invoke_firecrawl_crawlhtml
Initiate an asynchronous web crawl to extract HTML content from a specified URL. Results are stored in an S3 bucket, with control over the maximum number of pages to crawl.
Instructions
Start an asynchronous web crawl job using Firecrawl to retrieve HTML content.
Args:
url: URL to crawl
s3_uri: S3 URI where results will be uploaded
limit: Maximum number of pages to crawl (default: 100)
Returns:
Dictionary with crawl job information including the job ID
Input Schema
Name | Required | Description | Default |
---|---|---|---|
limit | No | ||
s3_uri | Yes | ||
url | Yes |
Input Schema (JSON Schema)
{
"properties": {
"limit": {
"default": 100,
"title": "Limit",
"type": "integer"
},
"s3_uri": {
"title": "S3 Uri",
"type": "string"
},
"url": {
"title": "Url",
"type": "string"
}
},
"required": [
"url",
"s3_uri"
],
"title": "invoke_firecrawl_crawlhtmlArguments",
"type": "object"
}