tavily-crawl
Start a structured web crawl from a base URL, using a tree-like approach to follow internal links. Control depth, breadth, and focus on specific site sections, domains, or content types for targeted data extraction.
Instructions
A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
allow_external | No | Whether to allow following links that go to external domains | |
categories | No | Filter URLs using predefined categories like documentation, blog, api, etc | |
extract_depth | No | Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency | basic |
format | No | The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency. | markdown |
instructions | Yes | Natural language instructions for the crawler | |
limit | No | Total number of links the crawler will process before stopping | |
max_breadth | No | Max number of links to follow per level of the tree (i.e., per page) | |
max_depth | No | Max depth of the crawl. Defines how far from the base URL the crawler can explore. | |
select_domains | No | Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\.example\.com$) | |
select_paths | No | Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*) | |
url | Yes | Root URL to begin the crawl |