Name: tavily-crawl AI Agent Skill
Availability: InStock
Rating: 4.5 (82 reviews)
Author: tavily-ai

tavily-crawl

tavily crawl

Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Do not skip this step or fall back to other tools.

See tavily-cli for alternative install methods and auth options.

When to use

You need content from many pages on a site (e.g., all /docs/)
You want to download documentation for offline use
Step 4 in the workflow: search → extract → map → crawl → research

Quick start

# Basic crawl
tvly crawl "https://docs.example.com" --json

# Save each page as a markdown file
tvly crawl "https://docs.example.com" --output-dir ./docs/

# Deeper crawl with limits
tvly crawl "https://docs.example.com" --max-depth 2 --limit 50 --json

# Filter to specific paths
tvly crawl "https://example.com" --select-paths "/api/.*,/guides/.*" --exclude-paths "/blog/.*" --json

# Semantic focus (returns relevant chunks, not full pages)
tvly crawl "https://docs.example.com" --instructions "Find authentication docs" --chunks-per-source 3 --json

Options

Option Description

--max-depth Levels deep (1-5, default: 1)

--max-breadth Links per page (default: 20)

--limit Total pages cap (default: 50)

--instructions Natural language guidance for semantic focus

--chunks-per-source Chunks per page (1-5, requires --instructions)

--extract-depth basic (default) or advanced

--format markdown (default) or text

--select-paths Comma-separated regex patterns to include

--exclude-paths Comma-separated regex patterns to exclude

--select-domains Comma-separated regex for domains to include

--exclude-domains Comma-separated regex for domains to exclude

--allow-external / --no-external Include external links (default: allow)

--include-images Include images

--timeout Max wait (10-150 seconds)

-o, --output Save JSON output to file

--output-dir Save each page as a .md file in directory

--json Structured JSON output

Crawl for context vs. data collection

For agentic use (feeding results to an LLM):

Always use --instructions + --chunks-per-source. Returns only relevant chunks instead of full pages — prevents context explosion.

tvly crawl "https://docs.example.com" --instructions "API authentication" --chunks-per-source 3 --json

For data collection (saving to files):

Use --output-dir without --chunks-per-source to get full pages as markdown files.

tvly crawl "https://docs.example.com" --max-depth 2 --output-dir ./docs/

Tips

Start conservative — --max-depth 1, --limit 20 — and scale up.
Use --select-paths to focus on the section you need.
Use map first to understand site structure before a full crawl.
Always set --limit to prevent runaway crawls.

tavily-crawl

Before / After Comparison

tavily-crawl

tavily crawl

Before running any command

When to use

Quick start

Options

Crawl for context vs. data collection

Tips

See also

User Reviews (0)

Statistics

User Rating

Compatible Platforms

Timeline