---
id: daily-tavily-crawl
name: "tavily-crawl"
url: https://skills.yangsir.net/skill/daily-tavily-crawl
author: tavily-ai
domain: ai-data-management-analysis
tags: ["web-crawling", "data-extraction", "information-retrieval", "website-scraping", "content-extraction"]
install_count: 6900
rating: 4.50 (82 reviews)
github: https://github.com/tavily-ai/skills
---

# tavily-crawl

> 能够高效爬取整个网站，从多个页面中提取所需内容，并支持保存数据，为深度数据分析和信息收集提供支持。

**Stats**: 6,900 installs · 4.5/5 (82 reviews)

## Before / After 对比

### 中文

## Readme

# tavily-crawl

# tavily crawl

Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.

## Before running any command

If `tvly` is not found on PATH, install it first:

```
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

```

Do not skip this step or fall back to other tools.

See [tavily-cli](https://github.com/tavily-ai/skills/blob/HEAD/skills/tavily-crawl/../tavily-cli/SKILL.md) for alternative install methods and auth options.

## When to use

- You need content from many pages on a site (e.g., all `/docs/`)

- You want to download documentation for offline use

- Step 4 in the [workflow](https://github.com/tavily-ai/skills/blob/HEAD/skills/tavily-crawl/../tavily-cli/SKILL.md): search → extract → map → **crawl** → research

## Quick start

```
# Basic crawl
tvly crawl "https://docs.example.com" --json

# Save each page as a markdown file
tvly crawl "https://docs.example.com" --output-dir ./docs/

# Deeper crawl with limits
tvly crawl "https://docs.example.com" --max-depth 2 --limit 50 --json

# Filter to specific paths
tvly crawl "https://example.com" --select-paths "/api/.*,/guides/.*" --exclude-paths "/blog/.*" --json

# Semantic focus (returns relevant chunks, not full pages)
tvly crawl "https://docs.example.com" --instructions "Find authentication docs" --chunks-per-source 3 --json

```

## Options

Option
Description

`--max-depth`
Levels deep (1-5, default: 1)

`--max-breadth`
Links per page (default: 20)

`--limit`
Total pages cap (default: 50)

`--instructions`
Natural language guidance for semantic focus

`--chunks-per-source`
Chunks per page (1-5, requires `--instructions`)

`--extract-depth`
`basic` (default) or `advanced`

`--format`
`markdown` (default) or `text`

`--select-paths`
Comma-separated regex patterns to include

`--exclude-paths`
Comma-separated regex patterns to exclude

`--select-domains`
Comma-separated regex for domains to include

`--exclude-domains`
Comma-separated regex for domains to exclude

`--allow-external / --no-external`
Include external links (default: allow)

`--include-images`
Include images

`--timeout`
Max wait (10-150 seconds)

`-o, --output`
Save JSON output to file

`--output-dir`
Save each page as a .md file in directory

`--json`
Structured JSON output

## Crawl for context vs. data collection

**For agentic use** (feeding results to an LLM):

Always use `--instructions` + `--chunks-per-source`. Returns only relevant chunks instead of full pages — prevents context explosion.

```
tvly crawl "https://docs.example.com" --instructions "API authentication" --chunks-per-source 3 --json

```

**For data collection** (saving to files):

Use `--output-dir` without `--chunks-per-source` to get full pages as markdown files.

```
tvly crawl "https://docs.example.com" --max-depth 2 --output-dir ./docs/

```

## Tips

- **Start conservative** — `--max-depth 1`, `--limit 20` — and scale up.

- **Use `--select-paths`** to focus on the section you need.

- **Use map first** to understand site structure before a full crawl.

- **Always set `--limit`** to prevent runaway crawls.

## See also

- [tavily-map](https://github.com/tavily-ai/skills/blob/HEAD/skills/tavily-crawl/../tavily-map/SKILL.md) — discover URLs before deciding to crawl

- [tavily-extract](https://github.com/tavily-ai/skills/blob/HEAD/skills/tavily-crawl/../tavily-extract/SKILL.md) — extract individual pages

- [tavily-search](https://github.com/tavily-ai/skills/blob/HEAD/skills/tavily-crawl/../tavily-search/SKILL.md) — find pages when you don't have a URL

Weekly Installs292Repository[tavily-ai/skills](https://github.com/tavily-ai/skills)GitHub Stars95First Seen2 days agoSecurity Audits[Gen Agent Trust HubFail](/tavily-ai/skills/tavily-crawl/security/agent-trust-hub)[SocketPass](/tavily-ai/skills/tavily-crawl/security/socket)[SnykFail](/tavily-ai/skills/tavily-crawl/security/snyk)Installed oncodex286opencode285cursor285kimi-cli284gemini-cli284amp284

---
*Source: https://skills.yangsir.net/skill/daily-tavily-crawl*
*Markdown mirror: https://skills.yangsir.net/api/skill/daily-tavily-crawl/markdown*