首页/数据 & AI/tavily-crawl
T

tavily-crawl

by @tavily-aiv1.0.0
0.0(0)

能够高效爬取整个网站,从多个页面中提取所需内容,并支持保存数据,为深度数据分析和信息收集提供支持。

Web CrawlingData ExtractionInformation RetrievalWebsite ScrapingContent ExtractionGitHub
安装方式
npx skills add tavily-ai/skills --skill tavily-crawl
compare_arrows

Before / After 效果对比

1
使用前

过去需要手动访问网站的多个页面并复制内容,或编写复杂的爬虫脚本,过程耗时且技术门槛较高。

使用后

Tavily Crawl技能能智能爬取整个网站并从多页面提取内容,大幅简化数据收集工作,提高效率。

description SKILL.md

tavily-crawl

tavily crawl Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file. Before running any command If tvly is not found on PATH, install it first: curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login Do not skip this step or fall back to other tools. See tavily-cli for alternative install methods and auth options. When to use You need content from many pages on a site (e.g., all /docs/) You want to download documentation for offline use Step 4 in the workflow: search → extract → map → crawl → research Quick start # Basic crawl tvly crawl "https://docs.example.com" --json # Save each page as a markdown file tvly crawl "https://docs.example.com" --output-dir ./docs/ # Deeper crawl with limits tvly crawl "https://docs.example.com" --max-depth 2 --limit 50 --json # Filter to specific paths tvly crawl "https://example.com" --select-paths "/api/.,/guides/." --exclude-paths "/blog/.*" --json # Semantic focus (returns relevant chunks, not full pages) tvly crawl "https://docs.example.com" --instructions "Find authentication docs" --chunks-per-source 3 --json Options Option Description --max-depth Levels deep (1-5, default: 1) --max-breadth Links per page (default: 20) --limit Total pages cap (default: 50) --instructions Natural language guidance for semantic focus --chunks-per-source Chunks per page (1-5, requires --instructions) --extract-depth basic (default) or advanced --format markdown (default) or text --select-paths Comma-separated regex patterns to include --exclude-paths Comma-separated regex patterns to exclude --select-domains Comma-separated regex for domains to include --exclude-domains Comma-separated regex for domains to exclude --allow-external / --no-external Include external links (default: allow) --include-images Include images --timeout Max wait (10-150 seconds) -o, --output Save JSON output to file --output-dir Save each page as a .md file in directory --json Structured JSON output Crawl for context vs. data collection For agentic use (feeding results to an LLM): Always use --instructions + --chunks-per-source. Returns only relevant chunks instead of full pages — prevents context explosion. tvly crawl "https://docs.example.com" --instructions "API authentication" --chunks-per-source 3 --json For data collection (saving to files): Use --output-dir without --chunks-per-source to get full pages as markdown files. tvly crawl "https://docs.example.com" --max-depth 2 --output-dir ./docs/ Tips Start conservative — --max-depth 1, --limit 20 — and scale up. Use --select-paths to focus on the section you need. Use map first to understand site structure before a full crawl. Always set --limit to prevent runaway crawls. See also tavily-map — discover URLs before deciding to crawl tavily-extract — extract individual pages tavily-search — find pages when you don't have a URL Weekly Installs270Repositorytavily-ai/skillsGitHub Stars95First Seen2 days agoSecurity AuditsGen Agent Trust HubFailSocketPassSnykFailInstalled oncodex265opencode264cursor264kimi-cli263gemini-cli263amp263

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量0
评分0.0 / 5.0
版本1.0.0
更新日期2026年3月18日
对比案例1 组

用户评分

0.0(0)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月18日
最后更新2026年3月18日