T

tavily-extract

by @tavily-aiv
4.5(21)

专注于从指定URL中提取干净、无干扰的Markdown或纯文本内容,便于阅读、分析或进一步处理,提高信息获取效率。

data-extractioncontent-parsingmarkdown-conversioninformation-retrievaltext-extractionGitHub
安装方式
npx skills add tavily-ai/skills --skill tavily-extract
compare_arrows

Before / After 效果对比

1
使用前

过去从网页复制内容时,常会带入大量广告、导航和混乱的排版,后期需要耗费大量时间进行手动清理。

使用后

Tavily Extract技能能从指定URL中智能提取纯净的Markdown或文本内容,省去繁琐的后期清理工作。

SKILL.md

tavily-extract

tavily extract

Extract clean markdown or text content from one or more URLs.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Do not skip this step or fall back to other tools.

See tavily-cli for alternative install methods and auth options.

When to use

  • You have a specific URL and want its content

  • You need text from JavaScript-rendered pages

  • Step 2 in the workflow: search → extract → map → crawl → research

Quick start

# Single URL
tvly extract "https://example.com/article" --json

# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json

# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json

# Save to file
tvly extract "https://example.com/article" -o article.md

Options

Option Description

--query Rerank chunks by relevance to this query

--chunks-per-source Chunks per URL (1-5, requires --query)

--extract-depth basic (default) or advanced (for JS pages)

--format markdown (default) or text

--include-images Include image URLs

--timeout Max wait time (1-60 seconds)

-o, --output Save output to file

--json Structured JSON output

Extract depth

Depth When to use

basic Simple pages, fast — try this first

advanced JS-rendered SPAs, dynamic content, tables

Tips

  • Max 20 URLs per request — batch larger lists into multiple calls.

  • Use --query + --chunks-per-source to get only relevant content instead of full pages.

  • Try basic first, fall back to advanced if content is missing.

  • Set --timeout for slow pages (up to 60s).

  • If search results already contain the content you need (via --include-raw-content), skip the extract step.

See also

Weekly Installs280Repositorytavily-ai/skillsGitHub Stars95First Seen2 days agoSecurity AuditsGen Agent Trust HubFailSocketPassSnykFailInstalled oncodex275opencode274cursor274kimi-cli273gemini-cli273amp273

用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价

统计数据

安装量7.2K
评分4.5 / 5.0
版本
更新日期2026年5月22日
对比案例1 组

用户评分

4.5(21)
5
71%
4
29%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月18日
最后更新2026年5月22日