Home/数据 & AI/tavily-extract
T

tavily-extract

by @tavily-aiv
4.7(11)

Focuses on extracting clean, distraction-free Markdown or plain text content from specified URLs for easy reading, analysis, or further processing, improving information retrieval efficiency.

Data ExtractionContent ParsingMarkdown ConversionInformation RetrievalText ExtractionGitHub
Installation
npx skills add tavily-ai/skills --skill tavily-extract
compare_arrows

Before / After Comparison

1
Before

In the past, when copying content from web pages, it often brought in a lot of ads, navigation elements, and messy formatting, requiring a significant amount of time for manual cleanup afterwards.

After

The Tavily Extract skill can intelligently extract clean Markdown or text content from a specified URL, eliminating tedious post-extraction cleanup.

description SKILL.md

tavily-extract

tavily extract

Extract clean markdown or text content from one or more URLs.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Do not skip this step or fall back to other tools.

See tavily-cli for alternative install methods and auth options.

When to use

  • You have a specific URL and want its content

  • You need text from JavaScript-rendered pages

  • Step 2 in the workflow: search → extract → map → crawl → research

Quick start

# Single URL
tvly extract "https://example.com/article" --json

# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json

# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json

# Save to file
tvly extract "https://example.com/article" -o article.md

Options

Option Description

--query Rerank chunks by relevance to this query

--chunks-per-source Chunks per URL (1-5, requires --query)

--extract-depth basic (default) or advanced (for JS pages)

--format markdown (default) or text

--include-images Include image URLs

--timeout Max wait time (1-60 seconds)

-o, --output Save output to file

--json Structured JSON output

Extract depth

Depth When to use

basic Simple pages, fast — try this first

advanced JS-rendered SPAs, dynamic content, tables

Tips

  • Max 20 URLs per request — batch larger lists into multiple calls.

  • Use --query + --chunks-per-source to get only relevant content instead of full pages.

  • Try basic first, fall back to advanced if content is missing.

  • Set --timeout for slow pages (up to 60s).

  • If search results already contain the content you need (via --include-raw-content), skip the extract step.

See also

Weekly Installs280Repositorytavily-ai/skillsGitHub Stars95First Seen2 days agoSecurity AuditsGen Agent Trust HubFailSocketPassSnykFailInstalled oncodex275opencode274cursor274kimi-cli273gemini-cli273amp273

forumUser Reviews (0)

Write a Review

Effect
Usability
Docs
Compatibility

No reviews yet

Statistics

Installs514
Rating4.7 / 5.0
Version
Updated2026年3月18日
Comparisons1

User Rating

4.7(11)
5
0%
4
0%
3
0%
2
0%
1
0%

Rate this Skill

0.0

Compatible Platforms

🔧Claude Code

Timeline

Created2026年3月18日
Last Updated2026年3月18日