web-fetch
抓取网页内容,优先使用 Markdown 端点,其次是基于选择器的 HTML 提取,并支持 Bun 工具。
npx skills add 0xbigboss/claude-code --skill web-fetchBefore / After 效果对比
1 组1在没有Web Fetch技能时,获取Web内容通常需要手动使用`curl`命令,然后可能需要额外的工具(如`html2text`)将HTML转换为可读格式,过程繁琐且难以自动化。
2```bash
3# 手动获取HTML并转换为文本
4curl -s "https://example.com/article" | html2text
5```1通过Web Fetch技能,可以优先获取Markdown格式内容,或使用选择器从HTML中提取特定内容。技能还提供Bun fallback脚本作为备用,确保高效、准确地获取所需Web内容。
2```bash
3# 使用Web Fetch技能获取内容
4# 假设技能内部处理了curl, html2markdown等工具的调用
5python skills/web-fetch/scripts/fetch.py --url "https://example.com/article" --selector "article.content"
6
7# 或者直接获取markdown
8python skills/web-fetch/scripts/fetch.py --url "https://example.com/markdown-content.md"
9```description SKILL.md
web-fetch
Web Content Fetching Fetch web content in this order: Prefer markdown-native endpoints (content-type: text/markdown) Use selector-based HTML extraction for known sites Use the bundled Bun fallback script when selectors fail Prerequisites Verify required tools before extracting: command -v curl >/dev/null || echo "curl is required" command -v html2markdown >/dev/null || echo "html2markdown is required for HTML extraction" command -v bun >/dev/null || echo "bun is required for fetch.ts fallback" Install Bun dependencies for the bundled script: cd ~/.claude/skills/web-fetch && bun install Default Workflow Use this as the default flow for any URL: URL="" CONTENT_TYPE="$(curl -sIL "$URL" | awk -F': ' 'tolower($1)=="content-type"{print tolower($2)}' | tr -d '\r' | tail -1)" if echo "$CONTENT_TYPE" | grep -q "markdown"; then curl -sL "$URL" else curl -sL "$URL" \ | html2markdown \ --include-selector "article,main,[role=main]" \ --exclude-selector "nav,header,footer,script,style" fi Known Site Selectors Site Include Selector Exclude Selector platform.claude.com #content-container - docs.anthropic.com #content-container - developer.mozilla.org article - github.com (docs) article nav,.sidebar Generic article,main,[role=main] nav,header,footer,script,style Example: curl -sL "" \ | html2markdown \ --include-selector "#content-container" \ --exclude-selector "nav,header,footer" Finding the Right Selector When a site isn't in the patterns list: # Check what content containers exist curl -s "" | grep -o '<article[^>]>|<main[^>]>|id="[^"]content[^"]"' | head -10 # Test a selector curl -sL "" | html2markdown --include-selector "" | head -30 # Check line count curl -sL "" | html2markdown --include-selector "" | wc -l Universal Fallback Script When selectors produce poor output, run the bundled parser: bun ~/.claude/skills/web-fetch/fetch.ts "" If already in the skill directory: bun fetch.ts "" Options Reference --include-selector "CSS" # Keep only matching elements --exclude-selector "CSS" # Remove matching elements --domain "https://..." # Convert relative links to absolute Troubleshooting Empty output with selectors: The page might be markdown-native. Check headers first: curl -sIL "" | grep -i '^content-type:' Wrong content selected: The site may have multiple article/main regions: curl -s "" | grep -o '<article[^>]*>' html2markdown not found: Install it, then retry selector-based extraction. bun or script deps missing: Run cd ~/.claude/skills/web-fetch && bun install. Missing code blocks: Check if the site uses non-standard code formatting. Client-rendered content: If HTML only has "Loading..." placeholders, the content is JS-rendered. Neither curl nor the Bun script can extract it; use browser-based tools.Weekly Installs445Repository0xbigboss/claude-codeGitHub Stars38First SeenJan 20, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykWarnInstalled onopencode404codex395gemini-cli390github-copilot376cursor373amp350
forum用户评价 (0)
发表评价
暂无评价,来写第一条吧
统计数据
用户评分
为此 Skill 评分