---
id: ssh2-parallel-web-extract
name: "parallel-web-extract"
url: https://skills.yangsir.net/skill/ssh2-parallel-web-extract
author: parallel-web
domain: ai-data-management-analysis
tags: ["web-data-extraction", "parallel-processing", "distributed-scraping", "data-pipelines"]
install_count: 7800
rating: 4.50 (30 reviews)
github: https://github.com/parallel-web/parallel-agent-skills
---

# parallel-web-extract

> 高效并行地从任意URL（包括网页、API等）提取内容，适用于大规模数据采集和信息处理任务。

**Stats**: 7,800 installs · 4.5/5 (30 reviews)

## Before / After 对比

### 并行高效提取网页内容，加速数据获取

## Readme

# URL Extraction

Extract content from: $ARGUMENTS

## Command

Choose a short, descriptive filename based on the URL or content (e.g., `vespa-docs`, `react-hooks-api`). Use lowercase with hyphens, no spaces.

```bash
parallel-cli extract "$ARGUMENTS" --json -o "/tmp/$FILENAME.md"
```

Options if needed:
- `--objective "focus area"` to focus on specific content

## Response format

Return content as:

**[Page Title](URL)**

Then the extracted content verbatim, with these rules:
- Keep content verbatim - do not paraphrase or summarize
- Parse lists exhaustively - extract EVERY numbered/bulleted item
- Strip only obvious noise: nav menus, footers, ads
- Preserve all facts, names, numbers, dates, quotes

After the response, mention the output file path (`/tmp/$FILENAME.md`) so the user knows it's available for follow-up questions.

## Setup

If `parallel-cli` is not found, install and authenticate:

```bash
curl -fsSL https://parallel.ai/install.sh | bash
```

If unable to install that way, install via pipx instead:

```bash
pipx install "parallel-web-tools[cli]"
pipx ensurepath
```

Then authenticate:

```bash
parallel-cli login
```

Or set an API key: `export PARALLEL_API_KEY="your-key"`


---
*Source: https://skills.yangsir.net/skill/ssh2-parallel-web-extract*
*Markdown mirror: https://skills.yangsir.net/api/skill/ssh2-parallel-web-extract/markdown*