首页/数据 & AI/parallel-findall
P

parallel-findall

by @parallel-webv
4.5(120)

根据自然语言描述,发现并列出符合条件的实体(如公司、人物、产品等),以结构化列表形式呈现。适用于用户需要查找特定类型实体的场景,区别于网页搜索和深度研究报告。

entity-discoveryinformation-retrievaldata-extractionstructured-datanatural-language-processingGitHub
安装方式
git clone https://github.com/parallel-web/parallel-agent-skills.git
compare_arrows

Before / After 效果对比

1
使用前

手动在多个网站和目录中搜索实体信息,耗时费力,且容易遗漏或包含不相关内容,导致列表不完整或不准确。

使用后

自动根据自然语言描述生成结构化实体列表,大幅减少人工搜索和整理时间,提高列表的准确性和覆盖率。

SKILL.md

FindAll: Entity Discovery

Find: $ARGUMENTS

Requires parallel-cli ≥ 0.3.0 (the findall command was added in 0.3.0). If parallel-cli findall errors with no such command or similar, tell the user to run parallel-cli update (or pipx upgrade parallel-web-tools if installed via pipx), then retry.

When to use this skill

Use FindAll when the user wants a structured list of entities matching a description, not webpages or a narrative answer.

User asks for…Use
"Find all X that…" / "List every Y…"parallel-findall (this skill)
Webpage results / quick answers / current infoparallel-web-search
Narrative report / analysis / "research X"parallel-deep-research
Add fields to a list you already haveparallel-data-enrichment

If the user already has a list and just wants to add fields, this is the wrong skill — use parallel-data-enrichment.

Step 1: Start the run

parallel-cli findall run "$ARGUMENTS" --no-wait --json

Defaults: generator core, match limit 10. Stick with core unless the user has a reason to escalate:

  • -g pro — most thorough generator (slower, costlier). Use when the user asks for "comprehensive" coverage or matches are sparse on core
  • -g base — fastest, but markedly lower quality. Often returns query-echo entities (e.g., directory pages, the literal query string), entries with no URL, or category placeholders. Only use if the user explicitly asks for a quick scan and accepts noise; otherwise prefer core
  • -n 50 — return up to 50 matched entities (5–1000 allowed)

If the user wants to exclude known entities (e.g., "find competitors but not Google or OpenAI"):

parallel-cli findall run "$ARGUMENTS" --no-wait --json \
    --exclude '[{"name":"Google","url":"google.com"},{"name":"OpenAI","url":"openai.com"}]'

Tip — preview the schema first if the objective is ambiguous: parallel-cli findall ingest "$ARGUMENTS" --json shows the entity type and match conditions the API inferred, so you can refine wording before paying for a run.

Parse the JSON output to extract the findall_id and any monitoring URL. Tell the user:

  • A FindAll run has been started
  • Approximate cadence (minutes for core, longer for pro)
  • They can keep working while it runs

Step 2: Poll for results

Choose a descriptive filename (e.g., series-a-ai-2026, charlotte-roofers). Use lowercase with hyphens, no spaces.

parallel-cli findall poll "$FINDALL_ID" -o "/tmp/$FILENAME.json" --timeout 540

Important:

  • Use --timeout 540 (9 minutes) to stay within tool execution limits
  • Do NOT pass --json for large result sets — it will flood context. -o saves the full results to disk

If the poll times out

Re-run the same parallel-cli findall poll command to continue waiting. Server-side the run continues regardless.

Response format

Before presenting matches, filter the results for obvious noise:

  • Drop entries with empty/missing url
  • Drop entries whose name echoes the user's query (e.g., literal "YC W25 batch companies in developer tools") — those are search-result placeholders, not real entities
  • Drop entries whose url is a third-party directory or profile page rather than the entity's own domain. Concretely: drop URLs on linkedin.com, ycombinator.com/companies/..., crunchbase.com, pitchbook.com, generic news/blog posts about the entity, etc. The URL should be something the entity itself owns (its product site, docs, or marketing site)

If filtering removes a meaningful share of matches, mention this to the user and suggest re-running with -g pro or a higher -n.

Sanity-check -g base results. The base generator can hallucinate categorical attributes (e.g., return a YC S22 company as a YC W25 match). The filter rules above only catch URL/name shape, not factual correctness. If the user's query has a falsifiable attribute (a specific batch, year, geography, etc.), spot-check the kept entries against the source URL and flag any that don't fit. Recommend re-running with -g core (or higher) if either multiple kept entries fail the spot-check or noise filtering dropped a meaningful share of the matched set (say, ≥40%) — both indicate base isn't producing reliable results for this query.

Present the remaining (real) entities as a markdown table or list. Lead with the count, then list each entity with its name, URL, and a one-line description if available. Cite each entity with its source URL.

Tell the user:

  • How many entities were matched (and how many were filtered as noise, if any)
  • The full results path (/tmp/$FILENAME.json)
  • That they can:
    • Add fields to these results, e.g.:

      parallel-cli findall enrich $FINDALL_ID '{"properties":{"ceo":{"type":"string"},"employee_count":{"type":"number"}}}'
      

      The schema is a JSON Schema-style object with properties mapping field names → {type, description?}.

    • Get more matches: parallel-cli findall extend $FINDALL_ID 50

Setup

Requires parallel-cli (installed and authenticated). If parallel-cli --version fails, or if a later command fails with an authentication error, tell the user to see https://docs.parallel.ai/integrations/cli and stop.

用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价

统计数据

安装量6.6K
评分4.5 / 5.0
版本
更新日期2026年5月22日
对比案例1 组

用户评分

4.5(120)
5
37%
4
43%
3
13%
2
5%
1
2%

为此 Skill 评分

0.0

兼容平台

🤖claude-code

时间线

创建2026年5月21日
最后更新2026年5月22日