首页/数据 & AI/doc-pipeline
D

doc-pipeline

by @claude-office-skillsv1.0.0
4.4(3)

构建文档处理流水线,串联提取、转换、转换等操作为可复用工作流,实现数据在阶段间自动流转

data-extractionautomationinformation-retrievalworkflowdata-analysisGitHub
安装方式
npx skills add claude-office-skills/skills --skill doc-pipeline
compare_arrows

Before / After 效果对比

1
使用前

处理一批文档需要手动执行多个步骤:下载、格式转换、数据提取、清洗、入库。每个环节独立操作,容易出错且无法复用,处理 100 份文档需要一整天。

使用后

定义一次文档处理流水线,后续批量自动执行全流程。数据在提取、转换、加载阶段间无缝流转,支持并行处理和错误重试,100 份文档 10 分钟完成且质量可控。

description SKILL.md

doc-pipeline

Doc Pipeline Skill

Overview

This skill enables building document processing pipelines - chain multiple operations (extract, transform, convert) into reusable workflows with data flowing between stages.

How to Use

  • Describe what you want to accomplish

  • Provide any required input data or files

  • I'll execute the appropriate operations

Example prompts:

  • "PDF → Extract Text → Translate → Generate DOCX"

  • "Image → OCR → Summarize → Create Report"

  • "Excel → Analyze → Generate Charts → Create PPT"

  • "Multiple inputs → Merge → Format → Output"

Domain Knowledge

Pipeline Architecture

Stage 1      Stage 2      Stage 3      Stage 4
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐
│Extract│ → │Transform│ → │ AI   │ → │Output│
│ PDF  │    │  Data  │    │Analyze│   │ DOCX │
└──────┘    └──────┘    └──────┘    └──────┘
     │           │           │           │
     └───────────┴───────────┴───────────┘
                 Data Flow

Pipeline DSL (Domain Specific Language)

# pipeline.yaml
name: contract-review-pipeline
description: Extract, analyze, and report on contracts

stages:
  - name: extract
    operation: pdf-extraction
    input: $input_file
    output: $extracted_text
    
  - name: analyze
    operation: ai-analyze
    input: $extracted_text
    prompt: "Review this contract for risks..."
    output: $analysis
    
  - name: report
    operation: docx-generation
    input: $analysis
    template: templates/review_report.docx
    output: $output_file

Python Implementation

from typing import Callable, Any
from dataclasses import dataclass

@dataclass
class Stage:
    name: str
    operation: Callable
    
class Pipeline:
    def __init__(self, name: str):
        self.name = name
        self.stages: list[Stage] = []
    
    def add_stage(self, name: str, operation: Callable):
        self.stages.append(Stage(name, operation))
        return self  # Fluent API
    
    def run(self, input_data: Any) -> Any:
        data = input_data
        for stage in self.stages:
            print(f"Running stage: {stage.name}")
            data = stage.operation(data)
        return data

# Example usage
pipeline = Pipeline("contract-review")
pipeline.add_stage("extract", extract_pdf_text)
pipeline.add_stage("analyze", analyze_with_ai)
pipeline.add_stage("generate", create_docx_report)

result = pipeline.run("/path/to/contract.pdf")

Advanced: Conditional Pipelines

class ConditionalPipeline(Pipeline):
    def add_conditional_stage(self, name: str, condition: Callable, 
                               if_true: Callable, if_false: Callable):
        def conditional_op(data):
            if condition(data):
                return if_true(data)
            return if_false(data)
        return self.add_stage(name, conditional_op)

# Usage
pipeline.add_conditional_stage(
    "ocr_if_needed",
    condition=lambda d: d.get("has_images"),
    if_true=run_ocr,
    if_false=lambda d: d
)

Best Practices

  • Keep stages focused (single responsibility)

  • Use intermediate outputs for debugging

  • Implement stage-level error handling

  • Make pipelines configurable via YAML/JSON

Installation

# Install required dependencies
pip install python-docx openpyxl python-pptx reportlab jinja2

Resources

Weekly Installs303Repositoryclaude-office-s…s/skillsGitHub Stars35First SeenMar 5, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled onclaude-code256opencode120github-copilot119kimi-cli117amp117cline117

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量217
评分4.4 / 5.0
版本1.0.0
更新日期2026年3月30日
对比案例1 组

用户评分

4.4(3)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月30日
最后更新2026年3月30日