---
id: daily-batch-processor
name: "batch-processor"
url: https://skills.yangsir.net/skill/daily-batch-processor
author: claude-office-skills
domain: data-analysis
tags: ["batch-processing", "document-conversion", "automation", "data-extraction", "productivity"]
install_count: 2900
rating: 4.40 (25 reviews)
github: https://github.com/claude-office-skills/skills
---

# batch-processor

> 高效批量处理数百个文档，支持格式转换、数据提取和分析，提供并行执行和进度跟踪

**Stats**: 2,900 installs · 4.4/5 (25 reviews)

## Before / After 对比

### 批量处理效率

**Before**:

逐个打开文件、手动转换格式、复制数据、记录结果，处理500个文档需要2-3天

**After**:

并行批量处理文档、自动转换提取、实时进度跟踪，4小时完成500个文档处理

| Metric | Before | After | Change |
|---|---|---|---|
| 处理时间 | 48hours/500docs | 4hours/500docs | -91.7% |

## Readme

# batch-processor

# Batch Processor Skill

## Overview

This skill enables efficient bulk processing of documents - convert, transform, extract, or analyze hundreds of files with parallel execution and progress tracking.

## How to Use

- Describe what you want to accomplish

- Provide any required input data or files

- I'll execute the appropriate operations

**Example prompts:**

- "Convert 100 PDFs to Word documents"

- "Extract text from all images in a folder"

- "Batch rename and organize files"

- "Mass update document headers/footers"

## Domain Knowledge

### Batch Processing Patterns

```
Input: [file1, file2, ..., fileN]
         │
         ▼
    ┌─────────────┐
    │  Parallel   │  ← Process multiple files concurrently
    │  Workers    │
    └─────────────┘
         │
         ▼
Output: [result1, result2, ..., resultN]

```

### Python Implementation

```
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
from tqdm import tqdm

def process_file(file_path: Path) -> dict:
    """Process a single file."""
    # Your processing logic here
    return {"path": str(file_path), "status": "success"}

def batch_process(input_dir: str, pattern: str = "*.*", max_workers: int = 4):
    """Process all matching files in directory."""
    
    files = list(Path(input_dir).glob(pattern))
    results = []
    
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(process_file, f): f for f in files}
        
        for future in tqdm(as_completed(futures), total=len(files)):
            file = futures[future]
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                results.append({"path": str(file), "error": str(e)})
    
    return results

# Usage
results = batch_process("/documents/invoices", "*.pdf", max_workers=8)
print(f"Processed {len(results)} files")

```

### Error Handling & Resume

```
import json
from pathlib import Path

class BatchProcessor:
    def __init__(self, checkpoint_file: str = "checkpoint.json"):
        self.checkpoint_file = checkpoint_file
        self.processed = self._load_checkpoint()
    
    def _load_checkpoint(self):
        if Path(self.checkpoint_file).exists():
            return json.load(open(self.checkpoint_file))
        return {}
    
    def _save_checkpoint(self):
        json.dump(self.processed, open(self.checkpoint_file, "w"))
    
    def process(self, files: list, processor_func):
        for file in files:
            if str(file) in self.processed:
                continue  # Skip already processed
            
            try:
                result = processor_func(file)
                self.processed[str(file)] = {"status": "success", **result}
            except Exception as e:
                self.processed[str(file)] = {"status": "error", "error": str(e)}
            
            self._save_checkpoint()  # Resume-safe

```

## Best Practices

- **Use progress bars (tqdm) for user feedback**

- **Implement checkpointing for long jobs**

- **Set reasonable worker counts (CPU cores)**

- **Log failures for later review**

## Installation

```
# Install required dependencies
pip install python-docx openpyxl python-pptx reportlab jinja2

```

## Resources

- [Custom Repository](https://github.com/claude-office-skills/skills)

- [Claude Office Skills Hub](https://github.com/claude-office-skills/skills)

Weekly Installs215Repository[claude-office-s…s/skills](https://github.com/claude-office-skills/skills)GitHub Stars19First Seen11 days agoSecurity Audits[Gen Agent Trust HubPass](/claude-office-skills/skills/batch-processor/security/agent-trust-hub)[SocketPass](/claude-office-skills/skills/batch-processor/security/socket)[SnykPass](/claude-office-skills/skills/batch-processor/security/snyk)Installed onclaude-code170opencode93github-copilot92gemini-cli90codex90amp90

---
*Source: https://skills.yangsir.net/skill/daily-batch-processor*
*Markdown mirror: https://skills.yangsir.net/api/skill/daily-batch-processor/markdown*