文档布局自动化分析

Name: layout-analyzer AI Agent Skill
Availability: InStock
Rating: 4.3 (20 reviews)
Author: claude-office-skills

Before: 在没有 `layout-analyzer` 技能之前，对文档进行布局分析是一项高度依赖人工的繁琐工作。分析师需要手动打开文档图像或PDF，逐页仔细检查，识别并标记出文本块、表格、图片、标题等各种元素。这通常涉及到使用图像编辑工具手动绘制边界框，并对每个元素进行分类。对于需要确定阅读顺序的复杂文档，分析师必须手动追踪文本流，这不仅耗时，而且容易因主观判断而产生不一致性。处理大量文档时，这种手动方法效率低下，成本高昂，且结果的准确性和一致性难以保证。 After: 引入 `layout-analyzer` 技能后，文档布局分析流程实现了显著的自动化和优化。用户只需提供文档图像或PDF，并指定需要检测的布局元素（如文本、表格、标题、图片等）。该技能利用 `surya` 强大的能力，自动识别文档结构，精确地检测出各类布局元素，并返回其边界框、类型和置信度。此外，它还能自动确定复杂的阅读顺序。这极大地减少了人工干预，将原本数小时甚至数天的工作量缩短到几秒钟，显著提升了处理效率、结果的准确性和一致性，从而释放了人力资源去处理更具价值的任务。

首页/数据 & AI/layout-analyzer

layout-analyzer

by @claude-office-skillsv

4.3(20)

使用 surya 进行文档布局分析，识别文档结构、表格和图像等元素

layoutanalyzeraillmagentsGitHub

安装方式

npx skills add claude-office-skills/skills --skill layout-analyzer

compare_arrows

Before / After 效果对比

1 组

使用前

在没有 `layout-analyzer` 技能之前，对文档进行布局分析是一项高度依赖人工的繁琐工作。分析师需要手动打开文档图像或PDF，逐页仔细检查，识别并标记出文本块、表格、图片、标题等各种元素。这通常涉及到使用图像编辑工具手动绘制边界框，并对每个元素进行分类。对于需要确定阅读顺序的复杂文档，分析师必须手动追踪文本流，这不仅耗时，而且容易因主观判断而产生不一致性。处理大量文档时，这种手动方法效率低下，成本高昂，且结果的准确性和一致性难以保证。

使用后

引入 `layout-analyzer` 技能后，文档布局分析流程实现了显著的自动化和优化。用户只需提供文档图像或PDF，并指定需要检测的布局元素（如文本、表格、标题、图片等）。该技能利用 `surya` 强大的能力，自动识别文档结构，精确地检测出各类布局元素，并返回其边界框、类型和置信度。此外，它还能自动确定复杂的阅读顺序。这极大地减少了人工干预，将原本数小时甚至数天的工作量缩短到几秒钟，显著提升了处理效率、结果的准确性和一致性，从而释放了人力资源去处理更具价值的任务。

SKILL.md

layout-analyzer

Layout Analyzer Skill

Overview

This skill enables document layout analysis using surya - an advanced document understanding system. Detect text blocks, tables, figures, headings, and determine reading order in complex documents.

How to Use

Provide the document image or PDF
Specify what layout elements to detect
I'll analyze the structure and return detected regions

Example prompts:

"Analyze the layout of this document page"
"Detect all tables and text blocks in this image"
"Determine the reading order for this PDF page"
"Find headings and paragraphs in this document"

Domain Knowledge

surya Fundamentals

from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

# Load image
image = Image.open("document.png")

# Detect layout elements
layout_predictor = LayoutPredictor()
layout_result = layout_predictor([image])

Layout Element Types

Element Description

Text Regular paragraph text

Title Document/section titles

Section-header Section headings

List-item Bulleted/numbered items

Table Tabular data

Figure Images/diagrams

Caption Figure/table captions

Footnote Footnotes

Formula Mathematical equations

Page-header Headers

Page-footer Footers

Text Detection

from surya.detection import DetectionPredictor
from PIL import Image

# Initialize detector
detector = DetectionPredictor()

# Load image
image = Image.open("document.png")

# Detect text regions
results = detector([image])

# Access results
for page_result in results:
    for bbox in page_result.bboxes:
        print(f"Text region: {bbox.bbox}")
        print(f"Confidence: {bbox.confidence}")

Layout Analysis

from surya.layout import LayoutPredictor
from PIL import Image

# Initialize layout predictor
layout_predictor = LayoutPredictor()

# Analyze layout
image = Image.open("document.png")
layout_results = layout_predictor([image])

# Process results
for page_result in layout_results:
    for element in page_result.bboxes:
        print(f"Type: {element.label}")
        print(f"Bbox: {element.bbox}")
        print(f"Confidence: {element.confidence}")

Reading Order Detection

from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import Image

# Get layout first
layout_predictor = LayoutPredictor()
image = Image.open("document.png")
layout_results = layout_predictor([image])

# Determine reading order
reading_order_predictor = ReadingOrderPredictor()
order_results = reading_order_predictor([image], layout_results)

# Access ordered elements
for page_result in order_results:
    for i, element in enumerate(page_result.ordered_bboxes):
        print(f"{i+1}. {element.label}: {element.bbox}")

OCR with Layout

from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import Image

# Initialize predictors
ocr_predictor = OCRPredictor()
layout_predictor = LayoutPredictor()

# Load image
image = Image.open("document.png")

# Get layout
layout_results = layout_predictor([image])

# Run OCR
ocr_results = ocr_predictor([image])

# Combine results
for layout, ocr in zip(layout_results, ocr_results):
    for layout_elem in layout.bboxes:
        print(f"Element: {layout_elem.label}")
        
        # Find OCR text within this layout element
        for text_line in ocr.text_lines:
            if boxes_overlap(layout_elem.bbox, text_line.bbox):
                print(f"  Text: {text_line.text}")

Processing PDFs

from surya.layout import LayoutPredictor
from pdf2image import convert_from_path

def analyze_pdf_layout(pdf_path):
    """Analyze layout of all pages in PDF."""
    
    # Convert PDF to images
    images = convert_from_path(pdf_path)
    
    # Initialize predictor
    layout_predictor = LayoutPredictor()
    
    # Analyze all pages
    results = layout_predictor(images)
    
    document_structure = []
    
    for page_num, page_result in enumerate(results):
        page_elements = []
        
        for element in page_result.bboxes:
            page_elements.append({
                'type': element.label,
                'bbox': element.bbox,
                'confidence': element.confidence
            })
        
        document_structure.append({
            'page': page_num + 1,
            'elements': page_elements
        })
    
    return document_structure

structure = analyze_pdf_layout("document.pdf")

Visualization

from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont

def visualize_layout(image_path, output_path):
    """Visualize detected layout elements."""
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    # Create drawing context
    draw = ImageDraw.Draw(image)
    
    # Color mapping for element types
    colors = {
        'Text': 'blue',
        'Title': 'red',
        'Table': 'green',
        'Figure': 'purple',
        'Section-header': 'orange',
        'List-item': 'cyan',
    }
    
    for element in results[0].bboxes:
        bbox = element.bbox
        color = colors.get(element.label, 'gray')
        
        # Draw rectangle
        draw.rectangle(bbox, outline=color, width=2)
        
        # Add label
        draw.text((bbox[0], bbox[1] - 15), 
                  f"{element.label} ({element.confidence:.2f})",
                  fill=color)
    
    image.save(output_path)
    return output_path

Best Practices

Use High-Quality Images: 150+ DPI for best results
Preprocess if Needed: Deskew rotated documents
Validate Results: Check confidence scores
Handle Multi-page: Process pages individually
Combine with OCR: Get text within detected regions

Common Patterns

Document Structure Extraction

def extract_document_structure(image_path):
    """Extract hierarchical document structure."""
    
    from surya.layout import LayoutPredictor
    from surya.reading_order import ReadingOrderPredictor
    
    image = Image.open(image_path)
    
    # Get layout
    layout_predictor = LayoutPredictor()
    layout_results = layout_predictor([image])
    
    # Get reading order
    order_predictor = ReadingOrderPredictor()
    order_results = order_predictor([image], layout_results)
    
    structure = {
        'title': None,
        'sections': [],
        'tables': [],
        'figures': []
    }
    
    current_section = None
    
    for element in order_results[0].ordered_bboxes:
        if element.label == 'Title':
            structure['title'] = element
        elif element.label == 'Section-header':
            current_section = {'header': element, 'content': []}
            structure['sections'].append(current_section)
        elif element.label == 'Table':
            structure['tables'].append(element)
        elif element.label == 'Figure':
            structure['figures'].append(element)
        elif current_section and element.label in ['Text', 'List-item']:
            current_section['content'].append(element)
    
    return structure

Table Region Extraction

def extract_table_regions(image_path):
    """Extract table regions from document."""
    
    from surya.layout import LayoutPredictor
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    tables = []
    
    for element in results[0].bboxes:
        if element.label == 'Table':
            bbox = element.bbox
            
            # Crop table region
            table_image = image.crop(bbox)
            
            tables.append({
                'bbox': bbox,
                'image': table_image,
                'confidence': element.confidence
            })
    
    return tables

Examples

Example 1: Academic Paper Analysis

from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path

def analyze_academic_paper(pdf_path):
    """Analyze structure of academic paper."""
    
    images = convert_from_path(pdf_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    paper_structure = {
        'pages': [],
        'element_counts': {
            'Title': 0,
            'Section-header': 0,
            'Text': 0,
            'Table': 0,
            'Figure': 0,
            'Formula': 0,
            'Footnote': 0
        }
    }
    
    layout_results = layout_predictor(images)
    order_results = order_predictor(images, layout_results)
    
    for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
        page_structure = {
            'page': page_num + 1,
            'elements': []
        }
        
        for element in order.ordered_bboxes:
            page_structure['elements'].append({
                'type': element.label,
                'bbox': element.bbox,
                'order': element.position
            })
            
            # Count element types
            if element.label in paper_structure['element_counts']:
                paper_structure['element_counts'][element.label] += 1
        
        paper_structure['pages'].append(page_structure)
    
    return paper_structure

paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")

Example 2: Form Field Detection

from surya.layout import LayoutPredictor
from PIL import Image

def detect_form_fields(image_path):
    """Detect form fields and labels."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    form_fields = []
    
    for element in results[0].bboxes:
        # Look for text elements that might be labels
        if element.label == 'Text':
            # Check if there's a box/line nearby (potential input field)
            form_fields.append({
                'type': 'potential_label',
                'bbox': element.bbox,
                'confidence': element.confidence
            })
    
    return form_fields

fields = detect_form_fields('form.png')
print(f"Found {len(fields)} potential form elements")

Example 3: Multi-column Article

from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

def process_multicolumn_article(image_path):
    """Process multi-column article layout."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    layout_results = layout_predictor([image])
    order_results = order_predictor([image], layout_results)
    
    # Group elements by column
    image_width = image.width
    column_threshold = image_width / 2
    
    columns = {
        'left': [],
        'right': [],
        'full_width': []
    }
    
    for element in order_results[0].ordered_bboxes:
        bbox = element.bbox
        element_center = (bbox[0] + bbox[2]) / 2
        element_width = bbox[2] - bbox[0]
        
        # Determine column
        if element_width > column_threshold * 1.5:
            columns['full_width'].append(element)
        elif element_center < column_threshold:
            columns['left'].append(element)
        else:
            columns['right'].append(element)
    
    return {
        'layout': 'multi-column',
        'columns': columns,
        'reading_order': order_results[0].ordered_bboxes
    }

article = process_multicolumn_article('newspaper_page.png')
print(f"Left column: {len(article['columns']['left'])} elements")
print(f"Right column: {len(article['columns']['right'])} elements")

Limitations

Handwritten layouts may be inaccurate
Very small text regions may be missed
Complex nested layouts challenging
GPU recommended for batch processing
Multi-language support varies

Installation

pip install surya-ocr

# For PDF processing
pip install pdf2image

Resources

Weekly Installs265Repositoryclaude-office-s…s/skillsGitHub Stars26First SeenMar 9, 2026Security AuditsGen Agent Trust HubPass SocketPass SnykPassInstalled onclaude-code214opencode111github-copilot110gemini-cli108kimi-cli108codex108

用户评价 (0)

发表评价

效果

易用性

文档

兼容性

暂无评价

统计数据

安装量1.8K

评分4.3 / 5.0

版本

更新日期2026年5月21日

对比案例1 组

用户评分

4.3(20)

25%

55%

20%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月26日

最后更新2026年5月21日