---
id: daily-layout-analyzer
name: "layout-analyzer"
url: https://skills.yangsir.net/skill/daily-layout-analyzer
author: claude-office-skills
domain: data-ai
tags: ["layout", "analyzer", "ai", "llm", "agents"]
install_count: 1800
rating: 4.30 (20 reviews)
github: https://github.com/claude-office-skills/skills
---

# layout-analyzer

> 使用 surya 进行文档布局分析，识别文档结构、表格和图像等元素

**Stats**: 1,800 installs · 4.3/5 (20 reviews)

## Before / After 对比

### 文档布局自动化分析

**Before**:

在没有 `layout-analyzer` 技能之前，对文档进行布局分析是一项高度依赖人工的繁琐工作。分析师需要手动打开文档图像或PDF，逐页仔细检查，识别并标记出文本块、表格、图片、标题等各种元素。这通常涉及到使用图像编辑工具手动绘制边界框，并对每个元素进行分类。对于需要确定阅读顺序的复杂文档，分析师必须手动追踪文本流，这不仅耗时，而且容易因主观判断而产生不一致性。处理大量文档时，这种手动方法效率低下，成本高昂，且结果的准确性和一致性难以保证。

**After**:

引入 `layout-analyzer` 技能后，文档布局分析流程实现了显著的自动化和优化。用户只需提供文档图像或PDF，并指定需要检测的布局元素（如文本、表格、标题、图片等）。该技能利用 `surya` 强大的能力，自动识别文档结构，精确地检测出各类布局元素，并返回其边界框、类型和置信度。此外，它还能自动确定复杂的阅读顺序。这极大地减少了人工干预，将原本数小时甚至数天的工作量缩短到几秒钟，显著提升了处理效率、结果的准确性和一致性，从而释放了人力资源去处理更具价值的任务。

| Metric | Before | After | Change |
|---|---|---|---|
| 单页文档布局分析时间 | 300秒 | 3秒 | -99% |
| 人工复核工作量 | 100% | 10% | -90% |

## Readme

# layout-analyzer

# Layout Analyzer Skill

## Overview

This skill enables document layout analysis using **surya** - an advanced document understanding system. Detect text blocks, tables, figures, headings, and determine reading order in complex documents.

## How to Use

- Provide the document image or PDF

- Specify what layout elements to detect

- I'll analyze the structure and return detected regions

**Example prompts:**

- "Analyze the layout of this document page"

- "Detect all tables and text blocks in this image"

- "Determine the reading order for this PDF page"

- "Find headings and paragraphs in this document"

## Domain Knowledge

### surya Fundamentals

```
from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

# Load image
image = Image.open("document.png")

# Detect layout elements
layout_predictor = LayoutPredictor()
layout_result = layout_predictor([image])

```

### Layout Element Types

Element
Description

Text
Regular paragraph text

Title
Document/section titles

Section-header
Section headings

List-item
Bulleted/numbered items

Table
Tabular data

Figure
Images/diagrams

Caption
Figure/table captions

Footnote
Footnotes

Formula
Mathematical equations

Page-header
Headers

Page-footer
Footers

### Text Detection

```
from surya.detection import DetectionPredictor
from PIL import Image

# Initialize detector
detector = DetectionPredictor()

# Load image
image = Image.open("document.png")

# Detect text regions
results = detector([image])

# Access results
for page_result in results:
    for bbox in page_result.bboxes:
        print(f"Text region: {bbox.bbox}")
        print(f"Confidence: {bbox.confidence}")

```

### Layout Analysis

```
from surya.layout import LayoutPredictor
from PIL import Image

# Initialize layout predictor
layout_predictor = LayoutPredictor()

# Analyze layout
image = Image.open("document.png")
layout_results = layout_predictor([image])

# Process results
for page_result in layout_results:
    for element in page_result.bboxes:
        print(f"Type: {element.label}")
        print(f"Bbox: {element.bbox}")
        print(f"Confidence: {element.confidence}")

```

### Reading Order Detection

```
from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import Image

# Get layout first
layout_predictor = LayoutPredictor()
image = Image.open("document.png")
layout_results = layout_predictor([image])

# Determine reading order
reading_order_predictor = ReadingOrderPredictor()
order_results = reading_order_predictor([image], layout_results)

# Access ordered elements
for page_result in order_results:
    for i, element in enumerate(page_result.ordered_bboxes):
        print(f"{i+1}. {element.label}: {element.bbox}")

```

### OCR with Layout

```
from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import Image

# Initialize predictors
ocr_predictor = OCRPredictor()
layout_predictor = LayoutPredictor()

# Load image
image = Image.open("document.png")

# Get layout
layout_results = layout_predictor([image])

# Run OCR
ocr_results = ocr_predictor([image])

# Combine results
for layout, ocr in zip(layout_results, ocr_results):
    for layout_elem in layout.bboxes:
        print(f"Element: {layout_elem.label}")
        
        # Find OCR text within this layout element
        for text_line in ocr.text_lines:
            if boxes_overlap(layout_elem.bbox, text_line.bbox):
                print(f"  Text: {text_line.text}")

```

### Processing PDFs

```
from surya.layout import LayoutPredictor
from pdf2image import convert_from_path

def analyze_pdf_layout(pdf_path):
    """Analyze layout of all pages in PDF."""
    
    # Convert PDF to images
    images = convert_from_path(pdf_path)
    
    # Initialize predictor
    layout_predictor = LayoutPredictor()
    
    # Analyze all pages
    results = layout_predictor(images)
    
    document_structure = []
    
    for page_num, page_result in enumerate(results):
        page_elements = []
        
        for element in page_result.bboxes:
            page_elements.append({
                'type': element.label,
                'bbox': element.bbox,
                'confidence': element.confidence
            })
        
        document_structure.append({
            'page': page_num + 1,
            'elements': page_elements
        })
    
    return document_structure

structure = analyze_pdf_layout("document.pdf")

```

### Visualization

```
from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont

def visualize_layout(image_path, output_path):
    """Visualize detected layout elements."""
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    # Create drawing context
    draw = ImageDraw.Draw(image)
    
    # Color mapping for element types
    colors = {
        'Text': 'blue',
        'Title': 'red',
        'Table': 'green',
        'Figure': 'purple',
        'Section-header': 'orange',
        'List-item': 'cyan',
    }
    
    for element in results[0].bboxes:
        bbox = element.bbox
        color = colors.get(element.label, 'gray')
        
        # Draw rectangle
        draw.rectangle(bbox, outline=color, width=2)
        
        # Add label
        draw.text((bbox[0], bbox[1] - 15), 
                  f"{element.label} ({element.confidence:.2f})",
                  fill=color)
    
    image.save(output_path)
    return output_path

```

## Best Practices

- **Use High-Quality Images**: 150+ DPI for best results

- **Preprocess if Needed**: Deskew rotated documents

- **Validate Results**: Check confidence scores

- **Handle Multi-page**: Process pages individually

- **Combine with OCR**: Get text within detected regions

## Common Patterns

### Document Structure Extraction

```
def extract_document_structure(image_path):
    """Extract hierarchical document structure."""
    
    from surya.layout import LayoutPredictor
    from surya.reading_order import ReadingOrderPredictor
    
    image = Image.open(image_path)
    
    # Get layout
    layout_predictor = LayoutPredictor()
    layout_results = layout_predictor([image])
    
    # Get reading order
    order_predictor = ReadingOrderPredictor()
    order_results = order_predictor([image], layout_results)
    
    structure = {
        'title': None,
        'sections': [],
        'tables': [],
        'figures': []
    }
    
    current_section = None
    
    for element in order_results[0].ordered_bboxes:
        if element.label == 'Title':
            structure['title'] = element
        elif element.label == 'Section-header':
            current_section = {'header': element, 'content': []}
            structure['sections'].append(current_section)
        elif element.label == 'Table':
            structure['tables'].append(element)
        elif element.label == 'Figure':
            structure['figures'].append(element)
        elif current_section and element.label in ['Text', 'List-item']:
            current_section['content'].append(element)
    
    return structure

```

### Table Region Extraction

```
def extract_table_regions(image_path):
    """Extract table regions from document."""
    
    from surya.layout import LayoutPredictor
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    tables = []
    
    for element in results[0].bboxes:
        if element.label == 'Table':
            bbox = element.bbox
            
            # Crop table region
            table_image = image.crop(bbox)
            
            tables.append({
                'bbox': bbox,
                'image': table_image,
                'confidence': element.confidence
            })
    
    return tables

```

## Examples

### Example 1: Academic Paper Analysis

```
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path

def analyze_academic_paper(pdf_path):
    """Analyze structure of academic paper."""
    
    images = convert_from_path(pdf_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    paper_structure = {
        'pages': [],
        'element_counts': {
            'Title': 0,
            'Section-header': 0,
            'Text': 0,
            'Table': 0,
            'Figure': 0,
            'Formula': 0,
            'Footnote': 0
        }
    }
    
    layout_results = layout_predictor(images)
    order_results = order_predictor(images, layout_results)
    
    for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
        page_structure = {
            'page': page_num + 1,
            'elements': []
        }
        
        for element in order.ordered_bboxes:
            page_structure['elements'].append({
                'type': element.label,
                'bbox': element.bbox,
                'order': element.position
            })
            
            # Count element types
            if element.label in paper_structure['element_counts']:
                paper_structure['element_counts'][element.label] += 1
        
        paper_structure['pages'].append(page_structure)
    
    return paper_structure

paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")

```

### Example 2: Form Field Detection

```
from surya.layout import LayoutPredictor
from PIL import Image

def detect_form_fields(image_path):
    """Detect form fields and labels."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    form_fields = []
    
    for element in results[0].bboxes:
        # Look for text elements that might be labels
        if element.label == 'Text':
            # Check if there's a box/line nearby (potential input field)
            form_fields.append({
                'type': 'potential_label',
                'bbox': element.bbox,
                'confidence': element.confidence
            })
    
    return form_fields

fields = detect_form_fields('form.png')
print(f"Found {len(fields)} potential form elements")

```

### Example 3: Multi-column Article

```
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

def process_multicolumn_article(image_path):
    """Process multi-column article layout."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    layout_results = layout_predictor([image])
    order_results = order_predictor([image], layout_results)
    
    # Group elements by column
    image_width = image.width
    column_threshold = image_width / 2
    
    columns = {
        'left': [],
        'right': [],
        'full_width': []
    }
    
    for element in order_results[0].ordered_bboxes:
        bbox = element.bbox
        element_center = (bbox[0] + bbox[2]) / 2
        element_width = bbox[2] - bbox[0]
        
        # Determine column
        if element_width > column_threshold * 1.5:
            columns['full_width'].append(element)
        elif element_center < column_threshold:
            columns['left'].append(element)
        else:
            columns['right'].append(element)
    
    return {
        'layout': 'multi-column',
        'columns': columns,
        'reading_order': order_results[0].ordered_bboxes
    }

article = process_multicolumn_article('newspaper_page.png')
print(f"Left column: {len(article['columns']['left'])} elements")
print(f"Right column: {len(article['columns']['right'])} elements")

```

## Limitations

- Handwritten layouts may be inaccurate

- Very small text regions may be missed

- Complex nested layouts challenging

- GPU recommended for batch processing

- Multi-language support varies

## Installation

```
pip install surya-ocr

# For PDF processing
pip install pdf2image

```

## Resources

- [surya GitHub](https://github.com/VikParuchuri/surya)

- [Model Documentation](https://github.com/VikParuchuri/surya#models)

- [Examples](https://github.com/VikParuchuri/surya/tree/master/examples)

Weekly Installs265Repository[claude-office-s…s/skills](https://github.com/claude-office-skills/skills)GitHub Stars26First SeenMar 9, 2026Security Audits[Gen Agent Trust HubPass](/claude-office-skills/skills/layout-analyzer/security/agent-trust-hub)[SocketPass](/claude-office-skills/skills/layout-analyzer/security/socket)[SnykPass](/claude-office-skills/skills/layout-analyzer/security/snyk)Installed onclaude-code214opencode111github-copilot110gemini-cli108kimi-cli108codex108

---
*Source: https://skills.yangsir.net/skill/daily-layout-analyzer*
*Markdown mirror: https://skills.yangsir.net/api/skill/daily-layout-analyzer/markdown*