首页/多媒体与音视频/transcription-automation
T

transcription-automation

by @claude-office-skillsv1.0.0
4.1(28)

自动化音视频转录流程,支持多语言识别、说话人分离和内容处理,AI Agent Skill,提升工作效率和自动化能力

speech-to-textvideo-productionautomationGitHub
安装方式
npx skills add claude-office-skills/skills --skill transcription-automation
compare_arrows

Before / After 效果对比

1
使用前

手动听写音视频内容,1小时音频需要6-8小时整理,识别说话人困难,多语言内容处理复杂易错。

使用后

自动识别音语音并转为文字,智能分离说话人,支持多语言,自动生成字幕和摘要,准确率超过95%。

description SKILL.md

transcription-automation

Transcription Automation

Comprehensive skill for automating audio/video transcription and content processing.

Core Workflows

1. Transcription Pipeline

TRANSCRIPTION FLOW:
┌─────────────────┐
│  Audio/Video    │
│     Input       │
└────────┬────────┘
         ▼
┌─────────────────┐
│  Pre-Processing │
│  - Convert      │
│  - Enhance      │
│  - Split        │
└────────┬────────┘
         ▼
┌─────────────────┐
│  Transcription  │
│  - STT Engine   │
│  - Diarization  │
└────────┬────────┘
         ▼
┌─────────────────┐
│ Post-Processing │
│  - Format       │
│  - Timestamps   │
│  - Speakers     │
└────────┬────────┘
         ▼
┌─────────────────┐
│     Output      │
│  - Text/SRT/VTT │
│  - Summary      │
└─────────────────┘

2. Transcription Configuration

transcription_config:
  engine: whisper  # whisper, assembly_ai, deepgram
  
  audio_settings:
    sample_rate: 16000
    channels: mono
    format: wav
    
  transcription:
    language: auto  # or specific: en, zh, es
    model: large  # tiny, base, small, medium, large
    task: transcribe  # transcribe or translate
    
  features:
    speaker_diarization: true
    word_timestamps: true
    punctuation: true
    profanity_filter: false
    
  output:
    formats:
      - txt
      - srt
      - vtt
      - json
    include_confidence: true
    include_timestamps: true

Meeting Transcription

Meeting Notes Template

meeting_transcript:
  metadata:
    title: "{{meeting_title}}"
    date: "{{date}}"
    duration: "{{duration}}"
    attendees: "{{speakers}}"
    
  output_template: |
    # {{title}}
    
    **Date:** {{date}}
    **Duration:** {{duration}}
    **Attendees:** {{attendees}}
    
    ## Summary
    {{ai_summary}}
    
    ## Key Points
    {{#each key_points}}
    - {{this}}
    {{/each}}
    
    ## Action Items
    {{#each action_items}}
    - [ ] {{task}} - @{{assignee}} - Due: {{due_date}}
    {{/each}}
    
    ## Full Transcript
    {{#each segments}}
    **[{{timestamp}}] {{speaker}}:** {{text}}
    
    {{/each}}

Speaker Diarization

diarization_config:
  min_speakers: 2
  max_speakers: 10
  
  speaker_labels:
    - name: "Speaker 1"
      voice_sample: "sample_1.wav"  # Optional
    - name: "Speaker 2"
      voice_sample: "sample_2.wav"
      
  output_format:
    speaker_prefix: true
    speaker_timestamps: true
    
  example_output: |
    [00:00:05] SPEAKER_1: Welcome everyone to today's meeting.
    [00:00:12] SPEAKER_2: Thanks for having us.
    [00:00:18] SPEAKER_1: Let's start with the agenda.

Subtitle Generation

SRT Format

subtitle_config:
  format: srt
  
  timing:
    max_duration: 7  # seconds per subtitle
    min_gap: 0.1     # seconds between subtitles
    chars_per_line: 42
    max_lines: 2
    
  style:
    case: sentence  # sentence, upper, lower
    numbers: words  # words, digits
    
  example_output: |
    1
    00:00:05,000 --> 00:00:08,500
    Welcome to today's presentation
    about transcription automation.
    
    2
    00:00:09,000 --> 00:00:12,000
    Let me start by explaining
    the basic concepts.

VTT Format

vtt_config:
  format: vtt
  
  features:
    cue_settings: true
    styling: true
    
  example_output: |
    WEBVTT
    
    00:00:05.000 --> 00:00:08.500 align:center
    Welcome to today's presentation
    about transcription automation.
    
    00:00:09.000 --> 00:00:12.000 align:center
    <v Speaker 1>Let me start by explaining
    the basic concepts.

Integration Workflows

Zoom Integration

zoom_transcription:
  trigger:
    event: recording_completed
    
  workflow:
    - step: download_recording
      source: zoom_cloud
      
    - step: transcribe
      engine: whisper
      language: auto
      
    - step: diarize
      identify_speakers: true
      
    - step: generate_notes
      template: meeting_notes
      include_summary: true
      extract_action_items: true
      
    - step: distribute
      destinations:
        - notion_page
        - slack_channel
        - email_attendees

YouTube Integration

youtube_subtitles:
  trigger:
    event: video_uploaded
    
  workflow:
    - step: download_audio
      source: youtube_video
      
    - step: transcribe
      engine: whisper
      task: transcribe
      
    - step: generate_subtitles
      formats: [srt, vtt]
      
    - step: translate
      target_languages: [es, zh, ja, de, fr]
      
    - step: upload_subtitles
      destination: youtube
      as_cc: true

Podcast Processing

podcast_workflow:
  input:
    source: rss_feed
    format: audio/mp3
    
  processing:
    - transcribe:
        engine: whisper
        model: large
        
    - generate_chapters:
        detect_topics: true
        min_duration: 60  # seconds
        
    - create_show_notes:
        summarize: true
        extract_links: true
        highlight_quotes: true
        
    - create_searchable_index:
        full_text: true
        timestamps: true
        
  output:
    - transcript_txt
    - chapters_json
    - show_notes_md
    - search_index

Language Support

Multi-Language Transcription

multilingual:
  auto_detect: true
  
  supported_languages:
    - code: en
      name: English
      model: large
      
    - code: zh
      name: Chinese
      model: large
      
    - code: es
      name: Spanish
      model: large
      
    - code: ja
      name: Japanese
      model: medium
      
  translation:
    enabled: true
    target: en
    preserve_original: true

Code-Switching

code_switching:
  enabled: true
  primary_language: en
  secondary_languages: [zh, es]
  
  output: |
    [00:01:23] The next topic is about 人工智能,
    which has been muy importante in recent years.
    
  handling:
    detect_language_per_segment: true
    tag_language_switches: true

Quality Enhancement

Post-Processing

post_processing:
  text_cleanup:
    - remove_filler_words: ["um", "uh", "like"]
    - fix_common_errors: true
    - normalize_numbers: true
    
  formatting:
    - add_punctuation: true
    - capitalize_sentences: true
    - paragraph_breaks: true
    
  speaker_attribution:
    - merge_short_segments: true
    - min_segment_duration: 1.0
    
  output_enhancement:
    - add_timestamps: true
    - highlight_keywords: true
    - generate_summary: true

Accuracy Metrics

TRANSCRIPTION QUALITY REPORT
═══════════════════════════════════════

File: meeting_2024_01_15.mp3
Duration: 45:32
Engine: Whisper Large

METRICS:
Word Error Rate (WER):  4.2%
Character Error Rate:   2.8%
Confidence Score:       0.94

SPEAKER DIARIZATION:
Speakers Detected: 4
Diarization Accuracy: 91%

PROCESSING TIME:
Total: 8m 23s
Real-time Factor: 0.18x

DETECTED ISSUES:
• Low confidence at 12:34 (background noise)
• Overlapping speech at 23:45
• Unknown speaker at 34:12

API Examples

OpenAI Whisper

import openai

# Transcribe audio
with open("meeting.mp3", "rb") as audio_file:
    transcript = openai.Audio.transcribe(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

# Access results
for segment in transcript.segments:
    print(f"[{segment.start:.2f}] {segment.text}")

AssemblyAI

import assemblyai as aai

transcriber = aai.Transcriber()

config = aai.TranscriptionConfig(
    speaker_labels=True,
    auto_chapters=True,
    entity_detection=True
)

transcript = transcriber.transcribe(
    "https://example.com/meeting.mp3",
    config=config
)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Best Practices

  • Quality Audio: Clean input = better output

  • Choose Right Model: Balance speed vs accuracy

  • Use Diarization: Identify speakers clearly

  • Post-Process: Clean up automated output

  • Verify Critical Content: Human review important

  • Consider Privacy: Handle sensitive content

  • Store Efficiently: Compress and index

  • Provide Context: Vocabulary hints help

Weekly Installs–Repositoryclaude-office-s…s/skillsGitHub Stars26First Seen–Security AuditsGen Agent Trust HubPassSocketPassSnykWarn

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量639
评分4.1 / 5.0
版本1.0.0
更新日期2026年3月25日
对比案例1 组

用户评分

4.1(28)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月25日
最后更新2026年3月25日