V

video-understand

by @heygen-comv1.0.0
4.7(65)

使用 ffmpeg 提取帧和 Whisper 转录本地理解视频内容,完全离线运行,无需 API 密钥

video-processingtranscriptionwhisperofflineffmpegGitHub
安装方式
npx skills add heygen-com/skills --skill video-understand
compare_arrows

Before / After 效果对比

1
使用前

手动完成使用 ffmpeg 提取帧和 相关任务,需要反复操作和确认,整个过程大约需要98分钟,容易出错且效率低下

使用后

使用该 Skill 自动化处理,智能分析和执行,9分钟内完成全部工作,准确率高且流程标准化

description SKILL.md

video-understand

video-understand

Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.

Prerequisites

  • ffmpeg + ffprobe (required): brew install ffmpeg

  • openai-whisper (optional, for transcription): pip install openai-whisper

Commands

# Scene detection + transcribe (default)
python3 skills/video-understand/scripts/understand_video.py video.mp4

# Keyframe extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe

# Regular interval extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval

# Limit frames extracted
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10

# Use a larger Whisper model
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small

# Frames only, skip transcription
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe

# Quiet mode (JSON only, no progress)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q

# Output to file
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json

CLI Options

Flag Description

video Input video file (positional, required)

-m, --mode Extraction mode: scene (default), keyframe, interval

--max-frames Maximum frames to keep (default: 20)

--whisper-model Whisper model size: tiny, base, small, medium, large (default: base)

--no-transcribe Skip audio transcription, extract frames only

-o, --output Write result JSON to file instead of stdout

-q, --quiet Suppress progress messages, output only JSON

Extraction Modes

Mode How it works Best for

scene Detects scene changes via ffmpeg select='gt(scene,0.3)' Most videos, varied content

keyframe Extracts I-frames (codec keyframes) Encoded video with natural keyframe placement

interval Evenly spaced frames based on duration and max-frames Fixed sampling, predictable output

If scene mode detects no scene changes, it automatically falls back to interval mode.

Output

The script outputs JSON to stdout (or file with -o). See references/output-format.md for the full schema.

{
  "video": "video.mp4",
  "duration": 18.076,
  "resolution": {"width": 1224, "height": 1080},
  "mode": "scene",
  "frames": [
    {"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
  ],
  "frame_count": 12,
  "transcript": [
    {"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
  ],
  "text": "Full transcript...",
  "note": "Use the Read tool to view frame images for visual understanding."
}

Use the Read tool on frame image paths to visually inspect extracted frames.

References

  • references/output-format.md -- Full JSON output schema documentation

Weekly Installs241Repositoryheygen-com/skillsGitHub Stars91First Seen6 days agoSecurity AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled onclaude-code217cline60gemini-cli60kimi-cli60codex60cursor60

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量1.7K
评分4.7 / 5.0
版本1.0.0
更新日期2026年3月23日
对比案例1 组

用户评分

4.7(65)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月23日
最后更新2026年3月23日