---
id: daily-baoyu-youtube-transcript
name: "baoyu-youtube-transcript"
url: https://skills.yangsir.net/skill/daily-baoyu-youtube-transcript
author: jimliu
domain: ai-agent-external-interaction
tags: ["youtube", "transcription", "video-processing", "data-extraction", "content-management"]
install_count: 10600
rating: 4.50 (25 reviews)
github: https://github.com/jimliu/baoyu-skills
---

# baoyu-youtube-transcript

> 自动下载 YouTube 视频的字幕和转录文本，支持手动和自动生成的字幕，无需 API 密钥或浏览器，直接调用 YouTube InnerTube API

**Stats**: 10,600 installs · 4.5/5 (25 reviews)

## Before / After 对比

### 使用效果对比

**Before**:

手动完成自动下载 YouTube 视频相关任务，需要反复操作和确认，整个过程大约需要115小时，容易出错且效率低下

**After**:

使用该 Skill 自动化处理，智能分析和执行，6小时内完成全部工作，准确率高且流程标准化

| Metric | Before | After | Change |
|---|---|---|---|
| 完成速度 | 115小时 | 6小时 | -95% |

## Readme

# baoyu-youtube-transcript

# YouTube Transcript

Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly.

Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.

## Script Directory

Scripts in `scripts/` subdirectory. `{baseDir}` = this SKILL.md's directory path. Resolve `${BUN_X}` runtime: if `bun` installed → `bun`; if `npx` available → `npx -y bun`; else suggest installing bun. Replace `{baseDir}` and `${BUN_X}` with actual values.

Script
Purpose

`scripts/main.ts`
Transcript download CLI

## Usage

```
# Default: markdown with timestamps (English)
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>

# Specify languages (priority order)
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja

# Without timestamps
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps

# With chapter segmentation
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters

# With speaker identification (requires AI post-processing)
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers

# SRT subtitle file
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt

# Translate transcript
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans

# List available transcripts
${BUN_X} {baseDir}/scripts/main.ts <url> --list

# Force re-fetch (ignore cache)
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh

```

## Options

Option
Description
Default

`<url-or-id>`
YouTube URL or video ID (multiple allowed)
Required

`--languages <codes>`
Language codes, comma-separated, in priority order
`en`

`--format <fmt>`
Output format: `text`, `srt`
`text`

`--translate <code>`
Translate to specified language code

`--list`
List available transcripts instead of fetching

`--timestamps`
Include `[HH:MM:SS → HH:MM:SS]` timestamps per paragraph
on

`--no-timestamps`
Disable timestamps

`--chapters`
Chapter segmentation from video description

`--speakers`
Raw transcript with metadata for speaker identification

`--exclude-generated`
Skip auto-generated transcripts

`--exclude-manually-created`
Skip manually created transcripts

`--refresh`
Force re-fetch, ignore cached data

`-o, --output <path>`
Save to specific file path
auto-generated

`--output-dir <dir>`
Base output directory
`youtube-transcript`

## Input Formats

Accepts any of these as video input:

- Full URL: `https://www.youtube.com/watch?v=dQw4w9WgXcQ`

- Short URL: `https://youtu.be/dQw4w9WgXcQ`

- Embed URL: `https://www.youtube.com/embed/dQw4w9WgXcQ`

- Shorts URL: `https://www.youtube.com/shorts/dQw4w9WgXcQ`

- Video ID: `dQw4w9WgXcQ`

## Output Formats

Format
Extension
Description

`text`
`.md`
Markdown with frontmatter, natural paragraphs, optional timestamps/chapters/speakers

`srt`
`.srt`
SubRip subtitle format for video players

## Output Directory

```
youtube-transcript/
├── .index.json                          # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
    ├── meta.json                        # Video metadata (title, channel, description, duration, chapters, etc.)
    ├── transcript-raw.json              # Raw transcript snippets from YouTube API (cached)
    ├── transcript-sentences.json        # Sentence-segmented transcript (split by punctuation, merged across snippets)
    ├── imgs/
    │   └── cover.jpg                    # Video thumbnail
    ├── transcript.md                    # Markdown transcript (generated from sentences)
    └── transcript.srt                   # SRT subtitle (generated from raw snippets, if --format srt)

```

- `{channel-slug}`: Channel name in kebab-case

- `{title-full-slug}`: Full video title in kebab-case

The `--list` mode outputs to stdout only (no file saved).

## Caching

On first fetch, the script saves:

- `meta.json` — video metadata, chapters, cover image path, language info

- `transcript-raw.json` — raw transcript snippets from YouTube API (`{ text, start, duration }[]`)

- `transcript-sentences.json` — sentence-segmented transcript (`{ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]`), split by sentence-ending punctuation (`.?!…。？！` etc.), timestamps proportionally allocated by character length, CJK-aware text merging

- `imgs/cover.jpg` — video thumbnail

Subsequent runs for the same video use cached data (no network calls). Use `--refresh` to force re-fetch. If a different language is requested, the cache is automatically refreshed.

SRT output (`--format srt`) is generated from `transcript-raw.json`. Text/markdown output uses `transcript-sentences.json` for natural sentence boundaries.

## Workflow

When user provides a YouTube URL and wants the transcript:

- Run with `--list` first if the user hasn't specified a language, to show available options

- Default: run with `--chapters --speakers` for the richest output (chapters + speaker identification)

- The script auto-saves cached data + output file and prints the file path

- For `--speakers` mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labels

When user only wants a cover image or metadata, running the script with any option will also cache `meta.json` and `imgs/cover.jpg`.

When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.

## Chapter & Speaker Workflow

### Chapters (`--chapters`)

The script parses chapter timestamps from the video description (e.g., `0:00 Introduction`), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as `.md` with a Table of Contents. No further processing needed.

If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.

### Speaker Identification (`--speakers`)

Speaker identification requires AI processing. The script outputs a raw `.md` file containing:

- YAML frontmatter with video metadata (title, channel, date, cover, language)

- Video description (for speaker name extraction)

- Chapter list from description (if available)

- Raw transcript in SRT format (pre-computed start/end timestamps, token-efficient)

After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:

- Read the saved `.md` file

- Read the prompt template at `{baseDir}/prompts/speaker-transcript.md`

- Process the raw transcript following the prompt:

Identify speakers using video metadata (title → guest, channel → host, description → names)

- Detect speaker turns from conversation flow, question-answer patterns, and contextual cues

- Segment into chapters (use description chapters if available, else create from topic shifts)

- Format with `**Speaker Name:**` labels, paragraph grouping (2-4 sentences), and `[HH:MM:SS → HH:MM:SS]` timestamps

- Overwrite the `.md` file with the processed transcript (keep the YAML frontmatter)

When `--speakers` is used, `--chapters` is implied — the processed output always includes chapter segmentation.

## Error Cases

Error
Meaning

Transcripts disabled
Video has no captions at all

No transcript found
Requested language not available

Video unavailable
Video deleted, private, or region-locked

IP blocked
Too many requests, try again later

Age restricted
Video requires login for age verification

Weekly Installs417Repository[jimliu/baoyu-skills](https://github.com/jimliu/baoyu-skills)GitHub Stars9.9KFirst SeenTodaySecurity Audits[Gen Agent Trust HubPass](/jimliu/baoyu-skills/baoyu-youtube-transcript/security/agent-trust-hub)[SocketPass](/jimliu/baoyu-skills/baoyu-youtube-transcript/security/socket)[SnykWarn](/jimliu/baoyu-skills/baoyu-youtube-transcript/security/snyk)Installed ongemini-cli393opencode393kimi-cli391amp391codex391github-copilot391

---
*Source: https://skills.yangsir.net/skill/daily-baoyu-youtube-transcript*
*Markdown mirror: https://skills.yangsir.net/api/skill/daily-baoyu-youtube-transcript/markdown*