---
id: sm-transcribe
name: "transcribe"
url: https://skills.yangsir.net/skill/sm-transcribe
author: openai
domain: multimedia
tags: ["speech-to-text", "audio-transcription", "openai-api", "natural-language-processing", "voice-ai"]
install_count: 1400
rating: 4.30 (23 reviews)
github: https://github.com/openai/skills
---

# transcribe

> 使用OpenAI转录音频，可选择进行说话人分离，优先使用捆绑的CLI进行确定性运行，提高转录准确性。

**Stats**: 1,400 installs · 4.3/5 (23 reviews)

## Before / After 对比

### 提升音频转录的准确性与效率

## Readme

# transcribe

# Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

## Workflow

- Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.

- Verify `OPENAI_API_KEY` is set. If missing, ask the user to set it locally (do not ask them to paste the key).

- Run the bundled `transcribe_diarize.py` CLI with sensible defaults (fast text transcription).

- Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.

- Save outputs under `output/transcribe/` when working in this repo.

## Decision rules

- Default to `gpt-4o-mini-transcribe` with `--response-format text` for fast transcription.

- If the user wants speaker labels or diarization, use `--model gpt-4o-transcribe-diarize --response-format diarized_json`.

- If audio is longer than ~30 seconds, keep `--chunking-strategy auto`.

- Prompting is not supported for `gpt-4o-transcribe-diarize`.

## Output conventions

- Use `output/transcribe/<job-id>/` for evaluation runs.

- Use `--out-dir` for multiple files to avoid overwriting.

## Dependencies (install if missing)

Prefer `uv` for dependency management.

```
uv pip install openai

```

If `uv` is unavailable:

```
python3 -m pip install openai

```

## Environment

- `OPENAI_API_KEY` must be set for live API calls.

- If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.

- Never ask the user to paste the full key in chat.

## Skill path (set once)

```
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

```

User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).

## CLI quick start

Single file (fast text default):

```
python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

```

Diarization with known speakers (up to 4):

```
python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

```

Plain text output (explicit):

```
python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

```

## Reference map

- `references/api.md`: supported formats, limits, response formats, and known-speaker notes.

Weekly Installs343Repository[openai/skills](https://github.com/openai/skills)GitHub Stars14.5KFirst SeenFeb 1, 2026Security Audits[Gen Agent Trust HubPass](/openai/skills/transcribe/security/agent-trust-hub)[SocketPass](/openai/skills/transcribe/security/socket)[SnykPass](/openai/skills/transcribe/security/snyk)Installed oncodex303opencode289gemini-cli281github-copilot269cursor262kimi-cli260

---
*Source: https://skills.yangsir.net/skill/sm-transcribe*
*Markdown mirror: https://skills.yangsir.net/api/skill/sm-transcribe/markdown*