transcribe
Transcribes audio using OpenAI, with optional speaker diarization, prioritizing bundled CLI for deterministic runs to improve transcription accuracy.
npx skills add openai/skills --skill transcribeBefore / After Comparison
1 组Traditional audio transcription methods often rely on general-purpose tools, leading to low transcription accuracy, especially in multi-speaker scenarios where speakers are difficult to distinguish. The lack of deterministic operational guarantees also results in extensive post-transcription proofreading and low overall efficiency.
Leveraging OpenAI transcription capabilities and the bundled CLI, audio transcription becomes both efficient and accurate. The CLI ensures deterministic operation and supports speaker diarization, significantly boosting transcription quality, drastically reducing manual proofreading time, and improving overall work efficiency.
transcribe
Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
-
Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
-
Verify
OPENAI_API_KEYis set. If missing, ask the user to set it locally (do not ask them to paste the key). -
Run the bundled
transcribe_diarize.pyCLI with sensible defaults (fast text transcription). -
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
-
Save outputs under
output/transcribe/when working in this repo.
Decision rules
-
Default to
gpt-4o-mini-transcribewith--response-format textfor fast transcription. -
If the user wants speaker labels or diarization, use
--model gpt-4o-transcribe-diarize --response-format diarized_json. -
If audio is longer than ~30 seconds, keep
--chunking-strategy auto. -
Prompting is not supported for
gpt-4o-transcribe-diarize.
Output conventions
-
Use
output/transcribe/<job-id>/for evaluation runs. -
Use
--out-dirfor multiple files to avoid overwriting.
Dependencies (install if missing)
Prefer uv for dependency management.
uv pip install openai
If uv is unavailable:
python3 -m pip install openai
Environment
-
OPENAI_API_KEYmust be set for live API calls. -
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
-
Never ask the user to paste the full key in chat.
Skill path (set once)
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).
CLI quick start
Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txt
Diarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meeting
Plain text output (explicit):
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txt
Reference map
references/api.md: supported formats, limits, response formats, and known-speaker notes.
Weekly Installs343Repositoryopenai/skillsGitHub Stars14.5KFirst SeenFeb 1, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled oncodex303opencode289gemini-cli281github-copilot269cursor262kimi-cli260
User Reviews (0)
Write a Review
No reviews yet
Statistics
User Rating
Rate this Skill