transcribe
OpenAIを使用して音声を転写し、オプションで話者分離を行い、バンドルされたCLIを優先して決定論的な実行を行い、転写の正確性を向上させます。
npx skills add openai/skills --skill transcribeBefore / After 効果比較
1 组従来の音声転写方法は、汎用ツールに依存しがちで、転写精度が低いという問題がありました。特に複数話者がいる場面では話者の区別が難しく、さらに確定的な動作保証がないため、後工程での校正作業が膨大になり、全体的な効率が低下していました。
OpenAIの転写機能と付属のCLIを活用することで、音声転写は効率的かつ正確になります。CLIは確定的な動作を保証し、話者分離をサポートするため、転写品質が大幅に向上し、手動での校正時間を大幅に削減し、全体的な作業効率を向上させます。
transcribe
Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
-
Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
-
Verify
OPENAI_API_KEYis set. If missing, ask the user to set it locally (do not ask them to paste the key). -
Run the bundled
transcribe_diarize.pyCLI with sensible defaults (fast text transcription). -
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
-
Save outputs under
output/transcribe/when working in this repo.
Decision rules
-
Default to
gpt-4o-mini-transcribewith--response-format textfor fast transcription. -
If the user wants speaker labels or diarization, use
--model gpt-4o-transcribe-diarize --response-format diarized_json. -
If audio is longer than ~30 seconds, keep
--chunking-strategy auto. -
Prompting is not supported for
gpt-4o-transcribe-diarize.
Output conventions
-
Use
output/transcribe/<job-id>/for evaluation runs. -
Use
--out-dirfor multiple files to avoid overwriting.
Dependencies (install if missing)
Prefer uv for dependency management.
uv pip install openai
If uv is unavailable:
python3 -m pip install openai
Environment
-
OPENAI_API_KEYmust be set for live API calls. -
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
-
Never ask the user to paste the full key in chat.
Skill path (set once)
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).
CLI quick start
Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txt
Diarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meeting
Plain text output (explicit):
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txt
Reference map
references/api.md: supported formats, limits, response formats, and known-speaker notes.
Weekly Installs343Repositoryopenai/skillsGitHub Stars14.5KFirst SeenFeb 1, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled oncodex303opencode289gemini-cli281github-copilot269cursor262kimi-cli260
ユーザーレビュー (0)
レビューを書く
レビューなし
統計データ
ユーザー評価
この Skill を評価