P
podcast-generation
by @sickn33v1.0.0
0.0(0)
"Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio..."
安装方式
npx skills add sickn33/antigravity-awesome-skills --skill podcast-generationcompare_arrows
Before / After 效果对比
0 组description 文档
name: podcast-generation description: "Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio..." risk: unknown source: community date_added: "2026-02-27"
Podcast Generation with GPT Realtime Mini
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
Quick Start
- Configure environment variables for Realtime API
- Connect via WebSocket to Azure OpenAI Realtime endpoint
- Send text prompt, collect PCM audio chunks + transcript
- Convert PCM to WAV format
- Return base64-encoded audio to frontend for playback
Environment Configuration
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
Note: Endpoint should NOT include /openai/v1/ - just the base URL.
Core Workflow
Backend Audio Generation
from openai import AsyncOpenAI
import base64
# Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
client = AsyncOpenAI(
websocket_base_url=ws_url,
api_key=api_key
)
audio_chunks = []
transcript_parts = []
async with client.realtime.connect(model="gpt-realtime-mini") as conn:
# Configure for audio-only output
await conn.session.update(session={
"output_modalities": ["audio"],
"instructions": "You are a narrator. Speak naturally."
})
# Send text to narrate
await conn.conversation.item.create(item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": prompt}]
})
await conn.response.create()
# Collect streaming events
async for event in conn:
if event.type == "response.output_audio.delta":
audio_chunks.append(base64.b64decode(event.delta))
elif event.type == "response.output_audio_transcript.delta":
transcript_parts.append(event.delta)
elif event.type == "response.done":
break
# Convert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks)
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
Frontend Audio Playback
// Convert base64 WAV to playable blob
const base64ToBlob = (base64, mimeType) => {
const bytes = atob(base64);
const arr = new Uint8Array(bytes.length);
for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
return new Blob([arr], { type: mimeType });
};
const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();
Voice Options
| Voice | Character | |-------|-----------| | alloy | Neutral | | echo | Warm | | fable | Expressive | | onyx | Deep | | nova | Friendly | | shimmer | Clear |
Realtime API Events
response.output_audio.delta- Base64 audio chunkresponse.output_audio_transcript.delta- Transcript textresponse.done- Generation completeerror- Handle withevent.error.message
Audio Format
- Input: Text prompt
- Output: PCM audio (24kHz, 16-bit, mono)
- Storage: Base64-encoded WAV
References
- Full architecture: See references/architecture.md for complete stack design
- Code examples: See references/code-examples.md for production patterns
- PCM conversion: Use scripts/pcm_to_wav.py for audio format conversion
When to Use
This skill is applicable to execute the workflow or actions described in the overview.
forum用户评价 (0)
发表评价
效果
易用性
文档
兼容性
暂无评价,来写第一条吧
统计数据
安装量0
评分0.0 / 5.0
版本1.0.0
更新日期2026年3月16日
对比案例0 组
用户评分
0.0(0)
5
0%
4
0%
3
0%
2
0%
1
0%
为此 Skill 评分
0.0
兼容平台
🔧Claude Code
时间线
创建2026年3月16日
最后更新2026年3月16日