---
id: english-pronunciation-audio
name: "english-pronunciation-audio"
url: https://skills.yangsir.net/skill/english-pronunciation-audio
author: cxwqs
domain: education
tags: ["english-learning", "tts", "telegram-bot", "language-practice", "audio-generation"]
install_count: 120
rating: 4.00 (10 reviews)
github: https://github.com/cxwqs/english-coach-telegram
---

# english-pronunciation-audio

> 此技能为Telegram用户提供英语发音音频，旨在配合`english-daily-coach`使用。它能自动从教练回复或用户提问中提取英文句子，并生成高质量的语音文件。尤其当用户询问“怎么读”时，技能会直接提供音频而非文字解释，极大简化了发音查询流程，帮助用户更高效、自然地练习英语口语。

**Stats**: 120 installs · 4.0/5 (10 reviews)

## Before / After 对比

### 提升英语发音练习效率

**Before**:

过去，用户在Telegram中练习英语时，遇到不确定发音的单词或句子，往往需要手动复制到其他翻译或发音工具中查询，再切换回Telegram。这个过程繁琐且打断学习连贯性，导致发音练习效率低下，难以形成即时反馈的习惯。

**After**:

现在，当用户在Telegram中询问“怎么读”或收到教练回复时，该技能能自动生成并发送英文句子的发音音频。用户无需离开Telegram即可即时听到标准发音，大大简化了发音查询流程，使练习更加流畅自然，显著提升了学习体验和效率。

| Metric | Before | After | Change |
|---|---|---|---|
| 发音查询耗时 | 5分钟 | 0.5分钟 | -90% |

## Readme

---
name: english-pronunciation-audio
description: Generate TTS audio for English practice replies and send via Telegram. Use together with english-daily-coach. Extracts spoken English lines, skips Chinese, and sends one audio file per reply. When user asks "怎么读", generate audio for the English sentence — do NOT explain pronunciation in text.
---

# English Pronunciation Audio

## CRITICAL Rules

1. This skill generates AUDIO files only. NEVER output text-based pronunciation guides (e.g. "vuh-LOR-unt", IPA symbols).
2. **When user asks "X怎么读"** (how to pronounce the sentence they sent): Pass ONLY the exact sentence X the user gave you. Example: user says "What would you like to order for lunch today?怎么读" → exec with `--text "What would you like to order for lunch today"`. Do NOT pass your reply or any extra text.
3. **When sending coaching reply**: Pass the full draft (你说/➡️/💬/📚/🎯) so the script extracts translation + examples + question.
4. Generate at most one audio file per reply.
5. Skip Chinese lines (lines starting with `提示:`, `你说:`, `📚`, `💬`).
6. Skip `You said:` lines. Only read `More natural:` and `Reusable phrase:` lines.
7. Skip vocabulary definitions (`• word — 释义`) and phonetic lines (IPA, "pronounced", "vuh-LOR-unt").
8. If the user ends the session, skip audio for the final reply.

## Supported Reply Shapes

Spoken content is extracted from:

- `➡️` line (translation)
- `①②③` lines (example sentences)
- Line after `🎯 我会这样继续问你：` (follow-up question)
- `Translation:` / `You can say:` / `Now you try:` / `More natural:` / `Reusable phrase:` (legacy labels)

## Script

- Use `scripts/tts_openrouter.py`.
- Pass the full draft reply text with `--text`. The script extracts spoken English automatically.
- Config: `assets/tts-config.json`.
- If script returns `no_spoken_text` or `too_long`, send text reply only.
- If script fails, do not block the text reply. Retry at most once.

## Reply Shaping

- Keep extracted English under 300 characters total.
- Keep Chinese on separate `提示:` lines.
- One template per reply for deterministic extraction.

## Debugging

- `--extract-only`: inspect extracted English as JSON.
- `--dry-run`: generate audio but skip Telegram upload.
- To verify what audio will be sent: `python3 scripts/tts_openrouter.py --text "your text" --extract-only`

---
*Source: https://skills.yangsir.net/skill/english-pronunciation-audio*
*Markdown mirror: https://skills.yangsir.net/api/skill/english-pronunciation-audio/markdown*