alicloud-ai-audio-tts
Alibaba Cloud AIモデルスタジオでQwen TTSモデルを使用して音声テキストを音声に変換します。
npx skills add cinience/alicloud-skills --skill alicloud-ai-audio-ttsBefore / After 効果比較
1 组これまでのテキスト読み上げシステムでは、合成された音声が機械的で不自然に聞こえ、感情や自然な抑揚に欠けていました。そのため、高度な人声シミュレーションが必要なアプリケーションには不向きで、ユーザー体験が劣悪でした。
Alibaba Cloud Qwen TTSモデルを導入したことで、音色豊かで感情表現に富み、自然な抑揚の音声を生成できるようになり、ユーザーの聴覚体験を大幅に向上させました。スマートカスタマーサービスやオーディオブックなど、幅広い分野で活用されています。
description SKILL.md
alicloud-ai-audio-tts
Category: provider
Model Studio Qwen TTS
Validation
mkdir -p output/alicloud-ai-audio-tts
python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt
Pass criteria: command exits 0 and output/alicloud-ai-audio-tts/validate.txt is generated.
Output And Evidence
-
Save generated audio links, sample audio files, and request payloads to
output/alicloud-ai-audio-tts/. -
Keep one validation log per execution.
Critical model names
Use one of the recommended models:
-
qwen3-tts-flash -
qwen3-tts-instruct-flash -
qwen3-tts-instruct-flash-2026-01-26
Prerequisites
- Install SDK (recommended in a venv to avoid PEP 668 limits):
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- Set
DASHSCOPE_API_KEYin your environment, or adddashscope_api_keyto~/.alibabacloud/credentials(env takes precedence).
Normalized interface (tts.generate)
Request
-
text(string, required) -
voice(string, required) -
language_type(string, optional; defaultAuto) -
instruction(string, optional; recommended for instruct models) -
stream(bool, optional; default false)
Response
-
audio_url(string, when stream=false) -
audio_base64_pcm(string, when stream=true) -
sample_rate(int, 24000) -
format(string, wav or pcm depending on mode)
Quick start (Python + DashScope SDK)
import os
import dashscope
# Prefer env var for auth: export DASHSCOPE_API_KEY=...
# Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
# Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-instruct-flash",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English",
instruction="Warm and calm tone, slightly slower pace.",
stream=False,
)
audio_url = response.output.audio.url
print(audio_url)
Streaming notes
-
stream=Truereturns Base64-encoded PCM chunks at 24kHz. -
Decode chunks and play or concatenate to a pcm buffer.
-
The response contains
finish_reason == "stop"when the stream ends.
Operational guidance
-
Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
-
Use
language_typeconsistent with the text to improve pronunciation. -
Use
instructiononly when you need explicit style/tone control. -
Cache by
(text, voice, language_type)to avoid repeat costs.
Output location
-
Default output:
output/alicloud-ai-audio-tts/audio/ -
Override base dir with
OUTPUT_DIR.
Workflow
-
Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
-
Run one minimal read-only query first to verify connectivity and permissions.
-
Execute the target operation with explicit parameters and bounded scope.
-
Verify results and save output/evidence files.
References
references/api_reference.md for parameter mapping and streaming example.
Realtime mode is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.
Voice cloning/design are provided by skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ and skills/ai/audio/alicloud-ai-audio-tts-voice-design/.
Source list: references/sources.md
Weekly Installs242Repositorycinience/alicloud-skillsGitHub Stars355First SeenFeb 7, 2026Security AuditsGen Agent Trust HubWarnSocketPassSnykPassInstalled ongemini-cli240github-copilot240codex240amp240kimi-cli240opencode240
forumユーザーレビュー (0)
レビューを書く
レビューなし
統計データ
ユーザー評価
この Skill を評価