alicloud-ai-audio-tts
Uses the Qwen TTS model in Alibaba Cloud AI Model Studio for audio text-to-speech.
npx skills add cinience/alicloud-skills --skill alicloud-ai-audio-ttsBefore / After Comparison
1 组Previous text-to-speech systems produced robotic and stiff-sounding speech, lacking emotion and natural intonation. This made them unsuitable for applications requiring high human voice simulation, leading to a poor user experience.
By adopting the Alibaba Cloud Qwen TTS model, we can now generate speech with rich timbre, full emotion, and natural intonation, significantly enhancing the user auditory experience. It is widely applied in smart customer service and audiobooks.
description SKILL.md
alicloud-ai-audio-tts
Category: provider
Model Studio Qwen TTS
Validation
mkdir -p output/alicloud-ai-audio-tts
python -m py_compile skills/ai/audio/alicloud-ai-audio-tts/scripts/generate_tts.py && echo "py_compile_ok" > output/alicloud-ai-audio-tts/validate.txt
Pass criteria: command exits 0 and output/alicloud-ai-audio-tts/validate.txt is generated.
Output And Evidence
-
Save generated audio links, sample audio files, and request payloads to
output/alicloud-ai-audio-tts/. -
Keep one validation log per execution.
Critical model names
Use one of the recommended models:
-
qwen3-tts-flash -
qwen3-tts-instruct-flash -
qwen3-tts-instruct-flash-2026-01-26
Prerequisites
- Install SDK (recommended in a venv to avoid PEP 668 limits):
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- Set
DASHSCOPE_API_KEYin your environment, or adddashscope_api_keyto~/.alibabacloud/credentials(env takes precedence).
Normalized interface (tts.generate)
Request
-
text(string, required) -
voice(string, required) -
language_type(string, optional; defaultAuto) -
instruction(string, optional; recommended for instruct models) -
stream(bool, optional; default false)
Response
-
audio_url(string, when stream=false) -
audio_base64_pcm(string, when stream=true) -
sample_rate(int, 24000) -
format(string, wav or pcm depending on mode)
Quick start (Python + DashScope SDK)
import os
import dashscope
# Prefer env var for auth: export DASHSCOPE_API_KEY=...
# Or use ~/.alibabacloud/credentials with dashscope_api_key under [default].
# Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
text = "Hello, this is a short voice line."
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-instruct-flash",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English",
instruction="Warm and calm tone, slightly slower pace.",
stream=False,
)
audio_url = response.output.audio.url
print(audio_url)
Streaming notes
-
stream=Truereturns Base64-encoded PCM chunks at 24kHz. -
Decode chunks and play or concatenate to a pcm buffer.
-
The response contains
finish_reason == "stop"when the stream ends.
Operational guidance
-
Keep requests concise; split long text into multiple calls if you hit size or timeout errors.
-
Use
language_typeconsistent with the text to improve pronunciation. -
Use
instructiononly when you need explicit style/tone control. -
Cache by
(text, voice, language_type)to avoid repeat costs.
Output location
-
Default output:
output/alicloud-ai-audio-tts/audio/ -
Override base dir with
OUTPUT_DIR.
Workflow
-
Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
-
Run one minimal read-only query first to verify connectivity and permissions.
-
Execute the target operation with explicit parameters and bounded scope.
-
Verify results and save output/evidence files.
References
references/api_reference.md for parameter mapping and streaming example.
Realtime mode is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.
Voice cloning/design are provided by skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ and skills/ai/audio/alicloud-ai-audio-tts-voice-design/.
Source list: references/sources.md
Weekly Installs242Repositorycinience/alicloud-skillsGitHub Stars355First SeenFeb 7, 2026Security AuditsGen Agent Trust HubWarnSocketPassSnykPassInstalled ongemini-cli240github-copilot240codex240amp240kimi-cli240opencode240
forumUser Reviews (0)
Write a Review
No reviews yet
Statistics
User Rating
Rate this Skill