Home/多媒体与音视频/ai-voice-cloning
A

ai-voice-cloning

by @inferen-shv
4.8(97)

Utilizes AI technology to clone specific human voices, generating highly realistic speech for personalized voice assistants, audiobooks, or film/TV dubbing.

Voice SynthesisText-to-Speech (TTS)ElevenLabsResemble AIDeepfake AudioGitHub
Installation
npx skills add inferen-sh/skills --skill ai-voice-cloning
compare_arrows

Before / After Comparison

1
Before

Traditional speech synthesis technology lacks personalization, making it difficult to simulate the timbre and emotion of a real human voice, resulting in a poor user experience and limited application scenarios.

After

Leveraging this skill, one can quickly clone a specific human voice to generate natural and realistic voice content. This is widely applicable in scenarios such as audiobooks and virtual assistants, thereby enhancing user immersion.

description SKILL.md

ai-voice-cloning

AI Voice Generation

Generate natural AI voices via inference.sh CLI.

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.",
  "voice": "af_sarah"
}'

Available Models

Model App ID Best For

ElevenLabs TTS elevenlabs/tts Premium quality, 22+ voices, 32 languages

ElevenLabs Voice Changer elevenlabs/voice-changer Transform existing voice recordings

Kokoro TTS infsh/kokoro-tts Natural, multiple voices

DIA infsh/dia-tts Conversational, expressive

Chatterbox infsh/chatterbox Casual, entertainment

Higgs infsh/higgs-tts Professional narration

VibeVoice infsh/vibevoice Emotional range

Kokoro Voice Library

American English

Voice ID Gender Style

af_sarah Female Warm, friendly

af_nicole Female Professional

af_sky Female Youthful

am_michael Male Authoritative

am_adam Male Conversational

am_echo Male Clear, neutral

British English

Voice ID Gender Style

bf_emma Female Refined

bf_isabella Female Warm

bm_george Male Classic

bm_lewis Male Modern

Voice Generation Examples

Professional Narration

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

infsh app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

# Generate dialogue between two speakers
# Speaker 1
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# Speaker 2
infsh app run infsh/kokoro-tts --input '{
  "prompt": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# Merge conversation
infsh app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

Long-Form Content

Chunked Processing

For content over 5000 characters, split into chunks:

# Process long text in chunks
TEXT="Your very long text here..."

# Split and generate
# Chunk 1
infsh app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# Chunk 2
infsh app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# Merge chunks
infsh app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

Voice + Video Workflow

Add Voiceover to Video

# 1. Generate voiceover
infsh app run infsh/kokoro-tts --input '{
  "prompt": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. Merge with video
infsh app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

Create Talking Head

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. Animate with avatar
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

Speed and Pacing

Speed Effect Use For

0.8 Slow, deliberate Audiobooks, meditation

0.9 Slightly slow Education, tutorials

1.0 Normal General purpose

1.1 Slightly fast Commercials, energy

1.2 Fast Quick announcements

# Slow narration
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

Punctuation for Pacing

Use punctuation to control speech rhythm:

Punctuation Effect

Period . Full pause

Comma , Brief pause

... Extended pause

! Emphasis

? Question intonation

- Quick break

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

  • Match voice to content - Professional voice for business, casual for social

  • Use punctuation - Control pacing with periods and commas

  • Keep sentences short - Easier to generate and sounds more natural

  • Test different voices - Same text sounds different across voices

  • Adjust speed - Slightly slower often sounds more natural

  • Break long content - Process in chunks for consistency

Use Cases

  • Voiceovers - Video narration, commercials

  • Audiobooks - Full book narration

  • Podcasts - AI hosts and guests

  • E-learning - Course narration

  • Accessibility - Screen reader content

  • IVR - Phone system messages

  • Content localization - Translate and voice

Related Skills

# ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs voice changer (transform recordings)
npx skills add inference-sh/skills@elevenlabs-voice-changer

# All TTS models
npx skills add inference-sh/skills@text-to-speech

# Podcast creation
npx skills add inference-sh/skills@ai-podcast-creation

# AI avatars
npx skills add inference-sh/skills@ai-avatar-video

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Full platform skill
npx skills add inference-sh/skills@infsh-cli

Browse audio apps: infsh app list --category audio Weekly Installs4.4KRepositoryinferen-sh/skillsGitHub Stars159First Seen6 days agoSecurity AuditsGen Agent Trust HubPassSocketWarnSnykPassInstalled onclaude-code3.5Kgemini-cli3.2Kcodex3.2Kamp3.2Kkimi-cli3.2Kgithub-copilot3.2K

forumUser Reviews (0)

Write a Review

Effect
Usability
Docs
Compatibility

No reviews yet

Statistics

Installs6.5K
Rating4.8 / 5.0
Version
Updated2026年3月17日
Comparisons1

User Rating

4.8(97)
5
0%
4
0%
3
0%
2
0%
1
0%

Rate this Skill

0.0

Compatible Platforms

🔧Claude Code
🔧OpenClaw
🔧OpenCode
🔧Codex
🔧Gemini CLI
🔧GitHub Copilot
🔧Amp
🔧Kimi CLI

Timeline

Created2026年3月17日
Last Updated2026年3月17日