首页/AI 工程/elevenlabs-stt
E

elevenlabs-stt

by @inferen-shv1.0.0
0.0(0)

利用ElevenLabs Scribe模型进行高精度语音转文本,支持多种语言,提高转录准确性和效率。

Speech-to-TextAudio TranscriptionAI Voice RecognitionElevenLabs STTGitHub
安装方式
npx skills add inferen-sh/skills --skill elevenlabs-stt
compare_arrows

Before / After 效果对比

1
使用前

传统语音转文本识别率低,处理复杂音频效果不佳。

使用后

高精度语音转文本,准确识别复杂音频,提升工作效率。

description SKILL.md

elevenlabs-stt

ElevenLabs Speech-to-Text High-accuracy transcription with Scribe models via inference.sh CLI. Quick Start Requires inference.sh CLI (infsh). Install instructions infsh login # Transcribe audio infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}' Available Models Model ID Best For Scribe v2 scribe_v2 Latest, highest accuracy (default) Scribe v1 scribe_v1 Stable, proven 98%+ transcription accuracy 90+ languages with auto-detection Examples Basic Transcription infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}' With Speaker Identification infsh app run elevenlabs/stt --input '{ "audio": "https://meeting.mp3", "diarize": true }' Audio Event Tagging Detect laughter, applause, music, and other non-speech events: infsh app run elevenlabs/stt --input '{ "audio": "https://podcast.mp3", "tag_audio_events": true }' Specify Language infsh app run elevenlabs/stt --input '{ "audio": "https://spanish-audio.mp3", "language_code": "spa" }' Full Options infsh app run elevenlabs/stt --input '{ "audio": "https://conference.mp3", "model": "scribe_v2", "diarize": true, "tag_audio_events": true, "language_code": "eng" }' Forced Alignment Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke. infsh app run elevenlabs/forced-alignment --input '{ "audio": "https://narration.mp3", "text": "This is the exact text spoken in the audio file." }' Output Format { "words": [ {"text": "This", "start": 0.0, "end": 0.3}, {"text": "is", "start": 0.35, "end": 0.5}, {"text": "the", "start": 0.55, "end": 0.65} ], "text": "This is the exact text spoken in the audio file." } Forced Alignment Use Cases Subtitles: Precise timing for video captions Lip-sync: Align audio to animated characters Karaoke: Word-by-word timing for lyrics Accessibility: Synchronized transcripts Workflow: Video Subtitles # 1. Transcribe video audio infsh app run elevenlabs/stt --input '{ "audio": "https://video.mp4", "diarize": true }' > transcript.json # 2. Use transcript for captions infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "" }' Supported Languages 90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave language_code empty for automatic detection. Use Cases Meetings: Transcribe recordings with speaker identification Podcasts: Generate transcripts with audio event tags Subtitles: Create timed captions for videos Research: Interview transcription with diarization Accessibility: Make audio content searchable and accessible Lip-sync: Forced alignment for animation timing Related Skills # ElevenLabs TTS (reverse direction) npx skills add inference-sh/skills@elevenlabs-tts # ElevenLabs dubbing (translate audio) npx skills add inference-sh/skills@elevenlabs-dubbing # Other STT models (Whisper) npx skills add inference-sh/skills@speech-to-text # Full platform skill (all 150+ apps) npx skills add inference-sh/skills@infsh-cli Browse all audio apps: infsh app list --category audioWeekly Installs332Repositoryinferen-sh/skillsGitHub Stars159First Seen1 day agoSecurity AuditsGen Agent Trust HubPassSocketWarnSnykWarnInstalled onclaude-code268gemini-cli238amp238github-copilot238codex238kimi-cli238

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量0
评分0.0 / 5.0
版本1.0.0
更新日期2026年3月18日
对比案例1 组

用户评分

0.0(0)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月18日
最后更新2026年3月18日