elevenlabs-stt
利用ElevenLabs Scribe模型进行高精度语音转文本,支持多种语言,提高转录准确性和效率。
npx skills add inferen-sh/skills --skill elevenlabs-sttBefore / After 效果对比
1 组传统语音转文本识别率低,处理复杂音频效果不佳。
高精度语音转文本,准确识别复杂音频,提升工作效率。
description SKILL.md
elevenlabs-stt
ElevenLabs Speech-to-Text High-accuracy transcription with Scribe models via inference.sh CLI. Quick Start Requires inference.sh CLI (infsh). Install instructions infsh login # Transcribe audio infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}' Available Models Model ID Best For Scribe v2 scribe_v2 Latest, highest accuracy (default) Scribe v1 scribe_v1 Stable, proven 98%+ transcription accuracy 90+ languages with auto-detection Examples Basic Transcription infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}' With Speaker Identification infsh app run elevenlabs/stt --input '{ "audio": "https://meeting.mp3", "diarize": true }' Audio Event Tagging Detect laughter, applause, music, and other non-speech events: infsh app run elevenlabs/stt --input '{ "audio": "https://podcast.mp3", "tag_audio_events": true }' Specify Language infsh app run elevenlabs/stt --input '{ "audio": "https://spanish-audio.mp3", "language_code": "spa" }' Full Options infsh app run elevenlabs/stt --input '{ "audio": "https://conference.mp3", "model": "scribe_v2", "diarize": true, "tag_audio_events": true, "language_code": "eng" }' Forced Alignment Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke. infsh app run elevenlabs/forced-alignment --input '{ "audio": "https://narration.mp3", "text": "This is the exact text spoken in the audio file." }' Output Format { "words": [ {"text": "This", "start": 0.0, "end": 0.3}, {"text": "is", "start": 0.35, "end": 0.5}, {"text": "the", "start": 0.55, "end": 0.65} ], "text": "This is the exact text spoken in the audio file." } Forced Alignment Use Cases Subtitles: Precise timing for video captions Lip-sync: Align audio to animated characters Karaoke: Word-by-word timing for lyrics Accessibility: Synchronized transcripts Workflow: Video Subtitles # 1. Transcribe video audio infsh app run elevenlabs/stt --input '{ "audio": "https://video.mp4", "diarize": true }' > transcript.json # 2. Use transcript for captions infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "" }' Supported Languages 90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave language_code empty for automatic detection. Use Cases Meetings: Transcribe recordings with speaker identification Podcasts: Generate transcripts with audio event tags Subtitles: Create timed captions for videos Research: Interview transcription with diarization Accessibility: Make audio content searchable and accessible Lip-sync: Forced alignment for animation timing Related Skills # ElevenLabs TTS (reverse direction) npx skills add inference-sh/skills@elevenlabs-tts # ElevenLabs dubbing (translate audio) npx skills add inference-sh/skills@elevenlabs-dubbing # Other STT models (Whisper) npx skills add inference-sh/skills@speech-to-text # Full platform skill (all 150+ apps) npx skills add inference-sh/skills@infsh-cli Browse all audio apps: infsh app list --category audioWeekly Installs332Repositoryinferen-sh/skillsGitHub Stars159First Seen1 day agoSecurity AuditsGen Agent Trust HubPassSocketWarnSnykWarnInstalled onclaude-code268gemini-cli238amp238github-copilot238codex238kimi-cli238
forum用户评价 (0)
发表评价
暂无评价,来写第一条吧
统计数据
用户评分
为此 Skill 评分