A

asr

by @marswaveaiv
4.4(45)

ローカルの音声認識を使用して、音声ファイルをテキストに転写します。トリガー:「转录」、「transcribe」、「语音转文字」、「ASR」、「识别音频」、「把这段音频转成文字」

Automatic Speech Recognition (ASR)Speech-to-TextNatural Language Processing (NLP)Audio ProcessingVoice AIGitHub
インストール方法
npx skills add marswaveai/skills --skill asr
compare_arrows

Before / After 効果比較

1
使用前

手動での音声文字起こしは、時間と労力がかかり、効率が悪く、重要な情報を見落としがちです。オンラインサービスに依存すると、プライバシーのリスクがあり、ネットワークの制限を受ける可能性もあります。

使用後

ローカル音声認識技術を採用することで、音声ファイルを迅速かつ正確にテキストに変換します。データプライバシーを保護し、ネットワークの影響を受けず、作業効率を大幅に向上させます。

description SKILL.md


name: asr metadata: openclaw: emoji: "🎙️" requires: tools: ["coli"] description: | Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".

When to Use

  • User wants to transcribe an audio file to text
  • User provides an audio file path and asks for transcription
  • User says "转录", "识别", "transcribe", "语音转文字"

When NOT to Use

  • User wants to synthesize speech from text (use /tts)
  • User wants to create a podcast or explainer (use /podcast or /explainer)

Purpose

Transcribe audio files to text using coli asr, which runs fully offline via local speech recognition models. No API key required. Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice model) or English-only (whisper model).

Run coli asr --help for current CLI options and supported flags.

Hard Constraints

  • No shell scripts. Use direct commands only.
  • Always read config following shared/config-pattern.md before any interaction
  • Follow shared/common-patterns.md for interaction patterns
  • Never ask more than one question at a time

Interaction Flow

Step 0: Prerequisites Check

Before config setup, silently check the environment:

COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
IssueAction
coli not foundBlock. Tell user to run npm install -g @marswave/coli first
ffmpeg not foundWarn (WAV files still work). Suggest brew install ffmpeg / sudo apt install ffmpeg
Models not downloadedInform user: first transcription will auto-download models (~60MB) to ~/.coli/models/

If coli is missing, stop here and do not proceed.

Step 0: Config Setup

Follow shared/config-pattern.md Step 0.

Initial defaults:

# 当前目录:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"

# 全局:
mkdir -p "$HOME/.listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json"
CONFIG_PATH="$HOME/.listenhub/asr/config.json"

Config summary display:

当前配置 (asr):
  模型:sensevoice / whisper-tiny.en
  润色:开启 / 关闭

Setup Flow (first run or reconfigure)

Ask in order:

  1. model: "默认使用哪个语音识别模型?"

    • "sensevoice(推荐)" — 支持中英日韩粤,可检测语言、情绪、音频事件
    • "whisper-tiny.en" — 仅英文
  2. polish: "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"

    • "是(推荐)" → polish: true
    • "否,保留原始转录" → polish: false

Save all answers at once after collecting them.

Step 1: Get Audio File

If the user hasn't provided a file path, ask:

"请提供要转录的音频文件路径。"

Verify the file exists before proceeding.

Step 2: Confirm

准备转录:

  文件:{filename}
  模型:{model}
  润色:{是 / 否}

继续?

Step 3: Transcribe

Run coli asr with JSON output (to get metadata):

coli asr -j --model {model} "{file}"

On first run, coli will automatically download the required model. This may take a moment — inform the user if models haven't been downloaded yet.

Parse the JSON result to extract text, lang, emotion, event, duration.

Step 4: Polish (if enabled)

If polish is true, take the raw text from the transcription result and rewrite it to fix punctuation, remove filler words, and improve readability. Preserve the original meaning and speaker intent. Do not summarize or paraphrase.

Step 5: Present Result

Display the transcript directly in the conversation:

转录完成

{transcript text}

─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s

If polished, show the polished version with a note that it was AI-refined. Offer to show the raw original on request.

Step 6: Export as Markdown (optional)

After presenting the result, ask:

Question: "保存为 Markdown 文件到当前目录?"
Options:
  - "是" — save to current directory
  - "否" — done

If yes, write {audio-filename}-transcript.md to the current working directory (where the user is running Claude Code). The file should contain the transcript text (polished version if polish was enabled), with a front-matter header:

---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

Composability

  • Invoked by: future skills that need to transcribe recorded audio
  • Invokes: nothing

Examples

"帮我转录这个文件 meeting.m4a"

  1. Check prerequisites
  2. Read config
  3. Confirm: meeting.m4a, sensevoice, polish on
  4. Run coli asr -j --model sensevoice "meeting.m4a"
  5. Polish the raw text
  6. Display inline

"transcribe interview.wav, no polish"

  1. Check prerequisites
  2. Read config
  3. Override polish to false for this session
  4. Run coli asr -j --model sensevoice "interview.wav"
  5. Display raw transcript inline

forumユーザーレビュー (0)

レビューを書く

効果
使いやすさ
ドキュメント
互換性

レビューなし

統計データ

インストール数1.1K
評価4.4 / 5.0
バージョン
更新日2026年3月16日
比較事例1 件

ユーザー評価

4.4(45)
5
0%
4
0%
3
0%
2
0%
1
0%

この Skill を評価

0.0

対応プラットフォーム

🔧Claude Code
🔧OpenClaw
🔧OpenCode
🔧Codex
🔧Gemini CLI
🔧GitHub Copilot
🔧Amp
🔧Kimi CLI

タイムライン

作成2026年3月16日
最終更新2026年3月16日