定制音乐创作效率

Before: 传统音乐创作流程中，为品牌定制一段 30 秒的背景音乐或宣传曲，从构思到最终交付可能需要数小时甚至数天，且修改成本高昂，迭代周期长。 After: 通过 AI 音乐生成，只需几分钟即可根据文本描述快速生成多种风格的 30 秒定制音乐，大幅缩短创作周期，降低试错成本，实现快速迭代。

elevenlabs-music-generation

ElevenLabs AI Music Generation — Pro Pack on RunComfy

Generate full songs and instrumental tracks from a text description — studio-quality 44.1 kHz stereo, 5 seconds to 5 minutes, with section-level structure control. ElevenLabs Music on the RunComfy Model API, called through the runcomfy CLI.

runcomfy.com · ElevenLabs Music model · CLI docs

Install this skill

npx skills add agentspace-so/runcomfy-agent-skills --skill elevenlabs-music-generation -g

Powered by the RunComfy CLI

# 1. Install (one of — see runcomfy-cli skill for details)
npm i -g @runcomfy/cli                              # global install
npx -y @runcomfy/cli --version                      # zero-install

# 2. Sign in
runcomfy login                                      # or in CI: export RUNCOMFY_TOKEN=<token>

# 3. Generate music
runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{"prompt": "..."}' \
  --output-dir ./out

CLI deep dive: runcomfy-cli skill.

When to use ElevenLabs Music

ElevenLabs Music's strength is structured songs with real vocals — it takes a style brief plus lyrics with section markers and returns a coherent, mixed track. Pick it for:

Full vocal songs — verse/chorus structure, multilingual lyrics, consistent meter
Instrumental beds — force_instrumental: true for background music, podcast intros, game loops
Short brand assets — jingles, stingers, theme music (5–30 s)
Long-form tracks — up to 5 minutes in a single call
Commercial work — output is commercial-friendly

If the user just wants ambient sound or a one-off SFX (thunder, footsteps), that's a sound-effects task, not music — ElevenLabs Music is for songs and tracks.

Endpoint + input schema

Model: elevenlabs/elevenlabs/music-generation

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	Style description and lyrics with section markers. See prompting tips
`music_length_ms`	int	no	`40000`	Output duration in ms. 5000–300000 (5 s – 5 min)
`force_instrumental`	bool	no	`false`	`true` = instrumental only, no vocals
`output_format`	string	no	`mp3_standard`	`mp3_standard` (default), or WAV — see the model page API tab for the full format list

Output: 44.1 kHz stereo audio. The result JSON contains the generated audio URL — the CLI downloads it into --output-dir.

Pricing: ~$0.0083 per second of generated audio (30 s ≈ $0.25, 60 s ≈ $0.50, 5 min ≈ $2.49). Cost scales with music_length_ms, so draft short and finalize long.

How to invoke

Full vocal song with structure:

runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{
    "prompt": "Upbeat indie-pop anthem, bright electric guitars, driving drums, 120 BPM, female lead vocal. [Intro 8 bars] instrumental build. [Verse] Chalk on the palms, laces double-knotted, morning on the ridge. [Chorus] We rise, we strike, we never fade out. [Bridge] soft breakdown, just piano and voice. [Outro] full band, fade.",
    "music_length_ms": 60000
  }' \
  --output-dir ./out

Instrumental background bed:

runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{
    "prompt": "Calm lo-fi hip-hop instrumental for a study playlist. Warm Rhodes piano, soft vinyl crackle, mellow boom-bap drums, 75 BPM. No vocals. Consistent loop-friendly groove throughout.",
    "music_length_ms": 90000,
    "force_instrumental": true
  }' \
  --output-dir ./out

Short brand jingle:

runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{
    "prompt": "5-second cheerful brand stinger, bright marimba and a single uplifting chord resolve, no vocals.",
    "music_length_ms": 5000,
    "force_instrumental": true
  }' \
  --output-dir ./out

Prompting tips

ElevenLabs Music reads one prompt field that carries both the style brief and the lyrics. Structure it well:

Lead with the style brief: genre, mood, tempo (BPM), key instruments, vocal type. "Upbeat indie-pop anthem, bright electric guitars, 120 BPM, female lead vocal."
Then the lyrics with section markers: [Intro], [Verse], [Chorus], [Bridge], [Outro]. Add approximate durations or bar counts — [Intro 8 bars], [Verse 16 bars].
Keep lyrical meter consistent — even syllable counts per line, clear rhyme scheme. The model follows meter; sloppy meter produces awkward phrasing.
Name lead instruments and mix priorities — "electric guitar carries the chorus, drums sit back in the verse."
For instrumental, set force_instrumental: true AND say "no vocals" in the prompt — belt and suspenders.
Multilingual: write the lyrics in the target language; annotate accent/language inline if needed ([Verse] (sung in Brazilian Portuguese) ...).
Avoid contradictory style instructions — "aggressive metal" + "soft lullaby" in one prompt confuses the model. One coherent direction per call.
Draft short, finalize long: validate the direction with a 30–45 s draft (music_length_ms: 35000) before paying for a 5-minute render.

Common patterns

Theme song for a video

Full brief + lyrics + [Intro]/[Verse]/[Chorus] structure, music_length_ms matched to the video length

Podcast intro / outro

force_instrumental: true, 10–20 s, "loop-friendly, clean ending"

Game background loop

force_instrumental: true, describe "seamless loop", 60–120 s, consistent groove

Multilingual release (same song, multiple languages)

One call per language, identical style brief, swap only the lyric lines

Iterate then commit

Draft at music_length_ms: 35000 to lock genre/tempo/structure → final render at full length

Limitations

One prompt field carries everything (style + lyrics). There is no separate "lyrics" parameter.
5 s – 5 min per call (music_length_ms 5000–300000). For longer pieces, generate sections and stitch externally.
Cost scales with duration — a 5-minute render is ~10× a 30-second one.
force_instrumental is the only vocal toggle — you can't request specific voice identities or clone a singer through this endpoint.
This skill pins ElevenLabs Music specifically. For sound effects, text-to-speech, or voice cloning, that's a different ElevenLabs capability not exposed through this endpoint.

Exit codes

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill invokes runcomfy run elevenlabs/elevenlabs/music-generation with the JSON body. The CLI POSTs to the RunComfy Model API, polls request status, fetches the result, and downloads the generated audio file into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

Install via verified package manager only. Use npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf — if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.
Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.
Input boundary (shell injection): the prompt is passed as a JSON string via --input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content, even with backticks, quotes, or $(...) patterns.
Lyrics provenance: if the user supplies lyrics, confirm they have the rights to them. Generating music around copyrighted lyrics is the operator's responsibility — the skill does not check.
Outbound endpoints (allowlist): only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated audio). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB.
Scope of bash usage: the skill only invokes runcomfy <subcommand> — npm / npx lines are one-time operator setup, not commands the skill executes per call.