多媒体内容生成

Name: mmx-cli AI Agent Skill
Availability: InStock
Rating: 4.5 (15 reviews)
Author: minimax-ai

Before: 使用多个工具分别生成文本、图像、视频和音频，手动整合多模态内容，一个完整项目需要2-3天 After: 通过统一CLI生成所有类型的多媒体内容，自动协调多模态输出，3小时完成复杂项目

mmx-cli · generative ai

mmx-cli

MiniMax CLI — Agent Skill Guide

Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform.

Prerequisites

# Install
npm install -g mmx-cli

# Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json)
mmx auth login --api-key sk-xxxxx

# Verify active auth source
mmx auth status

# Or pass per-call
mmx text chat --api-key sk-xxxxx --message "Hello"

Region is auto-detected. Override with --region global or --region cn.

Agent Flags

Always use these flags in non-interactive (agent/CI) contexts:

Flag Purpose

--non-interactive Fail fast on missing args instead of prompting

--quiet Suppress spinners/progress; stdout is pure data

--output json Machine-readable JSON output

--async Return task ID immediately (video generation)

--dry-run Preview the API request without executing

--yes Skip confirmation prompts

Commands

text chat

Chat completion. Default model: MiniMax-M2.7.

mmx text chat --message <text> [flags]

Flag Type Description

--message <text> string, required, repeatable Message text. Prefix with role: to set role (e.g. "system:You are helpful", "user:Hello")

--messages-file <path> string JSON file with messages array. Use - for stdin

--system <text> string System prompt

--model <model> string Model ID (default: MiniMax-M2.7)

--max-tokens <n> number Max tokens (default: 4096)

--temperature <n> number Sampling temperature (0.0, 1.0]

--top-p <n> number Nucleus sampling threshold

--stream boolean Stream tokens (default: on in TTY)

--tool <json-or-path> string, repeatable Tool definition JSON or file path

# Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet

# Multi-turn
mmx text chat \
  --system "You are a coding assistant." \
  --message "user:Write fizzbuzz in Python" \
  --output json

# From file
cat conversation.json | mmx text chat --messages-file - --output json

stdout: response text (text mode) or full response object (json mode).

image generate

Generate images. Model: image-01.

mmx image generate --prompt <text> [flags]

Flag Type Description

--prompt <text> string, required Image description

--aspect-ratio <ratio> string e.g. 16:9, 1:1

--n <count> number Number of images (default: 1)

--subject-ref <params> string Subject reference: type=character,image=path-or-url

--out-dir <dir> string Download images to directory

--out-prefix <prefix> string Filename prefix (default: image)

mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
# stdout: image URLs (one per line in quiet mode)

mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)

video generate

Generate video. Default model: MiniMax-Hailuo-2.3. This is an async task — by default it polls until completion.

mmx video generate --prompt <text> [flags]

Flag Type Description

--prompt <text> string, required Video description

--model <model> string MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast

--first-frame <path-or-url> string First frame image

--callback-url <url> string Webhook URL for completion

--download <path> string Save video to specific file

--async boolean Return task ID immediately

--no-wait boolean Same as --async

--poll-interval <seconds> number Polling interval (default: 5)

# Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}

# Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
# stdout: ocean.mp4

video task get

Query status of a video generation task.

mmx video task get --task-id <id> [--output json]

video download

Download a completed video by task ID.

mmx video download --file-id <id> [--out <path>]

speech synthesize

Text-to-speech. Default model: speech-2.8-hd. Max 10k chars.

mmx speech synthesize --text <text> [flags]

Flag Type Description

--text <text> string Text to synthesize

--text-file <path> string Read text from file. Use - for stdin

--model <model> string speech-2.8-hd (default), speech-2.6, speech-02

--voice <id> string Voice ID (default: English_expressive_narrator)

--speed <n> number Speed multiplier

--volume <n> number Volume level

--pitch <n> number Pitch adjustment

--format <fmt> string Audio format (default: mp3)

--sample-rate <hz> number Sample rate (default: 32000)

--bitrate <bps> number Bitrate (default: 128000)

--channels <n> number Audio channels (default: 1)

--language <code> string Language boost

--subtitles boolean Include subtitle timing data

--pronunciation <from/to> string, repeatable Custom pronunciation

--sound-effect <effect> string Add sound effect

--out <path> string Save audio to file

--stream boolean Stream raw audio to stdout

mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
# stdout: hello.mp3

echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3

music generate

Generate music. Responds well to rich, structured descriptions.

Model: music-2.6-free — unlimited for API key users, RPM = 3.

mmx music generate --prompt <text> [--lyrics <text>] [flags]

Flag Type Description

--prompt <text> string Music style description (can be detailed)

--lyrics <text> string Song lyrics with structure tags. Required unless --instrumental or --lyrics-optimizer is used.

--lyrics-file <path> string Read lyrics from file. Use - for stdin

--lyrics-optimizer boolean Auto-generate lyrics from prompt. Cannot be used with --lyrics or --instrumental.

--instrumental boolean Generate instrumental music (no vocals). Cannot be used with --lyrics.

--vocals <text> string Vocal style, e.g. "warm male baritone", "bright female soprano", "duet with harmonies"

--genre <text> string Music genre, e.g. folk, pop, jazz

--mood <text> string Mood or emotion, e.g. warm, melancholic, uplifting

--instruments <text> string Instruments to feature, e.g. "acoustic guitar, piano"

--tempo <text> string Tempo description, e.g. fast, slow, moderate

--bpm <number> number Exact tempo in beats per minute

--key <text> string Musical key, e.g. C major, A minor, G sharp

--avoid <text> string Elements to avoid in the generated music

--use-case <text> string Use case context, e.g. "background music for video", "theme song"

--structure <text> string Song structure, e.g. "verse-chorus-verse-bridge-chorus"

--references <text> string Reference tracks or artists, e.g. "similar to Ed Sheeran"

--extra <text> string Additional fine-grained requirements

--aigc-watermark boolean Embed AI-generated content watermark

--format <fmt> string Audio format (default: mp3)

--sample-rate <hz> number Sample rate (default: 44100)

--bitrate <bps> number Bitrate (default: 256000)

--out <path> string Save audio to file

--stream boolean Stream raw audio to stdout

At least one of --prompt or --lyrics is required.

# With lyrics
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet

# Auto-generate lyrics from prompt
mmx music generate --prompt "Upbeat pop about summer" --lyrics-optimizer --out summer.mp3 --quiet

# Instrumental
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3 --quiet

# Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
  --vocals "male and female duet, harmonies in chorus" \
  --instruments "acoustic guitar, piano" \
  --bpm 95 \
  --lyrics-file song.txt \
  --out duet.mp3

music cover

Generate a cover version of a song based on reference audio.

Model: music-cover-free — unlimited for API key users, RPM = 3.

mmx music cover --prompt <text> (--audio <url> | --audio-file <path>) [flags]

Flag Type Description

--prompt <text> string, required Target cover style, e.g. "Indie folk, acoustic guitar, warm male vocal"

--audio <url> string URL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB)

--audio-file <path> string Local reference audio file (auto base64-encoded)

--lyrics <text> string Cover lyrics. If omitted, extracted from reference audio via ASR.

--lyrics-file <path> string Read lyrics from file. Use - for stdin

--seed <number> number Random seed 0–1000000 for reproducible results

--format <fmt> string Audio format: mp3, wav, pcm (default: mp3)

--sample-rate <hz> number Sample rate (default: 44100)

--bitrate <bps> number Bitrate (default: 256000)

--channel <n> number Channels: 1 (mono) or 2 (stereo, default)

--out <path> string Save audio to file

--stream boolean Stream raw audio to stdout

# Cover from URL
mmx music cover --prompt "Indie folk, acoustic guitar, warm male vocal" \
  --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --out cover.mp3 --quiet

# Cover from local file with custom lyrics
mmx music cover --prompt "Jazz, piano, slow" \
  --audio-file original.mp3 --lyrics-file lyrics.txt --out jazz_cover.mp3 --quiet

# Reproducible result with seed
mmx music cover --prompt "Pop, upbeat" --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --seed 42 --out cover.mp3

vision describe

Image understanding via VLM. Provide either --image or --file-id, not both.

mmx vision describe (--image <path-or-url> | --file-id <id>) [flags]

Flag Type Description

--image <path-or-url> string Local path or URL (auto base64-encoded)

--file-id <id> string Pre-uploaded file ID (skips base64)

--prompt <text> string Question about the image (default: "Describe the image.")

mmx vision describe --image photo.jpg --prompt "What breed?" --output json

stdout: description text (text mode) or full response (json mode).

search query

Web search via MiniMax.

mmx search query --q <query>

Flag Type Description

--q <query> string, required Search query

mmx search query --q "MiniMax AI" --output json --quiet

quota show

Display Token Plan usage and remaining quotas.

mmx quota show [--output json]

Tool Schema Export

Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:

# All tool-worthy commands (excludes auth/config/update)
mmx config export-schema

# Single command
mmx config export-schema --command "video generate"

Use this to dynamically register mmx commands as tools in your agent framework.

Exit Codes

Code Meaning

0 Success

1 General error

2 Usage error (bad flags, missing args)

3 Authentication error

4 Quota exceeded

5 Timeout

10 Content filter triggered

Piping Patterns

# stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'

# stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2>/dev/null

# Chain: generate image → describe it
URL=$(mmx image generate --prompt "A sunset" --quiet)
mmx vision describe --image "$URL" --quiet

# Async video workflow
TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
mmx video task get --task-id "$TASK" --output json
mmx video download --task-id "$TASK" --out robot.mp4

Configuration Precedence

CLI flags → environment variables → ~/.mmx/config.json → defaults.

# Persistent config
mmx config set --key region --value cn
mmx config show

# Environment
export MINIMAX_API_KEY=sk-xxxxx
export MINIMAX_REGION=cn

Default Model Configuration

Set per-modality defaults so you don't need --model every time:

# Set defaults
mmx config set --key default-text-model --value MiniMax-M2.7-highspeed
mmx config set --key default-speech-model --value speech-2.8-hd
mmx config set --key default-video-model --value MiniMax-Hailuo-2.3
mmx config set --key default-music-model --value music-2.6

# Use without --model
mmx text chat --message "Hello"
mmx speech synthesize --text "Hello" --out hello.mp3
mmx video generate --prompt "Ocean waves"
mmx music generate --prompt "Upbeat pop" --instrumental

# --model still overrides per-call
mmx text chat --model MiniMax-M2.7 --message "Hello"

Resolution priority: --model flag > config default > hardcoded fallback. Weekly Installs3.3KRepositoryminimax-ai/cliGitHub Stars1.3KFirst Seen5 days agoSecurity AuditsGen Agent Trust HubPass SocketPass SnykFailInstalled onopencode3.3Kcodex3.3Kgithub-copilot3.3Kgemini-cli3.3Kcursor3.3Kkimi-cli3.3K

mmx-cli

Before / After 効果比較

mmx-cli

MiniMax CLI — Agent Skill Guide

Prerequisites

Agent Flags

Commands

text chat

image generate

video generate

video task get

video download

speech synthesize

music generate

music cover

vision describe

search query

quota show

Tool Schema Export

Exit Codes

Piping Patterns

Configuration Precedence

Default Model Configuration

ユーザーレビュー (0)

統計データ

ユーザー評価

対応プラットフォーム

タイムライン