wonda-cli
在终端自动生成图片、视频、音乐和音频,支持媒体编辑组合,一键发布到 LinkedIn、Reddit、X/Twitter 等社交平台
npx skills add degausai/wonda --skill wonda-cliBefore / After 效果对比
1 组手动使用多个工具分别生成图片、视频和音频素材,再逐个登录社交平台手动发布内容,一套内容需要 2-3 小时完成
一条命令自动生成所有媒体素材,自动编辑组合,批量发布到多个社交平台,20 分钟完成全套内容创作和分发
wonda-cli
Wonda CLI
Wonda CLI is a content creation toolkit for terminal-based agents. Use it to generate images, videos, music, and audio; edit and compose media; publish to social platforms; and research/automate across LinkedIn, Reddit, and X/Twitter.
Install
If wonda is not found on PATH, install it first:
# npm
npm i -g @degausai/wonda
# Homebrew
brew tap degausai/tap && brew install wonda
Setup
-
Auth:
wonda auth login(opens browser, recommended) or setWONDERCAT_API_KEYenv var -
Verify:
wonda auth check
Access tiers
Not all commands are available to every account type:
Tier Access
Anonymous (temporary account, no login)
Media upload/download, editing (video/edit, image/edit, audio/edit), transcription, social publishing, scraping, analytics
Free (logged in, Basic/Free plan)
Everything above + generation (image/generate, video/generate, etc.), styles, recipes, brand
Paid (Plus, Pro, or Absolute plan)
Everything above + video analysis (requires credits), skill commands (wonda skill install/list/get)
If a command returns a 403 error, check your plan at https://app.wondercat.ai/settings/billing.
Social signups (Instagram, TikTok, etc.)
Drive them with the wonda device primitives + a throwaway mailbox from wonda email. The screenshot → decide → tap/type/swipe loop is how these flows work — there's no shortcut command, and that's fine: social apps change their UI constantly and any canned flow would drift faster than you could maintain it.
Standard loop:
-
wonda email account create --random→ save{email, password}. -
wonda device create→ pick areadydevice (pollwonda device get <id> --fields status). -
wonda device launch <device-id> com.instagram.android(orcom.zhiliaoapp.musicallyfor TikTok). Fall back towonda device open-urlif you'd rather start in the web flow. -
Loop:
wonda device screenshot <device-id> > s.json→ decode the base64 PNG → read → pick an action →tap | type | swipe | key→ screenshot again. Use--text "SomeButtonLabel"ontapbefore guessing coordinates; fall back to--x --yread off the screenshot for elements without matching text (number pickers, date spinners, etc.). -
When the app sends a verification email,
wonda email inbox wait <email> --timeout 120— returns{codes: ["483921"], links: [...]}with the 6-digit code already extracted.wonda device type <device-id> --text "<code>"to feed it back. -
For number/date spinners: tap on the highlighted cell, Android pops up a numeric or alphabetic keyboard,
wonda device type --text "<value>"replaces the selected text.wonda device key --code 4dismisses the keyboard when done.
Consent-like taps — anything that accepts Terms/Privacy/Cookies, grants permissions, or publishes something — stop and ask the user for explicit confirmation in chat before tapping. That isn't about signups specifically; it applies to any automation step.
Rate-limit signals — if the app shows you a visual puzzle ("we want to make sure you're a real person"), stop and hand off to the user with wonda device stream <id> (see next section). Don't click through puzzles yourself.
Handing off to a human
If automation hits a screen that requires a human to take over (consent flow you shouldn't auto-accept, ambiguous UI, step where the user prefers to act themselves), use wonda device stream <device-id> — returns a playerUrl signed with a short-lived JWT (1h). Give that URL to the user, they act in their own browser, and automation can resume afterward.
wonda device stream <device-id>
# → { "streamUrl": "wss://…", "playerUrl": "https://…", "deviceType": "social" }
Global output flags
All commands support these output control flags:
-
--json— Force JSON output (auto-enabled when stdout is piped) -
--quiet— Only output the primary identifier (job ID, media ID, etc.) — ideal for scripting -
-o <path>— Download output to file (implies--wait) -
--fields status,outputs— Select specific JSON fields -
--jq '.outputs[0].media.url'— Filter JSON output with a jq expression
How to think about content creation
You are a marketing director with access to a full production toolkit. Before touching any tool, think:
-
What product category? (beauty, food, tech, fashion, fitness, etc.)
-
What format performs for this category? (UGC memes for everyday products, cinematic for luxury, before/after for transformations, testimonial for services)
-
What's the hook? (relatable scenario, surprising twist, aspirational lifestyle, social proof)
-
What specific scene? (not "product on table" but "person discovering the product in a funny situation")
Decision flow
When asked to create content, follow this order:
Step 1: Gather context
wonda brand # Brand identity, colors, products, audience
wonda analytics instagram # What content performs well
wonda scrape social --handle @competitor --platform instagram --wait # Competitive research (if relevant)
# Cross-platform research (if relevant)
wonda x search "topic OR keyword" # Find conversations on X/Twitter
wonda x user-tweets @competitor # Competitor's recent tweets
wonda reddit search "topic" --sort top --time week # Reddit discussions
wonda reddit feed marketing --sort hot # Subreddit trends
wonda linkedin search "topic" --type COMPANIES # LinkedIn company/people research
wonda linkedin profile competitor-vanity-name # LinkedIn profile intel
Step 2: Check content skills
Content skills are step-by-step guides for common content types. Each skill tells you exactly which models, prompts, and editing operations to use — and in what order. ALWAYS check skills before building from scratch.
wonda skill list # Browse all content skills
wonda skill get <slug> # Full step-by-step guide for a skill
Full skill index:
Slug Description Input
product-video Product/scene video — prompt library for all categories optional product image
ugc-talking Talking-head UGC — single clip, two-angle PIP, or 20s+ with B-roll optional reference
ugc-reaction-batch Batch TikTok-native UGC reactions with viral strategy optional product image
tiktok-ugc-pipeline Scrape viral reel → generate 5 UGC → post as drafts reel or TikTok URL
ugc-dance-motion Dance/motion transfer image + video
marketing-brain Marketing strategy brain — hooks, visuals, ads user brief
reddit-subreddit-intel Scrape top posts, analyze virality, generate ideas subreddit + product
twitter-influencer-search Find X influencers and amplifiers competitor/niche keywords
tiktok-slideshow-carousel 3-slide TikTok carousel — hook, bridge, product reveal app screenshot + audience
ffmpeg-local-video-finishing Local ffmpeg finishing for deterministic trims, muxes, reverses, and exports local video path or mediaId
ffmpeg-burn-captions Burn captions locally with ffmpeg after getting transcript/timing local video path or mediaId
ffmpeg-social-formatting Reformat local video for 9:16, 1:1, 16:9, and social-safe exports local video path or mediaId
ffmpeg-scene-splitting Detect scene boundaries locally, split into clips, or omit one scene local video path or mediaId
ffmpeg-silence-cut Detect and collapse dead air locally while preserving short natural pauses local video path or mediaId
ffmpeg-frame-extraction Extract single frames, poster frames, or evenly spaced stills locally local video path or mediaId
ffmpeg-analysis-artifacts Build local analysis artifacts: grid, first/last frame, and extracted audio local video path or mediaId
ffmpeg-reference Compact ffmpeg routing, font, codec, and command reference for agents local media path
remotion-local-render Render editorPipeline blueprint steps locally via @remotion/renderer manifest JSON + editor job id
If a skill matches → wonda skill get <slug>, read it, adapt to context, execute each step.
If no skill matches → build from scratch (Step 3).
Step 2.5: Decide whether finishing should be local
Not every media task should go back through Wonda editing. Use this routing rule:
-
Use
wondafor AI generation, AI transcription/alignment, scraping, publishing, hosted transitions, and workflows that need media IDs or remote jobs. -
Use local
ffmpegfor deterministic transforms on files you already have or can download: trim, crop/scale/pad, concat, replace audio, extract audio/frame, reverse, normalize for delivery, burn captions, split scenes, cut silence, and build analysis artifacts.
When a task starts from a Wonda media ID but the actual edit is deterministic, move it to local files first:
wonda media download <mediaId> -o ./input.mp4
Before any local ffmpeg work:
which ffmpeg
which ffprobe
ffmpeg -version
ffprobe -v error -show_format -show_streams -of json ./input.mp4
Font rule for local caption/text work:
-
Prefer an explicit font file path over a family name.
-
Never assume a font exists. Check first with
fc-match,fc-list,/System/Library/Fonts,/Library/Fonts,~/Library/Fonts, or/usr/share/fonts. -
If the task is mainly local finishing/captions/formatting/splitting/artifact extraction, check the ffmpeg-specific skills before inventing commands.
-
wonda edit videorenders locally by default for single-video ops (trim,crop,speed,volume,textOverlay,animatedCaptionswith supplied captions,editAudio). The server returns a manifest; the CLI runs@remotion/rendereragainst a CloudFront-hosted bundle, uploads the output, and finalizes the editor_job. No flag needed. Pass--render-serveronly to force Lambda. Multi-video ops (overlay,splitScreen,merge,splitScenes,motionDesign) auto-reject with a 400 — the CLI will tell you to use--render-server. See theremotion-local-rendercontent skill for the full recipe (including the STT-free TikTok-style caption flow viawonda alignment extract-timestamps→--caption-segments).
Default local export target unless the user asked otherwise:
-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -movflags +faststart -c:a aac -b:a 192k
Always pass -y as the first flag so the command auto-overwrites the output. ffmpeg prompts interactively when the output path exists and agent shells hang on that prompt until timeout.
Step 3: Build from scratch (chain endpoints)
When no skill matches, chain individual CLI commands. Each step produces an output that feeds into the next.
Single asset:
wonda generate image --model nano-banana-2 --prompt "..." --aspect-ratio 9:16 --wait -o out.png
# --negative-prompt "..." — override what to exclude (models like cookie have good defaults)
# --seed <number> — pin the seed for reproducible results
wonda generate video --model seedance-2 --prompt "..." --duration 5 --params '{"quality":"high"}' --wait -o out.mp4
wonda generate text --model <model> --prompt "..." --wait
wonda generate music --model suno-music --prompt "upbeat lo-fi" --wait -o music.mp3
Audio (speech, transcription, dialogue):
# Text-to-speech
wonda audio speech --model elevenlabs-tts --prompt "Your script here" \
--params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait -o speech.mp3
# elevenlabs-tts always requires a voiceId param
# Common voice: Rachel (female) "21m00Tcm4TlvDq8ikWAM"
# Transcribe audio/video to text
wonda audio transcribe --model elevenlabs-stt --attach $MEDIA --wait
# Multi-speaker dialogue
wonda audio dialogue --model elevenlabs-dialogue --prompt "Speaker A: Hi! Speaker B: Hello!" \
--wait -o dialogue.mp3
Audio AI operations (direct-inference, NOT editor ops):
# Denoise / dereverberate speech
wonda audio enhance --model replicate-resemble-enhance --attach $MEDIA \
--params '{"denoise":true,"chunkSeconds":10}' --wait -o enhanced.wav
# Split a track into voice and instrumental stems
wonda audio extract-voice --model replicate-demucs --attach $MEDIA \
--wait -o vocals.wav
DO NOT use wonda edit video --operation enhanceAudio or --operation voiceExtractor — those paths are deprecated. They still work but emit a warning, and they route through the heavier editor_job pipeline for no functional reason.
Add animated captions to a video:
The animatedCaptions operation handles everything in one step — it extracts audio, transcribes for word-level timing, and renders animated word-by-word captions onto the video.
# Generate a video with speech audio
VID_JOB=$(wonda generate video --model seedance-2 --prompt "..." --duration 5 --aspect-ratio 9:16 --params '{"quality":"high"}' --wait --quiet)
VID_MEDIA=$(wonda jobs get inference $VID_JOB --jq '.outputs[0].media.mediaId')
# Add animated captions (single step)
wonda edit video --operation animatedCaptions --media $VID_MEDIA \
--params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' \
--wait -o final.mp4
The video's original audio is preserved. Do NOT replace the audio with TTS — Sora already generated the speech.
Transitions (effects pipelines on a single video):
wonda transitions presets # List built-in presets (JSON)
wonda transitions operations # Grouped by category (analysis/effect/...)
wonda transitions operations --json # Full per-param metadata
wonda transitions llms # Full reference (presets + ops + dependencies)
wonda transitions run --media $VID --preset flash_glow --wait -o out.mp4
# Or build a custom pipeline of steps:
wonda transitions run --media $VID \
--steps '[{"glow":{"spread":8}},{"scene_flash":{}}]' --wait -o out.mp4
# Or send an agent-generated timeline of clips (inline JSON):
wonda transitions run --media $VID \
--clips '[{"layer_type":"video","start_frame":0,"end_frame":60}]' --wait -o out.mp4
# …or from a file (handy for long agent timelines):
wonda transitions run --media $VID --clips ./timeline.json --wait -o out.mp4
wonda transitions job <jobId> # Poll a transition job
Use exactly one of --preset, --steps, or --clips. Requires a full (logged-in) account. Always read wonda transitions llms first when composing a custom pipeline or a clips timeline — it documents the detect→segment→effect dependencies, which ops need masks, and the full clip-spec shape (layer types, tracks, effects, transforms).
Preset variables (variables block). Each preset declares the template variables it accepts under `v
...
用户评价 (0)
发表评价
暂无评价
统计数据
用户评分
为此 Skill 评分