wonda-cli
Automatically generate images, videos, music, and audio in the terminal, supporting media editing and combination, with one-click publishing to social platforms like LinkedIn, Reddit, X/Twitter.
npx skills add degausai/wonda --skill wonda-cliBefore / After Comparison
1 组Manually using multiple tools to generate image, video, and audio assets separately, then logging into each social platform one by one to manually publish content. A set of content takes 2-3 hours to complete.
One command automatically generates all media assets, automatically edits and combines them, and bulk publishes to multiple social platforms. The entire content creation and distribution is completed in 20 minutes.
wonda-cli
Wonda CLI
Wonda CLI is a content creation toolkit for terminal-based agents. Use it to generate images, videos, music, and audio; edit and compose media; publish to social platforms; and research/automate across LinkedIn, Reddit, and X/Twitter.
Install
If wonda is not found on PATH, install it first:
# npm
npm i -g @degausai/wonda
# Homebrew
brew tap degausai/tap && brew install wonda
Setup
-
Auth:
wonda auth login(opens browser, recommended) or setWONDERCAT_API_KEYenv var -
Verify:
wonda auth check
Access tiers
Not all commands are available to every account type:
Tier Access
Anonymous (temporary account, no login)
Media upload/download, editing (video/edit, image/edit, audio/edit), transcription, social publishing, scraping, analytics
Free (logged in, Basic/Free plan)
Everything above + generation (image/generate, video/generate, etc.), styles, recipes, brand
Paid (Plus, Pro, or Absolute plan)
Everything above + video analysis (requires credits), skill commands (wonda skill install/list/get)
If a command returns a 403 error, check your plan at https://app.wondercat.ai/settings/billing.
Social signups (Instagram, TikTok, etc.)
Drive them with the wonda device primitives + a throwaway mailbox from wonda email. The screenshot → decide → tap/type/swipe loop is how these flows work — there's no shortcut command, and that's fine: social apps change their UI constantly and any canned flow would drift faster than you could maintain it.
Standard loop:
-
wonda email account create --random→ save{email, password}. -
wonda device create→ pick areadydevice (pollwonda device get <id> --fields status). -
wonda device launch <device-id> com.instagram.android(orcom.zhiliaoapp.musicallyfor TikTok). Fall back towonda device open-urlif you'd rather start in the web flow. -
Loop:
wonda device screenshot <device-id> > s.json→ decode the base64 PNG → read → pick an action →tap | type | swipe | key→ screenshot again. Use--text "SomeButtonLabel"ontapbefore guessing coordinates; fall back to--x --yread off the screenshot for elements without matching text (number pickers, date spinners, etc.). -
When the app sends a verification email,
wonda email inbox wait <email> --timeout 120— returns{codes: ["483921"], links: [...]}with the 6-digit code already extracted.wonda device type <device-id> --text "<code>"to feed it back. -
For number/date spinners: tap on the highlighted cell, Android pops up a numeric or alphabetic keyboard,
wonda device type --text "<value>"replaces the selected text.wonda device key --code 4dismisses the keyboard when done.
Consent-like taps — anything that accepts Terms/Privacy/Cookies, grants permissions, or publishes something — stop and ask the user for explicit confirmation in chat before tapping. That isn't about signups specifically; it applies to any automation step.
Rate-limit signals — if the app shows you a visual puzzle ("we want to make sure you're a real person"), stop and hand off to the user with wonda device stream <id> (see next section). Don't click through puzzles yourself.
Handing off to a human
If automation hits a screen that requires a human to take over (consent flow you shouldn't auto-accept, ambiguous UI, step where the user prefers to act themselves), use wonda device stream <device-id> — returns a playerUrl signed with a short-lived JWT (1h). Give that URL to the user, they act in their own browser, and automation can resume afterward.
wonda device stream <device-id>
# → { "streamUrl": "wss://…", "playerUrl": "https://…", "deviceType": "social" }
Global output flags
All commands support these output control flags:
-
--json— Force JSON output (auto-enabled when stdout is piped) -
--quiet— Only output the primary identifier (job ID, media ID, etc.) — ideal for scripting -
-o <path>— Download output to file (implies--wait) -
--fields status,outputs— Select specific JSON fields -
--jq '.outputs[0].media.url'— Filter JSON output with a jq expression
How to think about content creation
You are a marketing director with access to a full production toolkit. Before touching any tool, think:
-
What product category? (beauty, food, tech, fashion, fitness, etc.)
-
What format performs for this category? (UGC memes for everyday products, cinematic for luxury, before/after for transformations, testimonial for services)
-
What's the hook? (relatable scenario, surprising twist, aspirational lifestyle, social proof)
-
What specific scene? (not "product on table" but "person discovering the product in a funny situation")
Decision flow
When asked to create content, follow this order:
Step 1: Gather context
wonda brand # Brand identity, colors, products, audience
wonda analytics instagram # What content performs well
wonda scrape social --handle @competitor --platform instagram --wait # Competitive research (if relevant)
# Cross-platform research (if relevant)
wonda x search "topic OR keyword" # Find conversations on X/Twitter
wonda x user-tweets @competitor # Competitor's recent tweets
wonda reddit search "topic" --sort top --time week # Reddit discussions
wonda reddit feed marketing --sort hot # Subreddit trends
wonda linkedin search "topic" --type COMPANIES # LinkedIn company/people research
wonda linkedin profile competitor-vanity-name # LinkedIn profile intel
Step 2: Check content skills
Content skills are step-by-step guides for common content types. Each skill tells you exactly which models, prompts, and editing operations to use — and in what order. ALWAYS check skills before building from scratch.
wonda skill list # Browse all content skills
wonda skill get <slug> # Full step-by-step guide for a skill
Full skill index:
Slug Description Input
product-video Product/scene video — prompt library for all categories optional product image
ugc-talking Talking-head UGC — single clip, two-angle PIP, or 20s+ with B-roll optional reference
ugc-reaction-batch Batch TikTok-native UGC reactions with viral strategy optional product image
tiktok-ugc-pipeline Scrape viral reel → generate 5 UGC → post as drafts reel or TikTok URL
ugc-dance-motion Dance/motion transfer image + video
marketing-brain Marketing strategy brain — hooks, visuals, ads user brief
reddit-subreddit-intel Scrape top posts, analyze virality, generate ideas subreddit + product
twitter-influencer-search Find X influencers and amplifiers competitor/niche keywords
tiktok-slideshow-carousel 3-slide TikTok carousel — hook, bridge, product reveal app screenshot + audience
ffmpeg-local-video-finishing Local ffmpeg finishing for deterministic trims, muxes, reverses, and exports local video path or mediaId
ffmpeg-burn-captions Burn captions locally with ffmpeg after getting transcript/timing local video path or mediaId
ffmpeg-social-formatting Reformat local video for 9:16, 1:1, 16:9, and social-safe exports local video path or mediaId
ffmpeg-scene-splitting Detect scene boundaries locally, split into clips, or omit one scene local video path or mediaId
ffmpeg-silence-cut Detect and collapse dead air locally while preserving short natural pauses local video path or mediaId
ffmpeg-frame-extraction Extract single frames, poster frames, or evenly spaced stills locally local video path or mediaId
ffmpeg-analysis-artifacts Build local analysis artifacts: grid, first/last frame, and extracted audio local video path or mediaId
ffmpeg-reference Compact ffmpeg routing, font, codec, and command reference for agents local media path
remotion-local-render Render editorPipeline blueprint steps locally via @remotion/renderer manifest JSON + editor job id
If a skill matches → wonda skill get <slug>, read it, adapt to context, execute each step.
If no skill matches → build from scratch (Step 3).
Step 2.5: Decide whether finishing should be local
Not every media task should go back through Wonda editing. Use this routing rule:
-
Use
wondafor AI generation, AI transcription/alignment, scraping, publishing, hosted transitions, and workflows that need media IDs or remote jobs. -
Use local
ffmpegfor deterministic transforms on files you already have or can download: trim, crop/scale/pad, concat, replace audio, extract audio/frame, reverse, normalize for delivery, burn captions, split scenes, cut silence, and build analysis artifacts.
When a task starts from a Wonda media ID but the actual edit is deterministic, move it to local files first:
wonda media download <mediaId> -o ./input.mp4
Before any local ffmpeg work:
which ffmpeg
which ffprobe
ffmpeg -version
ffprobe -v error -show_format -show_streams -of json ./input.mp4
Font rule for local caption/text work:
-
Prefer an explicit font file path over a family name.
-
Never assume a font exists. Check first with
fc-match,fc-list,/System/Library/Fonts,/Library/Fonts,~/Library/Fonts, or/usr/share/fonts. -
If the task is mainly local finishing/captions/formatting/splitting/artifact extraction, check the ffmpeg-specific skills before inventing commands.
-
wonda edit videorenders locally by default for single-video ops (trim,crop,speed,volume,textOverlay,animatedCaptionswith supplied captions,editAudio). The server returns a manifest; the CLI runs@remotion/rendereragainst a CloudFront-hosted bundle, uploads the output, and finalizes the editor_job. No flag needed. Pass--render-serveronly to force Lambda. Multi-video ops (overlay,splitScreen,merge,splitScenes,motionDesign) auto-reject with a 400 — the CLI will tell you to use--render-server. See theremotion-local-rendercontent skill for the full recipe (including the STT-free TikTok-style caption flow viawonda alignment extract-timestamps→--caption-segments).
Default local export target unless the user asked otherwise:
-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -movflags +faststart -c:a aac -b:a 192k
Always pass -y as the first flag so the command auto-overwrites the output. ffmpeg prompts interactively when the output path exists and agent shells hang on that prompt until timeout.
Step 3: Build from scratch (chain endpoints)
When no skill matches, chain individual CLI commands. Each step produces an output that feeds into the next.
Single asset:
wonda generate image --model nano-banana-2 --prompt "..." --aspect-ratio 9:16 --wait -o out.png
# --negative-prompt "..." — override what to exclude (models like cookie have good defaults)
# --seed <number> — pin the seed for reproducible results
wonda generate video --model seedance-2 --prompt "..." --duration 5 --params '{"quality":"high"}' --wait -o out.mp4
wonda generate text --model <model> --prompt "..." --wait
wonda generate music --model suno-music --prompt "upbeat lo-fi" --wait -o music.mp3
Audio (speech, transcription, dialogue):
# Text-to-speech
wonda audio speech --model elevenlabs-tts --prompt "Your script here" \
--params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait -o speech.mp3
# elevenlabs-tts always requires a voiceId param
# Common voice: Rachel (female) "21m00Tcm4TlvDq8ikWAM"
# Transcribe audio/video to text
wonda audio transcribe --model elevenlabs-stt --attach $MEDIA --wait
# Multi-speaker dialogue
wonda audio dialogue --model elevenlabs-dialogue --prompt "Speaker A: Hi! Speaker B: Hello!" \
--wait -o dialogue.mp3
Audio AI operations (direct-inference, NOT editor ops):
# Denoise / dereverberate speech
wonda audio enhance --model replicate-resemble-enhance --attach $MEDIA \
--params '{"denoise":true,"chunkSeconds":10}' --wait -o enhanced.wav
# Split a track into voice and instrumental stems
wonda audio extract-voice --model replicate-demucs --attach $MEDIA \
--wait -o vocals.wav
DO NOT use wonda edit video --operation enhanceAudio or --operation voiceExtractor — those paths are deprecated. They still work but emit a warning, and they route through the heavier editor_job pipeline for no functional reason.
Add animated captions to a video:
The animatedCaptions operation handles everything in one step — it extracts audio, transcribes for word-level timing, and renders animated word-by-word captions onto the video.
# Generate a video with speech audio
VID_JOB=$(wonda generate video --model seedance-2 --prompt "..." --duration 5 --aspect-ratio 9:16 --params '{"quality":"high"}' --wait --quiet)
VID_MEDIA=$(wonda jobs get inference $VID_JOB --jq '.outputs[0].media.mediaId')
# Add animated captions (single step)
wonda edit video --operation animatedCaptions --media $VID_MEDIA \
--params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' \
--wait -o final.mp4
The video's original audio is preserved. Do NOT replace the audio with TTS — Sora already generated the speech.
Transitions (effects pipelines on a single video):
wonda transitions presets # List built-in presets (JSON)
wonda transitions operations # Grouped by category (analysis/effect/...)
wonda transitions operations --json # Full per-param metadata
wonda transitions llms # Full reference (presets + ops + dependencies)
wonda transitions run --media $VID --preset flash_glow --wait -o out.mp4
# Or build a custom pipeline of steps:
wonda transitions run --media $VID \
--steps '[{"glow":{"spread":8}},{"scene_flash":{}}]' --wait -o out.mp4
# Or send an agent-generated timeline of clips (inline JSON):
wonda transitions run --media $VID \
--clips '[{"layer_type":"video","start_frame":0,"end_frame":60}]' --wait -o out.mp4
# …or from a file (handy for long agent timelines):
wonda transitions run --media $VID --clips ./timeline.json --wait -o out.mp4
wonda transitions job <jobId> # Poll a transition job
Use exactly one of --preset, --steps, or --clips. Requires a full (logged-in) account. Always read wonda transitions llms first when composing a custom pipeline or a clips timeline — it documents the detect→segment→effect dependencies, which ops need masks, and the full clip-spec shape (layer types, tracks, effects, transforms).
Preset variables (variables block). Each preset declares the template variables it accepts under `v
...
User Reviews (0)
Write a Review
No reviews yet
Statistics
User Rating
Rate this Skill