W

wonda-cli

by @degausaiv
4.6(4)

Automatically generate images, videos, music, and audio in the terminal, supporting media editing and combination, with one-click publishing to social platforms like LinkedIn, Reddit, X/Twitter.

multimediasocial-mediaautomationcontent-generationvideo-editingGitHub
Installation
npx skills add degausai/wonda --skill wonda-cli
compare_arrows

Before / After Comparison

1
Before

Manually using multiple tools to generate image, video, and audio assets separately, then logging into each social platform one by one to manually publish content. A set of content takes 2-3 hours to complete.

After

One command automatically generates all media assets, automatically edits and combines them, and bulk publishes to multiple social platforms. The entire content creation and distribution is completed in 20 minutes.

SKILL.md

wonda-cli

Wonda CLI

Wonda CLI is a content creation toolkit for terminal-based agents. Use it to generate images, videos, music, and audio; edit and compose media; publish to social platforms; and research/automate across LinkedIn, Reddit, and X/Twitter.

Install

If wonda is not found on PATH, install it first:

# npm
npm i -g @degausai/wonda

# Homebrew
brew tap degausai/tap && brew install wonda

Setup

  • Auth: wonda auth login (opens browser, recommended) or set WONDERCAT_API_KEY env var

  • Verify: wonda auth check

Access tiers

Not all commands are available to every account type:

Tier Access

Anonymous (temporary account, no login) Media upload/download, editing (video/edit, image/edit, audio/edit), transcription, social publishing, scraping, analytics

Free (logged in, Basic/Free plan) Everything above + generation (image/generate, video/generate, etc.), styles, recipes, brand

Paid (Plus, Pro, or Absolute plan) Everything above + video analysis (requires credits), skill commands (wonda skill install/list/get)

If a command returns a 403 error, check your plan at https://app.wondercat.ai/settings/billing.

Social signups (Instagram, TikTok, etc.)

Drive them with the wonda device primitives + a throwaway mailbox from wonda email. The screenshot → decide → tap/type/swipe loop is how these flows work — there's no shortcut command, and that's fine: social apps change their UI constantly and any canned flow would drift faster than you could maintain it.

Standard loop:

  • wonda email account create --random → save {email, password}.

  • wonda device create → pick a ready device (poll wonda device get <id> --fields status).

  • wonda device launch <device-id> com.instagram.android (or com.zhiliaoapp.musically for TikTok). Fall back to wonda device open-url if you'd rather start in the web flow.

  • Loop: wonda device screenshot <device-id> > s.json → decode the base64 PNG → read → pick an action → tap | type | swipe | key → screenshot again. Use --text "SomeButtonLabel" on tap before guessing coordinates; fall back to --x --y read off the screenshot for elements without matching text (number pickers, date spinners, etc.).

  • When the app sends a verification email, wonda email inbox wait <email> --timeout 120 — returns {codes: ["483921"], links: [...]} with the 6-digit code already extracted. wonda device type <device-id> --text "<code>" to feed it back.

  • For number/date spinners: tap on the highlighted cell, Android pops up a numeric or alphabetic keyboard, wonda device type --text "<value>" replaces the selected text. wonda device key --code 4 dismisses the keyboard when done.

Consent-like taps — anything that accepts Terms/Privacy/Cookies, grants permissions, or publishes something — stop and ask the user for explicit confirmation in chat before tapping. That isn't about signups specifically; it applies to any automation step.

Rate-limit signals — if the app shows you a visual puzzle ("we want to make sure you're a real person"), stop and hand off to the user with wonda device stream <id> (see next section). Don't click through puzzles yourself.

Handing off to a human

If automation hits a screen that requires a human to take over (consent flow you shouldn't auto-accept, ambiguous UI, step where the user prefers to act themselves), use wonda device stream <device-id> — returns a playerUrl signed with a short-lived JWT (1h). Give that URL to the user, they act in their own browser, and automation can resume afterward.

wonda device stream <device-id>
# → { "streamUrl": "wss://…", "playerUrl": "https://…", "deviceType": "social" }

Global output flags

All commands support these output control flags:

  • --json — Force JSON output (auto-enabled when stdout is piped)

  • --quiet — Only output the primary identifier (job ID, media ID, etc.) — ideal for scripting

  • -o <path> — Download output to file (implies --wait)

  • --fields status,outputs — Select specific JSON fields

  • --jq '.outputs[0].media.url' — Filter JSON output with a jq expression

How to think about content creation

You are a marketing director with access to a full production toolkit. Before touching any tool, think:

  • What product category? (beauty, food, tech, fashion, fitness, etc.)

  • What format performs for this category? (UGC memes for everyday products, cinematic for luxury, before/after for transformations, testimonial for services)

  • What's the hook? (relatable scenario, surprising twist, aspirational lifestyle, social proof)

  • What specific scene? (not "product on table" but "person discovering the product in a funny situation")

Decision flow

When asked to create content, follow this order:

Step 1: Gather context

wonda brand                                                    # Brand identity, colors, products, audience
wonda analytics instagram                                      # What content performs well
wonda scrape social --handle @competitor --platform instagram --wait  # Competitive research (if relevant)

# Cross-platform research (if relevant)
wonda x search "topic OR keyword"                              # Find conversations on X/Twitter
wonda x user-tweets @competitor                                # Competitor's recent tweets
wonda reddit search "topic" --sort top --time week             # Reddit discussions
wonda reddit feed marketing --sort hot                         # Subreddit trends
wonda linkedin search "topic" --type COMPANIES                 # LinkedIn company/people research
wonda linkedin profile competitor-vanity-name                  # LinkedIn profile intel

Step 2: Check content skills

Content skills are step-by-step guides for common content types. Each skill tells you exactly which models, prompts, and editing operations to use — and in what order. ALWAYS check skills before building from scratch.

wonda skill list                                # Browse all content skills
wonda skill get <slug>                          # Full step-by-step guide for a skill

Full skill index:

Slug Description Input

product-video Product/scene video — prompt library for all categories optional product image

ugc-talking Talking-head UGC — single clip, two-angle PIP, or 20s+ with B-roll optional reference

ugc-reaction-batch Batch TikTok-native UGC reactions with viral strategy optional product image

tiktok-ugc-pipeline Scrape viral reel → generate 5 UGC → post as drafts reel or TikTok URL

ugc-dance-motion Dance/motion transfer image + video

marketing-brain Marketing strategy brain — hooks, visuals, ads user brief

reddit-subreddit-intel Scrape top posts, analyze virality, generate ideas subreddit + product

twitter-influencer-search Find X influencers and amplifiers competitor/niche keywords

tiktok-slideshow-carousel 3-slide TikTok carousel — hook, bridge, product reveal app screenshot + audience

ffmpeg-local-video-finishing Local ffmpeg finishing for deterministic trims, muxes, reverses, and exports local video path or mediaId

ffmpeg-burn-captions Burn captions locally with ffmpeg after getting transcript/timing local video path or mediaId

ffmpeg-social-formatting Reformat local video for 9:16, 1:1, 16:9, and social-safe exports local video path or mediaId

ffmpeg-scene-splitting Detect scene boundaries locally, split into clips, or omit one scene local video path or mediaId

ffmpeg-silence-cut Detect and collapse dead air locally while preserving short natural pauses local video path or mediaId

ffmpeg-frame-extraction Extract single frames, poster frames, or evenly spaced stills locally local video path or mediaId

ffmpeg-analysis-artifacts Build local analysis artifacts: grid, first/last frame, and extracted audio local video path or mediaId

ffmpeg-reference Compact ffmpeg routing, font, codec, and command reference for agents local media path

remotion-local-render Render editorPipeline blueprint steps locally via @remotion/renderer manifest JSON + editor job id

If a skill matcheswonda skill get <slug>, read it, adapt to context, execute each step.

If no skill matches → build from scratch (Step 3).

Step 2.5: Decide whether finishing should be local

Not every media task should go back through Wonda editing. Use this routing rule:

  • Use wonda for AI generation, AI transcription/alignment, scraping, publishing, hosted transitions, and workflows that need media IDs or remote jobs.

  • Use local ffmpeg for deterministic transforms on files you already have or can download: trim, crop/scale/pad, concat, replace audio, extract audio/frame, reverse, normalize for delivery, burn captions, split scenes, cut silence, and build analysis artifacts.

When a task starts from a Wonda media ID but the actual edit is deterministic, move it to local files first:

wonda media download <mediaId> -o ./input.mp4

Before any local ffmpeg work:

which ffmpeg
which ffprobe
ffmpeg -version
ffprobe -v error -show_format -show_streams -of json ./input.mp4

Font rule for local caption/text work:

  • Prefer an explicit font file path over a family name.

  • Never assume a font exists. Check first with fc-match, fc-list, /System/Library/Fonts, /Library/Fonts, ~/Library/Fonts, or /usr/share/fonts.

  • If the task is mainly local finishing/captions/formatting/splitting/artifact extraction, check the ffmpeg-specific skills before inventing commands.

  • wonda edit video renders locally by default for single-video ops (trim, crop, speed, volume, textOverlay, animatedCaptions with supplied captions, editAudio). The server returns a manifest; the CLI runs @remotion/renderer against a CloudFront-hosted bundle, uploads the output, and finalizes the editor_job. No flag needed. Pass --render-server only to force Lambda. Multi-video ops (overlay, splitScreen, merge, splitScenes, motionDesign) auto-reject with a 400 — the CLI will tell you to use --render-server. See the remotion-local-render content skill for the full recipe (including the STT-free TikTok-style caption flow via wonda alignment extract-timestamps--caption-segments).

Default local export target unless the user asked otherwise:

-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -movflags +faststart -c:a aac -b:a 192k

Always pass -y as the first flag so the command auto-overwrites the output. ffmpeg prompts interactively when the output path exists and agent shells hang on that prompt until timeout.

Step 3: Build from scratch (chain endpoints)

When no skill matches, chain individual CLI commands. Each step produces an output that feeds into the next.

Single asset:

wonda generate image --model nano-banana-2 --prompt "..." --aspect-ratio 9:16 --wait -o out.png
# --negative-prompt "..." — override what to exclude (models like cookie have good defaults)
# --seed <number>         — pin the seed for reproducible results
wonda generate video --model seedance-2 --prompt "..." --duration 5 --params '{"quality":"high"}' --wait -o out.mp4
wonda generate text --model <model> --prompt "..." --wait
wonda generate music --model suno-music --prompt "upbeat lo-fi" --wait -o music.mp3

Audio (speech, transcription, dialogue):

# Text-to-speech
wonda audio speech --model elevenlabs-tts --prompt "Your script here" \
  --params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait -o speech.mp3
# elevenlabs-tts always requires a voiceId param
# Common voice: Rachel (female) "21m00Tcm4TlvDq8ikWAM"

# Transcribe audio/video to text
wonda audio transcribe --model elevenlabs-stt --attach $MEDIA --wait

# Multi-speaker dialogue
wonda audio dialogue --model elevenlabs-dialogue --prompt "Speaker A: Hi! Speaker B: Hello!" \
  --wait -o dialogue.mp3

Audio AI operations (direct-inference, NOT editor ops):

# Denoise / dereverberate speech
wonda audio enhance --model replicate-resemble-enhance --attach $MEDIA \
  --params '{"denoise":true,"chunkSeconds":10}' --wait -o enhanced.wav

# Split a track into voice and instrumental stems
wonda audio extract-voice --model replicate-demucs --attach $MEDIA \
  --wait -o vocals.wav

DO NOT use wonda edit video --operation enhanceAudio or --operation voiceExtractor — those paths are deprecated. They still work but emit a warning, and they route through the heavier editor_job pipeline for no functional reason.

Add animated captions to a video:

The animatedCaptions operation handles everything in one step — it extracts audio, transcribes for word-level timing, and renders animated word-by-word captions onto the video.

# Generate a video with speech audio
VID_JOB=$(wonda generate video --model seedance-2 --prompt "..." --duration 5 --aspect-ratio 9:16 --params '{"quality":"high"}' --wait --quiet)
VID_MEDIA=$(wonda jobs get inference $VID_JOB --jq '.outputs[0].media.mediaId')

# Add animated captions (single step)
wonda edit video --operation animatedCaptions --media $VID_MEDIA \
  --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' \
  --wait -o final.mp4

The video's original audio is preserved. Do NOT replace the audio with TTS — Sora already generated the speech.

Transitions (effects pipelines on a single video):

wonda transitions presets                            # List built-in presets (JSON)
wonda transitions operations                         # Grouped by category (analysis/effect/...)
wonda transitions operations --json                  # Full per-param metadata
wonda transitions llms                               # Full reference (presets + ops + dependencies)
wonda transitions run --media $VID --preset flash_glow --wait -o out.mp4
# Or build a custom pipeline of steps:
wonda transitions run --media $VID \
  --steps '[{"glow":{"spread":8}},{"scene_flash":{}}]' --wait -o out.mp4
# Or send an agent-generated timeline of clips (inline JSON):
wonda transitions run --media $VID \
  --clips '[{"layer_type":"video","start_frame":0,"end_frame":60}]' --wait -o out.mp4
# …or from a file (handy for long agent timelines):
wonda transitions run --media $VID --clips ./timeline.json --wait -o out.mp4
wonda transitions job <jobId>                        # Poll a transition job

Use exactly one of --preset, --steps, or --clips. Requires a full (logged-in) account. Always read wonda transitions llms first when composing a custom pipeline or a clips timeline — it documents the detect→segment→effect dependencies, which ops need masks, and the full clip-spec shape (layer types, tracks, effects, transforms).

Preset variables (variables block). Each preset declares the template variables it accepts under `v

...

User Reviews (0)

Write a Review

Effect
Usability
Docs
Compatibility

No reviews yet

Statistics

Installs21.5K
Rating4.6 / 5.0
Version
Updated2026年5月23日
Comparisons1

User Rating

4.6(4)
5
50%
4
50%
3
0%
2
0%
1
0%

Rate this Skill

0.0

Compatible Platforms

🔧Claude Code

Timeline

Created2026年4月23日
Last Updated2026年5月23日