W

wonda-cli

by @degausaiv
4.6(4)

ターミナルで画像、ビデオ、音楽、オーディオを自動生成し、メディアの編集と組み合わせをサポートします。LinkedIn、Reddit、X/Twitterなどのソーシャルプラットフォームへワンクリックで公開できます。

multimediasocial-mediaautomationcontent-generationvideo-editingGitHub
インストール方法
npx skills add degausai/wonda --skill wonda-cli
compare_arrows

Before / After 効果比較

1
使用前

複数のツールを手動で使い、画像、動画、音声素材をそれぞれ生成し、その後、各ソーシャルプラットフォームに個別にログインして手動でコンテンツを公開。一連のコンテンツ作成に2〜3時間かかります。

使用後

1つのコマンドで全てのメディア素材を自動生成し、自動で編集・組み合わせ、複数のソーシャルプラットフォームに一括公開。コンテンツ作成と配信の全工程が20分で完了します。

SKILL.md

wonda-cli

Wonda CLI

Wonda CLI is a content creation toolkit for terminal-based agents. Use it to generate images, videos, music, and audio; edit and compose media; publish to social platforms; and research/automate across LinkedIn, Reddit, and X/Twitter.

Install

If wonda is not found on PATH, install it first:

# npm
npm i -g @degausai/wonda

# Homebrew
brew tap degausai/tap && brew install wonda

Setup

  • Auth: wonda auth login (opens browser, recommended) or set WONDERCAT_API_KEY env var

  • Verify: wonda auth check

Access tiers

Not all commands are available to every account type:

Tier Access

Anonymous (temporary account, no login) Media upload/download, editing (video/edit, image/edit, audio/edit), transcription, social publishing, scraping, analytics

Free (logged in, Basic/Free plan) Everything above + generation (image/generate, video/generate, etc.), styles, recipes, brand

Paid (Plus, Pro, or Absolute plan) Everything above + video analysis (requires credits), skill commands (wonda skill install/list/get)

If a command returns a 403 error, check your plan at https://app.wondercat.ai/settings/billing.

Social signups (Instagram, TikTok, etc.)

Drive them with the wonda device primitives + a throwaway mailbox from wonda email. The screenshot → decide → tap/type/swipe loop is how these flows work — there's no shortcut command, and that's fine: social apps change their UI constantly and any canned flow would drift faster than you could maintain it.

Standard loop:

  • wonda email account create --random → save {email, password}.

  • wonda device create → pick a ready device (poll wonda device get <id> --fields status).

  • wonda device launch <device-id> com.instagram.android (or com.zhiliaoapp.musically for TikTok). Fall back to wonda device open-url if you'd rather start in the web flow.

  • Loop: wonda device screenshot <device-id> > s.json → decode the base64 PNG → read → pick an action → tap | type | swipe | key → screenshot again. Use --text "SomeButtonLabel" on tap before guessing coordinates; fall back to --x --y read off the screenshot for elements without matching text (number pickers, date spinners, etc.).

  • When the app sends a verification email, wonda email inbox wait <email> --timeout 120 — returns {codes: ["483921"], links: [...]} with the 6-digit code already extracted. wonda device type <device-id> --text "<code>" to feed it back.

  • For number/date spinners: tap on the highlighted cell, Android pops up a numeric or alphabetic keyboard, wonda device type --text "<value>" replaces the selected text. wonda device key --code 4 dismisses the keyboard when done.

Consent-like taps — anything that accepts Terms/Privacy/Cookies, grants permissions, or publishes something — stop and ask the user for explicit confirmation in chat before tapping. That isn't about signups specifically; it applies to any automation step.

Rate-limit signals — if the app shows you a visual puzzle ("we want to make sure you're a real person"), stop and hand off to the user with wonda device stream <id> (see next section). Don't click through puzzles yourself.

Handing off to a human

If automation hits a screen that requires a human to take over (consent flow you shouldn't auto-accept, ambiguous UI, step where the user prefers to act themselves), use wonda device stream <device-id> — returns a playerUrl signed with a short-lived JWT (1h). Give that URL to the user, they act in their own browser, and automation can resume afterward.

wonda device stream <device-id>
# → { "streamUrl": "wss://…", "playerUrl": "https://…", "deviceType": "social" }

Global output flags

All commands support these output control flags:

  • --json — Force JSON output (auto-enabled when stdout is piped)

  • --quiet — Only output the primary identifier (job ID, media ID, etc.) — ideal for scripting

  • -o <path> — Download output to file (implies --wait)

  • --fields status,outputs — Select specific JSON fields

  • --jq '.outputs[0].media.url' — Filter JSON output with a jq expression

How to think about content creation

You are a marketing director with access to a full production toolkit. Before touching any tool, think:

  • What product category? (beauty, food, tech, fashion, fitness, etc.)

  • What format performs for this category? (UGC memes for everyday products, cinematic for luxury, before/after for transformations, testimonial for services)

  • What's the hook? (relatable scenario, surprising twist, aspirational lifestyle, social proof)

  • What specific scene? (not "product on table" but "person discovering the product in a funny situation")

Decision flow

When asked to create content, follow this order:

Step 1: Gather context

wonda brand                                                    # Brand identity, colors, products, audience
wonda analytics instagram                                      # What content performs well
wonda scrape social --handle @competitor --platform instagram --wait  # Competitive research (if relevant)

# Cross-platform research (if relevant)
wonda x search "topic OR keyword"                              # Find conversations on X/Twitter
wonda x user-tweets @competitor                                # Competitor's recent tweets
wonda reddit search "topic" --sort top --time week             # Reddit discussions
wonda reddit feed marketing --sort hot                         # Subreddit trends
wonda linkedin search "topic" --type COMPANIES                 # LinkedIn company/people research
wonda linkedin profile competitor-vanity-name                  # LinkedIn profile intel

Step 2: Check content skills

Content skills are step-by-step guides for common content types. Each skill tells you exactly which models, prompts, and editing operations to use — and in what order. ALWAYS check skills before building from scratch.

wonda skill list                                # Browse all content skills
wonda skill get <slug>                          # Full step-by-step guide for a skill

Full skill index:

Slug Description Input

product-video Product/scene video — prompt library for all categories optional product image

ugc-talking Talking-head UGC — single clip, two-angle PIP, or 20s+ with B-roll optional reference

ugc-reaction-batch Batch TikTok-native UGC reactions with viral strategy optional product image

tiktok-ugc-pipeline Scrape viral reel → generate 5 UGC → post as drafts reel or TikTok URL

ugc-dance-motion Dance/motion transfer image + video

marketing-brain Marketing strategy brain — hooks, visuals, ads user brief

reddit-subreddit-intel Scrape top posts, analyze virality, generate ideas subreddit + product

twitter-influencer-search Find X influencers and amplifiers competitor/niche keywords

tiktok-slideshow-carousel 3-slide TikTok carousel — hook, bridge, product reveal app screenshot + audience

ffmpeg-local-video-finishing Local ffmpeg finishing for deterministic trims, muxes, reverses, and exports local video path or mediaId

ffmpeg-burn-captions Burn captions locally with ffmpeg after getting transcript/timing local video path or mediaId

ffmpeg-social-formatting Reformat local video for 9:16, 1:1, 16:9, and social-safe exports local video path or mediaId

ffmpeg-scene-splitting Detect scene boundaries locally, split into clips, or omit one scene local video path or mediaId

ffmpeg-silence-cut Detect and collapse dead air locally while preserving short natural pauses local video path or mediaId

ffmpeg-frame-extraction Extract single frames, poster frames, or evenly spaced stills locally local video path or mediaId

ffmpeg-analysis-artifacts Build local analysis artifacts: grid, first/last frame, and extracted audio local video path or mediaId

ffmpeg-reference Compact ffmpeg routing, font, codec, and command reference for agents local media path

remotion-local-render Render editorPipeline blueprint steps locally via @remotion/renderer manifest JSON + editor job id

If a skill matcheswonda skill get <slug>, read it, adapt to context, execute each step.

If no skill matches → build from scratch (Step 3).

Step 2.5: Decide whether finishing should be local

Not every media task should go back through Wonda editing. Use this routing rule:

  • Use wonda for AI generation, AI transcription/alignment, scraping, publishing, hosted transitions, and workflows that need media IDs or remote jobs.

  • Use local ffmpeg for deterministic transforms on files you already have or can download: trim, crop/scale/pad, concat, replace audio, extract audio/frame, reverse, normalize for delivery, burn captions, split scenes, cut silence, and build analysis artifacts.

When a task starts from a Wonda media ID but the actual edit is deterministic, move it to local files first:

wonda media download <mediaId> -o ./input.mp4

Before any local ffmpeg work:

which ffmpeg
which ffprobe
ffmpeg -version
ffprobe -v error -show_format -show_streams -of json ./input.mp4

Font rule for local caption/text work:

  • Prefer an explicit font file path over a family name.

  • Never assume a font exists. Check first with fc-match, fc-list, /System/Library/Fonts, /Library/Fonts, ~/Library/Fonts, or /usr/share/fonts.

  • If the task is mainly local finishing/captions/formatting/splitting/artifact extraction, check the ffmpeg-specific skills before inventing commands.

  • wonda edit video renders locally by default for single-video ops (trim, crop, speed, volume, textOverlay, animatedCaptions with supplied captions, editAudio). The server returns a manifest; the CLI runs @remotion/renderer against a CloudFront-hosted bundle, uploads the output, and finalizes the editor_job. No flag needed. Pass --render-server only to force Lambda. Multi-video ops (overlay, splitScreen, merge, splitScenes, motionDesign) auto-reject with a 400 — the CLI will tell you to use --render-server. See the remotion-local-render content skill for the full recipe (including the STT-free TikTok-style caption flow via wonda alignment extract-timestamps--caption-segments).

Default local export target unless the user asked otherwise:

-c:v libx264 -preset medium -crf 18 -pix_fmt yuv420p -movflags +faststart -c:a aac -b:a 192k

Always pass -y as the first flag so the command auto-overwrites the output. ffmpeg prompts interactively when the output path exists and agent shells hang on that prompt until timeout.

Step 3: Build from scratch (chain endpoints)

When no skill matches, chain individual CLI commands. Each step produces an output that feeds into the next.

Single asset:

wonda generate image --model nano-banana-2 --prompt "..." --aspect-ratio 9:16 --wait -o out.png
# --negative-prompt "..." — override what to exclude (models like cookie have good defaults)
# --seed <number>         — pin the seed for reproducible results
wonda generate video --model seedance-2 --prompt "..." --duration 5 --params '{"quality":"high"}' --wait -o out.mp4
wonda generate text --model <model> --prompt "..." --wait
wonda generate music --model suno-music --prompt "upbeat lo-fi" --wait -o music.mp3

Audio (speech, transcription, dialogue):

# Text-to-speech
wonda audio speech --model elevenlabs-tts --prompt "Your script here" \
  --params '{"voiceId":"21m00Tcm4TlvDq8ikWAM"}' --wait -o speech.mp3
# elevenlabs-tts always requires a voiceId param
# Common voice: Rachel (female) "21m00Tcm4TlvDq8ikWAM"

# Transcribe audio/video to text
wonda audio transcribe --model elevenlabs-stt --attach $MEDIA --wait

# Multi-speaker dialogue
wonda audio dialogue --model elevenlabs-dialogue --prompt "Speaker A: Hi! Speaker B: Hello!" \
  --wait -o dialogue.mp3

Audio AI operations (direct-inference, NOT editor ops):

# Denoise / dereverberate speech
wonda audio enhance --model replicate-resemble-enhance --attach $MEDIA \
  --params '{"denoise":true,"chunkSeconds":10}' --wait -o enhanced.wav

# Split a track into voice and instrumental stems
wonda audio extract-voice --model replicate-demucs --attach $MEDIA \
  --wait -o vocals.wav

DO NOT use wonda edit video --operation enhanceAudio or --operation voiceExtractor — those paths are deprecated. They still work but emit a warning, and they route through the heavier editor_job pipeline for no functional reason.

Add animated captions to a video:

The animatedCaptions operation handles everything in one step — it extracts audio, transcribes for word-level timing, and renders animated word-by-word captions onto the video.

# Generate a video with speech audio
VID_JOB=$(wonda generate video --model seedance-2 --prompt "..." --duration 5 --aspect-ratio 9:16 --params '{"quality":"high"}' --wait --quiet)
VID_MEDIA=$(wonda jobs get inference $VID_JOB --jq '.outputs[0].media.mediaId')

# Add animated captions (single step)
wonda edit video --operation animatedCaptions --media $VID_MEDIA \
  --params '{"fontFamily":"TikTok Sans SemiCondensed","position":"bottom-center","sizePercent":80,"strokeWidth":2.5,"fontSizeScale":0.8,"highlightColor":"rgb(252, 61, 61)"}' \
  --wait -o final.mp4

The video's original audio is preserved. Do NOT replace the audio with TTS — Sora already generated the speech.

Transitions (effects pipelines on a single video):

wonda transitions presets                            # List built-in presets (JSON)
wonda transitions operations                         # Grouped by category (analysis/effect/...)
wonda transitions operations --json                  # Full per-param metadata
wonda transitions llms                               # Full reference (presets + ops + dependencies)
wonda transitions run --media $VID --preset flash_glow --wait -o out.mp4
# Or build a custom pipeline of steps:
wonda transitions run --media $VID \
  --steps '[{"glow":{"spread":8}},{"scene_flash":{}}]' --wait -o out.mp4
# Or send an agent-generated timeline of clips (inline JSON):
wonda transitions run --media $VID \
  --clips '[{"layer_type":"video","start_frame":0,"end_frame":60}]' --wait -o out.mp4
# …or from a file (handy for long agent timelines):
wonda transitions run --media $VID --clips ./timeline.json --wait -o out.mp4
wonda transitions job <jobId>                        # Poll a transition job

Use exactly one of --preset, --steps, or --clips. Requires a full (logged-in) account. Always read wonda transitions llms first when composing a custom pipeline or a clips timeline — it documents the detect→segment→effect dependencies, which ops need masks, and the full clip-spec shape (layer types, tracks, effects, transforms).

Preset variables (variables block). Each preset declares the template variables it accepts under `v

...

ユーザーレビュー (0)

レビューを書く

効果
使いやすさ
ドキュメント
互換性

レビューなし

統計データ

インストール数21.5K
評価4.6 / 5.0
バージョン
更新日2026年5月23日
比較事例1 件

ユーザー評価

4.6(4)
5
50%
4
50%
3
0%
2
0%
1
0%

この Skill を評価

0.0

対応プラットフォーム

🔧Claude Code

タイムライン

作成2026年4月23日
最終更新2026年5月23日