Home/多媒体与音视频/Generative-Media-Skills
G

Generative-Media-Skills

by @SamurAIGPTv
3.5(0)

Generative-Media-Skills is a multimodal generative media skills library specifically designed for AI agents (such as Claude Code, Cursor, Gemini CLI). It provides a set of high-performance, schema-driven tools, enabling AI agents to easily generate, edit, and display professional-grade image, video, and audio content. The project aims to empower AI agents, enhancing their capabilities and efficiency in creative content generation, with all functionalities powerfully supported by muapi.ai.

AI AgentsGenerative AIMultimodalMedia GenerationDeveloper ToolsGitHub
Installation
npx skills add SamurAIGPT/Generative-Media-Skills
compare_arrows

Before / After Comparison

1
Before

When AI agents generate high-quality multimodal media, they often require manual coordination of different tools or the writing of complex scripts, leading to low efficiency, unstable output quality, difficult integration, and being time-consuming and labor-intensive.

After

With Generative-Media-Skills, AI agents can efficiently and automatically generate, edit, and display professional-grade images, videos, and audio through a unified schema-driven interface, significantly improving creation efficiency and content quality.

description SKILL.md

🎭 Generative Media Skills for AI Agents

The Ultimate Multimodal Toolset for Claude Code, Cursor, and Gemini CLI. A high-performance, schema-driven architecture for AI agents to generate, edit, and display professional-grade images, videos, and audio — powered by the muapi-cli.

🚀 Get Started | 🎨 Expert Library | ⚙️ Core Primitives | 🤖 MCP Server | 📖 Reference


✨ Key Features

  • 🤖 Agent-Native Design — CLI-powered scripts with structured JSON outputs, semantic exit codes, and --jq filtering for seamless agentic pipelines.
  • 🧠 Expert Knowledge Layer — Domain-specific skills that bake in professional cinematography, atomic design, and branding logic.
  • ⚡ CLI-Powered Core — All primitives delegate to muapi-cli — no curl, no JSON parsing, no boilerplate.
  • 🖼️ Direct Media Display — Use the --view flag to automatically download and open generated media in your system viewer.
  • 📁 Local File Support — Auto-upload images, videos, faces, and audio from your local machine to the CDN for processing.
  • 🌈 100+ AI Models — One-click access to Midjourney v7, Flux Kontext, Seedance 2.0, Kling 3.0, Veo3, and more.
  • 🔌 MCP Server — Run muapi mcp serve to expose all 19 tools directly to Claude Desktop, Cursor, or any MCP-compatible agent.

🏗️ Scalable Architecture

This repository uses a Core/Library split to ensure efficiency and high-signal discovery for LLMs:

⚙️ Core Primitives (/core)

Thin wrappers around muapi-cli for raw API access.

  • core/media/ — File upload
  • core/edit/ — Image editing (prompt-based)
  • core/platform/ — Setup, auth & result polling

📚 Expert Library (/library)

High-value skills that translate creative intent into technical directives.

  • Cinema Director (/library/motion/cinema-director/) — Technical film direction & cinematography.
  • Nano-Banana (/library/visual/nano-banana/) — Reasoning-driven image generation (Gemini 3 Style).
  • UI Designer (/library/visual/ui-design/) — High-fidelity mobile/web mockups (Atomic Design).
  • Logo Creator (/library/visual/logo-creator/) — Minimalist vector branding (Geometric Primitives).
  • Seedance 2 (Doubao Video) (/library/motion/seedance-2/) — Director-level cinematic video generation with text-to-video, image-to-video, and video extension with native audio-video sync.

🚀 Quick Start

1. Install the muapi CLI

The core scripts require muapi-cli. Install it once:

# via npm (recommended — no Python required)
npm install -g muapi-cli

# via pip
pip install muapi-cli

# or run without installing
npx muapi-cli --help

2. Configure Your API Key

# Interactive setup
muapi auth configure

# Or pass directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"

# Get your key at https://muapi.ai/dashboard

3. Install the Skills

# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all

# Or install a specific skill
npx skills add SamurAIGPT/Generative-Media-Skills --skill muapi-media-generation

# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

4. Generate Your First Image

muapi image generate "a cyberpunk city at night" --model flux-dev

# Download the result automatically
muapi image generate "a sunset over mountains" --model hidream-fast --download ./outputs

# Extract just the URL (agent-friendly)
muapi image generate "product on white bg" --model flux-schnell --output-json --jq '.outputs[0]'

5. Run an Expert Skill

# Use Nano-Banana reasoning to generate a 2K masterpiece
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
  --file ./my-source-image.jpg \
  --subject "a glass hummingbird" \
  --style "macro photography" \
  --resolution "2k" \
  --view

6. Direct a Cinematic Scene

cd library/motion/cinema-director

# Create a 10-second epic reveal
bash scripts/generate-film.sh \
  --subject "a cybernetic dragon over Tokyo" \
  --intent "epic" \
  --model "kling-v3.0-pro" \
  --duration 10 \
  --view

# Animate a reference image into video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
  --mode i2v \
  --file ./concept.jpg \
  --subject "camera slowly pulls back to reveal the full landscape" \
  --intent "reveal" \
  --view

# Extend an existing video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
  --mode extend \
  --request-id "YOUR_REQUEST_ID" \
  --subject "camera continues pulling back to reveal the vast city" \
  --duration 10

🤖 MCP Server

Run muapi as a Model Context Protocol server so Claude Desktop, Cursor, or any MCP-compatible agent can call generation tools directly — no shell scripts needed.

muapi mcp serve

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

This exposes 19 structured tools with full JSON Schema input/output definitions:

ToolDescription
muapi_image_generateText-to-image (14 models)
muapi_image_editImage-to-image editing (11 models)
muapi_video_generateText-to-video (13 models)
muapi_video_from_imageImage-to-video (16 models)
muapi_audio_createMusic generation (Suno)
muapi_audio_from_textSound effects (MMAudio)
muapi_enhance_upscaleAI upscaling
muapi_enhance_bg_removeBackground removal
muapi_enhance_face_swapFace swap image/video
muapi_enhance_ghibliGhibli style transfer
muapi_edit_lipsyncLip sync to audio
muapi_edit_clippingAI highlight extraction
muapi_predict_resultPoll prediction status
muapi_upload_fileUpload local file → URL
muapi_keys_listList API keys
muapi_keys_createCreate API key
muapi_keys_deleteDelete API key
muapi_account_balanceGet credit balance
muapi_account_topupAdd credits (Stripe checkout)

⚡ Agentic Pipeline Examples

# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# ... do other work ...

muapi predict wait "$REQUEST_ID" --download ./outputs

# Pipe a prompt from another command
generate_prompt | muapi image generate - --model flux-dev

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

📖 Schema Reference

This repository includes a streamlined schema_data.json that core scripts use at runtime to:

  • Validate Model IDs: Ensures the requested model exists.
  • Resolve Endpoints: Automatically maps model names to API endpoints.
  • Check Parameters: Validates supported aspect_ratio, resolution, and duration values.

Discover all available models via the CLI:

muapi models list
muapi models list --category video --output-json

🔧 Compatibility

Optimized for the next generation of AI development environments:

  • Claude Code — Direct terminal execution via tools + MCP server mode.
  • Gemini CLI / Cursor / Windsurf — Seamless integration as local scripts.
  • MCP — Full Model Context Protocol server with typed input/output schemas.
  • CI/CD--output-json, --jq, semantic exit codes for scripting.

📄 License

MIT © 2026

forumUser Reviews (0)

Write a Review

Effect
Usability
Docs
Compatibility

No reviews yet

Statistics

Installs3.0K
Rating3.5 / 5.0
Version
Updated2026年4月6日
Comparisons1

User Rating

3.5(0)
5
0%
4
0%
3
0%
2
0%
1
0%

Rate this Skill

0.0

Compatible Platforms

🔧Claude Code

Timeline

Created2026年4月6日
Last Updated2026年4月6日