Multimedia &amp; AV

flux-kontext

by @agentspace-so

Flux Kontext professional image editing model, high-quality image-to-image conversion and editing.

4.8(5)

122.1K

image-edit

by @agentspace-so

Integrates multiple professional image editing models, provides a one-stop image editing and enhancement solution, AI Agent Skill, improves work efficiency and automation capabilities.

4.8(4)

121.9K

gpt-image-edit

by @agentspace-so

OpenAI GPT Image Edit model, intelligent image understanding and editing.

4.8(12)

121.6K

nano-banana-edit

by @agentspace-so

Google Nano Banana Edit professional image editing model, efficient image modification and enhancement.

4.8(20)

121.5K

flux-2-klein

by @agentspace-so

Flux 2 Klein lightweight text-to-image model, fast high-quality image generation.

4.8(13)

121.5K

seedance-v2

by @agentspace-so

RunComfy platform professional video generation model, high-quality text-to-video content creation.

4.8(6)

121.4K

happyhorse-1-0

by @agentspace-so

Alibaba HappyHorse 1.0 text-to-video model, generates physically realistic video content.

4.8(11)

121.3K

wan-2-7

by @agentspace-so

Wan 2.7 text-to-video model, high-quality AI video content automatic generation.

4.8(6)

121.2K

gpt-image-2

by @agentspace-so

Directly use your ChatGPT Plus subscription within the Agent to call GPT Image 2 for image generation, without switching platforms.

4.8(15)

118.6K

image-to-video

by @inferen-sh

Provides your agent with hundreds of image-to-video production skills using the inference.sh API.

4.8(261)

106.4K

qwen-model

qwen-image-2

by @inferen-sh

Agent skill provided by inference.sh, enabling image processing and analysis based on the Qwen model, supporting data AI applications via API.

98.5K

video-processing

p-video

by @inferen-sh

Multimedia agent skill provided by inference.sh, supporting video content generation, processing, and management via API to enhance multimedia application efficiency.

98.5K

qwen-model

qwen-image-2-pro

by @inferen-sh

Professional-grade agent skill provided by inference.sh, enabling advanced image processing and analysis based on the Qwen model, supporting complex data AI applications via API.

98.4K

image-processing

p-image

by @inferen-sh

Multimedia agent skill provided by inference.sh, supporting image content generation, processing, and management via API to enhance multimedia application efficiency.

98.4K

kling-3-0

by @agentspace-so

Kling 3.0 is Kuaishou Technology's third-generation cinematic video model. This skill offers six Kling 3.0 rendering endpoints on RunComfy, supporting text-to-video and image-to-video, with Standard, Pro, and 4K quality tiers to meet diverse video production needs.

98.0K

face-swap

by @agentspace-so

This skill leverages the RunComfy CLI to perform face swapping in images and videos. It intelligently selects the appropriate model based on user intent, supporting still images, videos, single or batch processing, and accommodating photorealistic or stylized requirements, ensuring efficient and precise face manipulation.

64.8K

video-inpainting

by @agentspace-so

This skill leverages AI for video region editing, enabling removal of objects across multiple frames, cleaning watermarks, or replacing areas while maintaining motion consistency. It streamlines video repair and content modification via prompt-driven editing endpoints.

62.9K

lipsync

by @agentspace-so

This skill drives a face's mouth movements from an audio track, achieving lip-sync. It integrates multiple lip-sync models from the RunComfy platform, intelligently selecting the best model based on user intent and input type (e.g., portrait still + audio, video + audio, or script-only) to generate high-quality lip-synced videos.

62.8K

image-inpainting

by @agentspace-so

This skill provides image inpainting capabilities via the `runcomfy` CLI, supporting mask-driven region edits such as object removal, gap filling, or replacing specific areas. It intelligently routes to different models for precise image modifications.

62.8K

controlnet

controlnet-pose

by @agentspace-so

This skill leverages ControlNet and ComfyUI workflows to generate images or videos conditioned on pose, skeleton, or motion references. It integrates various model APIs for video motion transfer and image pose-conditioned generation, empowering creative content production.

62.7K

video-outpainting

by @agentspace-so

This skill leverages AI models to extend a video's spatial canvas, enabling vertical or horizontal outpainting and aspect ratio adjustments while preserving the central action. It's suitable for social media content creation and professional video editing, enhancing visual impact.

62.4K

video-extend

by @agentspace-so

This skill leverages the Google Veo 3-1 model to intelligently extend existing video clips or generate coherent narrative shot chains from a single seed. It offers both fast and high-quality extension modes, helping users easily create longer, smoother video content.

62.4K

relight

by @agentspace-so

This skill uses AI to change the lighting direction, color temperature, intensity, and mood of a still image without redoing the shot. It achieves precise control over image lighting through dedicated relighting LoRAs or identity-preserving edit endpoints, suitable for product photography, portrait adjustments, and more.

62.3K

image-outpainting

by @agentspace-so

This skill leverages AI to intelligently extend images, enabling outpainting, cropping adjustments, and aspect ratio modifications. It seamlessly fills in missing parts of an image based on prompts or reference styles, maintaining content consistency, significantly enhancing image editing efficiency and creative possibilities.

62.1K

music-generation

elevenlabs-music-generation

by @agentspace-so

Generate studio-quality music, including vocal songs, instrumental tracks, and brand jingles, from text descriptions using the ElevenLabs Music API via RunComfy. Supports structured tracks up to 5 minutes, significantly boosting music creation efficiency.

62.1K

ai-video-generation

by @inferen-sh

AI video generation agent skill provided by inference.sh, empowering users via API to automate video content creation and processing.

61.0K

hyperframes

by @heygen-com

HTML-based video framework, uses `data-*` attributes to control timeline and animations, automatically handles clip synchronization and media playback.

4.7(31)

60.7K

hyperframes-cli

by @heygen-com

CLI tool for the HyperFrames framework, provides scaffolding, preview, build, and export for a complete workflow.

4.7(9)

58.4K

website-to-hyperframes

by @heygen-com

Automatically captures websites and generates professional product videos, supports custom durations and platform formats. An AI Agent Skill, enhancing work efficiency and automation capabilities.

4.7(19)

56.6K

hyperframes-registry

by @heygen-com

HyperFrames component registry, provides reusable video blocks and effect components, installed and used via commands.

4.7(8)

56.4K

ai-video-generation

ai-avatar-video

by @inferen-sh

Creates virtual avatar videos using AI technology, achieving realistic character animation and voice synchronization, widely applied in education, marketing, and entertainment.

4.7(260)

54.8K

automation

ai-image-generation

by @inferen-sh

Generates images using over 50 AI models via the inference.sh command-line tool, providing quick start and installation guides for efficient AI image creation.

54.8K

music

ace-step

by @agentspace-so

ACE Step leverages StepFun-AI's open-weights model for tag-driven music generation, audio inpainting, and outpainting. It supports multilingual lyrics and high-quality vocal tracks, allowing for localized modifications or extensions of existing audio, making it a powerful tool for efficient music content creation.

54.2K

ai-music

by @agentspace-so

This skill generates AI music via the RunComfy CLI, supporting vocal songs, instrumentals, jingles, game loops, and multilingual covers. It intelligently selects the appropriate model based on user intent and provides detailed prompting patterns and precise CLI invocation commands, greatly simplifying the music creation process.

53.8K

text-to-speech

elevenlabs-tts

by @inferen-sh

Provide Text-to-Speech (TTS) service with over 22 high-quality voices via ElevenLabs, generating natural and fluent voice content.

52.0K

ai-music-generation

elevenlabs-music

by @inferen-sh

Generate original music from text prompts via ElevenLabs, providing unique background music and sound effects for video, games, and other projects.

51.8K

gif-creation

slack-gif-creator

by @anthropics

Creates and optimizes animated GIFs to meet Slack platform size limits. It helps users easily produce dynamic images suitable for sharing on Slack, ensuring good display and fast loading.

4.6(456)

37.1K

tts

hyperframes-media

by @heygen-com

Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays (u2net). Use when generating voiceover from text, transcribing speech for captions, removing the background from a video or image to use as a transparent overlay, choosing a TTS voice or whisper model, or chaining these (TTS → transcribe → captions). Each command downloads its own model on first run.

4.6(120)

33.4K

higgsfield-generate

by @higgsfield-ai

Unified invocation of all Higgsfield image and video generation models, supporting marketing materials and brand content creation.

4.6(32)

24.4K

ai-image-generation

baoyu-image-gen

by @JimLiu

Multi-platform AI image generation: Supports 7 platforms including OpenAI, Google, DashScope, Jimeng, Seedream, Replicate, with aspect ratio and quality presets.

4.6(220)

22.7K

higgsfield-product-photoshoot

by @higgsfield-ai

Automatically generates e-commerce product photography-grade images without physical shooting, supporting various scenes and styles.

4.6(3)

21.8K

higgsfield-soul-id

by @higgsfield-ai

Trains personalized facial AI models, maintaining character identity consistency, enabling cross-scene reuse generation. AI Agent Skill, enhances work efficiency and automation capabilities.

4.6(24)

21.5K

higgsfield-marketplace-cards

by @higgsfield-ai

Generates product display images compliant with e-commerce platform specifications, automatically optimizing size and composition to improve conversion rates.

4.6(9)

20.8K

webgpu

typegpu

by @heygen-com

TypeGPU and raw WebGPU adapter patterns for HyperFrames. Use when creating GPU-rendered compositions with TypeGPU, raw WebGPU, WGSL fragment shaders, compute pipelines, liquid glass effects, particle systems, or any canvas layer driven by navigator.gpu that responds to HyperFrames hf-seek events.

4.6(120)

19.2K

rest-api

seedance2-api

by @hexiaochun

This skill provides an end-to-end workflow from concept to final video, including storyboarding, reference image generation, video task submission, and result retrieval, enabling automated video content creation.

4.5(2)

11.5K

mmx-cli

by @minimax-ai

Generates text, images, videos, speech, music, and performs web searches via the MiniMax AI platform.

4.5(15)

10.4K

ai-video-generation

videoagent-video-studio

by @pexoai

Specializes in AI engineering, enabling AI agents to utilize "VideoAgent Video Studio" for advanced video processing, generation, and analysis tasks.

4.5(348)

10.1K

imagemagick

image-manipulation-image-magick

by @github

Processes and manipulates images using ImageMagick, supporting resizing, format conversion, batch processing, and retrieving image metadata.

4.5(303)

9.0K

remotion

remotion-video-production

by @supercent-io

This skill uses Remotion to create programmable videos, automatically generating videos from text instructions and producing brand-consistent video content at scale.

4.5(356)

8.9K