首页/数据 & AI/minimax-image-understanding
M

minimax-image-understanding

by @imsusv1.0.0
3.9(3)

使用 MiniMax AI 模型分析图像内容,提取视觉信息,生成详细描述,回答关于图片的复杂问题

computer-visiongenerative-aiinformation-retrievalimage-processingnatural-language-processingGitHub
安装方式
npx skills add imsus/pi-extension-minimax-coding-plan-mcp --skill minimax-image-understanding
compare_arrows

Before / After 效果对比

1
使用前

人工逐张查看图片并手动标注内容,每天处理 500 张图片需要 3 小时,容易疲劳导致遗漏细节

使用后

批量发送图片到 AI 模型,自动生成详细描述和标签,500 张图片 5 分钟完成处理,准确率提升 40%

description SKILL.md

minimax-image-understanding

MiniMax Image Understanding Skill

Use this skill when you need to analyze, describe, or extract information from images.

How to Use

Call the understand_image tool directly with a prompt and image URL:

understand_image({
  prompt: "Your question about the image",
  image_url: "https://example.com/image.png"
})

When to Use

Use understand_image when:

  • Screenshots: Error messages, UI issues, code in screenshots

  • Visual content: Photos, diagrams, charts, graphs

  • Documents: Extracting text from images (OCR), understanding layouts

  • UI/UX analysis: Evaluating designs, identifying components

  • Visual debugging: Understanding visual bugs or layout issues

When NOT to Use

Do NOT use understand_image when:

  • Image is already described in the conversation

  • The image is a simple icon or emoji you recognize

  • No image is provided or the image URL is inaccessible

  • Redundant with existing context (e.g., file contents already visible)

Usage

understand_image({
  prompt: "What do you see in this image?",
  image_url: "https://example.com/screenshot.png"
})

API Details

Endpoint: POST {api_host}/v1/coding_plan/vlm

Request Body:

{
  "prompt": "Your question about the image",
  "image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}

Response Format:

{
  "content": "AI analysis of the image...",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Image Processing

The tool automatically handles three types of image inputs:

HTTP/HTTPS URLs: Downloads the image and converts to base64

Example: https://example.com/image.jpg

Local file paths: Reads local files and converts to base64

Absolute: /Users/username/Documents/image.png

  • Relative: images/photo.png

  • Removes @ prefix if present

Base64 data URLs: Passes through existing base64 data

Example: data:image/png;base64,iVBORw0KGgo...

Image Formats

Supported:

  • JPEG (.jpg, .jpeg)

  • PNG (.png)

  • WebP (.webp)

Not supported:

  • PDF, GIF, PSD, SVG, and other formats

Crafting Effective Prompts

For Descriptions

  • "Describe what's in this image in detail"

  • "What is the main subject of this image?"

  • "Describe the visual style and composition"

For Code/Technical

  • "What code is shown in this screenshot?"

  • "Extract all text from this image"

  • "Identify the UI framework/components used"

For Analysis

  • "Analyze this UI design. What is working well and what could be improved?"

  • "What emotions or mood does this image convey?"

  • "Compare this design to Material Design principles"

For OCR/Text Extraction

  • "Extract all text from this image"

  • "Read the error message in this screenshot"

  • "What does the label say in this image?"

Examples

Error Analysis

understand_image({
  prompt: "What is the error message and where is it located in this screenshot?",
  image_url: "./error-screenshot.png"
})

Code Screenshot

understand_image({
  prompt: "What code is shown in this screenshot? Please transcribe it exactly.",
  image_url: "https://example.com/code.png"
})

Design Review

understand_image({
  prompt: "Analyze this UI design. What is working well and what could be improved?",
  image_url: "https://example.com/mockup.png"
})

OCR

understand_image({
  prompt: "Extract all text from this image",
  image_url: "/Users/username/Documents/scan.png"
})

Tips

  • Be specific in your prompt about what you want to know

  • Mention format if you need structured output (e.g., "list all elements")

  • Include context if the image is part of a larger task

  • For screenshots, specify if you need full-page or just a specific area

  • Complex analysis may trigger a confirmation prompt (analyze, extract, describe, recognize, transcribe, read)

Error Handling

  • Status code 1004: Authentication error - check API key and region

  • Status code 2038: Real-name verification required

  • Invalid image: File doesn't exist or URL is inaccessible

  • Unsupported format: Image format not in JPEG, PNG, WebP

Weekly Installs250Repositoryimsus/pi-extens…plan-mcpGitHub Stars2First SeenFeb 18, 2026Security AuditsGen Agent Trust HubFailSocketPassSnykFailInstalled onopencode248gemini-cli246github-copilot246codex246cursor245kimi-cli244

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量200
评分3.9 / 5.0
版本1.0.0
更新日期2026年3月24日
对比案例1 组

用户评分

3.9(3)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月24日
最后更新2026年3月24日