browser-use
此技能是为AI编码代理设计的前端开发能力,专注于浏览器使用。它使AI代理能够理解和操作网页元素,执行前端任务,从而在Web界面开发中发挥作用。
npx skills add browser-use/browser-use --skill browser-useBefore / After 效果对比
1 组AI智能体在前端开发中缺乏直接与浏览器交互的能力,难以模拟用户行为和调试界面。这限制了其在前端领域的应用。
赋予AI智能体模拟浏览器操作的能力,使其能进行前端开发、测试和调试。显著提升AI在前端领域的自动化水平。
description SKILL.md
Browser Automation with browser-use CLI
The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.
Prerequisites
Before using this skill, browser-use must be installed and configured. Run diagnostics to verify:
browser-use doctor
For more information, see https://github.com/browser-use/browser-use/blob/main/browser_use/skill_cli/README.md
Core Workflow
- Navigate:
browser-use open <url>- Opens URL (starts browser if needed) - Inspect:
browser-use state- Returns clickable elements with indices - Interact: Use indices from state to interact (
browser-use click 5,browser-use input 3 "text") - Verify:
browser-use stateorbrowser-use screenshotto confirm actions - Repeat: Browser stays open between commands
Browser Modes
browser-use --browser chromium open <url> # Default: headless Chromium
browser-use --browser chromium --headed open <url> # Visible Chromium window
browser-use --browser real open <url> # Real Chrome (no profile = fresh)
browser-use --browser real --profile "Default" open <url> # Real Chrome with your login sessions
browser-use --browser remote open <url> # Cloud browser
- chromium: Fast, isolated, headless by default
- real: Uses a real Chrome binary. Without
--profile, uses a persistent but empty CLI profile at~/.config/browseruse/profiles/cli/. With--profile "ProfileName", copies your actual Chrome profile (cookies, logins, extensions) - remote: Cloud-hosted browser with proxy support
Essential Commands
# Navigation
browser-use open <url> # Navigate to URL
browser-use back # Go back
browser-use scroll down # Scroll down (--amount N for pixels)
# Page State (always run state first to get element indices)
browser-use state # Get URL, title, clickable elements
browser-use screenshot # Take screenshot (base64)
browser-use screenshot path.png # Save screenshot to file
# Interactions (use indices from state)
browser-use click <index> # Click element
browser-use type "text" # Type into focused element
browser-use input <index> "text" # Click element, then type
browser-use keys "Enter" # Send keyboard keys
browser-use select <index> "option" # Select dropdown option
# Data Extraction
browser-use eval "document.title" # Execute JavaScript
browser-use get text <index> # Get element text
browser-use get html --selector "h1" # Get scoped HTML
# Wait
browser-use wait selector "h1" # Wait for element
browser-use wait text "Success" # Wait for text
# Session
browser-use sessions # List active sessions
browser-use close # Close current session
browser-use close --all # Close all sessions
# AI Agent
browser-use -b remote run "task" # Run agent in cloud (async by default)
browser-use task status <id> # Check cloud task progress
Commands
Navigation & Tabs
browser-use open <url> # Navigate to URL
browser-use back # Go back in history
browser-use scroll down # Scroll down
browser-use scroll up # Scroll up
browser-use scroll down --amount 1000 # Scroll by specific pixels (default: 500)
browser-use switch <tab> # Switch to tab by index
browser-use close-tab # Close current tab
browser-use close-tab <tab> # Close specific tab
Page State
browser-use state # Get URL, title, and clickable elements
browser-use screenshot # Take screenshot (outputs base64)
browser-use screenshot path.png # Save screenshot to file
browser-use screenshot --full path.png # Full page screenshot
Interactions
browser-use click <index> # Click element
browser-use type "text" # Type text into focused element
browser-use input <index> "text" # Click element, then type text
browser-use keys "Enter" # Send keyboard keys
browser-use keys "Control+a" # Send key combination
browser-use select <index> "option" # Select dropdown option
browser-use hover <index> # Hover over element (triggers CSS :hover)
browser-use dblclick <index> # Double-click element
browser-use rightclick <index> # Right-click element (context menu)
Use indices from browser-use state.
JavaScript & Data
browser-use eval "document.title" # Execute JavaScript, return result
browser-use get title # Get page title
browser-use get html # Get full page HTML
browser-use get html --selector "h1" # Get HTML of specific element
browser-use get text <index> # Get text content of element
browser-use get value <index> # Get value of input/textarea
browser-use get attributes <index> # Get all attributes of element
browser-use get bbox <index> # Get bounding box (x, y, width, height)
Cookies
browser-use cookies get # Get all cookies
browser-use cookies get --url <url> # Get cookies for specific URL
browser-use cookies set <name> <value> # Set a cookie
browser-use cookies set name val --domain .example.com --secure --http-only
browser-use cookies set name val --same-site Strict # SameSite: Strict, Lax, or None
browser-use cookies set name val --expires 1735689600 # Expiration timestamp
browser-use cookies clear # Clear all cookies
browser-use cookies clear --url <url> # Clear cookies for specific URL
browser-use cookies export <file> # Export all cookies to JSON file
browser-use cookies export <file> --url <url> # Export cookies for specific URL
browser-use cookies import <file> # Import cookies from JSON file
Wait Conditions
browser-use wait selector "h1" # Wait for element to be visible
browser-use wait selector ".loading" --state hidden # Wait for element to disappear
browser-use wait selector "#btn" --state attached # Wait for element in DOM
browser-use wait text "Success" # Wait for text to appear
browser-use wait selector "h1" --timeout 5000 # Custom timeout in ms
Python Execution
browser-use python "x = 42" # Set variable
browser-use python "print(x)" # Access variable (outputs: 42)
browser-use python "print(browser.url)" # Access browser object
browser-use python --vars # Show defined variables
browser-use python --reset # Clear Python namespace
browser-use python --file script.py # Execute Python file
The Python session maintains state across commands. The browser object provides:
browser.url,browser.title,browser.html— page infobrowser.goto(url),browser.back()— navigationbrowser.click(index),browser.type(text),browser.input(index, text),browser.keys(keys)— interactionsbrowser.screenshot(path),browser.scroll(direction, amount)— visualbrowser.wait(seconds),browser.extract(query)— utilities
Agent Tasks
Remote Mode Options
When using --browser remote, additional options are available:
# Specify LLM model
browser-use -b remote run "task" --llm gpt-4o
browser-use -b remote run "task" --llm claude-sonnet-4-20250514
# Proxy configuration (default: us)
browser-use -b remote run "task" --proxy-country uk
# Session reuse
browser-use -b remote run "task 1" --keep-alive # Keep session alive after task
browser-use -b remote run "task 2" --session-id abc-123 # Reuse existing session
# Execution modes
browser-use -b remote run "task" --flash # Fast execution mode
browser-use -b remote run "task" --wait # Wait for completion (default: async)
# Advanced options
browser-use -b remote run "task" --thinking # Extended reasoning mode
browser-use -b remote run "task" --no-vision # Disable vision (enabled by default)
# Using a cloud profile (create session first, then run with --session-id)
browser-use session create --profile <cloud-profile-id> --keep-alive
# → returns session_id
browser-use -b remote run "task" --session-id <session-id>
# Task configuration
browser-use -b remote run "task" --start-url https://example.com # Start from specific URL
browser-use -b remote run "task" --allowed-domain example.com # Restrict navigation (repeatable)
browser-use -b remote run "task" --metadata key=value # Task metadata (repeatable)
browser-use -b remote run "task" --skill-id skill-123 # Enable skills (repeatable)
browser-use -b remote run "task" --secret key=value # Secret metadata (repeatable)
# Structured output and evaluation
browser-use -b remote run "task" --structured-output '{"type":"object"}' # JSON schema for output
browser-use -b remote run "task" --judge # Enable judge mode
browser-use -b remote run "task" --judge-ground-truth "expected answer"
Task Management
browser-use task list # List recent tasks
browser-use task list --limit 20 # Show more tasks
browser-use task list --status finished # Filter by status (finished, stopped)
browser-use task list --session <id> # Filter by session ID
browser-use task list --json # JSON output
browser-use task status <task-id> # Get task status (latest step only)
browser-use task status <task-id> -c # All steps with reasoning
browser-use task status <task-id> -v # All steps with URLs + actions
browser-use task status <task-id> --last 5 # Last N steps only
browser-use task status <task-id> --step 3 # Specific step number
browser-use task status <task-id> --reverse # Newest first
browser-use task stop <task-id> # Stop a running task
browser-use task logs <task-id> # Get task execution logs
Cloud Session Management
browser-use session list # List cloud sessions
browser-use session list --limit 20 # Show more sessions
browser-use session list --status active # Filter by status
browser-use session list --json # JSON output
browser-use session get <session-id> # Get session details + live URL
browser-use session get <session-id> --json
browser-use session stop <session-id> # Stop a session
browser-use session stop --all # Stop all active sessions
browser-use session create # Create with defaults
browser-use session create --profile <id> # With cloud profile
browser-use session create --proxy-country uk # With geographic proxy
browser-use session create --start-url https://example.com
browser-use session create --screen-size 1920x1080
browser-use session create --keep-alive
browser-use session create --persist-memory
browser-use session share <session-id> # Create public share URL
browser-use session share <session-id> --delete # Delete public share
Tunnels
browser-use tunnel <port> # Start tunnel (returns URL)
browser-use tunnel <port> # Idempotent - returns existing URL
browser-use tunnel list # Show active tunnels
browser-use tunnel stop <port> # Stop tunnel
browser-use tunnel stop --all # Stop all tunnels
Session Management
browser-use sessions # List active sessions
browser-use close # Close current session
browser-use close --all # Close all sessions
Profile Management
Local Chrome Profiles (--browser real)
browser-use -b real profile list # List local Chrome profiles
browser-use -b real profile cookies "Default" # Show cookie domains in profile
Cloud Profiles (--browser remote)
browser-use -b remote profile list # List cloud profiles
browser-use -b remote profile list --page 2 --page-size 50
browser-use -b remote profile get <id> # Get profile details
browser-use -b remote profile create # Create new cloud profile
browser-use -b remote profile create --name "My Profile"
browser-use -b remote profile update <id> --name "New"
browser-use -b remote profile delete <id>
Syncing
browser-use profile sync --from "Default" --domain github.com # Domain-specific
browser-use profile sync --from "Default" # Full profile
browser-use profile sync --from "Default" --name "Custom Name" # With custom name
Server Control
browser-use server logs # View server logs
Common Workflows
Exposing Local Dev Servers
Use when you have a local dev server and need a cloud browser to reach it.
Core workflow: Start dev server → create tunnel → browse the tunnel URL remotely.
# 1. Start your dev server
npm run dev & # localhost:3000
# 2. Expose it via Cloudflare tunnel
browser-use tunnel 3000
# → url: https://abc.trycloudflare.com
# 3. Now the cloud browser can reach your local server
browser-use --browser remote open https://abc.trycloudflare.com
browser-use state
browser-use screenshot
Note: Tunnels are independent of browser sessions. They persist across browser-use close and can be managed separately. Cloudflared must be installed — run browser-use doctor to check.
Authenticated Browsing with Profiles
Use when a task requires browsing a site the user is already logged into (e.g. Gmail, GitHub, internal tools).
Core workflow: Check existing profiles → ask user which profile and browser mode → browse with that profile. Only sync cookies if no suitable profile exists.
Before browsing an authenticated site, the agent MUST:
- Ask the user whether to use real (local Chrome) or remote (cloud) browser
- List available profiles for that mode
- Ask which profile to use
- If no profile has the right cookies, offer to sync (see below)
Step 1: Check existing profiles
# Option A: Local Chrome profiles (--browser real)
browser-use -b real profile list
# → Default: Person 1 (user@gmail.com)
# → Profile 1: Work (work@company.com)
# Option B: Cloud profiles (--browser remote)
browser-use -b remote profile list
# → abc-123: "Chrome - Default (github.com)"
# → def-456: "Work profile"
Step 2: Browse with the chosen profile
# Real browser — uses local Chrome with existing login sessions
browser-use --browser real --profile "Default" open https://github.com
# Cloud browser — uses cloud profile
...
forum用户评价 (0)
发表评价
暂无评价
统计数据
用户评分
为此 Skill 评分