name: agent-device description: Automates interactions for iOS simulators/devices and Android emulators/devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile targets.

Mobile Automation with agent-device

For exploration, use snapshot refs. For deterministic replay, use selectors. For structured exploratory QA bug hunts and reporting, use ../dogfood/SKILL.md.

Start Here (Read This First)

Use this skill as a router, not a full manual.

Pick one mode:
- Normal interaction flow
- Debug/crash flow
- Replay maintenance flow
Run one canonical flow below.
Open references only if blocked.

Decision Map

No target context yet: devices -> pick target -> open.
Normal UI task: open -> snapshot -i -> press/fill -> diff snapshot -i -> close
Debug/crash: open <app> -> logs clear --restart -> reproduce -> network dump -> logs path -> targeted grep
Replay drift: replay -u <path> -> verify updated selectors
Remote multi-tenant run: allocate lease -> point client at remote daemon base URL -> run commands with tenant isolation flags -> heartbeat/release lease
Device-scope isolation run: set iOS simulator set / Android allowlist -> run selectors within scope only

Target Selection Rules

iOS local QA: use simulators unless the task explicitly requires a physical device.
iOS local QA in mixed simulator/device environments: run ensure-simulator first and pass --device, --udid, or --ios-simulator-device-set on later commands.
Android local QA: use install or reinstall for .apk/.aab files, then relaunch by installed package name.
Android React Native + Metro flows: set runtime hints with runtime set before open <package> --relaunch.
In mixed-device environments, always pin the exact target with --serial, --device, --udid, or an isolation scope.
For session-bound automation runs, prefer a pre-bound session/platform instead of repeating selectors on every command: set AGENT_DEVICE_SESSION, set AGENT_DEVICE_PLATFORM, and the daemon will enforce the shared lock policy across CLI, typed client, and RPC entry points.
Use --session-lock reject|strip (or AGENT_DEVICE_SESSION_LOCK) only when you need to override the default reject behavior. Lock mode applies to nested batch steps too.

Canonical Flows

1) Normal Interaction Flow

agent-device open Settings --platform ios
agent-device snapshot -i
agent-device press @e3
agent-device diff snapshot -i
agent-device fill @e5 "test"
agent-device close

1a) Local iOS Simulator QA Flow

agent-device ensure-simulator --platform ios --device "iPhone 16" --boot
agent-device open MyApp --platform ios --device "iPhone 16" --session qa-ios --relaunch
agent-device snapshot -i
agent-device press @e3
agent-device close

Use this when a physical iPhone is also connected and you want deterministic simulator-only automation.

1b) Android React Native + Metro QA Flow

agent-device reinstall MyApp /path/to/app-debug.apk --platform android --serial emulator-5554
agent-device runtime set --session qa-android --platform android --metro-host 10.0.2.2 --metro-port 8081
agent-device open com.example.myapp --platform android --serial emulator-5554 --session qa-android --relaunch
agent-device snapshot -i
agent-device close

Do not use open <apk|aab> --relaunch on Android. Install/reinstall binaries first, then relaunch by package.

1c) Session-Bound Automation Flow

export AGENT_DEVICE_SESSION=qa-ios
export AGENT_DEVICE_PLATFORM=ios
export AGENT_DEVICE_SESSION_LOCK=strip

agent-device open MyApp --relaunch
agent-device snapshot -i
agent-device batch --steps-file /tmp/qa-steps.json --json
agent-device close

Use this for orchestrators that must preserve one bound session/device across many plain CLI calls without a wrapper script. In strip mode, conflicting selectors such as --target, --device, --udid, --serial, and isolation-scope overrides are ignored instead of retargeting the run.

1d) Android Emulator Session-Bound Flow

export AGENT_DEVICE_SESSION=qa-android
export AGENT_DEVICE_PLATFORM=android

agent-device reinstall MyApp /path/to/app-debug.apk --serial emulator-5554
agent-device --session-lock reject open com.example.myapp --relaunch
agent-device snapshot -i
agent-device close --shutdown

Use this when an Android emulator session must stay pinned while an agent or test runner issues plain CLI commands over time.

2) Debug/Crash Flow

agent-device open MyApp --platform ios
agent-device logs clear --restart
agent-device network dump 25
agent-device logs path

Logging is off by default. Enable only for debugging windows. logs clear --restart requires an active app session (open <app> first).

3) Replay Maintenance Flow

agent-device replay -u ./session.ad

4) Remote Tenant Lease Flow (HTTP JSON-RPC)

# Client points directly at the remote daemon HTTP base URL.
export AGENT_DEVICE_DAEMON_BASE_URL=http://mac-host.example:4310
export AGENT_DEVICE_DAEMON_AUTH_TOKEN=<token>

# Allocate lease
curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
  -H "content-type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"jsonrpc":"2.0","id":"alloc-1","method":"agent_device.lease.allocate","params":{"runId":"run-123","tenantId":"acme","ttlMs":60000}}'

# Use lease in tenant-isolated command execution
agent-device \
  --tenant acme \
  --session-isolation tenant \
  --run-id run-123 \
  --lease-id <lease-id> \
  session list --json

# Heartbeat and release
curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
  -H "content-type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"jsonrpc":"2.0","id":"hb-1","method":"agent_device.lease.heartbeat","params":{"leaseId":"<lease-id>","ttlMs":60000}}'
curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
  -H "content-type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"jsonrpc":"2.0","id":"rel-1","method":"agent_device.lease.release","params":{"leaseId":"<lease-id>"}}'

Notes:

AGENT_DEVICE_DAEMON_BASE_URL makes the CLI skip local daemon discovery/startup and call the remote HTTP daemon directly.
AGENT_DEVICE_DAEMON_AUTH_TOKEN is sent in both the JSON-RPC request token and HTTP auth headers.
In remote daemon mode, --debug does not tail a local daemon.log; inspect logs on the remote host instead.

Command Skeleton (Minimal)

Session and navigation

agent-device devices
agent-device devices --platform ios --ios-simulator-device-set /tmp/tenant-a/simulators
agent-device devices --platform android --android-device-allowlist emulator-5554,device-1234
agent-device ensure-simulator --device "iPhone 16" --ios-simulator-device-set /tmp/tenant-a/simulators
agent-device ensure-simulator --device "iPhone 16" --runtime com.apple.CoreSimulator.SimRuntime.iOS-18-4 --ios-simulator-device-set /tmp/tenant-a/simulators --boot
agent-device open [app|url] [url]
agent-device open [app] --relaunch
agent-device close [app]
agent-device install <app> <path-to-binary>
agent-device reinstall <app> <path-to-binary>
agent-device session list

Use boot only as fallback when open cannot find/connect to a ready target. If the workspace repeats the same selectors or device/session flags, prefer a checked-in agent-device.json or --config <path> over repeating them inline. Environment-level defaults follow the same fields via AGENT_DEVICE_* names, so persistent host-specific values belong there rather than in committed project config. That includes bound-session defaults such as sessionLock / AGENT_DEVICE_SESSION_LOCK when automation should consistently reject or strip conflicting device routing flags. For Android emulators by AVD name, use boot --platform android --device <avd-name>. For Android emulators without GUI, add --headless. Use --target mobile|tv with --platform (required) to pick phone/tablet vs TV targets (AndroidTV/tvOS). For Android React Native + Metro flows, install or reinstall the APK first, set runtime hints with runtime set, then use open <package> --relaunch; do not use open <apk|aab> --relaunch. For local iOS QA in mixed simulator/device environments, use ensure-simulator and pass --device or --udid so automation does not attach to a physical device by accident. For session-bound automation, prefer AGENT_DEVICE_SESSION + AGENT_DEVICE_PLATFORM; that bound-session default now enables lock mode automatically.

Isolation scoping quick reference:

--ios-simulator-device-set <path> scopes iOS simulator discovery + command execution to one simulator set.
--android-device-allowlist <serials> scopes Android discovery/selection to comma/space separated serials.
Scope is applied before selectors (--device, --udid, --serial); out-of-scope selectors fail with DEVICE_NOT_FOUND.
With iOS simulator-set scope enabled, iOS physical devices are not enumerated.
In bound-session strip mode, conflicting per-call scope/selectors are ignored and the configured binding is restored for the request. Batch steps still inherit the parent --platform when they do not set their own.

Simulator provisioning quick reference:

Use ensure-simulator to create or reuse a named iOS simulator inside a device set before starting a session.
--device <name> is required (e.g. "iPhone 16 Pro"). --runtime <id> pins the runtime; omit to use the newest compatible one.
--boot boots it immediately. Returns udid, device, runtime, ios_simulator_device_set, created, booted.
Idempotent: safe to call repeatedly; reuses an existing matching simulator by default.

TV quick reference:

AndroidTV: open/apps use TV launcher discovery automatically.
TV target selection works on emulators/simulators and connected physical devices (AndroidTV + AppleTV).
tvOS: runner-driven interactions and snapshots are supported (snapshot, wait, press, fill, get, scroll, back, home, app-switcher, record and related selector flows).
tvOS back/home/app-switcher map to Siri Remote actions (menu, home, double-home) in the runner.
tvOS follows iOS simulator-only command semantics for helpers like pinch, settings, and push.

Snapshot and targeting

agent-device snapshot -i
agent-device diff snapshot -i
agent-device find "Sign In" click
agent-device press @e1
agent-device fill @e2 "text"
agent-device is visible 'id="anchor"'

press is canonical tap command; click is an alias.

Utilities

agent-device appstate
agent-device clipboard read
agent-device clipboard write "token"
agent-device keyboard status
agent-device keyboard dismiss
agent-device perf --json
agent-device network dump [limit] [summary|headers|body|all]
agent-device push <bundle|package> <payload.json|inline-json>
agent-device trigger-app-event screenshot_taken '{"source":"qa"}'
agent-device get text @e1
agent-device screenshot out.png
agent-device settings permission grant notifications
agent-device settings permission reset camera
agent-device trace start
agent-device trace stop ./trace.log

Batch (when sequence is already known)

agent-device batch --steps-file /tmp/batch-steps.json --json

Performance Check

Use agent-device perf --json (or metrics --json) after open.
For detailed metric semantics, caveats, and interpretation guidance, see references/perf-metrics.md.

Guardrails (High Value Only)

Re-snapshot after UI mutations (navigation/modal/list changes).
Prefer snapshot -i; scope/depth only when needed.
Use refs for discovery, selectors for replay/assertions.
find "<query>" click --json returns { ref, locator, query, x, y } — all derived from the matched snapshot node. Do not rely on these fields from raw press/click responses for observability; use find instead.
Use fill for clear-then-type semantics; use type for focused append typing.
Use install for in-place app upgrades (keep app data when platform permits), and reinstall for deterministic fresh-state runs.
App binary format support for install/reinstall: Android .apk/.aab, iOS .app/.ipa.
Android .aab requires bundletool in PATH, or AGENT_DEVICE_BUNDLETOOL_JAR=<path-to-bundletool-all.jar> with java in PATH.
Android .aab optional: set AGENT_DEVICE_ANDROID_BUNDLETOOL_MODE=<mode> to control bundletool build-apks --mode (default: universal).
iOS .ipa: extract/install from Payload/*.app; when multiple app bundles are present, <app> is used as a bundle id/name hint.
iOS appstate is session-scoped; Android appstate is live foreground state. iOS responses include device_udid and ios_simulator_device_set for isolation verification.
iOS open responses include device_udid and ios_simulator_device_set to confirm which simulator handled the session.
Clipboard helpers: clipboard read / clipboard write <text> are supported on Android and iOS simulators; iOS physical devices are not supported yet.
Android keyboard helpers: keyboard status|get|dismiss report keyboard visibility/type and dismiss via keyevent when visible.
network dump is best-effort and parses HTTP(s) entries from the session app log file.
Biometric settings: iOS simulator supports settings faceid|touchid <match|nonmatch|enroll|unenroll>; Android supports settings fingerprint <match|nonmatch> where runtime tooling is available.
For AndroidTV/tvOS selection, always pair --target with --platform (ios, android, or apple alias); target-only selection is invalid.
push simulates notification delivery:
- iOS simulator uses APNs-style payload JSON.
- Android uses broadcast action + typed extras (string/boolean/number).
trigger-app-event requires app-defined deep-link hooks and URL template configuration (AGENT_DEVICE_APP_EVENT_URL_TEMPLATE or platform-specific variants).
trigger-app-event requires an active session or explicit selectors (--platform, --device, --udid, --serial); on iOS physical devices, custom-scheme triggers require active app context.
Canonical trigger behavior and caveats are documented in website/docs/docs/commands.md under App event triggers.
Permission settings are app-scoped and require an active session app: settings permission <grant|deny|reset> <camera|microphone|photos|contacts|notifications> [full|limited]
iOS simulator permission alerts: use alert wait then alert accept/dismiss — accept/dismiss retry internally for up to 2 s so you do not need manual sleeps. See references/permissions.md.
full|limited mode applies only to iOS photos; other targets reject mode.
On Android, non-ASCII fill/type may require an ADB keyboard IME on some system images; only install IME APKs from trusted sources and verify checksum/signature.
If using --save-script, prefer explicit path syntax (--save-script=flow.ad or ./flow.ad).
For tenant-isolated remote runs, always pass --tenant, --session-isolation tenant, --run-id, and --lease-id together.
Use short lease TTLs and heartbeat only while work is active; release leases immediately after run completion/failure.
Env equivalents for scoped runs: AGENT_DEVICE_IOS_SIMULATOR_DEVICE_SET (compat IOS_SIMULATOR_DEVICE_SET) and AGENT_DEVICE_ANDROID_DEVICE_ALLOWLIST (compat ANDROID_DEVICE_ALLOWLIST).
For explicit remote client mode, prefer AGENT_DEVICE_DAEMON_BASE_URL / --daemon-base-url instead of relying on local daemon metadata or loopback-only ports.

Common Failure Patterns

Failed to access Android app sandbox for /path/app-debug.apk: Android relaunch/runtime-hint flow received an APK path instead of an installed package name. Use reinstall first, then open <package> --relaunch.
mkdir: Needs 1 argument while writing ReactNativeDevPrefs.xml: likely an older agent-device build or stale global install is still using the shell-based Android runtime-hint writer. Verify the exact binary being invoked.
Failed to terminate iOS app: the flow may have selected a physical iPhone or an unavailable iOS target. Re-run with ensure-simulator, then pin the simulator with --device or --udid.

Security and Trust Notes

Prefer a preinstalled agent-device binary over on-demand package execution.
If install is required, pin an exact version (for example: npx --yes agent-device@<exact-version> --help).
Signing/provisioning environment variables are optional, sensitive, and only for iOS physical-device setup.
Logs/artifacts are written under ~/.agent-device; replay scripts write to explicit paths you provide.
For remote daemon mode, prefer AGENT_DEVICE_DAEMON_SERVER_MODE=http|dual on the host plus client-side AGENT_DEVICE_DAEMON_BASE_URL, with AGENT_DEVICE_HTTP_AUTH_HOOK and tenant-scoped lease admission where needed.
Keep logging off unless debugging and use least-privilege/isolated environments for autonomous runs.

Common Mistakes

Mixing debug flow into normal runs (keep logs off unless debugging).
Continuing to use stale refs after screen transitions.
Using URL opens with Android --activity (unsupported combination).
Treating boot as default first step instead of fallback.

SkillForge

agent-device

Before / After 效果对比

description 文档