playwright-interactive

Playwright Interactive Skill

Use a persistent js_repl Playwright session to debug local web or Electron apps, keep the same handles alive across iterations, and run functional plus visual QA without restarting the whole toolchain unless the process ownership changed.

Preconditions

js_repl must be enabled for this skill.
If js_repl is missing, enable it in ~/.codex/config.toml:

[features]
js_repl = true

You can also start a new session with --enable js_repl (equivalent to -c features.js_repl=true).
After enabling js_repl, start a new Codex session so the tool list refreshes.
For now, run this workflow with sandboxing disabled: start Codex with --sandbox danger-full-access (or the equivalent config for sandbox_mode=danger-full-access). This is a temporary requirement while js_repl + Playwright support inside the sandbox is still being completed.
Run setup from the same project directory you need to debug.
Treat js_repl_reset as a recovery tool, not routine cleanup. Resetting the kernel destroys your Playwright handles.

One-time setup

test -f package.json || npm init -y
npm install playwright
# Web-only, for headed Chromium or mobile emulation:
# npx playwright install chromium
# Electron-only, and only if the target workspace is the app itself:
# npm install --save-dev electron
node -e "import('playwright').then(() => console.log('playwright import ok')).catch((error) => { console.error(error); process.exit(1); })"

If you switch to a different workspace later, repeat setup there.

Core Workflow

Write a brief QA inventory before testing:

Build the inventory from three sources: the user's requested requirements, the user-visible features or behaviors you actually implemented, and the claims you expect to make in the final response.

Anything that appears in any of those three sources must map to at least one QA check before signoff.
List the user-visible claims you intend to sign off on.
List every meaningful user-facing control, mode switch, or implemented interactive behavior.
List the state changes or view changes each control or implemented behavior can cause.
Use this as the shared coverage list for both functional QA and visual QA.
For each claim or control-state pair, note the intended functional check, the specific state where the visual check must happen, and the evidence you expect to capture.
If a requirement is visually central but subjective, convert it into an observable QA check instead of leaving it implicit.
Add at least 2 exploratory or off-happy-path scenarios that could expose fragile behavior.
Run the bootstrap cell once.
Start or confirm any required dev server in a persistent TTY session.
Launch the correct runtime and keep reusing the same Playwright handles.
After each code change, reload for renderer-only changes or relaunch for main-process/startup changes.
Run functional QA with normal user input.
Run a separate visual QA pass.
Verify viewport fit and capture the screenshots needed to support your claims.
Clean up the Playwright session only when the task is actually finished.

Bootstrap (Run Once)

var chromium;
var electronLauncher;
var browser;
var context;
var page;
var mobileContext;
var mobilePage;
var electronApp;
var appWindow;

try {
  ({ chromium, _electron: electronLauncher } = await import("playwright"));
  console.log("Playwright loaded");
} catch (error) {
  throw new Error(
    `Could not load playwright from the current js_repl cwd. Run the setup commands from this workspace first. Original error: ${error}`
  );
}

Binding rules:

Use var for the shared top-level Playwright handles because later js_repl cells reuse them.
The setup cells below are intentionally short happy paths. If a handle looks stale, set that binding to undefined and rerun the cell instead of adding recovery logic everywhere.
Prefer one named handle per surface you care about (page, mobilePage, appWindow) over repeatedly rediscovering pages from the context.

Shared web helpers:

var resetWebHandles = function () {
  context = undefined;
  page = undefined;
  mobileContext = undefined;
  mobilePage = undefined;
};

var ensureWebBrowser = async function () {
  if (browser && !browser.isConnected()) {
    browser = undefined;
    resetWebHandles();
  }

  browser ??= await chromium.launch({ headless: false });
  return browser;
};

var reloadWebContexts = async function () {
  for (const currentContext of [context, mobileContext]) {
    if (!currentContext) continue;
    for (const p of currentContext.pages()) {
      await p.reload({ waitUntil: "domcontentloaded" });
    }
  }
  console.log("Reloaded existing web tabs");
};

Choose Session Mode

For web apps, use an explicit viewport by default and treat native-window mode as a separate validation pass.

Use an explicit viewport for routine iteration, breakpoint checks, reproducible screenshots, snapshot diffs, and model-assisted localization. This is the default because it is stable across machines and avoids host window-manager variability.
When you need deterministic high-DPI behavior, keep the explicit viewport and add deviceScaleFactor rather than switching straight to native-window mode.
Use native-window mode (viewport: null) for a separate headed pass when you need to validate launched window size, OS-level DPI behavior, browser chrome interactions, or bugs that may depend on the host display configuration.
For Electron, assume native-window behavior all the time. Electron launches through Playwright with noDefaultViewport, so treat it like a real desktop window and check the as-launched size and layout before resizing anything.
When signoff depends on both layout breakpoints and real desktop behavior, do both passes: explicit viewport first for deterministic QA, then native-window validation for final environment-specific checks.
Treat switching modes as a context reset. Do not reuse a viewport-emulated context for a native-window pass or vice versa; close the old page and context, then create a new one for the new mode.

Start or Reuse Web Session

Desktop and mobile web sessions share the same browser, helpers, and QA flow. The main difference is which context and page pair you create.

Desktop Web Context

Set TARGET_URL to the app you are debugging. For local servers, prefer 127.0.0.1 over localhost.

var TARGET_URL = "http://127.0.0.1:3000";

if (page?.isClosed()) page = undefined;

await ensureWebBrowser();
context ??= await browser.newContext({
  viewport: { width: 1600, height: 900 },
});
page ??= await context.newPage();

await page.goto(TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded:", await page.title());

If context or page is stale, set context = page = undefined and rerun the cell.

Mobile Web Context

Reuse TARGET_URL when it already exists; otherwise set a mobile target directly.

var MOBILE_TARGET_URL = typeof TARGET_URL === "string"
  ? TARGET_URL
  : "http://127.0.0.1:3000";

if (mobilePage?.isClosed()) mobilePage = undefined;

await ensureWebBrowser();
mobileContext ??= await browser.newContext({
  viewport: { width: 390, height: 844 },
  isMobile: true,
  hasTouch: true,
});
mobilePage ??= await mobileContext.newPage();

await mobilePage.goto(MOBILE_TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded mobile:", await mobilePage.title());

If mobileContext or mobilePage is stale, set mobileContext = mobilePage = undefined and rerun the cell.

Native-Window Web Pass

var TARGET_URL = "http://127.0.0.1:3000";

await ensureWebBrowser();

await page?.close().catch(() => {});
await context?.close().catch(() => {});
page = undefined;
context = undefined;

browser ??= await chromium.launch({ headless: false });
context = await browser.newContext({ viewport: null });
page = await context.newPage();

await page.goto(TARGET_URL, { waitUntil: "domcontentloaded" });
console.log("Loaded native window:", await page.title());

Start or Reuse Electron Session

Set ELECTRON_ENTRY to . when the current workspace is the Electron app and package.json points main to the right entry file. If you need to target a specific main-process file directly, use a path such as ./main.js instead.

var ELECTRON_ENTRY = ".";

if (appWindow?.isClosed()) appWindow = undefined;

if (!appWindow && electronApp) {
  await electronApp.close().catch(() => {});
  electronApp = undefined;
}

electronApp ??= await electronLauncher.launch({
  args: [ELECTRON_ENTRY],
});

appWindow ??= await electronApp.firstWindow();

console.log("Loaded Electron window:", await appWindow.title());

If js_repl is not already running from the Electron app workspace, pass cwd explicitly when launching.

If the app process looks stale, set electronApp = appWindow = undefined and rerun the cell.

If you already have an Electron session but need a fresh process after a main-process, preload, or startup change, use the restart cell in the next section instead of rerunning this one.

Reuse Sessions During Iteration

Keep the same session alive whenever you can.

Web renderer reload:

await reloadWebContexts();

Electron renderer-only reload:

await appWindow.reload({ waitUntil: "domcontentloaded" });
console.log("Reloaded Electron window");

Electron restart after main-process, preload, or startup changes:

await electronApp.close().catch(() => {});
electronApp = undefined;
appWindow = undefined;

electronApp = await electronLauncher.launch({
  args: [ELECTRON_ENTRY],
});

appWindow = await electronApp.firstWindow();
console.log("Relaunched Electron window:", await appWindow.title());

If your launch requires an explicit cwd, include the same cwd here.

Default posture:

Keep each js_repl cell short and focused on one interaction burst.
Reuse the existing top-level bindings (browser, context, page, electronApp, appWindow) instead of redeclaring them.
If you need isolation, create a new page or a new context inside the same browser.
For Electron, use electronApp.evaluate(...) only for main-process inspection or purpose-built diagnostics.
Fix helper mistakes in place; do not reset the REPL unless the kernel is actually broken.

Checklists

Session Loop

Bootstrap js_repl once, then keep the same Playwright handles alive across iterations.
Launch the target runtime from the current workspace.
Make the code change.
Reload or relaunch using the correct path for that change.
Update the shared QA inventory if exploration reveals an additional control, state, or visible claim.
Re-run functional QA.
Re-run visual QA.
Capture final artifacts only after the current state is the one you are evaluating.

Reload Decision

Renderer-only change: reload the existing page or Electron window.
Main-process, preload, or startup change: relaunch Electron.
New uncertainty about process ownership or startup code: relaunch instead of guessing.

Functional QA

Use real user controls for signoff: keyboard, mouse, click, touch, or equivalent Playwright input APIs.
Verify at least one end-to-end critical flow.
Confirm the visible result of that flow, not just internal state.
For realtime or animation-heavy apps, verify behavior under actual interaction timing.
Work through the shared QA inventory rather than ad hoc spot checks.
Cover every obvious visible control at least once before signoff, not only the main happy path.
For reversible controls or stateful toggles in the inventory, test the full cycle: initial state, changed state, and return to the initial state.
After the scripted checks pass, do a short exploratory pass using normal input for 30-90 seconds instead of following only the intended path.
If the exploratory pass reveals a new state, control, or claim, add it to the shared QA inventory and cover it before signoff.
page.evaluate(...) and electronApp.evaluate(...) may inspect or stage state, but they do not count as signoff input.

Visual QA

Treat visual QA as separate from functional QA.
Use the same shared QA inventory defined before testing and updated during QA; do not start visual coverage from a different implicit list.
Restate the user-visible claims and verify each one explicitly; do not assume a functional pass proves a visual claim.
A user-visible claim is not signed off until it has been inspected in the specific state where it is meant to be perceived.
Inspect the initial viewport before scrolling.
Confirm that the initial view visibly supports the interface's primary claims; if a core promised element is not clearly perceptible there, treat that as a bug.
Inspect all required visible regions, not just the main interaction surface.
Inspect the states and modes already enumerated in the shared QA inventory, including at least one meaningful post-interaction state when the task is interactive.
If motion or transitions are part of the experience, inspect at least one in-transition state in addition to the settled endpoints.
If labels, overlays, annotations, guides, or highlights are meant to track changing content, verify that relationship after the relevant state change.
For dynamic or interaction-dependent visuals, inspect long enough to judge stability, layering, and readability; do not rely on a single screenshot for signoff.
For interfaces that can become denser after loading or interaction, inspect the densest realistic state you can reach during QA, not only the empty, loading, or collapsed state.
If the product has a defined minimum supported viewport or window size, run a separate visual QA pass there; otherwise, choose a smaller but still realistic size and inspect it explicitly.
Distinguish presence from implementation: if an intended affordance is technically there but not clearly perceptible because of weak contrast, occlusion, clipping, or instability, treat that as a visual failure.
If any required visible region is clipped, cut off, obscured, or pushed outside the viewport in the state you are evaluating, treat that as a bug even if page-level scroll metrics appear acceptable.
Look for clipping, overflow, distortion, layout imbalance, inconsistent spacing, alignment problems, illegible text, weak contrast, broken layering, and awkward motion states.
Judge aesthetic quality as well as correctness. The UI should feel intentional, coherent, and visually pleasing for the task.
Prefer viewport screenshots for signoff. Use full-page captures only as secondary debugging artifacts, and capture a focused screenshot when a region needs closer inspection.
If motion makes a screenshot ambiguous, wait briefly for the UI to settle, then capture the i

...

playwright-interactive

Before / After Comparison