---
id: daily-momentic-result-classification
name: "momentic-result-classification"
url: https://skills.yangsir.net/skill/daily-momentic-result-classification
author: momentic-ai
domain: testing
tags: ["testing", "classification", "momentic", "e2e", "analysis"]
install_count: 11700
rating: 4.50 (8 reviews)
github: https://github.com/momentic-ai/skills
---

# momentic-result-classification

> 对 Momentic 测试结果进行分类和分析，识别失败模式、性能问题和异常情况

**Stats**: 11,700 installs · 4.5/5 (8 reviews)

## Before / After 对比

### Momentic测试失败智能分类

**Before**:

当Momentic测试失败时，工程师需要手动深入分析失败原因。这通常涉及下载并解压测试运行结果（如`runId.zip`），然后逐一检查`metadata.json`、日志、截图、网络请求等资产。为了识别重复模式或回归问题，还需要耗费大量时间手动查找并比对同一测试的历史运行结果，尤其是在不同Git分支上的版本。整个过程高度依赖人工经验，耗时且容易出错，导致故障分类不一致，严重拖慢了根本原因分析和修复的进度，影响了测试反馈的效率和准确性。

**After**:

借助`momentic-result-classification`技能，Momentic测试失败的分析和分类过程实现了自动化。该技能会自动调用`momentic_get_run`获取当前运行的详细信息，并利用`momentic_list_runs`智能地检索并比对相关历史运行数据（包括特定Git分支上的版本）。它能够快速处理原始测试结果，识别失败模式、性能问题和异常情况，并自动将故障归类到预定义类别，同时提供清晰的解释。这极大地缩短了故障诊断时间，提高了分类的一致性和准确性，使工程师能更快地理解问题并采取行动，从而加速了测试反馈循环。

| Metric | Before | After | Change |
|---|---|---|---|
| 故障分类时间 | 20分钟 | 2分钟 | -90% |
| 故障分类一致性 | 65% | 95% | +46.15% |

## Readme

# momentic-result-classification

# Momentic result classification (MCP)

Momentic is an end-to-end testing framework where each test is composed of browser interaction steps. Each step combines Momentic-specific behavior (AI checks, natural-language locators, ai actions, etc.) with Playwright capabilities wrapped in our YAML step schema. When these tests are run, they produce results data that can be used to analyze the outcome of the test. The results data contains metadata about the run as well as any assets generated by the run (e.g. screenshots, logs, network requests, video recordings, etc.). Your job is to use these test results to classify failures that occurred in Momentic test runs.

## Instructions

- Given a failing test run, analyze why the test run failed. Often you'll need to look beyond the current run to understand this, looking at past runs of the same test, or other context provided by the Momentic MCP tools

- After analyzing why the run failed, bucket the failure into one of the below categories, explaining the reasoning for choosing the specific category.

## Helpful MCP tools

`momentic_get_run` — Returns some metadata about the run and the path to the full run results. Use the metadata to help you parse through the run results (e.g. which attempt to look at, which step failed, etc.)

`momentic_list_runs` — Recent runs for a test so you can compare the result of past runs over time. **Always pass `gitBranchName` when it exists on the run in question** so that it's more likely you're looking at the same version of the test.

## Background

### Test run result structure

When momentic tests are run via the CLI, the results are stored in a "run group". The data for this run group is stored in a single directory within the momentic project. By default, the directory is called `test-results`, but can be changed in momentic project settings or on a single run of a run group. The run group results folder has the following structure:

```
test-results/
├── metadata.json         data about the run group, including git metadata and timing info.
└── runs/                 On zip for each test run in the run group.
    ├── <runId_1>.zip         a zipped run directory containing data about this specific test run.  Follows the structure described below.
    └── <runId_2>.zip

```

When unzipped, run directories have the following structure:

```
<runId>/
├── metadata.json           run-level metadata.
└── attempts/<n>/           one folder per attempt (1-based n).
    ├── metadata.json       attempt outcome and step results.
    ├── console.json        optional browser console output.
    └── assets/
        ├── <snapshotId>.jpeg     before/after screenshot for each step (see attempt metadata.json for snapshot ID).
        ├── <snapshotId>.html     before/after DOM snapshot for each step (see attempt metadata.json for snapshot ID).
        ├── har-pages.log         HAR pages (ndjson).
        ├── har-entries.log       HAR network entries (ndjson).
        ├── resource-usage.ndjson CPU/memory samples taken during the attempt.
        ├── <videoName>           video recording (when video recording is enabled).
        └── browser-crash.zip     browser crash dump (only present on crash).

```

When getting run results via the momentic MCP, tools such as `momentic_get_run` will return links to the MCP working directory (default `.momentic-mcp`). This directory will contain unzipped run result folders, following the structure above, named `run-result-<runId>`.

### Element locators

Certain step types that interact with elements have a "target" property, or **locator**, that specifies which element the step should interact with.

#### Locator caches

Locators identify elements by sending the page state html/xml to an llm as well as a screenshot. The llm identifies which element on the page the user is referring to. Momentic will attempt to "cache" the answer from the llm so that future runs don't require AI calls. On future runs, the page state is checked against the cached element to determine whether the element is still usable, or the page has changed enough such that another AI call is required.

A locator cache can bust for a variety of reasons:

- the element description has changed, in which case we'll always bust the cache

- the cached element could not be located in the current page state

- the cached element was located in the page state, but fails certain checks specified on the cache entry, such as requiring a certain position, shape, or content.

You can find the `cacheBustReason` on the `trace` property in the results for a given step. The `cache` property is also listed on the results, showing the full cache saved for that element.

#### Identifying bad caches

Sometimes the element that was cached is not the element that the user intended to target. This can cause failures or unexpected behaviors in tests. In these cases, it helps to verify exactly why the wrong cache was saved in the first place. Use the `runId` property of the `targetUpdateLoggerTags` on the incorrect cache to get the details of the original run, calling `momentic_get_run` with this runId. This will return the run where the cache target was updated.

## Using past runs

You MUST look at past runs of the same test when understanding why a test failed. Looking at past runs helps you identify:

- When did this test start failing?

- What differed vs the last passing run?

- Did the same action behave differently on an earlier run?

Use step results and screenshots on past runs to answer these questions. Do NOT rely only on summaries from `momentic_get_run` or `momentic_list_runs` to understand what happened in a test run. You MUST look at the specific run details, including step results and screenshots, to determine the behavior of past runs.

When looking at past runs, use the following workflow:

- Call the `momentic_list_runs` tool to identify the runs you want more detail on.

- Call `momentic_get_run` for that specific run to get the run details.

**ALWAYS** look at screenshots when determining the behavior of test runs.

### Multi-attempt runs

When `momentic_list_runs` shows a passing run with `attempts > 1`, treat it as a partial failure worth investigating, not a clean passing run. Pull the first attempt's step results and failure messages to understand what was going wrong before the retry succeeded.

### Flakiness and intermittent failures

- In order to consider a test flaky or failing intermittently, it must be intermittently failing **for the same app and test behavior**.

Just because a test failed once does NOT mean that it's flaky - it could have failed because of an application change. You need to determine whether or not there was an application or test change between runs by analyzing the screenshots and/or browser state in the results.

- **IMPORTANT**: You cannot make assumptions about flakiness or intermittent failures without verifying whether there was an application or test change that caused the failure

### Test temporality

- Any past results may not necessarily match today’s test file. The test may have changed, meaning the result was on a different version of the test.

- Looking at the `simplifiedTestSteps` property in the response from `momentic_get_run` can help you determine whether the test has changed.

- For specific step configuration details, look at the `stepsSnapshot` property of the full run results.

## Identifying related vs unrelated issues

- Use test name, description, and the `simplifiedTestSteps` property on the response from `momentic_get_run` to determine what the test is intending to verify

- Failures outside that intent are unrelated, otherwise consider them related.

- Any failures in setup or teardown steps are pretty much always considered unrelated

## Bug vs change

- Bug: something very clearly went wrong when it shouldn't have, such as an error message appearing. It's obvious just by looking at a single step or two that this is a bug.

- Change: any other behavior changes in the application

## Formal classification output

- Exactly one category id — no new labels, no multi-label.

- Ground your decision in data. Be sure that you've fully investigated the run before assigning the category.

```
Reasoning: <a few sentences tied to summary, past runs, and intent>
Category: <one id from the list>

```

## Category ids

Use these strings verbatim:

- `NO_FAILURE` — Nothing failed; all attempts passed.

- `RELATED_APPLICATION_CHANGE` — Related to intent; expectation drift / change, not a clear defect.

- `RELATED_APPLICATION_BUG` — Related to intent; clearly incorrect behavior.

- `UNRELATED_APPLICATION_CHANGE` — Outside intent; not a clear bug.

- `UNRELATED_APPLICATION_BUG` — Outside intent but clearly broken.

- `TEST_CAN_BE_IMPROVED` — Test/automation issue (race, vague locator or assertion).

- `INFRA` — Rare or external (browser crash, resource pressure, rate limits, flaky environment).

- `PERFORMANCE` — Load/responsiveness (stuck spinner, assertion timeouts) when not pure infra.

- `MOMENTIC_ISSUE` — There was an issue with momentic itself, the platform running the test (e.g. an AI hallucination, data issues, incorrectly redirecting to the wrong element).

Weekly Installs515Repository[momentic-ai/skills](https://github.com/momentic-ai/skills)GitHub Stars10First Seen6 days agoSecurity Audits[Gen Agent Trust HubPass](/momentic-ai/skills/momentic-result-classification/security/agent-trust-hub)[SocketPass](/momentic-ai/skills/momentic-result-classification/security/socket)[SnykWarn](/momentic-ai/skills/momentic-result-classification/security/snyk)Installed onopencode515gemini-cli515deepagents515antigravity515github-copilot515codex515

---
*Source: https://skills.yangsir.net/skill/daily-momentic-result-classification*
*Markdown mirror: https://skills.yangsir.net/api/skill/daily-momentic-result-classification/markdown*