首页/AI 智能体核心开发/minimal-run-and-audit
M

minimal-run-and-audit

by @lllllllamav
4.8(656)

RigorPilot 技能是面向深度学习实验的以研究为先的智能体技能。它赋能 AI 智能体以科学严谨的方式复现、改进、探索和审计深度学习研究项目,确保有意义的变更、公平的比较、可复现的证据和可审计的修改。

deep-learningai-researchreproducibilityagent-skillsexperimentationGitHub
安装方式
npx skills add lllllllama/ai-paper-reproduction-skill --skill minimal-run-and-audit
compare_arrows

Before / After 效果对比

1
使用前

在没有标准化工具辅助下,AI 智能体或研究人员在复现和审计深度学习实验时,常因环境差异、代码变动或数据管理不善,耗费大量时间排查问题,导致研究进展缓慢且结果不可靠。

使用后

RigorPilot 确保实验的科学严谨性、可复现性和可审计性。AI 智能体能快速识别并验证实验设置与结果,大幅减少排查时间,加速研究迭代,提升结果可信度。

SKILL.md

RigorPilot Skills

Research-first Agent Skills for Deep Learning Experiments.

RigorPilot helps AI agents reproduce, improve, and explore deep learning research projects with scientific rigor: meaningful changes, fair comparison, reproducible evidence, and auditable modifications.

Not just higher scores. Meaningful deep learning research progress.

Brand note: the project brand is RigorPilot Skills; the recommended GitHub repository slug is rigorpilot-skills. Legacy install paths remain documented only as compatibility fallbacks while older clients and bookmarks migrate.

Migration note:

  • Project brand: ai-research-workflow-skills -> RigorPilot Skills
  • Existing compatible skill slugs remain available.
  • Preferred install source: lllllllama/rigorpilot-skills
  • Legacy fallback source: lllllllama/ai-paper-reproduction-skills
  • ai-paper-reproduction -> ai-research-reproduction
  • research-explore -> ai-research-explore

What RigorPilot Is

  • Research-first Agent Skills for deep learning experiments.
  • It helps AI agents reproduce, improve, explore, and audit deep learning research work.
  • It is designed for personal research use first.
  • It values scientific meaning, fair comparison, reproducibility, explainability, and collaborator control.
  • It encourages meaningful novelty during exploration, but does not overclaim novelty.

What RigorPilot Is Not

  • Not a generic coding agent.
  • Not a score-chasing automation framework.
  • Not a guarantee of novel discoveries.
  • Not a replacement for researcher judgment.
  • Not a rigid workflow that should weaken strong models.

Core Principles

  1. Do not chase scores blindly.
  2. Do not claim novelty lightly.
  3. Do not break comparability silently.
  4. Do not disguise engineering fixes as research contributions.
  5. Do not leave collaborators out of control.

See references/research-rigor-principles.md.

Rigor and Novelty

Rigor is the baseline. Novel is the aspiration.

Novelty and significance remain hypotheses until supported by literature contrast, ablation evidence, and fair comparison.

RigorPilot should add research judgment and audit awareness without making strong models slower, more mechanical, or less capable.

Deep Learning Focus

RigorPilot is built for deep learning research repositories where README commands, environment setup, data, weights, checkpoints, training, evaluation, metrics, logs, baselines, SOTA tables, and ablations all matter.

This repository is still built around one compatibility rule: trusted by default.

  • Ambiguous requests route to the trusted lane.
  • Exploration requires explicit authorization.
  • Trusted outputs are auditable and durable.
  • Explore outputs are candidate-only and disposable.

Shared operating principles live in references/agent-operating-principles.md. They keep the skills focused on high-level guidance: think before acting, keep the solution small, change only what is necessary, and work toward verifiable goals. They are guardrails, not a detailed script for every implementation choice.

🧭 Current Repo Snapshot

This repository currently ships:

  • 11 skills total: 9 public skills and 2 helper skills.
  • 6 trusted-lane public skills and 3 explore-lane public skills.
  • 4 project-scoped Claude Code command wrappers under .claude/commands/.
  • 45 Python scripts, including 43 test scripts with focused research-explore regressions and document-structure checks.
  • A RigorPilot Explore chain that now includes bounded idea-seed generation, explicit idea score breakdowns, atomic idea decomposition, and implementation-fidelity evidence split into planned, heuristic, and observed layers.
  • A documented and tested workflow intended to be usable from both Windows PowerShell and Linux shells.

The skills use the open SKILL.md layout, so the same repository can be installed into neutral Agent Skills directories as well as Codex and Claude Code. For shared local installs, prefer ~/.agents/skills/ or ./.agents/skills/. Client-specific installs under ~/.codex/skills/ and ~/.claude/skills/ remain supported.

💻 Windows and Linux Notes

This repository is intended to be usable on both Windows and Linux.

  • The command examples below are written in a shell-neutral style around python ..., npx ..., and relative paths.
  • For user-scoped install targets, prefer $HOME/.agents/skills, $HOME/.codex/skills, and $HOME/.claude/skills. These work well in Linux shells and in PowerShell, and Python accepts forward slashes on Windows paths.
  • Project-scoped paths such as ./.agents/skills and ./tmp/codex-skills are also valid on both platforms.
  • The repository validation and routing checks are already exercised on Windows and Linux-oriented environments through local tests and CI.

🛠️ Install

For most users, start with npx. It is the shortest path and should be enough for normal use.

Recommended: npx

Install the full repository skill set:

npx skills add lllllllama/rigorpilot-skills --all

Install only the trusted main entrypoint:

npx skills add lllllllama/rigorpilot-skills --skill ai-research-reproduction

Install only the exploratory main entrypoint:

npx skills add lllllllama/rigorpilot-skills --skill ai-research-explore

If you only want to get started quickly, stop here.

Claude Code can auto-invoke these skills when the descriptions match, or you can call them directly with commands such as /ai-research-reproduction, /ai-research-explore, and /safe-debug.

Project-scoped Claude Code slash commands currently ship for:

  • /ai-research-reproduction
  • /ai-research-explore
  • /analyze-project
  • /safe-debug

Advanced: local clone installs

Use the Python installer only if you are developing locally, need a project-scoped install, or want to target neutral Agent Skills, Codex, or Claude Code directories manually.

Install from a local clone into a neutral Agent Skills directory:

python scripts/install_skills.py --client agents --target "$HOME/.agents/skills" --force

Install into a project-scoped neutral Agent Skills directory:

python scripts/install_skills.py --client agents --target ./.agents/skills --force

Install with the default neutral target:

python scripts/install_skills.py --force

Install the full repository skill set in Codex:

npx skills add lllllllama/rigorpilot-skills --all

Install only the trusted reproduction orchestrator in Codex:

npx skills add lllllllama/rigorpilot-skills --skill ai-research-reproduction

Legacy GitHub source fallback, if the new slug is not yet available in your environment:

npx skills add lllllllama/ai-paper-reproduction-skills --all

Install from a local clone into Codex:

python scripts/install_skills.py --client codex --target "$HOME/.codex/skills" --force

Install from a local clone into Claude Code:

python scripts/install_skills.py --client claude --target "$HOME/.claude/skills" --force

Install into a project-scoped Claude Code skills directory:

python scripts/install_skills.py --client claude --target ./.claude/skills --force

PowerShell note:

  • In Windows PowerShell, the same commands work as written above.
  • If you prefer explicit Windows-style paths, replace $HOME/.codex/skills with something like $env:USERPROFILE\\.codex\\skills.

🎯 Choose an Entry Point

RigorPilot modes map to the current compatible skill slugs:

If you want to...RigorPilot modeCurrent skill slug
Reproduce a deep learning repository from README commandsReproduceai-research-reproduction
Explore meaningful and potentially novel ideas on top of current researchExploreai-research-explore
Improve a baseline while preserving comparabilityImproveai-research-explore, explore-code, explore-run
Audit changes, scientific meaning, and comparabilityAuditanalyze-project, safe-debug, generated reports
Analyze repository structure without editingAnalyzeanalyze-project
Prepare environment, datasets, weights, and cache assumptionsSetupenv-and-assets-bootstrap
Run documented evaluation or inference conservativelyRunminimal-run-and-audit
Start or verify training conservativelyTrainrun-train
Debug a failure safelyDebugsafe-debug

Bundled helper skills:

  • repo-intake-and-plan
  • paper-context-resolver

🛣️ Lane Model

🔒 Trusted Lane

Use the trusted lane for reproduction, setup, analysis, bounded execution, training verification, and debugging.

  • Primary end-to-end orchestrator: ai-research-reproduction
  • Output directories: repro_outputs/, train_outputs/, analysis_outputs/, debug_outputs/
  • Default stance: preserve scientific meaning, minimize semantic changes, surface assumptions and blockers

🧪 Explore Lane

Use the explore lane only when the researcher explicitly authorizes candidate-only exploratory work.

  • Primary end-to-end orchestrator: ai-research-explore
  • Narrow leaf skills: explore-code, explore-run
  • Output directory: explore_outputs/
  • Key anchor: current_research

current_research should be a durable reference such as a branch, commit, checkpoint, run record, or already-trained local model state. It does not imply a trusted baseline; it is the context the exploration branches from.

🧰 Helper Lane

Helpers are intentionally narrow and should usually be orchestrator-invoked rather than used as the first entry point.

🔗 Client Compatibility

SKILL.md is the canonical cross-client contract in this repository.

  • Required for portability: SKILL.md, repository-local scripts/, and references/
  • Optional Codex UI metadata: agents/openai.yaml
  • Optional Claude Code project entrypoints: .claude/commands/*.md
  • Not allowed: making skill behavior depend on a client-specific metadata file

See references/client-compatibility-policy.md.

🔁 Lifecycle View

The repository follows a lifecycle-oriented routing model:

flowchart LR
    A[Understand] --> B[Reproduce]
    B --> C[Set up]
    C --> D[Run or train]
    D --> E[Debug]
    E --> F[Report]
    B -. explicit only .-> G[Explore]
    G --> H[Rank candidates]
    H --> F

This lifecycle is intentionally shallow. It helps the agent choose the right lane and evidence target without forcing a fixed implementation sequence inside each repository.

🗺️ Routing Overview

flowchart TD
    A[User request] --> B{Explicit candidate-only exploration?}
    B -- No --> C[Trusted lane]
    B -- Yes --> D[Explore lane]

    C --> C1[ai-research-reproduction]
    C --> C2[analyze-project]
    C --> C3[env-and-assets-bootstrap]
    C --> C4[minimal-run-and-audit]
    C --> C5[run-train]
    C --> C6[safe-debug]

    D --> D1[ai-research-explore]
    D --> D2[explore-code]
    D --> D3[explore-run]

    C1 -. helper .-> H1[repo-intake-and-plan]
    C1 -. helper .-> H2[paper-context-resolver]

🧠 RigorPilot Explore Flow

ai-research-explore is the RigorPilot Explore entrypoint when the researcher has already frozen the task family, dataset, evaluation method, and provided SOTA references, then explicitly authorizes candidate-only exploration on top of current_research. In RigorPilot terms, this is meaningful and potentially novel candidate work, not verified novelty.

flowchart LR
    A[current_research + frozen campaign] --> B[Outer loop:<br/>understand, source, gate]
    B --> C{candidate worth trying?}
    C -- No --> D[Stop with blocker or checkpoint]
    C -- Yes --> E[Inner loop:<br/>bounded change or run]
    E --> F[Smoke and evidence]
    F --> G[Rank candidate]
    G --> B
    G --> H[explore_outputs<br/>candidate-only summary]

Current RigorPilot implementation highlights:

  • Researcher ideas are preserved, then optionally expanded with bounded synthesized or hybrid seed ideas in analysis_outputs/IDEA_SEEDS.json.
  • Idea ranking uses hard gates plus explicit weighted breakdowns in analysis_outputs/IDEA_SCORES.json.
  • Selected ideas are decomposed into atomic academic concepts in analysis_outputs/ATOMIC_IDEA_MAP.md and analysis_outputs/ATOMIC_IDEA_MAP.json.
  • Implementation fidelity distinguishes planned, heuristic, and observed implementation evidence in analysis_outputs/IMPLEMENTATION_FIDELITY.md and analysis_outputs/IMPLEMENTATION_FIDELITY.json.
  • Executor-observed evidence now comes from emitted changed_files, new_files, deleted_files, and touched_paths rather than planned target placeholders.

The two-loop rhythm is a guide, not a never-stop autonomous agent. RigorPilot Explore stops at explicit blockers, unclear scientific meaning, exhausted budget, missing anchors, or human checkpoints. The explore lane must not claim trusted reproduction success, global benchmark completeness, or verified novelty.

📦 Public Skill Matrix

LaneSkillPurpose
Trustedai-research-reproductionEnd-to-end README-first reproduction orchestrator
Trustedenv-and-assets-bootstrapConservative environment, dataset, checkpoint, and cache planning
Trustedminimal-run-and-auditTrusted inference, evaluation, smoke, and sanity execution
Trustedanalyze-projectRead-only project analysis, model mapping, and risk surfacing
Trustedrun-trainTraining startup verification, resume handling, bounded monitoring, and training records
Trustedsafe-debugResearch-safe debugging: analyze first, patch only after approval
Explore`ai-research

...

用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价

统计数据

安装量127.4K
评分4.8 / 5.0
版本
更新日期2026年5月23日
对比案例1 组

用户评分

4.8(656)
5
41%
4
47%
3
12%
2
1%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月31日
最后更新2026年5月23日