repo-intake-and-plan
RigorPilotスキルは、AIエージェントが科学的な厳密さをもって深層学習研究プロジェクトを再現、改善、探索できるようにします。これは、意味のある変更、公正な比較、再現可能な証拠、および監査可能な修正を重視し、単にスコアを追求するのではなく、深層学習研究の実質的な進歩を推進します。このスキルは、環境設定、データ、モデルトレーニング、評価などの主要な側面を処理する深層学習研究リポジトリに焦点を当てています。
npx skills add lllllllama/ai-paper-reproduction-skill --skill repo-intake-and-planBefore / After 効果比較
1 组深層学習論文の実験を手動で再現するには、環境設定、コードデバッグ、結果検証に多大な時間を要し、数日または数週間かかることがよくあります。このプロセスは人為的なエラーが発生しやすく、再現の失敗や結果の不一致につながります。
RigorPilotスキルを活用したAIエージェントは、深層学習実験の再現、改善、探索プロセスを自動化できます。これにより、実験サイクルが大幅に短縮され、再現成功率が向上し、結果の信頼性が高まるため、研究者はイノベーションに集中できます。
RigorPilot Skills
Research-first Agent Skills for Deep Learning Experiments.
RigorPilot helps AI agents reproduce, improve, and explore deep learning research projects with scientific rigor: meaningful changes, fair comparison, reproducible evidence, and auditable modifications.
Not just higher scores. Meaningful deep learning research progress.
Brand note: the project brand is RigorPilot Skills; the recommended GitHub
repository slug is rigorpilot-skills. Legacy install paths remain documented
only as compatibility fallbacks while older clients and bookmarks migrate.
Migration note:
- Project brand:
ai-research-workflow-skills->RigorPilot Skills - Existing compatible skill slugs remain available.
- Preferred install source:
lllllllama/rigorpilot-skills - Legacy fallback source:
lllllllama/ai-paper-reproduction-skills ai-paper-reproduction->ai-research-reproductionresearch-explore->ai-research-explore
What RigorPilot Is
- Research-first Agent Skills for deep learning experiments.
- It helps AI agents reproduce, improve, explore, and audit deep learning research work.
- It is designed for personal research use first.
- It values scientific meaning, fair comparison, reproducibility, explainability, and collaborator control.
- It encourages meaningful novelty during exploration, but does not overclaim novelty.
What RigorPilot Is Not
- Not a generic coding agent.
- Not a score-chasing automation framework.
- Not a guarantee of novel discoveries.
- Not a replacement for researcher judgment.
- Not a rigid workflow that should weaken strong models.
Core Principles
- Do not chase scores blindly.
- Do not claim novelty lightly.
- Do not break comparability silently.
- Do not disguise engineering fixes as research contributions.
- Do not leave collaborators out of control.
See references/research-rigor-principles.md.
Rigor and Novelty
Rigor is the baseline. Novel is the aspiration.
Novelty and significance remain hypotheses until supported by literature contrast, ablation evidence, and fair comparison.
RigorPilot should add research judgment and audit awareness without making strong models slower, more mechanical, or less capable.
Deep Learning Focus
RigorPilot is built for deep learning research repositories where README commands, environment setup, data, weights, checkpoints, training, evaluation, metrics, logs, baselines, SOTA tables, and ablations all matter.
This repository is still built around one compatibility rule: trusted by default.
- Ambiguous requests route to the trusted lane.
- Exploration requires explicit authorization.
- Trusted outputs are auditable and durable.
- Explore outputs are candidate-only and disposable.
Shared operating principles live in references/agent-operating-principles.md. They keep the skills focused on high-level guidance: think before acting, keep the solution small, change only what is necessary, and work toward verifiable goals. They are guardrails, not a detailed script for every implementation choice.
🧭 Current Repo Snapshot
This repository currently ships:
11skills total:9public skills and2helper skills.6trusted-lane public skills and3explore-lane public skills.4project-scoped Claude Code command wrappers under.claude/commands/.45Python scripts, including43test scripts with focusedresearch-exploreregressions and document-structure checks.- A RigorPilot Explore chain that now includes bounded idea-seed generation, explicit idea score breakdowns, atomic idea decomposition, and implementation-fidelity evidence split into planned, heuristic, and observed layers.
- A documented and tested workflow intended to be usable from both Windows PowerShell and Linux shells.
The skills use the open SKILL.md layout, so the same repository can be installed into neutral Agent Skills directories as well as Codex and Claude Code. For shared local installs, prefer ~/.agents/skills/ or ./.agents/skills/. Client-specific installs under ~/.codex/skills/ and ~/.claude/skills/ remain supported.
💻 Windows and Linux Notes
This repository is intended to be usable on both Windows and Linux.
- The command examples below are written in a shell-neutral style around
python ...,npx ..., and relative paths. - For user-scoped install targets, prefer
$HOME/.agents/skills,$HOME/.codex/skills, and$HOME/.claude/skills. These work well in Linux shells and in PowerShell, and Python accepts forward slashes on Windows paths. - Project-scoped paths such as
./.agents/skillsand./tmp/codex-skillsare also valid on both platforms. - The repository validation and routing checks are already exercised on Windows and Linux-oriented environments through local tests and CI.
🛠️ Install
For most users, start with npx. It is the shortest path and should be enough for normal use.
Recommended: npx
Install the full repository skill set:
npx skills add lllllllama/rigorpilot-skills --all
Install only the trusted main entrypoint:
npx skills add lllllllama/rigorpilot-skills --skill ai-research-reproduction
Install only the exploratory main entrypoint:
npx skills add lllllllama/rigorpilot-skills --skill ai-research-explore
If you only want to get started quickly, stop here.
Claude Code can auto-invoke these skills when the descriptions match, or you can call them directly with commands such as /ai-research-reproduction, /ai-research-explore, and /safe-debug.
Project-scoped Claude Code slash commands currently ship for:
/ai-research-reproduction/ai-research-explore/analyze-project/safe-debug
Advanced: local clone installs
Use the Python installer only if you are developing locally, need a project-scoped install, or want to target neutral Agent Skills, Codex, or Claude Code directories manually.
Install from a local clone into a neutral Agent Skills directory:
python scripts/install_skills.py --client agents --target "$HOME/.agents/skills" --force
Install into a project-scoped neutral Agent Skills directory:
python scripts/install_skills.py --client agents --target ./.agents/skills --force
Install with the default neutral target:
python scripts/install_skills.py --force
Install the full repository skill set in Codex:
npx skills add lllllllama/rigorpilot-skills --all
Install only the trusted reproduction orchestrator in Codex:
npx skills add lllllllama/rigorpilot-skills --skill ai-research-reproduction
Legacy GitHub source fallback, if the new slug is not yet available in your environment:
npx skills add lllllllama/ai-paper-reproduction-skills --all
Install from a local clone into Codex:
python scripts/install_skills.py --client codex --target "$HOME/.codex/skills" --force
Install from a local clone into Claude Code:
python scripts/install_skills.py --client claude --target "$HOME/.claude/skills" --force
Install into a project-scoped Claude Code skills directory:
python scripts/install_skills.py --client claude --target ./.claude/skills --force
PowerShell note:
- In Windows PowerShell, the same commands work as written above.
- If you prefer explicit Windows-style paths, replace
$HOME/.codex/skillswith something like$env:USERPROFILE\\.codex\\skills.
🎯 Choose an Entry Point
RigorPilot modes map to the current compatible skill slugs:
| If you want to... | RigorPilot mode | Current skill slug |
|---|---|---|
| Reproduce a deep learning repository from README commands | Reproduce | ai-research-reproduction |
| Explore meaningful and potentially novel ideas on top of current research | Explore | ai-research-explore |
| Improve a baseline while preserving comparability | Improve | ai-research-explore, explore-code, explore-run |
| Audit changes, scientific meaning, and comparability | Audit | analyze-project, safe-debug, generated reports |
| Analyze repository structure without editing | Analyze | analyze-project |
| Prepare environment, datasets, weights, and cache assumptions | Setup | env-and-assets-bootstrap |
| Run documented evaluation or inference conservatively | Run | minimal-run-and-audit |
| Start or verify training conservatively | Train | run-train |
| Debug a failure safely | Debug | safe-debug |
Bundled helper skills:
repo-intake-and-planpaper-context-resolver
🛣️ Lane Model
🔒 Trusted Lane
Use the trusted lane for reproduction, setup, analysis, bounded execution, training verification, and debugging.
- Primary end-to-end orchestrator:
ai-research-reproduction - Output directories:
repro_outputs/,train_outputs/,analysis_outputs/,debug_outputs/ - Default stance: preserve scientific meaning, minimize semantic changes, surface assumptions and blockers
🧪 Explore Lane
Use the explore lane only when the researcher explicitly authorizes candidate-only exploratory work.
- Primary end-to-end orchestrator:
ai-research-explore - Narrow leaf skills:
explore-code,explore-run - Output directory:
explore_outputs/ - Key anchor:
current_research
current_research should be a durable reference such as a branch, commit, checkpoint, run record, or already-trained local model state. It does not imply a trusted baseline; it is the context the exploration branches from.
🧰 Helper Lane
Helpers are intentionally narrow and should usually be orchestrator-invoked rather than used as the first entry point.
🔗 Client Compatibility
SKILL.md is the canonical cross-client contract in this repository.
- Required for portability:
SKILL.md, repository-localscripts/, andreferences/ - Optional Codex UI metadata:
agents/openai.yaml - Optional Claude Code project entrypoints:
.claude/commands/*.md - Not allowed: making skill behavior depend on a client-specific metadata file
See references/client-compatibility-policy.md.
🔁 Lifecycle View
The repository follows a lifecycle-oriented routing model:
flowchart LR
A[Understand] --> B[Reproduce]
B --> C[Set up]
C --> D[Run or train]
D --> E[Debug]
E --> F[Report]
B -. explicit only .-> G[Explore]
G --> H[Rank candidates]
H --> F
This lifecycle is intentionally shallow. It helps the agent choose the right lane and evidence target without forcing a fixed implementation sequence inside each repository.
🗺️ Routing Overview
flowchart TD
A[User request] --> B{Explicit candidate-only exploration?}
B -- No --> C[Trusted lane]
B -- Yes --> D[Explore lane]
C --> C1[ai-research-reproduction]
C --> C2[analyze-project]
C --> C3[env-and-assets-bootstrap]
C --> C4[minimal-run-and-audit]
C --> C5[run-train]
C --> C6[safe-debug]
D --> D1[ai-research-explore]
D --> D2[explore-code]
D --> D3[explore-run]
C1 -. helper .-> H1[repo-intake-and-plan]
C1 -. helper .-> H2[paper-context-resolver]
🧠 RigorPilot Explore Flow
ai-research-explore is the RigorPilot Explore entrypoint when the researcher has
already frozen the task family, dataset, evaluation method, and provided SOTA
references, then explicitly authorizes candidate-only exploration on top of
current_research. In RigorPilot terms, this is meaningful and potentially
novel candidate work, not verified novelty.
flowchart LR
A[current_research + frozen campaign] --> B[Outer loop:<br/>understand, source, gate]
B --> C{candidate worth trying?}
C -- No --> D[Stop with blocker or checkpoint]
C -- Yes --> E[Inner loop:<br/>bounded change or run]
E --> F[Smoke and evidence]
F --> G[Rank candidate]
G --> B
G --> H[explore_outputs<br/>candidate-only summary]
Current RigorPilot implementation highlights:
- Researcher ideas are preserved, then optionally expanded with bounded synthesized or hybrid seed ideas in
analysis_outputs/IDEA_SEEDS.json. - Idea ranking uses hard gates plus explicit weighted breakdowns in
analysis_outputs/IDEA_SCORES.json. - Selected ideas are decomposed into atomic academic concepts in
analysis_outputs/ATOMIC_IDEA_MAP.mdandanalysis_outputs/ATOMIC_IDEA_MAP.json. - Implementation fidelity distinguishes planned, heuristic, and observed implementation evidence in
analysis_outputs/IMPLEMENTATION_FIDELITY.mdandanalysis_outputs/IMPLEMENTATION_FIDELITY.json. - Executor-observed evidence now comes from emitted
changed_files,new_files,deleted_files, andtouched_pathsrather than planned target placeholders.
The two-loop rhythm is a guide, not a never-stop autonomous agent. RigorPilot Explore stops at explicit blockers, unclear scientific meaning, exhausted budget, missing anchors, or human checkpoints. The explore lane must not claim trusted reproduction success, global benchmark completeness, or verified novelty.
📦 Public Skill Matrix
| Lane | Skill | Purpose |
|---|---|---|
| Trusted | ai-research-reproduction | End-to-end README-first reproduction orchestrator |
| Trusted | env-and-assets-bootstrap | Conservative environment, dataset, checkpoint, and cache planning |
| Trusted | minimal-run-and-audit | Trusted inference, evaluation, smoke, and sanity execution |
| Trusted | analyze-project | Read-only project analysis, model mapping, and risk surfacing |
| Trusted | run-train | Training startup verification, resume handling, bounded monitoring, and training records |
| Trusted | safe-debug | Research-safe debugging: analyze first, patch only after approval |
| Explore | `ai-research |
...
ユーザーレビュー (0)
レビューを書く
レビューなし
統計データ
ユーザー評価
この Skill を評価