R

repo-intake-and-plan

by @lllllllamav
4.8(716)

RigorPilotスキルは、AIエージェントが科学的な厳密さをもって深層学習研究プロジェクトを再現、改善、探索できるようにします。これは、意味のある変更、公正な比較、再現可能な証拠、および監査可能な修正を重視し、単にスコアを追求するのではなく、深層学習研究の実質的な進歩を推進します。このスキルは、環境設定、データ、モデルトレーニング、評価などの主要な側面を処理する深層学習研究リポジトリに焦点を当てています。

deep-learningai-agentsresearchreproducibilityexperimentationGitHub
インストール方法
npx skills add lllllllama/ai-paper-reproduction-skill --skill repo-intake-and-plan
compare_arrows

Before / After 効果比較

1
使用前

深層学習論文の実験を手動で再現するには、環境設定、コードデバッグ、結果検証に多大な時間を要し、数日または数週間かかることがよくあります。このプロセスは人為的なエラーが発生しやすく、再現の失敗や結果の不一致につながります。

使用後

RigorPilotスキルを活用したAIエージェントは、深層学習実験の再現、改善、探索プロセスを自動化できます。これにより、実験サイクルが大幅に短縮され、再現成功率が向上し、結果の信頼性が高まるため、研究者はイノベーションに集中できます。

SKILL.md

RigorPilot Skills

Research-first Agent Skills for Deep Learning Experiments.

RigorPilot helps AI agents reproduce, improve, and explore deep learning research projects with scientific rigor: meaningful changes, fair comparison, reproducible evidence, and auditable modifications.

Not just higher scores. Meaningful deep learning research progress.

Brand note: the project brand is RigorPilot Skills; the recommended GitHub repository slug is rigorpilot-skills. Legacy install paths remain documented only as compatibility fallbacks while older clients and bookmarks migrate.

Migration note:

  • Project brand: ai-research-workflow-skills -> RigorPilot Skills
  • Existing compatible skill slugs remain available.
  • Preferred install source: lllllllama/rigorpilot-skills
  • Legacy fallback source: lllllllama/ai-paper-reproduction-skills
  • ai-paper-reproduction -> ai-research-reproduction
  • research-explore -> ai-research-explore

What RigorPilot Is

  • Research-first Agent Skills for deep learning experiments.
  • It helps AI agents reproduce, improve, explore, and audit deep learning research work.
  • It is designed for personal research use first.
  • It values scientific meaning, fair comparison, reproducibility, explainability, and collaborator control.
  • It encourages meaningful novelty during exploration, but does not overclaim novelty.

What RigorPilot Is Not

  • Not a generic coding agent.
  • Not a score-chasing automation framework.
  • Not a guarantee of novel discoveries.
  • Not a replacement for researcher judgment.
  • Not a rigid workflow that should weaken strong models.

Core Principles

  1. Do not chase scores blindly.
  2. Do not claim novelty lightly.
  3. Do not break comparability silently.
  4. Do not disguise engineering fixes as research contributions.
  5. Do not leave collaborators out of control.

See references/research-rigor-principles.md.

Rigor and Novelty

Rigor is the baseline. Novel is the aspiration.

Novelty and significance remain hypotheses until supported by literature contrast, ablation evidence, and fair comparison.

RigorPilot should add research judgment and audit awareness without making strong models slower, more mechanical, or less capable.

Deep Learning Focus

RigorPilot is built for deep learning research repositories where README commands, environment setup, data, weights, checkpoints, training, evaluation, metrics, logs, baselines, SOTA tables, and ablations all matter.

This repository is still built around one compatibility rule: trusted by default.

  • Ambiguous requests route to the trusted lane.
  • Exploration requires explicit authorization.
  • Trusted outputs are auditable and durable.
  • Explore outputs are candidate-only and disposable.

Shared operating principles live in references/agent-operating-principles.md. They keep the skills focused on high-level guidance: think before acting, keep the solution small, change only what is necessary, and work toward verifiable goals. They are guardrails, not a detailed script for every implementation choice.

🧭 Current Repo Snapshot

This repository currently ships:

  • 11 skills total: 9 public skills and 2 helper skills.
  • 6 trusted-lane public skills and 3 explore-lane public skills.
  • 4 project-scoped Claude Code command wrappers under .claude/commands/.
  • 45 Python scripts, including 43 test scripts with focused research-explore regressions and document-structure checks.
  • A RigorPilot Explore chain that now includes bounded idea-seed generation, explicit idea score breakdowns, atomic idea decomposition, and implementation-fidelity evidence split into planned, heuristic, and observed layers.
  • A documented and tested workflow intended to be usable from both Windows PowerShell and Linux shells.

The skills use the open SKILL.md layout, so the same repository can be installed into neutral Agent Skills directories as well as Codex and Claude Code. For shared local installs, prefer ~/.agents/skills/ or ./.agents/skills/. Client-specific installs under ~/.codex/skills/ and ~/.claude/skills/ remain supported.

💻 Windows and Linux Notes

This repository is intended to be usable on both Windows and Linux.

  • The command examples below are written in a shell-neutral style around python ..., npx ..., and relative paths.
  • For user-scoped install targets, prefer $HOME/.agents/skills, $HOME/.codex/skills, and $HOME/.claude/skills. These work well in Linux shells and in PowerShell, and Python accepts forward slashes on Windows paths.
  • Project-scoped paths such as ./.agents/skills and ./tmp/codex-skills are also valid on both platforms.
  • The repository validation and routing checks are already exercised on Windows and Linux-oriented environments through local tests and CI.

🛠️ Install

For most users, start with npx. It is the shortest path and should be enough for normal use.

Recommended: npx

Install the full repository skill set:

npx skills add lllllllama/rigorpilot-skills --all

Install only the trusted main entrypoint:

npx skills add lllllllama/rigorpilot-skills --skill ai-research-reproduction

Install only the exploratory main entrypoint:

npx skills add lllllllama/rigorpilot-skills --skill ai-research-explore

If you only want to get started quickly, stop here.

Claude Code can auto-invoke these skills when the descriptions match, or you can call them directly with commands such as /ai-research-reproduction, /ai-research-explore, and /safe-debug.

Project-scoped Claude Code slash commands currently ship for:

  • /ai-research-reproduction
  • /ai-research-explore
  • /analyze-project
  • /safe-debug

Advanced: local clone installs

Use the Python installer only if you are developing locally, need a project-scoped install, or want to target neutral Agent Skills, Codex, or Claude Code directories manually.

Install from a local clone into a neutral Agent Skills directory:

python scripts/install_skills.py --client agents --target "$HOME/.agents/skills" --force

Install into a project-scoped neutral Agent Skills directory:

python scripts/install_skills.py --client agents --target ./.agents/skills --force

Install with the default neutral target:

python scripts/install_skills.py --force

Install the full repository skill set in Codex:

npx skills add lllllllama/rigorpilot-skills --all

Install only the trusted reproduction orchestrator in Codex:

npx skills add lllllllama/rigorpilot-skills --skill ai-research-reproduction

Legacy GitHub source fallback, if the new slug is not yet available in your environment:

npx skills add lllllllama/ai-paper-reproduction-skills --all

Install from a local clone into Codex:

python scripts/install_skills.py --client codex --target "$HOME/.codex/skills" --force

Install from a local clone into Claude Code:

python scripts/install_skills.py --client claude --target "$HOME/.claude/skills" --force

Install into a project-scoped Claude Code skills directory:

python scripts/install_skills.py --client claude --target ./.claude/skills --force

PowerShell note:

  • In Windows PowerShell, the same commands work as written above.
  • If you prefer explicit Windows-style paths, replace $HOME/.codex/skills with something like $env:USERPROFILE\\.codex\\skills.

🎯 Choose an Entry Point

RigorPilot modes map to the current compatible skill slugs:

If you want to...RigorPilot modeCurrent skill slug
Reproduce a deep learning repository from README commandsReproduceai-research-reproduction
Explore meaningful and potentially novel ideas on top of current researchExploreai-research-explore
Improve a baseline while preserving comparabilityImproveai-research-explore, explore-code, explore-run
Audit changes, scientific meaning, and comparabilityAuditanalyze-project, safe-debug, generated reports
Analyze repository structure without editingAnalyzeanalyze-project
Prepare environment, datasets, weights, and cache assumptionsSetupenv-and-assets-bootstrap
Run documented evaluation or inference conservativelyRunminimal-run-and-audit
Start or verify training conservativelyTrainrun-train
Debug a failure safelyDebugsafe-debug

Bundled helper skills:

  • repo-intake-and-plan
  • paper-context-resolver

🛣️ Lane Model

🔒 Trusted Lane

Use the trusted lane for reproduction, setup, analysis, bounded execution, training verification, and debugging.

  • Primary end-to-end orchestrator: ai-research-reproduction
  • Output directories: repro_outputs/, train_outputs/, analysis_outputs/, debug_outputs/
  • Default stance: preserve scientific meaning, minimize semantic changes, surface assumptions and blockers

🧪 Explore Lane

Use the explore lane only when the researcher explicitly authorizes candidate-only exploratory work.

  • Primary end-to-end orchestrator: ai-research-explore
  • Narrow leaf skills: explore-code, explore-run
  • Output directory: explore_outputs/
  • Key anchor: current_research

current_research should be a durable reference such as a branch, commit, checkpoint, run record, or already-trained local model state. It does not imply a trusted baseline; it is the context the exploration branches from.

🧰 Helper Lane

Helpers are intentionally narrow and should usually be orchestrator-invoked rather than used as the first entry point.

🔗 Client Compatibility

SKILL.md is the canonical cross-client contract in this repository.

  • Required for portability: SKILL.md, repository-local scripts/, and references/
  • Optional Codex UI metadata: agents/openai.yaml
  • Optional Claude Code project entrypoints: .claude/commands/*.md
  • Not allowed: making skill behavior depend on a client-specific metadata file

See references/client-compatibility-policy.md.

🔁 Lifecycle View

The repository follows a lifecycle-oriented routing model:

flowchart LR
    A[Understand] --> B[Reproduce]
    B --> C[Set up]
    C --> D[Run or train]
    D --> E[Debug]
    E --> F[Report]
    B -. explicit only .-> G[Explore]
    G --> H[Rank candidates]
    H --> F

This lifecycle is intentionally shallow. It helps the agent choose the right lane and evidence target without forcing a fixed implementation sequence inside each repository.

🗺️ Routing Overview

flowchart TD
    A[User request] --> B{Explicit candidate-only exploration?}
    B -- No --> C[Trusted lane]
    B -- Yes --> D[Explore lane]

    C --> C1[ai-research-reproduction]
    C --> C2[analyze-project]
    C --> C3[env-and-assets-bootstrap]
    C --> C4[minimal-run-and-audit]
    C --> C5[run-train]
    C --> C6[safe-debug]

    D --> D1[ai-research-explore]
    D --> D2[explore-code]
    D --> D3[explore-run]

    C1 -. helper .-> H1[repo-intake-and-plan]
    C1 -. helper .-> H2[paper-context-resolver]

🧠 RigorPilot Explore Flow

ai-research-explore is the RigorPilot Explore entrypoint when the researcher has already frozen the task family, dataset, evaluation method, and provided SOTA references, then explicitly authorizes candidate-only exploration on top of current_research. In RigorPilot terms, this is meaningful and potentially novel candidate work, not verified novelty.

flowchart LR
    A[current_research + frozen campaign] --> B[Outer loop:<br/>understand, source, gate]
    B --> C{candidate worth trying?}
    C -- No --> D[Stop with blocker or checkpoint]
    C -- Yes --> E[Inner loop:<br/>bounded change or run]
    E --> F[Smoke and evidence]
    F --> G[Rank candidate]
    G --> B
    G --> H[explore_outputs<br/>candidate-only summary]

Current RigorPilot implementation highlights:

  • Researcher ideas are preserved, then optionally expanded with bounded synthesized or hybrid seed ideas in analysis_outputs/IDEA_SEEDS.json.
  • Idea ranking uses hard gates plus explicit weighted breakdowns in analysis_outputs/IDEA_SCORES.json.
  • Selected ideas are decomposed into atomic academic concepts in analysis_outputs/ATOMIC_IDEA_MAP.md and analysis_outputs/ATOMIC_IDEA_MAP.json.
  • Implementation fidelity distinguishes planned, heuristic, and observed implementation evidence in analysis_outputs/IMPLEMENTATION_FIDELITY.md and analysis_outputs/IMPLEMENTATION_FIDELITY.json.
  • Executor-observed evidence now comes from emitted changed_files, new_files, deleted_files, and touched_paths rather than planned target placeholders.

The two-loop rhythm is a guide, not a never-stop autonomous agent. RigorPilot Explore stops at explicit blockers, unclear scientific meaning, exhausted budget, missing anchors, or human checkpoints. The explore lane must not claim trusted reproduction success, global benchmark completeness, or verified novelty.

📦 Public Skill Matrix

LaneSkillPurpose
Trustedai-research-reproductionEnd-to-end README-first reproduction orchestrator
Trustedenv-and-assets-bootstrapConservative environment, dataset, checkpoint, and cache planning
Trustedminimal-run-and-auditTrusted inference, evaluation, smoke, and sanity execution
Trustedanalyze-projectRead-only project analysis, model mapping, and risk surfacing
Trustedrun-trainTraining startup verification, resume handling, bounded monitoring, and training records
Trustedsafe-debugResearch-safe debugging: analyze first, patch only after approval
Explore`ai-research

...

ユーザーレビュー (0)

レビューを書く

効果
使いやすさ
ドキュメント
互換性

レビューなし

統計データ

インストール数127.5K
評価4.8 / 5.0
バージョン
更新日2026年5月23日
比較事例1 件

ユーザー評価

4.8(716)
5
20%
4
50%
3
27%
2
3%
1
0%

この Skill を評価

0.0

対応プラットフォーム

🔧Claude Code

タイムライン

作成2026年3月31日
最終更新2026年5月23日