R

run

by @alirezarezvaniv
4.3(20)

执行单次实验迭代:回顾历史、决定变更、编辑代码、提交并评估结果,支持科学方法的系统化实验流程

experiment-trackingscientific-methodresearch-methodologyautomationiterative-testingGitHub
安装方式
npx skills add alirezarezvani/claude-skills --skill run
compare_arrows

Before / After 效果对比

1
使用前

手动记录实验历史、分析结果、设计下次实验、修改代码并运行验证,实验流程混乱,难以追踪变量影响,结果不可复现

使用后

自动管理实验历史和上下文,系统化执行实验迭代全流程,每次变更都被记录和评估,确保实验结果可追溯和可复现

SKILL.md

run

/ar:run — Single Experiment Iteration

Run exactly ONE experiment iteration: review history, decide a change, edit, commit, evaluate.

Usage

/ar:run engineering/api-speed              # Run one iteration
/ar:run                                     # List experiments, let user pick

What It Does

Step 1: Resolve experiment

If no experiment specified, run python {skill_path}/scripts/setup_experiment.py --list and ask the user to pick.

Step 2: Load context

# Read experiment config
cat .autoresearch/{domain}/{name}/config.cfg

# Read strategy and constraints
cat .autoresearch/{domain}/{name}/program.md

# Read experiment history
cat .autoresearch/{domain}/{name}/results.tsv

# Checkout the experiment branch
git checkout autoresearch/{domain}/{name}

Step 3: Decide what to try

Review results.tsv:

  • What changes were kept? What pattern do they share?

  • What was discarded? Avoid repeating those approaches.

  • What crashed? Understand why.

  • How many runs so far? (Escalate strategy accordingly)

Strategy escalation:

  • Runs 1-5: Low-hanging fruit (obvious improvements)

  • Runs 6-15: Systematic exploration (vary one parameter)

  • Runs 16-30: Structural changes (algorithm swaps)

  • Runs 30+: Radical experiments (completely different approaches)

Step 4: Make ONE change

Edit only the target file specified in config.cfg. Change one thing. Keep it simple.

Step 5: Commit and evaluate

git add {target}
git commit -m "experiment: {short description of what changed}"

python {skill_path}/scripts/run_experiment.py \
  --experiment {domain}/{name} --single

Step 6: Report result

Read the script output. Tell the user:

  • KEEP: "Improvement! {metric}: {value} ({delta} from previous best)"

  • DISCARD: "No improvement. {metric}: {value} vs best {best}. Reverted."

  • CRASH: "Evaluation failed: {reason}. Reverted."

Step 7: Self-improvement check

After every 10th experiment (check results.tsv line count), update the Strategy section of program.md with patterns learned.

Rules

  • ONE change per iteration. Don't change 5 things at once.

  • NEVER modify the evaluator (evaluate.py). It's ground truth.

  • Simplicity wins. Equal performance with simpler code is an improvement.

  • No new dependencies.

Weekly Installs448Repositoryalirezarezvani/…e-skillsGitHub Stars10.1KFirst SeenMar 13, 2026Security AuditsGen Agent Trust HubWarnSocketPassSnykPassInstalled onopencode423codex422cursor422gemini-cli421github-copilot421amp420

用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价

统计数据

安装量1.2K
评分4.3 / 5.0
版本
更新日期2026年5月19日
对比案例1 组

用户评分

4.3(20)
5
85%
4
15%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年4月9日
最后更新2026年5月19日