Home/AI Agent Core Development/Auto-claude-code-research-in-sleep
A

Auto-claude-code-research-in-sleep

by @wanshuiyinv
4.5(269)

ARIS (Auto-Research-In-Sleep) is a lightweight AI skill repository designed to enable autonomous machine learning research through automation. It supports cross-model review, new idea discovery, and experiment automation, and features a persistent knowledge base and self-evolving capabilities. Users can utilize it as a skill on platforms like Claude Code, or run it as a standalone CLI, greatly enhancing AI research efficiency.

ai-agentml-researchautomationllmknowledge-managementGitHub
Installation
npx skills add wanshuiyin/Auto-claude-code-research-in-sleep --skill Auto-claude-code-research-in-sleep
compare_arrows

Before / After Comparison

1
Before

Traditional ML research relies on manual operations, is time-consuming and labor-intensive, makes information integration difficult, has long iteration cycles, and easily misses innovative points.

After

ARIS automates ML research, including cross-model review, idea discovery, and experiment automation, significantly improving efficiency and accelerating innovation.

SKILL.md

Auto-claude-code-research-in-sleep (ARIS ⚔️🌙)

💡 Use ARIS in Claude Code / Cursor / Trae as a skill-based workflow, or get the full experience with the standalone CLI — enjoy any way you like!

🤖 AI agents: Read AGENT_GUIDE.md instead — structured for LLM consumption, not human browsing.

🔥 ARIS-Code CLI — 独立安装版 · English | ⬇️ Download

📰 ARIS-Code v0.4.1 (2026-04-15) — Plan mode (/plan) | Cooperative Ctrl+C interrupt | Auto-retry (429/5xx/network) | Research Wiki 📚 (persistent knowledge base) | Self-Evolution 🧬 (/meta-optimize) | Local models (LM Studio/Ollama) | 62 skills synced

v0.3.11 (2026-04-13) — Reviewer Anthropic-compatible mode (Claude via proxy)

v0.3.9 (2026-04-11) — Proxy/custom base URL (CCSwitch) | Local models (LM Studio/Ollama) | Windows (experimental)

v0.3.5 (2026-04-08) — Research Wiki (persistent papers/ideas/experiments/claims + relationship graph) | Meta-Optimize self-evolution (analyze logs → propose SKILL.md patches)

v0.3.0 (2026-04-03) — Multi-file memory index | Rich task system (TodoWrite) | /plan | Security hardening

v0.2.2 (2026-04-03) — /plan step-by-step planning | /tasks persistent tracking

v0.2.1 (2026-04-03) — Persistent Memory | Kimi K2.5 multi-turn fix | CJK cursor fix

v0.2.0 (2026-04-02) — Open source | Kimi + MiniMax + GLM support | Smart LlmReview routing | CI/CD

v0.1.0 (2026-04-02) — Initial release | Multi-executor & reviewer | 42 bundled skills

ARIS Logo

Hero

中文版 README | English

Score Progression

🌙 Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously.

🪶 Radically lightweight — zero dependencies, zero lock-in. The entire system is plain Markdown files. No framework to learn, no database to maintain, no Docker to configure, no daemon to babysit. Every skill is a single SKILL.md readable by any LLM — swap Claude Code for Codex CLI, OpenClaw, Cursor, Trae, Antigravity, Windsurf, or your own agent and the workflows still work. Fork it, rewrite it, adapt it to your stack.

💡 ARIS is a methodology, not a platform. What matters is the research workflow — take it wherever you go. 🌱

Featured on PaperWeekly · PaperWeekly — MiniMax-M2.7 · Featured in awesome-agent-skills · AI Digital Crew - Project of the Day · 💬 Join Community · Cite

Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 Also supports alternative model combinations (Kimi, LongCat, DeepSeek, etc.) — no Claude or OpenAI API required. For example, MiniMax-M2.7 + GLM-5 or GLM-5 + MiniMax-M2.7. 🤖 Codex CLI native — full skill set also available for OpenAI Codex. 🖱️ Cursor — works in Cursor too. 🖥️ Trae — ByteDance AI IDE. 🚀 Antigravity — Google's agent-first IDE. 🆓 Free tier via ModelScope — zero cost, zero lock-in.

💭 Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima — the same model reviewing its own patterns creates blind spots.

Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) — and adversarial bandits are fundamentally harder to game.

💭 Why two models, not more? Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns — the biggest gain is going from 1→2, not 2→4.

Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — speed × rigor — produce better outcomes than either model talking to itself.

🧿 Want the strongest possible reviewer? Add — reviewer: oracle-pro to any skill to route reviews through GPT-5.4 Pro via Oracle MCP. Pro-level reasoning for proof verification, experiment auditing, and final stress tests. Works with API key or free browser mode. Setup →

🎯 More Than Just a Prompt

These are full pipelines — you can also use each workflow independently. Already have an idea? Skip to Workflow 1.5. Have results? Jump to Workflow 3. Got reviews? Jump to Workflow 4. Want persistent memory? Enable Research Wiki. See Quick Start for all commands and Workflows for the full breakdown.

Basic mode — give ARIS a research direction, it handles everything:

/research-pipeline "factorized gap in discrete diffusion LMs"

🔥 Targeted mode — got a paper you want to improve? Give ARIS the paper + the code:

/research-pipeline "improve method X" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project

ARIS reads the paper → finds its weaknesses → clones the codebase → generates ideas that specifically fix those weaknesses with that code → runs experiments → writes your paper. Like telling a research assistant: "read this paper, use this repo, find what's missing, and fix it."

Mix and match: ref paper only = "what can be improved?", base repo only = "what can I build with this code?", both = "improve this paper using this code."

🔥 Rebuttal mode — reviews just dropped? Don't panic. ARIS reads every concern, builds a strategy, and drafts a rebuttal that's grounded, structured, and under the character limit:

/rebuttal "paper/ + reviews" — venue: ICML, character limit: 5000
ParameterDefaultWhat it does
venueICMLTarget venue (ICML/NeurIPS/ICLR/CVPR/ACL/AAAI/ACM)
character limitRequired. Hard character limit for rebuttal text
quick modefalseStop after parsing + strategy (Phase 0-3). See what reviewers want before drafting
auto experimentfalseAuto-run supplementary experiments via /experiment-bridge when reviewers ask for new evidence
max stress test rounds1How many times GPT-5.4 xhigh stress-tests the draft
max followup rounds3Per-reviewer follow-up round limit

Three safety gates — rebuttal will NOT finalize if any fails:

  • 🔒 No fabrication — every claim maps to paper/review/user-confirmed result
  • 🔒 No overpromise — every promise is user-approved
  • 🔒 Full coverage — every reviewer concern is tracked

Two outputs: PASTE_READY.txt (exact char count, paste to venue) + REBUTTAL_DRAFT_rich.md (extended version for manual editing).

After acceptance — your paper is in, now prepare the presentation:

/paper-slides "paper/"     # → Beamer PDF + PPTX + speaker notes + Q&A prep
/paper-poster "paper/"     # → A0/A1 poster PDF + editable PPTX + SVG

💡 From idea to paper to podium — one toolchain. 🌱

🏆 Papers Built with ARIS

PaperScoreVenueAuthorStack
CS Paper8/10 "clear accept"CS Conference@DefanXue & @MonglitayClaude Code + GPT-5.4
AAAI Paper7/10 "good paper, accept"AAAI 2026 Main Technical@xinbo820-webPure Codex CLI
UAV-CCUnder reviewIEEE TGRS@wxx827Claude Opus 4.6 + Codex 5.4 xhigh + Cursor

🎉 Built with ARIS — from idea to submission. Full details + PDFs →

📢 What's New

  • 2026-04-15NEW 🛡️ Paper Writing Pipeline Hardening — 10 empirically-motivated patches from a real NeurIPS run. REVIEWER_BIAS_GUARD=true: every review round uses a fresh thread (codex-reply inflated 3→8/10). Reviewer Independence Protocol: no fix summaries to reviewer. Step 4.5 Restatement Regression Test: catches theorem drift across fix rounds. Step 5.5 Kill Argument Exercise: final-round adversarial attack/defense for theory papers. Location-aware overfull blocking. Theory Paper Consistency Pass in /paper-write. Enforced Bib Hygiene with DBLP/CrossRef validation. Phase 5.5 Mandatory Final Claim Audit as submission gate. Review Tracing Protocol: full prompt/response pairs saved to .aris/traces/ for reviewer-independence audit (review-tracing.md, save_trace.sh). Inspired by community contribution from @李傲龍
  • 2026-04-15NEW 🎨 FigureSpec Renderer v2 — deterministic JSON→SVG figure generation for academic papers. Shape-aware edge clipping (rect/circle/ellipse/diamond), self-loops, curved edges, multi-line labels with CJK width estimation, comprehensive validation (type checks, structure, palette). Went through 5 rounds of Codex review (3/10→7/10). All architecture and workflow diagrams in the ARIS tech report were generated with this pipeline. New --- mode: vector for /paper-illustration skill
  • 2026-04-14NEW 📋 /paper-claim-audit — zero-context paper-to-evidence verification. Fresh reviewer with NO prior context compares every number in the paper against raw result files. Catches rounding inflation, best-seed cherry-pick, config mismatch, delta errors, scope overclaim. Auto-integrated into Workflow 3 (Phase 4.7). Completes the 3-layer audit chain: /experiment-audit (code) → /result-to-claim (science) → /paper-claim-audit (reporting). 👁️ Visual PDF review also added to improvement loop — reviewer now sees compiled PDF, not just LaTeX source. Inspired by Hermes Agent
  • 2026-04-13NEW 🧿 GPT-5.4 Pro via Oracle— reviewer: oracle-pro on any skill for the strongest available reviewer. API mode (fast) or browser mode (free). Supported on: /research-review, /auto-review-loop, /experiment-audit, /proof-checker, /rebuttal, /idea-creator, /research-lit. Default stays Codex xhigh. Not installed = zero impact. Setup →
  • 2026-04-13NEW 🔬 /proof-checker — rigorous mathematical proof verification via cross-model review. 20-category issue taxonomy, two-axis severity, side-condition checklists (DCT/MCT/Fubini/IFT/...), counterexample red team, proof-obligation ledger. Auto-integrated into Workflow 3: detects \begin{theorem} and runs before improvement loop. Complements /proof-writer
  • 2026-04-10NEWEffort Levels— effort: lite | balanced | max | beast. Controls work intensity across all skills: papers found, ideas generated, review rounds, writing depth. Codex reasoning stays xhigh always. beast = every knob to maximum for top-venue sprints. Default balanced = zero change for existing users. Details →
  • 2026-04-10NEW 🔎 DeepXiv integration — progressive paper retrieval via DeepXiv CLI. Opt-in: — sources: deepxiv or — sources: all, deepxiv. Staged reading: search → brief → head → section. pip install deepxiv-sdk to enable. Community contribution by @DreamEnding
  • 2026-04-10NEW 🛡️ /experiment-audit — cross-model experiment integrity verification. GPT-5.4 reads your eval scripts and results directly, checks for fake ground truth, self-normalized scores, phantom results, and scope inflation (#131, #57). Advisory — warns loudly, never blocks. /result-to-claim auto-reads audit if present. New experiment-integrity.md shared reference. The executor must never judge its own integrity.
  • 2026-04-10NEW 🧠 tools/smart_update.sh — intelligent skill updater. Compares local vs upstream, detects personal customizations (server paths, API keys), only updates safe skills. bash tools/smart_update.sh --apply
  • 2026-04-10NEW 🏆 **Community paper

...

User Reviews (0)

Write a Review

Effect
Usability
Docs
Compatibility

No reviews yet

Statistics

Installs6.7K
Rating4.5 / 5.0
Version
Updated2026年5月17日
Comparisons1

User Rating

4.5(269)
5
65%
4
25%
3
6%
2
3%
1
1%

Rate this Skill

0.0

Compatible Platforms

🔧Claude Code/Cursor/Trae
🔧Standalone CLI

Timeline

Created2026年4月16日
Last Updated2026年5月17日