council
Establishes a multi-model consensus council that comprehensively evaluates and reaches decision consensus through a parallel arbitration mechanism.
npx skills add boshu2/agentops --skill councilBefore / After Comparison
1 组A single AI model may have biases or limitations in complex decision-making scenarios. Its judgment results lack multi-dimensional verification, making them prone to errors and affecting system reliability and user trust.
This skill builds a multi-model consensus committee, utilizing parallel judgment and configurable perspectives. It integrates opinions from multiple AI models, significantly enhancing decision accuracy and robustness, and reducing risks.
/council — Multi-Model Consensus Council
Spawn parallel judges with different perspectives, consolidate into consensus. Works for any task — validation, research, brainstorming.
Quick Start
/council --quick validate recent # fast inline check
/council validate this plan # validation (2 agents)
/council brainstorm caching approaches # brainstorm
/council validate the implementation # validation (critique triggers map here)
/council research kubernetes upgrade strategies # research
/council research the CI/CD pipeline bottlenecks # research (analyze triggers map here)
/council --preset=security-audit validate the auth system # preset personas
/council --deep --explorers=3 research upgrade automation # deep + explorers
/council --debate validate the auth system # adversarial 2-round review
/council --deep --debate validate the migration plan # thorough + debate
/council # infers from context
Council works independently — no RPI workflow, no ratchet chain, no ao CLI required. Zero setup beyond initial install.
Modes
| Mode | Agents | Execution Backend | Use Case |
|---|---|---|---|
--quick | 0 (inline) | Self | Fast single-agent check, no spawning |
| default | 2 | Runtime-native (Codex sub-agents preferred; Claude teams fallback) | Independent judges (no perspective labels) |
--deep | 3 | Runtime-native | Thorough review |
--mixed | 3+3 | Runtime-native + Codex CLI | Cross-vendor consensus |
--debate | 2+ | Runtime-native | Adversarial refinement (2 rounds) |
/council --quick validate recent # inline single-agent check, no spawning
/council recent # 2 runtime-native judges
/council --deep recent # 3 runtime-native judges
/council --mixed recent # runtime-native + Codex CLI
Spawn Backend (MANDATORY)
Council requires a runtime that can spawn parallel subagents and (for --debate) send messages between agents. Use whatever multi-agent primitives your runtime provides. If no multi-agent capability is detected, fall back to --quick (inline single-agent).
Required capabilities:
- Spawn subagent — create a parallel agent with a prompt (required for all modes except
--quick) - Agent messaging — send a message to a specific agent (required for
--debate)
Skills describe WHAT to do, not WHICH tool to call. See skills/shared/SKILL.md for the capability contract.
After detecting your backend, read the matching reference for concrete spawn/wait/message/cleanup examples:
- Shared Claude feature contract →
skills/shared/references/claude-code-latest-features.md - Local mirrored contract for runtime-local reads →
references/claude-code-latest-features.md - Claude Native Teams →
references/backend-claude-teams.md - Codex Sub-Agents / CLI →
references/backend-codex-subagents.md - Background Tasks →
references/backend-background-tasks.md - Inline (
--quick) →references/backend-inline.md
See also references/cli-spawning.md for council-specific spawning flow (phases, timeouts, output collection).
When to Use --debate
Use --debate for high-stakes or ambiguous reviews where judges are likely to disagree:
- Security audits, architecture decisions, migration plans
- Reviews where multiple valid perspectives exist
- Cases where a missed finding has real consequences
Skip --debate for routine validation where consensus is expected. Debate adds R2 latency (judges stay alive and process a second round via backend messaging).
Incompatibilities:
--quickand--debatecannot be combined.--quickruns inline with no spawning;--debaterequires multi-agent rounds. If both are passed, exit with error: "Error: --quick and --debate are incompatible."--debateis only supported with validate mode. Brainstorm and research do not produce PASS/WARN/FAIL verdicts. If combined, exit with error: "Error: --debate is only supported with validate mode."
Task Types
| Type | Trigger Words | Perspective Focus |
|---|---|---|
| validate | validate, check, review, assess, critique, feedback, improve | Is this correct? What's wrong? What could be better? |
| brainstorm | brainstorm, explore, options, approaches | What are the alternatives? Pros/cons? |
| research | research, investigate, deep dive, explore deeply, analyze, examine, evaluate, compare | What can we discover? What are the properties, trade-offs, and structure? |
Natural language works — the skill infers task type from your prompt.
First-pass rigor gate for plan/spec validation (MANDATORY)
When mode is validate and the target is a plan/spec/contract (or contains boundary rules, state transitions, or conformance tables), judges must apply this gate before returning PASS:
- Canonical mutation + ack sequence is explicit, single-path, and non-contradictory.
- Consume-at-most-once path is crash-safe with explicit atomic boundary and restart recovery semantics.
- Status/precedence behavior is defined with a field-level truth table and anomaly reason codes for conflicting evidence.
- Conformance includes explicit boundary failpoint tests and deterministic assertions for replay/no-duplicate-effect outcomes.
Verdict policy for this gate:
- Missing or contradictory gate item: minimum
WARN. - Missing deterministic conformance coverage for any gate item: minimum
WARN. - Critical lifecycle invariant not mechanically verifiable:
FAIL.
Architecture
Context Budget Rule (CRITICAL)
Judges write ALL analysis to output files. Messages to the lead contain ONLY a
minimal completion signal: {"type":"verdict","verdict":"...","confidence":"...","file":"..."}.
The lead reads output files during consolidation. This prevents N judges from
exploding the lead's context window with N full reports via SendMessage.
Consolidation runs inline as the lead — no separate chairman agent. The lead reads each judge's output file sequentially with the Read tool and synthesizes.
Execution Flow
┌─────────────────────────────────────────────────────────────────┐
│ Phase 1: Build Packet (JSON) │
│ - Task type (validate/brainstorm/research) │
│ - Target description │
│ - Context (files, diffs, prior decisions) │
│ - Perspectives to assign │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 1a: Select spawn backend │
│ codex_subagents | claude_teams | background_fallback │
│ Team lead = spawner (this agent) │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────┴─────────────────┐
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ RUNTIME-NATIVE JUDGES│ │ CODEX AGENTS │
│ (spawn_agent or teams)│ │ (Bash tool, parallel)│
│ │ │ Agent 1 (independent │
│ Agent 1 (independent │ │ or with preset) │
│ or with preset) │ │ Agent 2 │
│ Agent 2 │ │ Agent 3 │
│ Agent 3 (--deep only)│ │ (--mixed only) │
│ (--deep/--mixed only)│ │ │
│ │ │ Output: JSON + MD │
│ Write files, then │ │ Files: .agents/ │
│ wait()/SendMessage to │ │ council/codex-* │
│ lead │ │ │
│ Files: .agents/ │ └───────────────────────┘
│ council/claude-* │ │
└───────────────────────┘ │
│ │
└─────────────────┬─────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 2: Consolidation (Team Lead — inline, no extra agent) │
│ - Receive MINIMAL completion signals (verdict + file path) │
│ - Read each judge's output file with Read tool │
│ - If schema_version is missing from a judge's output, treat │
│ as version 0 (backward compatibility) │
│ - Compute consensus verdict │
│ - Identify shared findings │
│ - Surface disagreements with attribution │
│ - Generate Markdown report for human │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 3: Cleanup │
│ - Cleanup backend resources (close_agent / TeamDelete / none) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Output: Markdown Council Report │
│ - Consensus: PASS/WARN/FAIL │
│ - Shared findings │
│ - Disagreements (if any) │
│ - Recommendations │
└─────────────────────────────────────────────────────────────────┘
Graceful Degradation
| Failure | Behavior |
|---|---|
| 1 of N agents times out | Proceed with N-1, note in report |
| All Codex CLI agents fail | Proceed with runtime-native judges only, note degradation |
| All agents fail | Return error, suggest retry |
| Codex CLI not installed | Skip Codex CLI judges, continue with runtime judges only (warn user) |
| No multi-agent capability | Fall back to --quick (inline single-agent review) |
| No agent messaging | --debate unavailable, single-round review only |
| Output dir missing | Create .agents/council/ automatically |
Timeout: 120s per agent (configurable via --timeout=N in seconds).
Minimum quorum: At least 1 agent must respond for a valid council. If 0 agents respond, return error.
Effort Levels for Judges
Use the effort command to optimize token spend per judge role:
| Agent Role | Recommended Effort | Rationale |
|---|---|---|
| Judges (validate/research) | low | Judges review evidence, not implement — shallow reasoning suffices |
| Explorers | low | Fast breadth-first scanning |
| Chairman (consolidation) | medium | Needs balanced reasoning for consensus synthesis |
Pre-Flight Checks
- Multi-agent capability: Detect whether runtime supports spawning parallel subagents. If not, degrade to
--quick. - Agent messaging: Detect whether runtime supports agent-to-agent messaging. If not, disable
--debate. - Codex CLI judges (--mixed only): Check
which codex, test model availability, test--output-schemasupport. Downgrade mixed mode when unavailable. - Agent count: Verify
judges * (1 + explorers) <= MAX_AGENTS (12) - Output dir:
mkdir -p .agents/council
Quick Mode (--quick)
Single-agent inline validation. No subprocess spawning, no Task tool, no Codex. The current agent performs a structured self-review using the same output schema as a full council.
When to use: Routine checks, mid-implementation sanity checks, pre-commit quick scan.
Execution: Gather context (files, diffs) -> perform structured self-review inline using the council output_schema (verdict, confidence, findings, recommendation) -> write report to .agents/council/YYYY-MM-DD-quick-<target>.md labeled as Mode: quick (single-agent).
Limitations: No cross-perspective disagreement, no cross-vendor insights, lower confidence ceiling. Not suitable for security audits or architecture decisions.
Packet Format (JSON)
The packet sent to each agent. File contents are included inline — agents receive the actual code/plan text in the packet, not just paths. This ensures both Claude and Codex agents can analyze without needing file access.
If .agents/ao/environment.json exists, include it in the context packet so judges can reason about available tools and environment state.
Judge prompt boundary:
- Do NOT include
.agents/references in judge prompts. - Do NOT instruct judges to search
.agents/directories. Judges operate on the council packet only.
{
"council_packet": {
"version": "1.0",
"mode": "validate | brainstorm | research",
"target": "Implementation of user authentication system",
"context": {
"files": [
{
"path": "src/auth/jwt.py",
"content": "<file contents inlined here>"
},
{
"path": "src/auth/middleware.py",
"content": "<file contents inlined here>"
}
],
"diff": "git diff output if applicable",
"spec": {
"source": "bead na-0042 | plan doc | none",
"content": "The spec/bead description text (optional — included when wrapper provides it)"
},
"prior_decisions": [
"Using JWT, not sessions",
"Refresh tokens required"
],
"empirical_results": "(optional) test output, CLI flag verification, or Wave 0 findings — include when evaluating feasibility"
},
"perspective": "skeptic (only when --preset or --perspectives used)",
"perspective_description": "What could go wrong? (only when --preset or --perspectives used)",
"output_schema": {
"verdict": "PASS | WARN | FAIL",
"confidence": "HIGH | MEDIUM | LOW",
"key_insight": "Single sentence summary",
"findings": [
{
"severity": "critical | significant | minor",
"category": "security | architecture | performance | style",
"id": "(optional) Stable finding ID for cross-skill correlation (e.g., f-council-001)",
"description": "What was found",
"location": "file:line if applicable",
"recommendation": "How to address",
"fix": "Specific action to resolve this finding",
"why": "Root cause or rationale",
"ref": "File path, spec anchor, or doc reference"
}
],
"recommendation": "Concrete next step",
"schema_ve
...
User Reviews (0)
Write a Review
No reviews yet
Statistics
User Rating
Rate this Skill