skill-comply
自动生成AI代理行为规范测试场景,验证代码代理是否严格遵循技能定义和规则要求
npx skills add affaan-m/everything-claude-code --skill skill-complyBefore / After 效果对比
1 组人工编写测试用例验证AI代理行为,手动检查每个输出是否符合规范,一个代理需要多天时间,覆盖面有限且容易遗漏边界情况
自动从规范文档生成完整测试套件,包含多种严格度级别的测试场景,自动执行并生成合规性报告,2小时完成全面验证
description SKILL.md
skill-comply
skill-comply: Automated Compliance Measurement
Measures whether coding agents actually follow skills, rules, or agent definitions by:
-
Auto-generating expected behavioral sequences (specs) from any .md file
-
Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)
-
Running
claude -pand capturing tool call traces via stream-json -
Classifying tool calls against spec steps using LLM (not regex)
-
Checking temporal ordering deterministically
-
Generating self-contained reports with spec, prompts, and timelines
Supported Targets
-
Skills (
skills/*/SKILL.md): Workflow skills like search-first, TDD guides -
Rules (
rules/common/*.md): Mandatory rules like testing.md, security.md, git-workflow.md -
Agent definitions (
agents/*.md): Whether an agent gets invoked when expected (internal workflow verification not yet supported)
When to Activate
-
User runs
/skill-comply <path> -
User asks "is this rule actually being followed?"
-
After adding new rules/skills, to verify agent compliance
-
Periodically as part of quality maintenance
Usage
# Full run
uv run python -m scripts.run ~/.claude/rules/common/testing.md
# Dry run (no cost, spec + scenarios only)
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md
# Custom models
uv run python -m scripts.run --gen-model haiku --model sonnet <path>
Key Concept: Prompt Independence
Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.
Report Contents
Reports are self-contained and include:
-
Expected behavioral sequence (auto-generated spec)
-
Scenario prompts (what was asked at each strictness level)
-
Compliance scores per scenario
-
Tool call timelines with LLM classification labels
Advanced (optional)
For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself. Weekly Installs325Repositoryaffaan-m/everyt…ude-codeGitHub Stars114.8KFirst Seen6 days agoSecurity AuditsGen Agent Trust HubFailSocketWarnSnykFailInstalled oncodex314cursor281gemini-cli280github-copilot280cline280opencode280
forum用户评价 (0)
发表评价
暂无评价,来写第一条吧
统计数据
用户评分
为此 Skill 评分