首页/AI 工程/skill-comply
S

skill-comply

by @affaan-mv1.0.0
4.4(3)

自动生成AI代理行为规范测试场景,验证代码代理是否严格遵循技能定义和规则要求

ai-agentstest-automationprompt-engineeringquality-assurancellm-orchestrationGitHub
安装方式
npx skills add affaan-m/everything-claude-code --skill skill-comply
compare_arrows

Before / After 效果对比

1
使用前

人工编写测试用例验证AI代理行为,手动检查每个输出是否符合规范,一个代理需要多天时间,覆盖面有限且容易遗漏边界情况

使用后

自动从规范文档生成完整测试套件,包含多种严格度级别的测试场景,自动执行并生成合规性报告,2小时完成全面验证

description SKILL.md

skill-comply

skill-comply: Automated Compliance Measurement

Measures whether coding agents actually follow skills, rules, or agent definitions by:

  • Auto-generating expected behavioral sequences (specs) from any .md file

  • Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)

  • Running claude -p and capturing tool call traces via stream-json

  • Classifying tool calls against spec steps using LLM (not regex)

  • Checking temporal ordering deterministically

  • Generating self-contained reports with spec, prompts, and timelines

Supported Targets

  • Skills (skills/*/SKILL.md): Workflow skills like search-first, TDD guides

  • Rules (rules/common/*.md): Mandatory rules like testing.md, security.md, git-workflow.md

  • Agent definitions (agents/*.md): Whether an agent gets invoked when expected (internal workflow verification not yet supported)

When to Activate

  • User runs /skill-comply <path>

  • User asks "is this rule actually being followed?"

  • After adding new rules/skills, to verify agent compliance

  • Periodically as part of quality maintenance

Usage

# Full run
uv run python -m scripts.run ~/.claude/rules/common/testing.md

# Dry run (no cost, spec + scenarios only)
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md

# Custom models
uv run python -m scripts.run --gen-model haiku --model sonnet <path>

Key Concept: Prompt Independence

Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.

Report Contents

Reports are self-contained and include:

  • Expected behavioral sequence (auto-generated spec)

  • Scenario prompts (what was asked at each strictness level)

  • Compliance scores per scenario

  • Tool call timelines with LLM classification labels

Advanced (optional)

For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself. Weekly Installs325Repositoryaffaan-m/everyt…ude-codeGitHub Stars114.8KFirst Seen6 days agoSecurity AuditsGen Agent Trust HubFailSocketWarnSnykFailInstalled oncodex314cursor281gemini-cli280github-copilot280cline280opencode280

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量200
评分4.4 / 5.0
版本1.0.0
更新日期2026年3月30日
对比案例1 组

用户评分

4.4(3)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月30日
最后更新2026年3月30日