首页/DevOps/incident-response
I

incident-response

by @anthropicsv1.0.0
3.9(40)

管理事故全生命周期,从检测到响应再到事后复盘,建立标准化的应急处理流程

incident-responsedevopssrepostmortemincident-managementGitHub
安装方式
npx skills add anthropics/knowledge-work-plugins --skill incident-response
compare_arrows

Before / After 效果对比

1
使用前

事故响应混乱,责任不清,缺乏复盘同样问题重复发生

使用后

标准化响应流程,快速定位根因,系统复盘防止复发

description SKILL.md

incident-response

/incident-response

If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md.

Manage an incident from detection through postmortem.

Usage

/incident-response $ARGUMENTS

Modes

/incident-response new [description]     # Start a new incident
/incident-response update [status]       # Post a status update
/incident-response postmortem            # Generate postmortem from incident data

If no mode is specified, ask what phase the incident is in.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                    INCIDENT RESPONSE                               │
├─────────────────────────────────────────────────────────────────┤
│  Phase 1: TRIAGE                                                  │
│  ✓ Assess severity (SEV1-4)                                     │
│  ✓ Identify affected systems and users                          │
│  ✓ Assign roles (IC, comms, responders)                         │
│                                                                    │
│  Phase 2: COMMUNICATE                                              │
│  ✓ Draft internal status update                                  │
│  ✓ Draft customer communication (if needed)                     │
│  ✓ Set up war room and cadence                                   │
│                                                                    │
│  Phase 3: MITIGATE                                                 │
│  ✓ Document mitigation steps taken                               │
│  ✓ Track timeline of events                                      │
│  ✓ Confirm resolution                                            │
│                                                                    │
│  Phase 4: POSTMORTEM                                               │
│  ✓ Blameless postmortem document                                 │
│  ✓ Timeline reconstruction                                       │
│  ✓ Root cause analysis (5 whys)                                  │
│  ✓ Action items with owners                                      │
└─────────────────────────────────────────────────────────────────┘

Severity Classification

Level Criteria Response Time

SEV1 Service down, all users affected Immediate, all-hands

SEV2 Major feature degraded, many users affected Within 15 min

SEV3 Minor feature issue, some users affected Within 1 hour

SEV4 Cosmetic or low-impact issue Next business day

Communication Guidance

Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.

Output — Status Update

## Incident Update: [Title]
**Severity:** SEV[1-4] | **Status:** Investigating | Identified | Monitoring | Resolved
**Impact:** [Who/what is affected]
**Last Updated:** [Timestamp]

### Current Status
[What we know now]

### Actions Taken
- [Action 1]
- [Action 2]

### Next Steps
- [What's happening next and ETA]

### Timeline
| Time | Event |
|------|-------|
| [HH:MM] | [Event] |

Output — Postmortem

## Postmortem: [Incident Title]
**Date:** [Date] | **Duration:** [X hours] | **Severity:** SEV[X]
**Authors:** [Names] | **Status:** Draft

### Summary
[2-3 sentence plain-language summary]

### Impact
- [Users affected]
- [Duration of impact]
- [Business impact if quantifiable]

### Timeline
| Time (UTC) | Event |
|------------|-------|
| [HH:MM] | [Event] |

### Root Cause
[Detailed explanation of what caused the incident]

### 5 Whys
1. Why did [symptom]? → [Because...]
2. Why did [cause 1]? → [Because...]
3. Why did [cause 2]? → [Because...]
4. Why did [cause 3]? → [Because...]
5. Why did [cause 4]? → [Root cause]

### What Went Well
- [Things that worked]

### What Went Poorly
- [Things that didn't work]

### Action Items
| Action | Owner | Priority | Due Date |
|--------|-------|----------|----------|
| [Action] | [Person] | P0/P1/P2 | [Date] |

### Lessons Learned
[Key takeaways for the team]

If Connectors Available

If ~~monitoring is connected:

  • Pull alert details and metrics

  • Show graphs of affected metrics

If ~~incident management is connected:

  • Create or update incident in PagerDuty/Opsgenie

  • Page on-call responders

If ~~chat is connected:

  • Post status updates to incident channel

  • Create war room channel

Tips

  • Start writing immediately — Don't wait for complete information. Update as you learn more.

  • Keep updates factual — What we know, what we've done, what's next. No speculation.

  • Postmortems are blameless — Focus on systems and processes, not individuals.

Weekly Installs220Repositoryanthropics/know…-pluginsGitHub Stars9.9KFirst SeenFeb 24, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled ongemini-cli209opencode209cursor208github-copilot208codex208amp207

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量1.2K
评分3.9 / 5.0
版本1.0.0
更新日期2026年3月20日
对比案例1 组

用户评分

3.9(40)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年3月20日
最后更新2026年3月20日