---
id: ssh2-enterprise-agent-ops
name: "enterprise-agent-ops"
url: https://skills.yangsir.net/skill/ssh2-enterprise-agent-ops
author: affaan-m
domain: ai-system-observability-sre
tags: ["ai-agent-operations", "llm-orchestration", "enterprise-ai", "monitoring", "deployment"]
install_count: 3800
rating: 4.40 (20 reviews)
github: https://github.com/affaan-m/everything-claude-code
---

# enterprise-agent-ops

> 管理和操作长期运行的AI智能体工作负载，确保其可观测性、安全性和可靠性，适用于企业级AI应用。

**Stats**: 3,800 installs · 4.4/5 (20 reviews)

## Before / After 对比

### 提升企业级代理工作负载的运维能力

| Metric | Before | After | Change |
|---|---|---|---|
| - | - | - | - |
| - | - | - | - |
| - | - | - | - |

## Readme

# Enterprise Agent Ops

Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.

## Operational Domains

1. runtime lifecycle (start, pause, stop, restart)
2. observability (logs, metrics, traces)
3. safety controls (scopes, permissions, kill switches)
4. change management (rollout, rollback, audit)

## Baseline Controls

- immutable deployment artifacts
- least-privilege credentials
- environment-level secret injection
- hard timeout and retry budgets
- audit log for high-risk actions

## Metrics to Track

- success rate
- mean retries per task
- time to recovery
- cost per successful task
- failure class distribution

## Incident Pattern

When failure spikes:
1. freeze new rollout
2. capture representative traces
3. isolate failing route
4. patch with smallest safe change
5. run regression + security checks
6. resume gradually

## Deployment Integrations

This skill pairs with:
- PM2 workflows
- systemd services
- container orchestrators
- CI/CD gates


---
*Source: https://skills.yangsir.net/skill/ssh2-enterprise-agent-ops*
*Markdown mirror: https://skills.yangsir.net/api/skill/ssh2-enterprise-agent-ops/markdown*