llm-monitoring-dashboard
このスキルはTokuin CLIを使用してLLM APIのコスト、トークン、レイテンシを追跡し、データ駆動型の管理ダッシュボードを自動生成して、プロジェクト管理の洞察を提供します。
npx skills add supercent-io/skills-template --skill llm-monitoring-dashboardBefore / After 効果比較
1 组LLM APIのコスト、トークン、遅延の追跡が困難で、統一されたビューが不足しており、プロジェクト管理の洞察を得ることが難しく、リソースの無駄につながっていました。
このスキルはTokuin CLIを通じてLLM API監視ダッシュボードを自動生成し、コスト、トークン、遅延の洞察を提供し、リソース管理を最適化します。
llm-monitoring-dashboard
LLM Usage Monitoring Dashboard Tracks LLM API costs, tokens, and latency using Tokuin CLI, and auto-generates a data-driven admin dashboard with PM insights. When to use this skill LLM cost visibility: When you want to monitor API usage costs per team or individual in real time PM reporting dashboard: When you need weekly reports on who uses AI, how much, and how User adoption management: When you want to track inactive users and increase AI adoption rates Model optimization evidence: When you need data-driven decisions for model switching or cost reduction Add monitoring tab to admin dashboard: When adding an LLM monitoring section to an existing Admin page Prerequisites 1. Verify Tokuin CLI installation # Check if installed which tokuin && tokuin --version || echo "Not installed — run Step 1 first" 2. Environment variables (only needed for live API calls) # Store in .env file (never hardcode directly in source) OPENAI_API_KEY=sk-... # OpenAI ANTHROPIC_API_KEY=sk-ant-... # Anthropic OPENROUTER_API_KEY=sk-or-... # OpenRouter (400+ models) # LLM monitoring settings LLM_USER_ID=dev-alice # User identifier LLM_USER_ALIAS=Alice # Display name COST_THRESHOLD_USD=10.00 # Cost threshold (alert when exceeded) DASHBOARD_PORT=3000 # Dashboard port MAX_COST_USD=5.00 # Max cost per single run SLACK_WEBHOOK_URL=https://... # For alerts (optional) 3. Project stack requirements Option A (recommended): Next.js 15+ + React 18 + TypeScript Option B (lightweight): Python 3.8+ + HTML/JavaScript (minimal dependencies) Instructions Step 0: Safety check (always run this first) ⚠️ Run this script before executing the skill. Any FAIL items will halt execution. cat > safety-guard.sh << 'SAFETY_EOF' #!/usr/bin/env bash # safety-guard.sh — Safety gate before running the LLM monitoring dashboard set -euo pipefail RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m'; NC='\033[0m' ALLOW_LIVE="${1:-}"; PASS=0; WARN=0; FAIL=0 log_pass() { echo -e "${GREEN}✅ PASS${NC} $1"; ((PASS++)); } log_warn() { echo -e "${YELLOW}⚠️ WARN${NC} $1"; ((WARN++)); } log_fail() { echo -e "${RED}❌ FAIL${NC} $1"; ((FAIL++)); } echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "🛡 LLM Monitoring Dashboard — Safety Guard v1.0" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" # ── 1. Check Tokuin CLI installation ──────────────────────────────── if command -v tokuin &>/dev/null; then log_pass "Tokuin CLI installed: $(tokuin --version 2>&1 | head -1)" else log_fail "Tokuin not installed → install with the command below and re-run:" echo " curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash" fi # ── 2. Detect hardcoded API keys ──────────────────────────────── HARDCODED=$(grep -rE "(sk-[a-zA-Z0-9]{20,}|sk-ant-[a-zA-Z0-9]{20,}|sk-or-[a-zA-Z0-9]{20,})" \ . --include=".ts" --include=".tsx" --include=".js" --include=".jsx" \ --include=".html" --include=".sh" --include=".py" --include=".json" \ --exclude-dir=node_modules --exclude-dir=.git 2>/dev/null \ | grep -v ".env" | grep -v "example" | wc -l || echo 0) if [ "$HARDCODED" -eq 0 ]; then log_pass "No hardcoded API keys found" else log_fail "⚠️ ${HARDCODED} hardcoded API key(s) detected! → Move to environment variables (.env) immediately" grep -rE "(sk-[a-zA-Z0-9]{20,})" . \ --include=".ts" --include=".js" --include=".html" \ --exclude-dir=node_modules 2>/dev/null | head -5 || true fi # ── 3. Check .env is in .gitignore ──────────────────────────── if [ -f .env ]; then if [ -f .gitignore ] && grep -q ".env" .gitignore; then log_pass ".env is listed in .gitignore" else log_fail ".env exists but is not in .gitignore! → echo '.env' >> .gitignore" fi else log_warn ".env file not found — create one before making live API calls" fi # ── 4. Check live API call mode ──────────────────────────── if [ "$ALLOW_LIVE" = "--allow-live" ]; then log_warn "Live API call mode enabled! Costs will be incurred." log_warn "Max cost threshold: $${MAX_COST_USD:-5.00} (adjust via MAX_COST_USD env var)" read -p " Allow live API calls? [y/N] " -r echo [[ $REPLY =~ ^[Yy]$ ]] || { echo "Cancelled. Re-run in dry-run mode."; exit 1; } else log_pass "dry-run mode (default) — no API costs incurred" fi # ── 5. Check port conflicts ───────────────────────────────────── PORT="${DASHBOARD_PORT:-3000}" if lsof -i ":${PORT}" &>/dev/null 2>&1; then ALT_PORT=$((PORT + 1)) log_warn "Port ${PORT} is in use → use ${ALT_PORT} instead: export DASHBOARD_PORT=${ALT_PORT}" else log_pass "Port ${PORT} is available" fi # ── 6. Initialize data/ directory ────────────────────────────── mkdir -p ./data if [ -f ./data/metrics.jsonl ]; then BYTES=$(wc -c < ./data/metrics.jsonl || echo 0) if [ "$BYTES" -gt 10485760 ]; then log_warn "metrics.jsonl exceeds 10MB (${BYTES}B) → consider applying a rolling policy" echo " cp data/metrics.jsonl data/metrics-$(date +%Y%m%d).jsonl.bak && > data/metrics.jsonl" else log_pass "data/ ready (metrics.jsonl: ${BYTES}B)" fi else log_pass "data/ ready (new)" fi # ── Summary ───────────────────────────────────────────── echo "" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo -e "Result: ${GREEN}PASS $PASS${NC} / ${YELLOW}WARN $WARN${NC} / ${RED}FAIL $FAIL${NC}" if [ "$FAIL" -gt 0 ]; then echo -e "${RED}❌ Safety check failed. Resolve the FAIL items above and re-run.${NC}" exit 1 else echo -e "${GREEN}✅ Safety check passed. Continuing skill execution.${NC}" exit 0 fi SAFETY_EOF chmod +x safety-guard.sh # Run (halts immediately if any FAIL) bash safety-guard.sh Step 1: Install Tokuin CLI and verify with dry-run # 1-1. Install (macOS / Linux) curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash # Windows PowerShell: # irm https://raw.githubusercontent.com/nooscraft/tokuin/main/install.ps1 | iex # 1-2. Verify installation tokuin --version which tokuin # expected: /usr/local/bin/tokuin or ~/.local/bin/tokuin # 1-3. Basic token count test echo "Hello, world!" | tokuin --model gpt-4 # 1-4. dry-run cost estimate (no API key needed ✅) echo "Analyze user behavior patterns from the following data" | \ tokuin load-test \ --model gpt-4 \ --runs 50 \ --concurrency 5 \ --dry-run \ --estimate-cost \ --output-format json | python3 -m json.tool # Expected output structure: # { # "total_requests": 50, # "successful": 50, # "failed": 0, # "latency_ms": { "average": ..., "p50": ..., "p95": ... }, # "cost": { "input_tokens": ..., "output_tokens": ..., "total_cost": ... } # } # 1-5. Multi-model comparison (dry-run) echo "Translate this to Korean" | tokuin --compare gpt-4 gpt-3.5-turbo claude-3-haiku --price # 1-6. Verify Prometheus format output echo "Benchmark" | tokuin load-test --model gpt-4 --runs 10 --dry-run --output-format prometheus # Expected: "# HELP", "# TYPE", metrics with "tokuin_" prefix Step 2: Data collection pipeline with user context # 2-1. Create prompt auto-categorization module cat > categorize_prompt.py << 'PYEOF' #!/usr/bin/env python3 """Auto-categorize prompts based on keywords""" import hashlib CATEGORIES = { "coding": ["code", "function", "class", "implement", "debug", "fix", "refactor"], "analysis": ["analyze", "compare", "evaluate", "assess"], "translation": ["translate", "translation"], "summary": ["summarize", "summary", "tldr", "brief"], "writing": ["write", "draft", "create", "generate"], "question": ["what is", "how to", "explain", "why"], "data": ["data", "table", "csv", "json", "sql"], } def categorize(prompt: str) -> str: p = prompt.lower() for cat, keywords in CATEGORIES.items(): if any(k in p for k in keywords): return cat return "other" def hash_prompt(prompt: str) -> str: """First 16 chars of SHA-256 (stored instead of raw text — privacy protection)""" return hashlib.sha256(prompt.encode()).hexdigest()[:16] def truncate_preview(prompt: str, limit: int = 100) -> str: return prompt[:limit] + ("…" if len(prompt) > limit else "") if name == "main": import sys prompt = sys.argv[1] if len(sys.argv) > 1 else "" print(categorize(prompt)) PYEOF # 2-2. Create metrics collection script with user context cat > collect-metrics.sh << 'COLLECT_EOF' #!/usr/bin/env bash # collect-metrics.sh — Run Tokuin and save with user context (dry-run by default) set -euo pipefail # User info USER_ID="${LLM_USER_ID:-$(whoami)}" USER_ALIAS="${LLM_USER_ALIAS:-$USER_ID}" SESSION_ID="${LLM_SESSION_ID:-$(date +%Y%m%d-%H%M%S)-$$}" PROMPT="${1:-Benchmark prompt}" MODEL="${MODEL:-gpt-4}" PROVIDER="${PROVIDER:-openai}" RUNS="${RUNS:-50}" CONCURRENCY="${CONCURRENCY:-5}" TAGS="${LLM_TAGS:-[]}" TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") CATEGORY=$(python3 categorize_prompt.py "$PROMPT" 2>/dev/null || echo "other") PROMPT_HASH=$(echo -n "$PROMPT" | sha256sum | cut -c1-16 2>/dev/null || echo "unknown") PROMPT_LEN=${#PROMPT} # Run Tokuin (dry-run by default) RESULT=$(echo "$PROMPT" | tokuin load-test \ --model "$MODEL" \ --provider "$PROVIDER" \ --runs "$RUNS" \ --concurrency "$CONCURRENCY" \ --output-format json \ ${ALLOW_LIVE:+""} ${ALLOW_LIVE:-"--dry-run --estimate-cost"} 2>/dev/null) # Save to JSONL with user context python3 - << PYEOF import json, sys result = json.loads('''${RESULT}''') latency = result.get("latency_ms", {}) cost = result.get("cost", {}) record = { "id": "${PROMPT_HASH}-${SESSION_ID}", "timestamp": "${TIMESTAMP}", "model": "${MODEL}", "provider": "${PROVIDER}", "user_id": "${USER_ID}", "user_alias": "${USER_ALIAS}", "session_id": "${SESSION_ID}", "prompt_hash": "${PROMPT_HASH}", "prompt_category": "${CATEGORY}", "prompt_length": ${PROMPT_LEN}, "tags": json.loads('${TAGS}'), "is_dry_run": True, "total_requests": result.get("total_requests", 0), "successful": result.get("successful", 0), "failed": result.get("failed", 0), "input_tokens": cost.get("input_tokens", 0), "output_tokens": cost.get("output_tokens", 0), "cost_usd": cost.get("total_cost", 0), "latency_avg_ms": latency.get("average", 0), "latency_p50_ms": latency.get("p50", 0), "latency_p95_ms": latency.get("p95", 0), "status_code": 200 if result.get("successful", 0) > 0 else 500, } with open("./data/metrics.jsonl", "a") as f: f.write(json.dumps(record, ensure_ascii=False) + "\n") print(f"✅ Saved: [{record['user_alias']}] {record['prompt_category']} | ${record['cost_usd']:.4f} | {record['latency_avg_ms']:.0f}ms") PYEOF COLLECT_EOF chmod +x collect-metrics.sh # 2-3. Set up cron (auto-collect every 5 minutes) (crontab -l 2>/dev/null; echo "/5 * * * * cd $(pwd) && bash collect-metrics.sh 'Scheduled benchmark' >> ./data/collect.log 2>&1") | crontab - echo "✅ Cron registered (every 5 minutes)" # 2-4. First collection test (dry-run) bash collect-metrics.sh "Analyze user behavior patterns" cat ./data/metrics.jsonl | python3 -m json.tool | head -30 Step 3: Routing structure and dashboard frame Option A — Next.js (recommended) # 3-1. Initialize Next.js project (skip this if adding to an existing project) npx create-next-app@latest llm-dashboard \ --typescript \ --tailwind \ --app \ --no-src-dir cd llm-dashboard # 3-2. Install dependencies npm install recharts better-sqlite3 @types/better-sqlite3 # 3-3. Set design tokens (consistent tone and style) cat > app/globals.css << 'CSS_EOF' :root { /* Background layers / --bg-base: #0f1117; --bg-surface: #1a1d27; --bg-elevated: #21253a; --border: rgba(255, 255, 255, 0.06); / Text layers / --text-primary: #f1f5f9; --text-secondary: #94a3b8; --text-muted: #475569; / 3-level traffic light system (use consistently across all components) / --color-ok: #22c55e; / Normal — Green 500 / --color-warn: #f59e0b; / Warning — Amber 500 / --color-danger: #ef4444; / Danger — Red 500 / --color-neutral: #60a5fa; / Neutral — Blue 400 / / Data series colors (colorblind-friendly palette) / --series-1: #818cf8; / Indigo — System/GPT-4 / --series-2: #38bdf8; / Sky — User/Claude / --series-3: #34d399; / Emerald — Assistant/Gemini*/ --series-4: #fb923c; /* Orange — 4th series / / Cost-specific / --cost-input: #a78bfa; --cost-output: #f472b6; / Ranking colors / --rank-gold: #fbbf24; --rank-silver: #94a3b8; --rank-bronze: #b45309; --rank-inactive: #374151; / Typography / --font-mono: 'JetBrains Mono', 'Fira Code', monospace; --font-ui: 'Geist', 'Plus Jakarta Sans', system-ui, sans-serif; } body { background: var(--bg-base); color: var(--text-primary); font-family: var(--font-ui); } / Numbers: alignment stability / .metric-value { font-family: var(--font-mono); font-variant-numeric: tabular-nums; font-feature-settings: 'tnum'; } / KPI card accent-bar */ .status-ok { border-left-color: var(--color-ok); } .status-warn { border-left-color: var(--color-warn); } .status-danger { border-left-color: var(--color-danger); } CSS_EOF # 3-4. Create routing structure mkdir -p app/admin/llm-monitoring mkdir -p app/admin/llm-monitoring/users mkdir -p "app/admin/llm-monitoring/users/[userId]" mkdir -p "app/admin/llm-monitoring/runs/[runId]" mkdir -p components/llm-monitoring mkdir -p lib/llm-monitoring # 3-5. Initialize SQLite DB cat > lib/llm-monitoring/db.ts << 'TS_EOF' import Database from 'better-sqlite3' import path from 'path' const DB_PATH = path.join(process.cwd(), 'data', 'monitoring.db') const db = new Database(DB_PATH) db.exec(CREATE TABLE IF NOT EXISTS runs ( id TEXT PRIMARY KEY, timestamp DATETIME NOT NULL DEFAULT (datetime('now')), model TEXT NOT NULL, provider TEXT NOT NULL, user_id TEXT DEFAULT 'anonymous', user_alias TEXT DEFAULT 'anonymous', session_id TEXT, prompt_hash TEXT, prompt_category TEXT DEFAULT 'other', prompt_length INTEGER DEFAULT 0, tags TEXT DEFAULT '[]', is_dry_run INTEGER DEFAULT 1, total_requests INTEGER DEFAULT 0, successful INTEGER DEFAULT 0, failed INTEGER DEFAULT 0, input_tokens INTEGER DEFAULT 0, output_tokens INTEGER DEFAULT 0, cost_usd REAL DEFAULT 0, latency_avg_ms REAL DEFAULT 0, latency_p50_ms REAL DEFAULT 0, latency_p95_ms REAL DEFAULT 0, status_code INTEGER DEFAULT 200 ); CREATE TABLE IF NOT EXISTS user_profiles ( user_id TEXT PRIMARY KEY, user_alias TEXT NOT NULL, team TEXT DEFAULT '', role TEXT DEFAULT 'user', created_at DATETIME DEFAULT (datetime('now')), last_seen DATETIME, notes TEXT DEFAULT '' ); CREATE INDEX IF NOT EXISTS idx_runs_timestamp ON runs(timestamp DESC); CREATE INDEX IF NOT EXISTS idx_runs_user_id ON runs(user_id); CREATE INDEX IF NOT EXISTS idx_runs_model ON runs(model); CREATE VIEW IF NOT EXISTS user_stats AS SELECT user_id, user_alias, COUNT(*) AS total_runs, SUM(input_tokens + output_tokens) AS total_tokens, ROUND(SUM(cost_usd), 4) AS total_cost, ROUND(AVG(latency_avg_ms), 1) AS avg_latency, ROUND(AVG(CAST(successful AS REAL) / NULLIF(total_requests, 0) * 100), 1) AS success_rate, COUNT(DISTINCT model) AS models_used, MAX(timestamp) AS last_seen FROM runs GROUP BY user_id;) export default db TS_EOF Option B — Lightweight HTML (minimal dependencies) # Use this when there's no existing project or you need a quick prototype mkdir -p llm-monitoring/data cat > llm-monitoring/index.html << 'HTML_
...
ユーザーレビュー (0)
レビューを書く
レビューなし
統計データ
ユーザー評価
この Skill を評価