Home/AI System Observability & SRE/llm-monitoring-dashboard
L

llm-monitoring-dashboard

by @supercent-iov
4.5(274)

This skill uses Tokuin CLI to track LLM API costs, tokens, and latency, and automatically generates data-driven management dashboards, providing project management insights.

llm-monitoringai-observabilitydashboardingdata-visualizationperformance-metricsGitHub
Installation
npx skills add supercent-io/skills-template --skill llm-monitoring-dashboard
compare_arrows

Before / After Comparison

1
Before

Tracking LLM API costs, tokens, and latency is difficult, lacking a unified view, making it hard to gain project management insights, leading to resource waste.

After

The skill automatically generates an LLM API monitoring dashboard via Tokuin CLI, providing cost, token, and latency insights, optimizing resource management.

SKILL.md

llm-monitoring-dashboard

LLM Usage Monitoring Dashboard Tracks LLM API costs, tokens, and latency using Tokuin CLI, and auto-generates a data-driven admin dashboard with PM insights. When to use this skill LLM cost visibility: When you want to monitor API usage costs per team or individual in real time PM reporting dashboard: When you need weekly reports on who uses AI, how much, and how User adoption management: When you want to track inactive users and increase AI adoption rates Model optimization evidence: When you need data-driven decisions for model switching or cost reduction Add monitoring tab to admin dashboard: When adding an LLM monitoring section to an existing Admin page Prerequisites 1. Verify Tokuin CLI installation # Check if installed which tokuin && tokuin --version || echo "Not installed — run Step 1 first" 2. Environment variables (only needed for live API calls) # Store in .env file (never hardcode directly in source) OPENAI_API_KEY=sk-... # OpenAI ANTHROPIC_API_KEY=sk-ant-... # Anthropic OPENROUTER_API_KEY=sk-or-... # OpenRouter (400+ models) # LLM monitoring settings LLM_USER_ID=dev-alice # User identifier LLM_USER_ALIAS=Alice # Display name COST_THRESHOLD_USD=10.00 # Cost threshold (alert when exceeded) DASHBOARD_PORT=3000 # Dashboard port MAX_COST_USD=5.00 # Max cost per single run SLACK_WEBHOOK_URL=https://... # For alerts (optional) 3. Project stack requirements Option A (recommended): Next.js 15+ + React 18 + TypeScript Option B (lightweight): Python 3.8+ + HTML/JavaScript (minimal dependencies) Instructions Step 0: Safety check (always run this first) ⚠️ Run this script before executing the skill. Any FAIL items will halt execution. cat > safety-guard.sh << 'SAFETY_EOF' #!/usr/bin/env bash # safety-guard.sh — Safety gate before running the LLM monitoring dashboard set -euo pipefail RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m'; NC='\033[0m' ALLOW_LIVE="${1:-}"; PASS=0; WARN=0; FAIL=0 log_pass() { echo -e "${GREEN}✅ PASS${NC} $1"; ((PASS++)); } log_warn() { echo -e "${YELLOW}⚠️ WARN${NC} $1"; ((WARN++)); } log_fail() { echo -e "${RED}❌ FAIL${NC} $1"; ((FAIL++)); } echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "🛡 LLM Monitoring Dashboard — Safety Guard v1.0" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" # ── 1. Check Tokuin CLI installation ──────────────────────────────── if command -v tokuin &>/dev/null; then log_pass "Tokuin CLI installed: $(tokuin --version 2>&1 | head -1)" else log_fail "Tokuin not installed → install with the command below and re-run:" echo " curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash" fi # ── 2. Detect hardcoded API keys ──────────────────────────────── HARDCODED=$(grep -rE "(sk-[a-zA-Z0-9]{20,}|sk-ant-[a-zA-Z0-9]{20,}|sk-or-[a-zA-Z0-9]{20,})" \ . --include=".ts" --include=".tsx" --include=".js" --include=".jsx" \ --include=".html" --include=".sh" --include=".py" --include=".json" \ --exclude-dir=node_modules --exclude-dir=.git 2>/dev/null \ | grep -v ".env" | grep -v "example" | wc -l || echo 0) if [ "$HARDCODED" -eq 0 ]; then log_pass "No hardcoded API keys found" else log_fail "⚠️ ${HARDCODED} hardcoded API key(s) detected! → Move to environment variables (.env) immediately" grep -rE "(sk-[a-zA-Z0-9]{20,})" . \ --include=".ts" --include=".js" --include=".html" \ --exclude-dir=node_modules 2>/dev/null | head -5 || true fi # ── 3. Check .env is in .gitignore ──────────────────────────── if [ -f .env ]; then if [ -f .gitignore ] && grep -q ".env" .gitignore; then log_pass ".env is listed in .gitignore" else log_fail ".env exists but is not in .gitignore! → echo '.env' >> .gitignore" fi else log_warn ".env file not found — create one before making live API calls" fi # ── 4. Check live API call mode ──────────────────────────── if [ "$ALLOW_LIVE" = "--allow-live" ]; then log_warn "Live API call mode enabled! Costs will be incurred." log_warn "Max cost threshold: $${MAX_COST_USD:-5.00} (adjust via MAX_COST_USD env var)" read -p " Allow live API calls? [y/N] " -r echo [[ $REPLY =~ ^[Yy]$ ]] || { echo "Cancelled. Re-run in dry-run mode."; exit 1; } else log_pass "dry-run mode (default) — no API costs incurred" fi # ── 5. Check port conflicts ───────────────────────────────────── PORT="${DASHBOARD_PORT:-3000}" if lsof -i ":${PORT}" &>/dev/null 2>&1; then ALT_PORT=$((PORT + 1)) log_warn "Port ${PORT} is in use → use ${ALT_PORT} instead: export DASHBOARD_PORT=${ALT_PORT}" else log_pass "Port ${PORT} is available" fi # ── 6. Initialize data/ directory ────────────────────────────── mkdir -p ./data if [ -f ./data/metrics.jsonl ]; then BYTES=$(wc -c < ./data/metrics.jsonl || echo 0) if [ "$BYTES" -gt 10485760 ]; then log_warn "metrics.jsonl exceeds 10MB (${BYTES}B) → consider applying a rolling policy" echo " cp data/metrics.jsonl data/metrics-$(date +%Y%m%d).jsonl.bak && > data/metrics.jsonl" else log_pass "data/ ready (metrics.jsonl: ${BYTES}B)" fi else log_pass "data/ ready (new)" fi # ── Summary ───────────────────────────────────────────── echo "" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo -e "Result: ${GREEN}PASS $PASS${NC} / ${YELLOW}WARN $WARN${NC} / ${RED}FAIL $FAIL${NC}" if [ "$FAIL" -gt 0 ]; then echo -e "${RED}❌ Safety check failed. Resolve the FAIL items above and re-run.${NC}" exit 1 else echo -e "${GREEN}✅ Safety check passed. Continuing skill execution.${NC}" exit 0 fi SAFETY_EOF chmod +x safety-guard.sh # Run (halts immediately if any FAIL) bash safety-guard.sh Step 1: Install Tokuin CLI and verify with dry-run # 1-1. Install (macOS / Linux) curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash # Windows PowerShell: # irm https://raw.githubusercontent.com/nooscraft/tokuin/main/install.ps1 | iex # 1-2. Verify installation tokuin --version which tokuin # expected: /usr/local/bin/tokuin or ~/.local/bin/tokuin # 1-3. Basic token count test echo "Hello, world!" | tokuin --model gpt-4 # 1-4. dry-run cost estimate (no API key needed ✅) echo "Analyze user behavior patterns from the following data" | \ tokuin load-test \ --model gpt-4 \ --runs 50 \ --concurrency 5 \ --dry-run \ --estimate-cost \ --output-format json | python3 -m json.tool # Expected output structure: # { # "total_requests": 50, # "successful": 50, # "failed": 0, # "latency_ms": { "average": ..., "p50": ..., "p95": ... }, # "cost": { "input_tokens": ..., "output_tokens": ..., "total_cost": ... } # } # 1-5. Multi-model comparison (dry-run) echo "Translate this to Korean" | tokuin --compare gpt-4 gpt-3.5-turbo claude-3-haiku --price # 1-6. Verify Prometheus format output echo "Benchmark" | tokuin load-test --model gpt-4 --runs 10 --dry-run --output-format prometheus # Expected: "# HELP", "# TYPE", metrics with "tokuin_" prefix Step 2: Data collection pipeline with user context # 2-1. Create prompt auto-categorization module cat > categorize_prompt.py << 'PYEOF' #!/usr/bin/env python3 """Auto-categorize prompts based on keywords""" import hashlib CATEGORIES = { "coding": ["code", "function", "class", "implement", "debug", "fix", "refactor"], "analysis": ["analyze", "compare", "evaluate", "assess"], "translation": ["translate", "translation"], "summary": ["summarize", "summary", "tldr", "brief"], "writing": ["write", "draft", "create", "generate"], "question": ["what is", "how to", "explain", "why"], "data": ["data", "table", "csv", "json", "sql"], } def categorize(prompt: str) -> str: p = prompt.lower() for cat, keywords in CATEGORIES.items(): if any(k in p for k in keywords): return cat return "other" def hash_prompt(prompt: str) -> str: """First 16 chars of SHA-256 (stored instead of raw text — privacy protection)""" return hashlib.sha256(prompt.encode()).hexdigest()[:16] def truncate_preview(prompt: str, limit: int = 100) -> str: return prompt[:limit] + ("…" if len(prompt) > limit else "") if name == "main": import sys prompt = sys.argv[1] if len(sys.argv) > 1 else "" print(categorize(prompt)) PYEOF # 2-2. Create metrics collection script with user context cat > collect-metrics.sh << 'COLLECT_EOF' #!/usr/bin/env bash # collect-metrics.sh — Run Tokuin and save with user context (dry-run by default) set -euo pipefail # User info USER_ID="${LLM_USER_ID:-$(whoami)}" USER_ALIAS="${LLM_USER_ALIAS:-$USER_ID}" SESSION_ID="${LLM_SESSION_ID:-$(date +%Y%m%d-%H%M%S)-$$}" PROMPT="${1:-Benchmark prompt}" MODEL="${MODEL:-gpt-4}" PROVIDER="${PROVIDER:-openai}" RUNS="${RUNS:-50}" CONCURRENCY="${CONCURRENCY:-5}" TAGS="${LLM_TAGS:-[]}" TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") CATEGORY=$(python3 categorize_prompt.py "$PROMPT" 2>/dev/null || echo "other") PROMPT_HASH=$(echo -n "$PROMPT" | sha256sum | cut -c1-16 2>/dev/null || echo "unknown") PROMPT_LEN=${#PROMPT} # Run Tokuin (dry-run by default) RESULT=$(echo "$PROMPT" | tokuin load-test \ --model "$MODEL" \ --provider "$PROVIDER" \ --runs "$RUNS" \ --concurrency "$CONCURRENCY" \ --output-format json \ ${ALLOW_LIVE:+""} ${ALLOW_LIVE:-"--dry-run --estimate-cost"} 2>/dev/null) # Save to JSONL with user context python3 - << PYEOF import json, sys result = json.loads('''${RESULT}''') latency = result.get("latency_ms", {}) cost = result.get("cost", {}) record = { "id": "${PROMPT_HASH}-${SESSION_ID}", "timestamp": "${TIMESTAMP}", "model": "${MODEL}", "provider": "${PROVIDER}", "user_id": "${USER_ID}", "user_alias": "${USER_ALIAS}", "session_id": "${SESSION_ID}", "prompt_hash": "${PROMPT_HASH}", "prompt_category": "${CATEGORY}", "prompt_length": ${PROMPT_LEN}, "tags": json.loads('${TAGS}'), "is_dry_run": True, "total_requests": result.get("total_requests", 0), "successful": result.get("successful", 0), "failed": result.get("failed", 0), "input_tokens": cost.get("input_tokens", 0), "output_tokens": cost.get("output_tokens", 0), "cost_usd": cost.get("total_cost", 0), "latency_avg_ms": latency.get("average", 0), "latency_p50_ms": latency.get("p50", 0), "latency_p95_ms": latency.get("p95", 0), "status_code": 200 if result.get("successful", 0) > 0 else 500, } with open("./data/metrics.jsonl", "a") as f: f.write(json.dumps(record, ensure_ascii=False) + "\n") print(f"✅ Saved: [{record['user_alias']}] {record['prompt_category']} | ${record['cost_usd']:.4f} | {record['latency_avg_ms']:.0f}ms") PYEOF COLLECT_EOF chmod +x collect-metrics.sh # 2-3. Set up cron (auto-collect every 5 minutes) (crontab -l 2>/dev/null; echo "/5 * * * * cd $(pwd) && bash collect-metrics.sh 'Scheduled benchmark' >> ./data/collect.log 2>&1") | crontab - echo "✅ Cron registered (every 5 minutes)" # 2-4. First collection test (dry-run) bash collect-metrics.sh "Analyze user behavior patterns" cat ./data/metrics.jsonl | python3 -m json.tool | head -30 Step 3: Routing structure and dashboard frame Option A — Next.js (recommended) # 3-1. Initialize Next.js project (skip this if adding to an existing project) npx create-next-app@latest llm-dashboard \ --typescript \ --tailwind \ --app \ --no-src-dir cd llm-dashboard # 3-2. Install dependencies npm install recharts better-sqlite3 @types/better-sqlite3 # 3-3. Set design tokens (consistent tone and style) cat > app/globals.css << 'CSS_EOF' :root { /* Background layers / --bg-base: #0f1117; --bg-surface: #1a1d27; --bg-elevated: #21253a; --border: rgba(255, 255, 255, 0.06); / Text layers / --text-primary: #f1f5f9; --text-secondary: #94a3b8; --text-muted: #475569; / 3-level traffic light system (use consistently across all components) / --color-ok: #22c55e; / Normal — Green 500 / --color-warn: #f59e0b; / Warning — Amber 500 / --color-danger: #ef4444; / Danger — Red 500 / --color-neutral: #60a5fa; / Neutral — Blue 400 / / Data series colors (colorblind-friendly palette) / --series-1: #818cf8; / Indigo — System/GPT-4 / --series-2: #38bdf8; / Sky — User/Claude / --series-3: #34d399; / Emerald — Assistant/Gemini*/ --series-4: #fb923c; /* Orange — 4th series / / Cost-specific / --cost-input: #a78bfa; --cost-output: #f472b6; / Ranking colors / --rank-gold: #fbbf24; --rank-silver: #94a3b8; --rank-bronze: #b45309; --rank-inactive: #374151; / Typography / --font-mono: 'JetBrains Mono', 'Fira Code', monospace; --font-ui: 'Geist', 'Plus Jakarta Sans', system-ui, sans-serif; } body { background: var(--bg-base); color: var(--text-primary); font-family: var(--font-ui); } / Numbers: alignment stability / .metric-value { font-family: var(--font-mono); font-variant-numeric: tabular-nums; font-feature-settings: 'tnum'; } / KPI card accent-bar */ .status-ok { border-left-color: var(--color-ok); } .status-warn { border-left-color: var(--color-warn); } .status-danger { border-left-color: var(--color-danger); } CSS_EOF # 3-4. Create routing structure mkdir -p app/admin/llm-monitoring mkdir -p app/admin/llm-monitoring/users mkdir -p "app/admin/llm-monitoring/users/[userId]" mkdir -p "app/admin/llm-monitoring/runs/[runId]" mkdir -p components/llm-monitoring mkdir -p lib/llm-monitoring # 3-5. Initialize SQLite DB cat > lib/llm-monitoring/db.ts << 'TS_EOF' import Database from 'better-sqlite3' import path from 'path' const DB_PATH = path.join(process.cwd(), 'data', 'monitoring.db') const db = new Database(DB_PATH) db.exec(CREATE TABLE IF NOT EXISTS runs ( id TEXT PRIMARY KEY, timestamp DATETIME NOT NULL DEFAULT (datetime('now')), model TEXT NOT NULL, provider TEXT NOT NULL, user_id TEXT DEFAULT 'anonymous', user_alias TEXT DEFAULT 'anonymous', session_id TEXT, prompt_hash TEXT, prompt_category TEXT DEFAULT 'other', prompt_length INTEGER DEFAULT 0, tags TEXT DEFAULT '[]', is_dry_run INTEGER DEFAULT 1, total_requests INTEGER DEFAULT 0, successful INTEGER DEFAULT 0, failed INTEGER DEFAULT 0, input_tokens INTEGER DEFAULT 0, output_tokens INTEGER DEFAULT 0, cost_usd REAL DEFAULT 0, latency_avg_ms REAL DEFAULT 0, latency_p50_ms REAL DEFAULT 0, latency_p95_ms REAL DEFAULT 0, status_code INTEGER DEFAULT 200 ); CREATE TABLE IF NOT EXISTS user_profiles ( user_id TEXT PRIMARY KEY, user_alias TEXT NOT NULL, team TEXT DEFAULT '', role TEXT DEFAULT 'user', created_at DATETIME DEFAULT (datetime('now')), last_seen DATETIME, notes TEXT DEFAULT '' ); CREATE INDEX IF NOT EXISTS idx_runs_timestamp ON runs(timestamp DESC); CREATE INDEX IF NOT EXISTS idx_runs_user_id ON runs(user_id); CREATE INDEX IF NOT EXISTS idx_runs_model ON runs(model); CREATE VIEW IF NOT EXISTS user_stats AS SELECT user_id, user_alias, COUNT(*) AS total_runs, SUM(input_tokens + output_tokens) AS total_tokens, ROUND(SUM(cost_usd), 4) AS total_cost, ROUND(AVG(latency_avg_ms), 1) AS avg_latency, ROUND(AVG(CAST(successful AS REAL) / NULLIF(total_requests, 0) * 100), 1) AS success_rate, COUNT(DISTINCT model) AS models_used, MAX(timestamp) AS last_seen FROM runs GROUP BY user_id;) export default db TS_EOF Option B — Lightweight HTML (minimal dependencies) # Use this when there's no existing project or you need a quick prototype mkdir -p llm-monitoring/data cat > llm-monitoring/index.html << 'HTML_

...

User Reviews (0)

Write a Review

Effect
Usability
Docs
Compatibility

No reviews yet

Statistics

Installs6.8K
Rating4.5 / 5.0
Version
Updated2026年5月17日
Comparisons1

User Rating

4.5(274)
5
60%
4
40%
3
0%
2
0%
1
0%

Rate this Skill

0.0

Compatible Platforms

🔧Claude Code
🔧OpenClaw
🔧OpenCode
🔧Codex
🔧Gemini CLI
🔧GitHub Copilot
🔧Amp
🔧Kimi CLI

Timeline

Created2026年3月17日
Last Updated2026年5月17日