首页/AI 系统可观测与 SRE/adk-observability-guide
A

adk-observability-guide

by @googlev
4.4(20)

在为ADK代理设置可观测性前必须阅读的指南。涵盖日志、指标和追踪的配置与实践,确保系统状态透明,便于监控和故障排查。

adkmobile-observabilitymonitoringloggingperformance-metricsGitHub
安装方式
npx skills add google/adk-docs --skill adk-observability-guide
compare_arrows

Before / After 效果对比

1
使用前

缺乏对ADK智能体运行状态的有效监控,难以发现和诊断潜在问题,导致故障排查困难,影响系统稳定性。

使用后

遵循ADK可观测性指南,我能全面监控智能体性能和行为。这显著提升了问题发现和解决效率,确保了系统的稳定可靠。

SKILL.md

ADK Observability Guide

Scaffolded project? Cloud Trace and prompt-response logging are pre-configured by Terraform. See references/cloud-trace-and-logging.md for infrastructure details, env vars, and verification commands.

No scaffold? Follow the ADK docs links below for manual setup. For production infrastructure, scaffold with /adk-scaffold.

Reference Files

FileContents
references/cloud-trace-and-logging.mdScaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally
references/bigquery-agent-analytics.mdBQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance

Observability Tiers

Choose the right level of observability based on your needs:

TierWhat It DoesScopeDefault StateBest For
Cloud TraceDistributed tracing — execution flow, latency, errors via OpenTelemetry spansAll templates, all environmentsAlways enabledDebugging latency, understanding agent execution flow
Prompt-Response LoggingGenAI interactions exported to GCS, BigQuery, and Cloud LoggingADK agents onlyDisabled locally, enabled when deployedAuditing LLM interactions, compliance
BigQuery Agent AnalyticsStructured agent events (LLM calls, tool use, outcomes) to BigQueryADK agents with plugin enabledOpt-in (--bq-analytics at scaffold time)Conversational analytics, custom dashboards, LLM-as-judge evals
Third-Party IntegrationsExternal observability platforms (AgentOps, Phoenix, MLflow, etc.)Any ADK agentOpt-in, per-provider setupTeam collaboration, specialized visualization, prompt management

Ask the user which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.


Cloud Trace

ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.

Span Hierarchy

invocation
  └── agent_run (one per agent in the chain)
        ├── call_llm (model request/response)
        └── execute_tool (tool execution)

Setup by Deployment Type

DeploymentSetup
Agent EngineAutomatic — traces are exported to Cloud Trace by default
Cloud Run (scaffolded)Automatic — otel_to_cloud=True in the FastAPI app
Cloud Run (manual)Configure OpenTelemetry exporter in your app
Local devWorks with make playground; traces visible in Cloud Console

View traces: Cloud Console → Trace → Trace explorer

For detailed setup instructions (Agent Engine CLI/SDK, Cloud Run, custom deployments), fetch https://google.github.io/adk-docs/integrations/cloud-trace/index.md.


Prompt-Response Logging

Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL), BigQuery (external tables), and Cloud Logging (dedicated bucket). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise.

Key env var: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT — set to NO_CONTENT (metadata only, default in deployed envs), true (full content), or false (disabled). Logging is disabled locally unless LOGS_BUCKET_NAME is set.

For scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see references/cloud-trace-and-logging.md.

For ADK logging docs (log levels, configuration, debugging), fetch https://google.github.io/adk-docs/observability/logging/index.md.


BigQuery Agent Analytics Plugin

Optional plugin that logs structured agent events to BigQuery. Enable with --bq-analytics at scaffold time. See references/bigquery-agent-analytics.md for details.


Third-Party Integrations

ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.

PlatformKey DifferentiatorSetup ComplexitySelf-Hosted Option
AgentOpsSession replays, 2-line setup, replaces native telemetryMinimalNo (SaaS)
Arize AXCommercial platform, production monitoring, evaluation dashboardsLowNo (SaaS)
PhoenixOpen-source, custom evaluators, experiment testingLowYes
MLflowOTel traces to MLflow Tracking Server, span tree visualizationMedium (needs SQL backend)Yes
Monocle1-call setup, VS Code Gantt chart visualizerMinimalYes (local files)
WeaveW&B platform, team collaboration, timeline viewsLowNo (SaaS)
FreeplayPrompt management + evals + observability in one platformLowNo (SaaS)

Ask the user which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below.


Troubleshooting

IssueSolution
No traces in Cloud TraceVerify otel_to_cloud=True in FastAPI app; check service account has cloudtrace.agent role
Prompt-response data not appearingCheck LOGS_BUCKET_NAME is set; verify SA has storage.objectCreator on the bucket; check app logs for telemetry setup warnings
Privacy mode misconfiguredCheck OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT value — use NO_CONTENT for metadata-only, false to disable
BigQuery Analytics not loggingVerify plugin is configured in app/agent.py; check BQ_ANALYTICS_DATASET_ID env var is set
Third-party integration not capturing spansCheck provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry
Traces missing tool spansTool execution spans appear under execute_tool — check trace explorer filters
High telemetry costsSwitch to NO_CONTENT mode; reduce BigQuery retention; disable unused tiers

Deep Dive: ADK Docs (WebFetch URLs)

For detailed documentation beyond what this skill covers, fetch these pages:

TopicURL
Observability overviewhttps://google.github.io/adk-docs/observability/index.md
Agent activity logginghttps://google.github.io/adk-docs/observability/logging/index.md
Cloud Trace integrationhttps://google.github.io/adk-docs/integrations/cloud-trace/index.md
BigQuery Agent Analyticshttps://google.github.io/adk-docs/integrations/bigquery-agent-analytics/index.md
AgentOpshttps://google.github.io/adk-docs/integrations/agentops/index.md
Arize AXhttps://google.github.io/adk-docs/integrations/arize-ax/index.md
Phoenix (Arize)https://google.github.io/adk-docs/integrations/phoenix/index.md
MLflow tracinghttps://google.github.io/adk-docs/integrations/mlflow/index.md
Monoclehttps://google.github.io/adk-docs/integrations/monocle/index.md
W&B Weavehttps://google.github.io/adk-docs/integrations/weave/index.md
Freeplayhttps://google.github.io/adk-docs/integrations/freeplay/index.md

用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价

统计数据

安装量2.6K
评分4.4 / 5.0
版本
更新日期2026年5月17日
对比案例1 组

用户评分

4.4(20)
5
15%
4
45%
3
35%
2
5%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code
🔧OpenClaw
🔧OpenCode
🔧Codex
🔧Gemini CLI
🔧GitHub Copilot
🔧Amp
🔧Kimi CLI

时间线

创建2026年3月16日
最后更新2026年5月17日