首页/AI 系统可观测与 SRE/google-agents-cli-observability
G

google-agents-cli-observability

by @googlev
4.5(25)

配置 ADK Agent 可观测性,集成 Cloud Trace、日志记录和 BigQuery 分析,监控生产环境性能

observabilitymonitoringcloud-infraanalyticsGitHub
安装方式
npx skills add google/agents-cli --skill google-agents-cli-observability
compare_arrows

Before / After 效果对比

1
使用前

生产环境出现问题时,手动检查分散的日志和指标,关联调用链困难,平均故障排查时间超过 1 小时

使用后

通过 Cloud Trace 自动关联调用链,BigQuery 分析提示响应模式,10 分钟内定位问题根因并获取修复建议

SKILL.md

google-agents-cli-observability

ADK Observability Guide

Cloud Trace works out of the box — no infrastructure needed. Prompt-response logging and BigQuery Agent Analytics require Terraform-provisioned infrastructure (service account, GCS bucket, BigQuery dataset). Run agents-cli infra single-project --project PROJECT_ID to provision these resources. See references/cloud-trace-and-logging.md for details, env vars, and verification commands. If your project isn't scaffolded yet, see /google-agents-cli-scaffold first.

Order of operations for agent_runtime deployments

For deployment_target = agent_runtime, run agents-cli infra single-project before the first agents-cli deploy. The Terraform module owns the entire Reasoning Engine resource (display_name, service account, deployment spec, env vars), so applying it after a SDK-based deploy creates a state mismatch — Terraform has no record of the SDK-deployed instance and cannot layer env vars onto it without taking ownership of the whole resource.

If you have already run agents-cli deploy, you have two options:

  • Switch to Terraform-managed. Delete the SDK-deployed Reasoning Engine, then run agents-cli infra single-project followed by agents-cli deploy. Sessions and any in-flight state on the previous instance are lost.

  • Keep the SDK-deployed instance. Skip infra single-project and set the observability env vars on the running instance directly via the vertexai client update API. You will also need to grant the instance's service account the IAM permissions required to emit telemetry — writing to the logs GCS bucket, BigQuery dataset access, log writer, etc. See deployment/terraform/single-project/iam.tf and telemetry.tf in your scaffolded project for the full set of bindings the Terraform module would otherwise provision. Terraform-managed env vars are not available in this mode.

Reference Files

File Contents

references/cloud-trace-and-logging.md Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally

references/bigquery-agent-analytics.md BQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance

Observability Tiers

Choose the right level of observability based on your needs:

Tier What It Does Scope Default State Best For

Cloud Trace Distributed tracing — execution flow, latency, errors via OpenTelemetry spans All templates, all environments Always enabled Debugging latency, understanding agent execution flow

Prompt-Response Logging GenAI interactions exported to GCS, BigQuery, and Cloud Logging ADK agents only Disabled locally, enabled when deployed Auditing LLM interactions, compliance

BigQuery Agent Analytics Structured agent events (LLM calls, tool use, outcomes) to BigQuery ADK agents with plugin enabled Opt-in (--bq-analytics at scaffold time) Conversational analytics, custom dashboards, LLM-as-judge evals

Third-Party Integrations External observability platforms (AgentOps, Phoenix, MLflow, etc.) Any ADK agent Opt-in, per-provider setup Team collaboration, specialized visualization, prompt management

Ask the user which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.

Cloud Trace

ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.

Span Hierarchy

invocation
  └── agent_run (one per agent in the chain)
        ├── call_llm (model request/response)
        └── execute_tool (tool execution)

Setup by Deployment Type

Deployment Setup

Agent Runtime Automatic — traces are exported to Cloud Trace by default

Cloud Run (scaffolded) Automatic — otel_to_cloud=True in the FastAPI app

GKE (scaffolded) Automatic — otel_to_cloud=True in the FastAPI app

Cloud Run / GKE (manual) Configure OpenTelemetry exporter in your app

Local dev Works with agents-cli playground; traces visible in Cloud Console

View traces: Cloud Console → Trace → Trace explorer

For detailed setup instructions (Agent Runtime CLI/SDK, Cloud Run, custom deployments), fetch https://adk.dev/integrations/cloud-trace/index.md.

Prompt-Response Logging

Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL) and BigQuery (via direct log sinks and external tables). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise.

Key env var: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT — set to NO_CONTENT (metadata only, default in deployed envs), true (full content), or false (disabled). Logging is disabled locally unless LOGS_BUCKET_NAME is set.

For scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see references/cloud-trace-and-logging.md.

For ADK logging docs (log levels, configuration, debugging), fetch https://adk.dev/observability/logging/index.md.

BigQuery Agent Analytics Plugin

Optional plugin that logs structured agent events to BigQuery. Enable with --bq-analytics at scaffold time. See references/bigquery-agent-analytics.md for details.

Third-Party Integrations

ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.

Platform Key Differentiator Setup Complexity Self-Hosted Option

AgentOps Session replays, 2-line setup, replaces native telemetry Minimal No (SaaS)

Arize AX Commercial platform, production monitoring, evaluation dashboards Low No (SaaS)

Phoenix Open-source, custom evaluators, experiment testing Low Yes

MLflow OTel traces to MLflow Tracking Server, span tree visualization Medium (needs SQL backend) Yes

Monocle 1-call setup, VS Code Gantt chart visualizer Minimal Yes (local files)

Weave W&B platform, team collaboration, timeline views Low No (SaaS)

Freeplay Prompt management + evals + observability in one platform Low No (SaaS)

Ask the user which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below.

Troubleshooting

Issue Solution

No traces in Cloud Trace Verify otel_to_cloud=True in FastAPI app; check service account has cloudtrace.agent role

Prompt-response data not appearing Check LOGS_BUCKET_NAME is set; verify SA has storage.objectCreator on the bucket; check app logs for telemetry setup warnings

Privacy mode misconfigured Check OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT value — use NO_CONTENT for metadata-only, false to disable

BigQuery Analytics not logging Verify plugin is configured in app/agent.py; check BQ_ANALYTICS_DATASET_ID env var is set

Third-party integration not capturing spans Check provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry

Traces missing tool spans Tool execution spans appear under execute_tool — check trace explorer filters

High telemetry costs Switch to NO_CONTENT mode; reduce BigQuery retention; disable unused tiers

Deep Dive: ADK Docs (WebFetch URLs)

For detailed documentation beyond what this skill covers, fetch these pages:

Topic URL

Observability overview https://adk.dev/observability/index.md

Agent activity logging https://adk.dev/observability/logging/index.md

Cloud Trace integration https://adk.dev/integrations/cloud-trace/index.md

BigQuery Agent Analytics https://adk.dev/integrations/bigquery-agent-analytics/index.md

AgentOps https://adk.dev/integrations/agentops/index.md

Arize AX https://adk.dev/integrations/arize-ax/index.md

Phoenix (Arize) https://adk.dev/integrations/phoenix/index.md

MLflow tracing https://adk.dev/integrations/mlflow-tracing/index.md

Monocle https://adk.dev/integrations/monocle/index.md

W&B Weave https://adk.dev/integrations/weave/index.md

Freeplay https://adk.dev/integrations/freeplay/index.md

Related Skills

  • /google-agents-cli-deploy — Deployment targets, CI/CD pipelines, and production workflows

  • /google-agents-cli-workflow — Development workflow, coding guidelines, and operational rules

  • /google-agents-cli-adk-code — ADK Python API quick reference for writing agent code

Weekly Installs880Repositorygoogle/agents-cliGitHub Stars694First SeenTodaySecurity AuditsGen Agent Trust HubPassSocketPassSnykPass

用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价

统计数据

安装量8.6K
评分4.5 / 5.0
版本
更新日期2026年5月23日
对比案例1 组

用户评分

4.5(25)
5
52%
4
32%
3
12%
2
4%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code

时间线

创建2026年4月25日
最后更新2026年5月23日