---
id: daily-google-agents-cli-observability
name: "google-agents-cli-observability"
url: https://skills.yangsir.net/skill/daily-google-agents-cli-observability
author: google
domain: ai-system-observability-sre
tags: ["observability", "monitoring", "cloud-infra", "analytics"]
install_count: 8600
rating: 4.50 (25 reviews)
github: https://github.com/google/agents-cli
---

# google-agents-cli-observability

> 配置 ADK Agent 可观测性，集成 Cloud Trace、日志记录和 BigQuery 分析，监控生产环境性能

**Stats**: 8,600 installs · 4.5/5 (25 reviews)

## Before / After 对比

### 故障排查效率

**Before**:

生产环境出现问题时，手动检查分散的日志和指标，关联调用链困难，平均故障排查时间超过 1 小时

**After**:

通过 Cloud Trace 自动关联调用链，BigQuery 分析提示响应模式，10 分钟内定位问题根因并获取修复建议

| Metric | Before | After | Change |
|---|---|---|---|
| 排障时间 | 60分钟 | 10分钟 | -83% |

## Readme

# google-agents-cli-observability

# ADK Observability Guide

**Cloud Trace** works out of the box — no infrastructure needed. **Prompt-response logging** and **BigQuery Agent Analytics** require Terraform-provisioned infrastructure (service account, GCS bucket, BigQuery dataset). Run `agents-cli infra single-project --project PROJECT_ID` to provision these resources. See `references/cloud-trace-and-logging.md` for details, env vars, and verification commands. If your project isn't scaffolded yet, see `/google-agents-cli-scaffold` first.

### Order of operations for `agent_runtime` deployments

For `deployment_target = agent_runtime`, run `agents-cli infra single-project` **before** the first `agents-cli deploy`. The Terraform module owns the entire Reasoning Engine resource (display_name, service account, deployment spec, env vars), so applying it after a SDK-based deploy creates a state mismatch — Terraform has no record of the SDK-deployed instance and cannot layer env vars onto it without taking ownership of the whole resource.

If you have already run `agents-cli deploy`, you have two options:

- **Switch to Terraform-managed.** Delete the SDK-deployed Reasoning Engine, then run `agents-cli infra single-project` followed by `agents-cli deploy`. Sessions and any in-flight state on the previous instance are lost.

- **Keep the SDK-deployed instance.** Skip `infra single-project` and set the observability env vars on the running instance directly via the `vertexai` client `update` API. You will also need to grant the instance's service account the IAM permissions required to emit telemetry — writing to the logs GCS bucket, BigQuery dataset access, log writer, etc. See `deployment/terraform/single-project/iam.tf` and `telemetry.tf` in your scaffolded project for the full set of bindings the Terraform module would otherwise provision. Terraform-managed env vars are not available in this mode.

### Reference Files

File
Contents

`references/cloud-trace-and-logging.md`
Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally

`references/bigquery-agent-analytics.md`
BQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance

## Observability Tiers

Choose the right level of observability based on your needs:

Tier
What It Does
Scope
Default State
Best For

**Cloud Trace**
Distributed tracing — execution flow, latency, errors via OpenTelemetry spans
All templates, all environments
Always enabled
Debugging latency, understanding agent execution flow

**Prompt-Response Logging**
GenAI interactions exported to GCS, BigQuery, and Cloud Logging
ADK agents only
Disabled locally, enabled when deployed
Auditing LLM interactions, compliance

**BigQuery Agent Analytics**
Structured agent events (LLM calls, tool use, outcomes) to BigQuery
ADK agents with plugin enabled
Opt-in (`--bq-analytics` at scaffold time)
Conversational analytics, custom dashboards, LLM-as-judge evals

**Third-Party Integrations**
External observability platforms (AgentOps, Phoenix, MLflow, etc.)
Any ADK agent
Opt-in, per-provider setup
Team collaboration, specialized visualization, prompt management

**Ask the user** which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.

## Cloud Trace

ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.

### Span Hierarchy

```
invocation
  └── agent_run (one per agent in the chain)
        ├── call_llm (model request/response)
        └── execute_tool (tool execution)

```

### Setup by Deployment Type

Deployment
Setup

**Agent Runtime**
Automatic — traces are exported to Cloud Trace by default

**Cloud Run (scaffolded)**
Automatic — `otel_to_cloud=True` in the FastAPI app

**GKE (scaffolded)**
Automatic — `otel_to_cloud=True` in the FastAPI app

**Cloud Run / GKE (manual)**
Configure OpenTelemetry exporter in your app

**Local dev**
Works with `agents-cli playground`; traces visible in Cloud Console

View traces: **Cloud Console → Trace → Trace explorer**

For detailed setup instructions (Agent Runtime CLI/SDK, Cloud Run, custom deployments), fetch `https://adk.dev/integrations/cloud-trace/index.md`.

## Prompt-Response Logging

Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL) and BigQuery (via direct log sinks and external tables). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise.

Key env var: `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` — set to `NO_CONTENT` (metadata only, default in deployed envs), `true` (full content), or `false` (disabled). Logging is disabled locally unless `LOGS_BUCKET_NAME` is set.

For scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see `references/cloud-trace-and-logging.md`.

For ADK logging docs (log levels, configuration, debugging), fetch `https://adk.dev/observability/logging/index.md`.

## BigQuery Agent Analytics Plugin

Optional plugin that logs structured agent events to BigQuery. Enable with `--bq-analytics` at scaffold time. See `references/bigquery-agent-analytics.md` for details.

## Third-Party Integrations

ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.

Platform
Key Differentiator
Setup Complexity
Self-Hosted Option

**AgentOps**
Session replays, 2-line setup, replaces native telemetry
Minimal
No (SaaS)

**Arize AX**
Commercial platform, production monitoring, evaluation dashboards
Low
No (SaaS)

**Phoenix**
Open-source, custom evaluators, experiment testing
Low
Yes

**MLflow**
OTel traces to MLflow Tracking Server, span tree visualization
Medium (needs SQL backend)
Yes

**Monocle**
1-call setup, VS Code Gantt chart visualizer
Minimal
Yes (local files)

**Weave**
W&B platform, team collaboration, timeline views
Low
No (SaaS)

**Freeplay**
Prompt management + evals + observability in one platform
Low
No (SaaS)

**Ask the user** which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below.

## Troubleshooting

Issue
Solution

No traces in Cloud Trace
Verify `otel_to_cloud=True` in FastAPI app; check service account has `cloudtrace.agent` role

Prompt-response data not appearing
Check `LOGS_BUCKET_NAME` is set; verify SA has `storage.objectCreator` on the bucket; check app logs for telemetry setup warnings

Privacy mode misconfigured
Check `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` value — use `NO_CONTENT` for metadata-only, `false` to disable

BigQuery Analytics not logging
Verify plugin is configured in `app/agent.py`; check `BQ_ANALYTICS_DATASET_ID` env var is set

Third-party integration not capturing spans
Check provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry

Traces missing tool spans
Tool execution spans appear under `execute_tool` — check trace explorer filters

High telemetry costs
Switch to `NO_CONTENT` mode; reduce BigQuery retention; disable unused tiers

## Deep Dive: ADK Docs (WebFetch URLs)

For detailed documentation beyond what this skill covers, fetch these pages:

Topic
URL

Observability overview
`https://adk.dev/observability/index.md`

Agent activity logging
`https://adk.dev/observability/logging/index.md`

Cloud Trace integration
`https://adk.dev/integrations/cloud-trace/index.md`

BigQuery Agent Analytics
`https://adk.dev/integrations/bigquery-agent-analytics/index.md`

AgentOps
`https://adk.dev/integrations/agentops/index.md`

Arize AX
`https://adk.dev/integrations/arize-ax/index.md`

Phoenix (Arize)
`https://adk.dev/integrations/phoenix/index.md`

MLflow tracing
`https://adk.dev/integrations/mlflow-tracing/index.md`

Monocle
`https://adk.dev/integrations/monocle/index.md`

W&B Weave
`https://adk.dev/integrations/weave/index.md`

Freeplay
`https://adk.dev/integrations/freeplay/index.md`

## Related Skills

- `/google-agents-cli-deploy` — Deployment targets, CI/CD pipelines, and production workflows

- `/google-agents-cli-workflow` — Development workflow, coding guidelines, and operational rules

- `/google-agents-cli-adk-code` — ADK Python API quick reference for writing agent code

Weekly Installs880Repository[google/agents-cli](https://github.com/google/agents-cli)GitHub Stars694First SeenTodaySecurity Audits[Gen Agent Trust HubPass](/google/agents-cli/google-agents-cli-observability/security/agent-trust-hub)[SocketPass](/google/agents-cli/google-agents-cli-observability/security/socket)[SnykPass](/google/agents-cli/google-agents-cli-observability/security/snyk)

---
*Source: https://skills.yangsir.net/skill/daily-google-agents-cli-observability*
*Markdown mirror: https://skills.yangsir.net/api/skill/daily-google-agents-cli-observability/markdown*