Monitor Agent Activity

Use monitoring to track agent performance, analyze conversation patterns, debug execution issues, and optimize token costs.

The Agentic Data Plane is supported on BYOC clusters running with AWS and Redpanda version 25.3 and later.

After reading this page, you will be able to:

  • Verify agent behavior using the Inspector tab

  • Track token usage and performance metrics

  • Debug agent execution using Transcripts

For conceptual background on traces and observability, see Transcripts and AI Observability.

Prerequisites

You must have a running agent. If you do not have one, see AI Agent Quickstart.

Debug agent execution with Transcripts

The Transcripts view shows execution traces with detailed timing, errors, and performance metrics. Use this view to debug issues, verify agent behavior, and monitor performance in real-time.

  1. Click Transcripts.

  2. Select a recent transcript from your agent executions.

The transcripts view displays:

  • Timeline: Visual history of recent executions with success/error indicators

  • Trace list: Hierarchical view of traces and spans

  • Summary panel: Detailed metrics when you select a transcript

Timeline visualization

The timeline shows execution patterns over time:

  • Green bars: Successful executions

  • Red bars: Failed executions with errors

  • Gray bars: Incomplete traces or traces still loading

  • Time range: Displays the last few hours by default

Use the timeline to spot patterns like error clusters, performance degradation over time, or gaps indicating downtime.

Trace hierarchy

The trace list shows nested operations with visual duration bars indicating how long each operation took. Click the expand arrows (▶) to drill into nested spans and see the complete execution flow.

For details on span types, see Agent trace hierarchy.

Summary panel

When you select a transcript, the summary panel shows:

  • Duration: Total execution time for this request

  • Total Spans: Number of operations in the trace

  • Token Usage: Input tokens, output tokens, and total (critical for cost tracking)

  • LLM Calls: How many times the agent called the language model

  • Service: The agent identifier

  • Conversation ID: Links to session data topics

Check agent health

Use the Transcripts view to verify your agent is healthy. Look for consistent green bars in the timeline, which indicate successful executions. Duration should stay within your expected range, while token usage remains stable without unexpected growth.

Several warning signs indicate problems. Red bars in the timeline mean errors or failures that need investigation. When duration increases over time, your context window may be growing or tool calls could be slowing down. Many LLM calls for simple requests often signal that the agent is stuck in loops or making unnecessary iterations. If you see missing transcripts, the agent may be stopped or encountering deployment issues.

Pay attention to patterns across multiple executions. When all recent transcripts show errors, start by checking agent status, MCP server connectivity, and system prompt configuration. A spiky timeline that alternates between success and error typically points to intermittent tool failures or external API issues. If duration increases steadily over a session, your context window is likely filling up. Clear the conversation history to reset it. High token usage combined with relatively few LLM calls usually means tool results are large or your system prompts are verbose.

Debug with Transcripts

Use Transcripts to diagnose specific issues:

If the agent is not responding:

  1. Check the timeline for recent transcripts. If none appear, the agent may be stopped.

  2. Verify agent status in the main AI Agents view.

  3. Look for error transcripts with deployment or initialization failures.

If the agent fails during execution:

  1. Select the failed transcript (red bar in timeline).

  2. Expand the trace hierarchy to find the tool invocation span.

  3. Check the span details for error messages.

  4. Cross-reference with MCP server status.

If performance is slow:

  1. Compare duration across multiple transcripts in the summary panel.

  2. Look for specific spans with long durations (wide bars in trace list).

  3. Check if LLM calls are taking longer than expected.

  4. Verify tool execution time by examining nested spans.

Track token usage and costs

View token consumption in the Summary panel when you select a transcript. The breakdown shows input tokens (everything sent to the LLM including system prompt, conversation history, and tool results), output tokens (what the LLM generates in agent responses), and total tokens as the sum of both.

Calculate cost per request:

Cost = (input_tokens x input_price) + (output_tokens x output_price)

Example: GPT-5.2 with 4,302 input tokens and 1,340 output tokens at $0.00000175 per input token and $0.000014 per output token costs $0.026 per request.

For cost optimization strategies, see Cost calculation.

Test agent behavior with Inspector

The Inspector tab provides real-time conversation testing. Use it to test agent responses interactively and verify behavior before deploying changes.

Access Inspector

  1. Navigate to Agentic AI > AI Agents in the Redpanda Cloud Console.

  2. Click your agent name.

  3. Open the Inspector tab.

  4. Enter test queries and review responses.

  5. Check the conversation panel to see tool calls.

  6. Start a new session to test fresh conversations or click Clear context to reset history.

Testing best practices

Test your agents systematically by exploring edge cases and potential failure scenarios. Begin with boundary testing. Requests at the edge of agent capabilities verify that scope enforcement works correctly. Error handling becomes clear when you request unavailable data and observe whether the agent degrades gracefully. Even with proper system prompt constraints, testing confirms that your agent responds appropriately to edge cases.

Monitor iteration counts during complex requests to ensure they complete within your configured limits. Ambiguous or vague queries reveal whether the agent asks clarifying questions or makes risky assumptions. Throughout testing, track token usage per request to estimate costs and identify which query patterns consume the most resources.