# Troubleshoot Agents

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [agentic-data-plane-full.txt](https://docs.redpanda.com/agentic-data-plane-full.txt)

---
title: Troubleshoot Agents
latest-operator-version: v26.1.5
latest-console-tag: v3.7.4
latest-connect-version: 4.96.1
latest-redpanda-tag: v26.1.10
docname: troubleshoot-ai-agents
page-component-name: agentic-data-plane
page-version: master
page-component-version: master
page-component-title: Agentic Data Plane
page-relative-src-path: troubleshoot-ai-agents.adoc
page-edit-url: https://github.com/redpanda-data/adp-docs/edit/main/modules/monitor/pages/troubleshoot-ai-agents.adoc
description: Diagnose and fix common issues with AI agents including deployment failures, runtime behavior problems, and tool execution errors.
page-topic-type: troubleshooting
personas: agent_builder, platform_engineer
learning-objective-1: Diagnose deployment failures and resource allocation errors
learning-objective-2: Resolve runtime behavior issues including tool selection and iteration limits
learning-objective-3: Fix tool execution problems and authentication failures
page-git-created-date: "2026-05-28"
page-git-modified-date: "2026-06-04"
---

<!-- Source: https://docs.redpanda.com/agentic-data-plane/monitor/troubleshoot-ai-agents.md -->

Use this page to diagnose and fix common issues with AI agents, including deployment failures, runtime behavior problems, tool execution errors, and integration issues.

## [](#deployment-issues)Deployment issues

Fix issues that prevent agents from connecting to required resources.

### [](#mcp-server-connection-failures)MCP server connection failures

**Symptoms:** Agent starts but the tools don’t respond or return connection errors.

**Causes:**

-   MCP server stopped or crashed after agent creation

-   Network connectivity issues between agent and MCP server

-   MCP server authentication or permission issues


**Solution:**

1.  Verify MCP server status in **Remote MCP**.

2.  Check MCP server logs for errors.

3.  Restart the MCP server if needed.

4.  Verify agent has permission to access the MCP server.


**Prevention:**

-   Monitor MCP server health

-   Use appropriate retry logic in tools


## [](#runtime-behavior-issues)Runtime behavior issues

Resolve problems with agent decision-making, tool selection, and response generation.

### [](#agent-not-calling-tools)Agent not calling tools

**Symptoms:** Agent responds without calling any tools, or fabricates information instead of using tools.

**Causes:**

-   System prompt doesn’t clearly specify when to use tools

-   Tool descriptions are vague or missing

-   LLM model lacks sufficient reasoning capability

-   Max iterations is too low


**Solution:**

1.  Strengthen tool usage guidance in your system prompt:

    ```text
    ALWAYS use get_order_status when customer mentions an order ID.
    NEVER respond about order status without calling the tool first.
    ```

2.  Review tool descriptions in your MCP server configuration.

3.  Use a more capable model from the supported list for your gateway.

4.  Increase max iterations if the agent is stopping before reaching tools.


**Prevention:**

-   Write explicit tool selection criteria in system prompts

-   Test agents with the [systematic testing approach](https://docs.redpanda.com/agentic-data-plane/connect/system-prompts/#evaluation-and-testing)

-   Use models appropriate for your task complexity


### [](#calling-wrong-tools)Calling wrong tools

**Symptoms:** Agent selects incorrect tools for the task, or calls tools with invalid parameters.

**Causes:**

-   Tool descriptions are ambiguous or overlap

-   Too many similar tools confuse the LLM

-   System prompt doesn’t provide clear tool selection guidance


**Solution:**

1.  Make tool descriptions more specific and distinct.

2.  Add "when to use" guidance to your system prompt:

    ```text
    Use get_order_status when:
    - Customer provides an order ID (ORD-XXXXX)
    - You need to check current order state

    Use get_shipping_info when:
    - Order status is "shipped"
    - Customer asks about delivery or tracking
    ```

3.  Reduce the number of tools you expose to the agent.

4.  Use subagents to partition tools by domain.


**Prevention:**

-   Follow tool design patterns in [How MCP Servers Work](https://docs.redpanda.com/agentic-data-plane/connect/mcp-overview/)

-   Limit each agent to 10-15 tools maximum

-   Test boundary cases where multiple tools might apply


### [](#stuck-in-loops-or-exceeding-max-iterations)Stuck in loops or exceeding max iterations

**Symptoms:** Agent reaches max iterations without completing the task, or repeatedly calls the same tool with the same parameters.

**Causes:**

-   Tool returns errors that the agent doesn’t know how to handle

-   Agent doesn’t recognize when the task is complete

-   Tool returns incomplete data that prompts another call

-   System prompt encourages exhaustive exploration


**Solution:**

1.  Add completion criteria to your system prompt:

    ```text
    When you have retrieved all requested information:
    1. Present the results to the user
    2. Stop calling additional tools
    3. Do not explore related data unless asked
    ```

2.  Add error handling guidance:

    ```text
    If a tool fails after 2 attempts:
    - Explain what went wrong
    - Do not retry the same tool again
    - Move on or ask for user guidance
    ```

3.  Review tool output to ensure it signals completion clearly.

4.  Increase max iterations if the task legitimately requires many steps.


**Prevention:**

-   Design tools to return complete information in one call

-   Set max iterations appropriate for task complexity (see [Why iterations matter](https://docs.redpanda.com/agentic-data-plane/connect/concepts/#why-iterations-matter))

-   Test with ambiguous requests that might cause loops


### [](#making-up-information)Making up information

**Symptoms:** Agent provides plausible-sounding answers without calling tools, or invents data when tools fail.

**Causes:**

-   System prompt doesn’t explicitly forbid fabrication

-   Agent treats tool failures as suggestions rather than requirements

-   Model is hallucinating due to lack of constraints


**Solution:**

1.  Add explicit constraints to your system prompt:

    ```text
    Critical rules:
    - NEVER make up order numbers, tracking numbers, or customer data
    - If a tool fails, explain the failure - do not guess
    - If you don't have information, say so explicitly
    ```

2.  Test error scenarios by temporarily disabling tools.

3.  Use a more capable model that follows instructions better.


**Prevention:**

-   Include "never fabricate" rules in all system prompts

-   Test with requests that require unavailable data

-   Monitor **Transcripts** and session topic for fabricated responses


### [](#analyzing-conversation-patterns)Analyzing conversation patterns

**Symptoms:** Agent behavior is inconsistent or produces unexpected results.

**Solution:**

Review conversation history in **Transcripts** to identify problematic patterns:

-   Agents calling the same tool repeatedly: Indicates loop detection is needed

-   Large gaps between messages: Suggests tool timeout or slow execution

-   Agent responses without tool calls: Indicates a tool selection issue

-   Fabricated information: Suggests a missing "never make up data" constraint

-   Truncated early messages: Indicates the context window was exceeded


**Analysis workflow:**

1.  Use **Inspector** to reproduce the issue.

2.  Review full conversation including tool invocations.

3.  Identify where agent behavior diverged from expected.

4.  Check system prompt for missing guidance.

5.  Verify tool responses are formatted correctly.


## [](#performance-issues)Performance issues

Diagnose and fix issues related to agent speed and resource consumption.

### [](#slow-response-times)Slow response times

**Symptoms:** Agent takes 10+ seconds to respond to simple queries.

**Causes:**

-   LLM model is slow (large context processing)

-   Too many tool calls in sequence

-   Tools themselves are slow (database queries, API calls)

-   Large context window from long conversation history


**Solution:**

1.  Use a faster, lower-latency model tier for simple queries and reserve larger models for complex reasoning.

2.  Review conversation history in the **Inspector** tab to identify unnecessary tool calls.

3.  Optimize tool implementations:

    1.  Add caching where appropriate

    2.  Reduce query complexity

    3.  Return only needed data (use pagination, filters)


4.  Clear the conversation history if the context is very large.


**Prevention:**

-   Right-size model selection based on task complexity

-   Design tools to execute quickly (< 2 seconds ideal)

-   Set appropriate max iterations to prevent excessive exploration

-   Monitor token usage and conversation length


### [](#high-token-costs)High token costs

**Symptoms:** Token usage is higher than expected, costs are increasing rapidly.

**Causes:**

-   Max iterations configured too high

-   Agent making unnecessary tool calls

-   Large tool results filling context window

-   Long conversation history not being managed

-   Using expensive models for simple tasks


**Solution:**

1.  Review token usage in **Transcripts**.

2.  Lower max iterations for this agent.

3.  Optimize tool responses to return less data:

    ```text
    Bad:  Return all 10,000 customer records
    Good: Return paginated results, 20 records at a time
    ```

4.  Add cost control guidance to system prompt:

    ```text
    Efficiency guidelines:
    - Request only the data you need
    - Stop when you have enough information
    - Do not call tools speculatively
    ```

5.  Switch to a more cost-effective model for simple queries.

6.  Clear conversation history periodically in the **Inspector** tab.


**Prevention:**

-   Set appropriate max iterations (10-20 for simple, 30-40 for complex)

-   Design tools to return minimal necessary data

-   Monitor token usage trends

-   See cost calculation guidance in [Cost calculation](https://docs.redpanda.com/agentic-data-plane/connect/concepts/#cost-calculation)


## [](#tool-execution-issues)Tool execution issues

Fix problems with timeouts, invalid parameters, and error responses.

### [](#tool-timeouts)Tool timeouts

**Symptoms:** Tools fail with timeout errors, agent receives incomplete results.

**Causes:**

-   External API is slow or unresponsive

-   Database query is too complex

-   Network latency between tool and external system

-   Tool processing large datasets in memory


**Solution:**

1.  Add timeout handling to tool implementation:

    ```yaml
    http:
      url: https://api.example.com/data
      timeout: "5s"  # Set explicit timeout
    ```

2.  Optimize external queries:

    1.  Add database indexes

    2.  Reduce query scope

    3.  Cache frequent queries


3.  Increase tool timeout if operation legitimately takes longer.

4.  Add retry logic for transient failures.


**Prevention:**

-   Set explicit timeouts in all tool configurations

-   Test tools under load

-   Monitor external API performance

-   Design tools to fail fast on unavailable services


### [](#invalid-parameters)Invalid parameters

**Symptoms:** Tools return validation errors about missing or incorrectly formatted parameters.

**Causes:**

-   Tool schema doesn’t match implementation

-   Agent passes wrong data types

-   Required parameters not marked as required in schema

-   Agent misunderstands parameter purpose


**Solution:**

1.  Verify tool schema matches implementation:

    ```yaml
    input_schema:
      properties:
        order_id:
          type: string  # Must match what tool expects
          description: "Order ID in format ORD-12345"
    ```

2.  Add parameter validation to tools.

3.  Improve parameter descriptions in tool schema.

4.  Add examples to tool descriptions:

    ```yaml
    description: |
      Get order status by order ID.
      Example: get_order_status(order_id="ORD-12345")
    ```


**Prevention:**

-   Write detailed parameter descriptions

-   Include format requirements and examples

-   Test tools with invalid inputs to verify error messages

-   Use JSON Schema validation in tool implementations


### [](#tool-returns-errors)Tool returns errors

**Symptoms:** Tools execute but return error responses or unexpected data formats.

**Causes:**

-   External API returned error

-   Tool implementation has bugs

-   Data format changed in external system

-   Tool lacks error handling


**Solution:**

1.  Check tool logs in MCP server.

2.  Test tool directly (outside agent context).

3.  Verify external system is operational.

4.  Add error handling to tool implementation:

    ```yaml
    processors:
      - try:
          - http:
              url: ${API_URL}
        catch:
          - mapping: |
              root.error = "API unavailable: " + error()
    ```

5.  Update agent system prompt to handle this error type.


**Prevention:**

-   Implement comprehensive error handling in tools

-   Monitor external system health

-   Add retries for transient failures

-   Log all tool errors for analysis


## [](#integration-issues)Integration issues

Fix problems with external applications calling agents and pipeline-to-agent integration.

### [](#agent-card-does-not-contain-a-url)Agent card does not contain a URL

**Symptoms:** Pipeline fails with error: `agent card does not contain a URL` or `failed to init processor <no label> path root.pipeline.processors.0`

**Causes:**

-   The `agent_card_url` points to the base agent endpoint instead of the agent card JSON file


**Solution:**

The `agent_card_url` must point to the agent card JSON file, not the base agent endpoint.

**Incorrect configuration:**

```yaml
processors:
  - a2a_message:
      agent_card_url: "https://your-agent-id.ai-agents.your-cluster-id.cloud.redpanda.com"
      prompt: "Analyze this transaction: ${!content()}"
```

**Correct configuration:**

```yaml
processors:
  - a2a_message:
      agent_card_url: "https://your-agent-id.ai-agents.your-cluster-id.cloud.redpanda.com/.well-known/agent-card.json"
      prompt: "Analyze this transaction: ${!content()}"
```

The agent card is always available at `/.well-known/agent-card.json` according to the A2A protocol standard.

**Prevention:**

-   Always append `/.well-known/agent-card.json` to the agent endpoint URL

-   Test the agent card URL in a browser before using it in pipeline configuration

-   See [Agent card location](https://docs.redpanda.com/agentic-data-plane/connect/a2a-concepts/#agent-card-location) for details


### [](#pipeline-integration-failures)Pipeline integration failures

**Symptoms:** Pipelines using `a2a_message` processor fail or timeout.

**Causes:**

-   Agent is not running or restarting

-   Agent timeout is too low for pipeline workload

-   Authentication issues between pipeline and agent

-   High event volume overwhelming agent


**Solution:**

1.  Check agent status and resource allocation.

2.  Increase agent resource tier for high-volume pipelines.

3.  Add error handling in pipeline:

    ```yaml
    processors:
      - try:
          - a2a_message:
              agent_card_url: "https://your-agent-url/.well-known/agent-card.json"
        catch:
          - log:
              message: "Agent invocation failed: ${! error() }"
    ```


**Prevention:**

-   Test pipeline-agent integration with low volume first

-   Size agent resources appropriately for event rate

-   See integration patterns in [Integrate with Redpanda Pipelines](https://docs.redpanda.com/agentic-data-plane/connect/pipeline-integration-patterns/)


## [](#monitor-and-debug-agents)Monitor and debug agents

For comprehensive guidance on monitoring agent activity, analyzing conversation history, tracking token usage, and debugging issues, see [Monitor Agent Activity](https://docs.redpanda.com/agentic-data-plane/monitor/monitor-agents/).

## [](#next-steps)Next steps

-   [Write Effective System Prompts](https://docs.redpanda.com/agentic-data-plane/connect/system-prompts/)

-   [How MCP Servers Work](https://docs.redpanda.com/agentic-data-plane/connect/mcp-overview/)

-   [Choose an Agent Architecture](https://docs.redpanda.com/agentic-data-plane/connect/architecture-patterns/)