Docs Cloud Agentic AI Agents Build Agents Troubleshoot Troubleshoot AI Agents Page options Copy as Markdown Copied! View as plain text Ask AI about this topic Add MCP server to VS Code Use this page to diagnose and fix common issues with AI agents, including deployment failures, runtime behavior problems, tool execution errors, and integration issues. The Agentic Data Plane is supported on BYOC clusters running with AWS and Redpanda version 25.3 and later. Deployment issues Fix issues that prevent agents from connecting to required resources. MCP server connection failures Symptoms: Agent starts but the tools don’t respond or return connection errors. Causes: MCP server stopped or crashed after agent creation Network connectivity issues between agent and MCP server MCP server authentication or permission issues Solution: Verify MCP server status in Agentic AI > Remote MCP. Check MCP server logs for errors. Restart the MCP server if needed. Verify agent has permission to access the MCP server. Prevention: Monitor MCP server health Use appropriate retry logic in tools Runtime behavior issues Resolve problems with agent decision-making, tool selection, and response generation. Agent not calling tools Symptoms: Agent responds without calling any tools, or fabricates information instead of using tools. Causes: System prompt doesn’t clearly specify when to use tools Tool descriptions are vague or missing LLM model lacks sufficient reasoning capability Max iterations is too low Solution: Strengthen tool usage guidance in your system prompt: ALWAYS use get_order_status when customer mentions an order ID. NEVER respond about order status without calling the tool first. Review tool descriptions in your MCP server configuration. Use a more capable model from the supported list for your gateway. Increase max iterations if the agent is stopping before reaching tools. Prevention: Write explicit tool selection criteria in system prompts Test agents with the systematic testing approach Use models appropriate for your task complexity Calling wrong tools Symptoms: Agent selects incorrect tools for the task, or calls tools with invalid parameters. Causes: Tool descriptions are ambiguous or overlap Too many similar tools confuse the LLM System prompt doesn’t provide clear tool selection guidance Solution: Make tool descriptions more specific and distinct. Add "when to use" guidance to your system prompt: Use get_order_status when: - Customer provides an order ID (ORD-XXXXX) - You need to check current order state Use get_shipping_info when: - Order status is "shipped" - Customer asks about delivery or tracking Reduce the number of tools you expose to the agent. Use subagents to partition tools by domain. Prevention: Follow tool design patterns in MCP Tool Patterns Limit each agent to 10-15 tools maximum Test boundary cases where multiple tools might apply Stuck in loops or exceeding max iterations Symptoms: Agent reaches max iterations without completing the task, or repeatedly calls the same tool with the same parameters. Causes: Tool returns errors that the agent doesn’t know how to handle Agent doesn’t recognize when the task is complete Tool returns incomplete data that prompts another call System prompt encourages exhaustive exploration Solution: Add completion criteria to your system prompt: When you have retrieved all requested information: 1. Present the results to the user 2. Stop calling additional tools 3. Do not explore related data unless asked Add error handling guidance: If a tool fails after 2 attempts: - Explain what went wrong - Do not retry the same tool again - Move on or ask for user guidance Review tool output to ensure it signals completion clearly. Increase max iterations if the task legitimately requires many steps. Prevention: Design tools to return complete information in one call Set max iterations appropriate for task complexity (see Why iterations matter) Test with ambiguous requests that might cause loops Making up information Symptoms: Agent provides plausible-sounding answers without calling tools, or invents data when tools fail. Causes: System prompt doesn’t explicitly forbid fabrication Agent treats tool failures as suggestions rather than requirements Model is hallucinating due to lack of constraints Solution: Add explicit constraints to your system prompt: Critical rules: - NEVER make up order numbers, tracking numbers, or customer data - If a tool fails, explain the failure - do not guess - If you don't have information, say so explicitly Test error scenarios by temporarily disabling tools. Use a more capable model that follows instructions better. Prevention: Include "never fabricate" rules in all system prompts Test with requests that require unavailable data Monitor Transcripts and session topic for fabricated responses Analyzing conversation patterns Symptoms: Agent behavior is inconsistent or produces unexpected results. Solution: Review conversation history in Transcripts to identify problematic patterns: Agents calling the same tool repeatedly: Indicates loop detection is needed Large gaps between messages: Suggests tool timeout or slow execution Agent responses without tool calls: Indicates a tool selection issue Fabricated information: Suggests a missing "never make up data" constraint Truncated early messages: Indicates the context window was exceeded Analysis workflow: Use Inspector to reproduce the issue. Review full conversation including tool invocations. Identify where agent behavior diverged from expected. Check system prompt for missing guidance. Verify tool responses are formatted correctly. Performance issues Diagnose and fix issues related to agent speed and resource consumption. Slow response times Symptoms: Agent takes 10+ seconds to respond to simple queries. Causes: LLM model is slow (large context processing) Too many tool calls in sequence Tools themselves are slow (database queries, API calls) Large context window from long conversation history Solution: Use a faster, lower-latency model tier for simple queries and reserve larger models for complex reasoning. Review conversation history in the Inspector tab to identify unnecessary tool calls. Optimize tool implementations: Add caching where appropriate Reduce query complexity Return only needed data (use pagination, filters) Clear the conversation history if the context is very large. Prevention: Right-size model selection based on task complexity Design tools to execute quickly (< 2 seconds ideal) Set appropriate max iterations to prevent excessive exploration Monitor token usage and conversation length High token costs Symptoms: Token usage is higher than expected, costs are increasing rapidly. Causes: Max iterations configured too high Agent making unnecessary tool calls Large tool results filling context window Long conversation history not being managed Using expensive models for simple tasks Solution: Review token usage in Transcripts. Lower max iterations for this agent. Optimize tool responses to return less data: Bad: Return all 10,000 customer records Good: Return paginated results, 20 records at a time Add cost control guidance to system prompt: Efficiency guidelines: - Request only the data you need - Stop when you have enough information - Do not call tools speculatively Switch to a more cost-effective model for simple queries. Clear conversation history periodically in the Inspector tab. Prevention: Set appropriate max iterations (10-20 for simple, 30-40 for complex) Design tools to return minimal necessary data Monitor token usage trends See cost calculation guidance in Cost calculation Tool execution issues Fix problems with timeouts, invalid parameters, and error responses. Tool timeouts Symptoms: Tools fail with timeout errors, agent receives incomplete results. Causes: External API is slow or unresponsive Database query is too complex Network latency between tool and external system Tool processing large datasets in memory Solution: Add timeout handling to tool implementation: http: url: https://api.example.com/data timeout: "5s" # Set explicit timeout Optimize external queries: Add database indexes Reduce query scope Cache frequent queries Increase tool timeout if operation legitimately takes longer. Add retry logic for transient failures. Prevention: Set explicit timeouts in all tool configurations Test tools under load Monitor external API performance Design tools to fail fast on unavailable services Invalid parameters Symptoms: Tools return validation errors about missing or incorrectly formatted parameters. Causes: Tool schema doesn’t match implementation Agent passes wrong data types Required parameters not marked as required in schema Agent misunderstands parameter purpose Solution: Verify tool schema matches implementation: input_schema: properties: order_id: type: string # Must match what tool expects description: "Order ID in format ORD-12345" Add parameter validation to tools. Improve parameter descriptions in tool schema. Add examples to tool descriptions: description: | Get order status by order ID. Example: get_order_status(order_id="ORD-12345") Prevention: Write detailed parameter descriptions Include format requirements and examples Test tools with invalid inputs to verify error messages Use JSON Schema validation in tool implementations Tool returns errors Symptoms: Tools execute but return error responses or unexpected data formats. Causes: External API returned error Tool implementation has bugs Data format changed in external system Tool lacks error handling Solution: Check tool logs in MCP server. Test tool directly (outside agent context). Verify external system is operational. Add error handling to tool implementation: processors: - try: - http: url: ${API_URL} catch: - mapping: | root.error = "API unavailable: " + error() Update agent system prompt to handle this error type. Prevention: Implement comprehensive error handling in tools Monitor external system health Add retries for transient failures Log all tool errors for analysis Integration issues Fix problems with external applications calling agents and pipeline-to-agent integration. Agent card does not contain a URL Symptoms: Pipeline fails with error: agent card does not contain a URL or failed to init processor <no label> path root.pipeline.processors.0 Causes: The agent_card_url points to the base agent endpoint instead of the agent card JSON file Solution: The agent_card_url must point to the agent card JSON file, not the base agent endpoint. Incorrect configuration: processors: - a2a_message: agent_card_url: "https://your-agent-id.ai-agents.your-cluster-id.cloud.redpanda.com" prompt: "Analyze this transaction: ${!content()}" Correct configuration: processors: - a2a_message: agent_card_url: "https://your-agent-id.ai-agents.your-cluster-id.cloud.redpanda.com/.well-known/agent-card.json" prompt: "Analyze this transaction: ${!content()}" The agent card is always available at /.well-known/agent-card.json according to the A2A protocol standard. Prevention: Always append /.well-known/agent-card.json to the agent endpoint URL Test the agent card URL in a browser before using it in pipeline configuration See Agent card location for details Pipeline integration failures Symptoms: Pipelines using a2a_message processor fail or timeout. Causes: Agent is not running or restarting Agent timeout is too low for pipeline workload Authentication issues between pipeline and agent High event volume overwhelming agent Solution: Check agent status and resource allocation. Increase agent resource tier for high-volume pipelines. Add error handling in pipeline: processors: - try: - a2a_message: agent_card_url: "https://your-agent-url/.well-known/agent-card.json" catch: - log: message: "Agent invocation failed: ${! error() }" Prevention: Test pipeline-agent integration with low volume first Size agent resources appropriately for event rate See integration patterns in Pipeline Integration Patterns Monitor and debug agents For comprehensive guidance on monitoring agent activity, analyzing conversation history, tracking token usage, and debugging issues, see Monitor Agent Activity. Next steps System Prompt Best Practices Agent Concepts MCP Tool Patterns Agent Architecture Patterns Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution 🎉 Thanks for your feedback! Architecture Patterns Monitor Agents