Docs Cloud Agentic AI AI Gateway Overview What is an AI Gateway? Page options Copy as Markdown Copied! View as plain text Ask AI about this topic Add MCP server to VS Code Redpanda Agentic Data Plane is supported on BYOC clusters running with AWS and Redpanda version 25.3 and later. It is currently in a limited availability release. Redpanda AI Gateway keeps your AI-powered applications highly available and your AI spend under control. It sits between your applications and the LLM providers and AI tools they depend on. If a provider goes down, the gateway provides automatic failover to keep your apps running. It also offers centralized budget controls to prevent runaway costs. For platform teams, it adds governance at the model-fallback level, tenancy modeling for teams, individuals, apps, and service accounts, and a single proxy layer for both LLM models and MCP servers. The problem Modern AI applications face two business-critical challenges: staying up and staying on budget. First, applications typically hardcode provider-specific SDKs. An application using OpenAI’s SDK cannot easily switch to Anthropic or Google without code changes and redeployment. When a provider hits rate limits, suffers an outage, or degrades in performance, your application goes down with it. Your end users don’t care which provider you use; they care that the app works. Second, costs can spiral without centralized controls. Without a single view of token consumption across teams and applications, it’s difficult to attribute costs to specific customers, features, or environments. Testing and debugging can generate unexpected bills, and there’s no way to enforce budgets or rate limits per team, application, or service account. The result: runaway spend that finance discovers only after the fact. These two challenges are compounded by fragmented observability across provider dashboards, which makes it harder to detect availability issues or cost anomalies in time to act. And as organizations adopt AI agents that call MCP tools, the lack of centralized tool governance adds another dimension of uncontrolled cost and risk. What AI Gateway solves Redpanda AI Gateway delivers two core business outcomes, high availability and cost governance, backed by platform-level controls that set it apart from simple proxy layers. High availability through governed failover Your end users don’t care whether you use OpenAI, Anthropic, or Google: they care that your app stays up. AI Gateway lets you configure provider pools with automatic failover, so when your primary provider hits rate limits, times out, or returns errors, the gateway routes requests to a fallback provider with no code changes and no downtime for your users. Unlike simple retry logic, AI Gateway provides governance at the failover level: you define which providers fail over to which, under what conditions, and with what priority. This controlled failover can significantly improve uptime even during extended provider outages. Cost governance and budget controls AI Gateway gives you centralized fiscal control over AI spend. Set monthly budget caps for each gateway, enforce them automatically, and set rate limits per team, environment, or application. No more runaway costs discovered after the fact. You can route requests to different models based on user attributes. For example, to direct premium users to a more capable model while routing free tier users to a cost-effective option, use a CEL expression. For example: // Route premium users to best model, free users to cost-effective model request.headers["x-user-tier"] == "premium" ? "anthropic/claude-opus-4.6" : "anthropic/claude-sonnet-4.5" You can also set different rate limits and spend limits for each environment to prevent staging or development traffic from consuming production budgets. Tenancy and access governance AI Gateway provides multi-tenant isolation by design. Create separate gateways for teams, individual developers, applications, or service accounts, each with their own budgets, rate limits, routing policies, and observability scope. This tenancy model lets platform teams govern who uses what, how much they spend, and which models and tools they can access, without building custom authorization layers. Unified LLM access (single endpoint for all providers) AI Gateway provides a single OpenAI-compatible endpoint that routes requests to multiple LLM providers. Instead of integrating with each provider’s SDK separately, you configure your application once and switch providers by changing only the model parameter. Without AI Gateway, you need different SDKs and patterns for each provider: # OpenAI from openai import OpenAI client = OpenAI(api_key="sk-...") response = client.chat.completions.create( model="gpt-5.2", messages=[{"role": "user", "content": "Hello"}] ) # Anthropic (different SDK, different patterns) from anthropic import Anthropic client = Anthropic(api_key="sk-ant-...") response = client.messages.create( model="claude-sonnet-4.5", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}] ) With AI Gateway, you use the OpenAI SDK for all providers: from openai import OpenAI # Single configuration, multiple providers client = OpenAI( base_url="<your-gateway-endpoint>", api_key="your-redpanda-token", ) # Route to OpenAI response = client.chat.completions.create( model="openai/gpt-5.2", messages=[{"role": "user", "content": "Hello"}] ) # Route to Anthropic (same code, different model string) response = client.chat.completions.create( model="anthropic/claude-sonnet-4.5", messages=[{"role": "user", "content": "Hello"}] ) # Route to Google Gemini (same code, different model string) response = client.chat.completions.create( model="google/gemini-2.0-flash", messages=[{"role": "user", "content": "Hello"}] ) To switch providers, you change only the model parameter from openai/gpt-5.2 to anthropic/claude-sonnet-4.5. No code changes or redeployment needed. Proxy for LLM models and MCP servers AI Gateway acts as a single proxy layer for both LLM model requests and MCP servers. For LLM traffic, it provides a unified endpoint. For AI agents that use MCP tools, it aggregates multiple MCP servers and provides deferred tool loading, which dramatically reduces token costs. Without AI Gateway, agents typically load all available MCP tools from multiple MCP servers at startup. This approach sends 50+ tool definitions with every request, creating high token costs (thousands of tokens per request), slow agent startup times, and no centralized governance over which tools agents can access. With AI Gateway, you configure approved MCP servers once, and the gateway loads only search and orchestrator tools initially. Agents query for specific tools only when needed, which often reduces token usage by 80-90% depending on your configuration and the number of tools aggregated. You also gain centralized approval and governance over which MCP servers your agents can access. For complex workflows, AI Gateway provides a JavaScript-based orchestrator tool that reduces multi-step workflows from multiple round trips to a single call. For example, you can create a workflow that searches a vector database and, if the results are insufficient, falls back to web search—all in one orchestration step. Unified observability and cost tracking AI Gateway provides a single dashboard that tracks all LLM traffic across providers, eliminating the need to switch between multiple provider dashboards. The dashboard tracks request volume for each gateway, model, and provider, along with token usage for both prompt and completion tokens. You can view estimated spend per model with cross-provider comparisons, latency metrics (p50, p95, p99), and errors broken down by type, provider, and model. This unified view helps you answer critical questions such as which model is the most cost-effective for your use case, why a specific user request failed, how much your staging environment costs each week, and what the latency difference is between providers for your workload. Common gateway patterns Some common patterns for configuring gateways include: Team isolation: When multiple teams share infrastructure but need separate budgets and policies, create one gateway for each team. For example, you might configure Team A’s gateway with a $5K/month budget for both staging and production environments, while Team B’s gateway has a $10K/month budget with different rate limits. Each team sees only their own traffic in the observability dashboards, providing clear cost attribution and isolation. Environment separation: To prevent staging traffic from affecting production metrics, create separate gateways for each environment. Configure the staging gateway with lower rate limits, restricted model access, and aggressive cost controls to prevent runaway expenses. The production gateway can have higher rate limits, access to all models, and alerting configured to detect anomalies. Primary and fallback for reliability: To ensure uptime during provider outages, configure provider pools with automatic failover. For example, you can set OpenAI as your primary provider (preferred for quality) and configure Anthropic as the fallback that activates when the gateway detects rate limits or timeouts from OpenAI. Monitor the fallback rate to detect primary provider issues early, before they impact your users. A/B testing models: To compare model quality and cost without dual integration, route a percentage of traffic to different models. For example, you can send 80% of traffic to claude-sonnet-4.5 and 20% to claude-opus-4.6, then compare quality metrics and costs in the observability dashboard before adjusting the split. Customer-based routing: For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: Customer-based routing For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: request.headers["x-customer-tier"] == "enterprise" ? "anthropic/claude-opus-4.6" : request.headers["x-customer-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" : "anthropic/claude-haiku" When to use AI Gateway AI Gateway is ideal for organizations that: Use or plan to use multiple LLM providers Need centralized cost tracking and budgeting Want to experiment with different models without code changes Require high availability during provider outages Have multiple teams or customers using AI services Build AI agents that need MCP tool aggregation Need unified observability across all AI traffic AI Gateway may not be necessary if: You only use a single provider with simple requirements You have minimal AI traffic (< 1000 requests/day) You don’t need cost tracking or policy enforcement Your application doesn’t require provider switching Next steps Gateway Quickstart - Get started quickly with a basic gateway setup For Administrators: Setup Guide - Enable providers, models, and create gateways Architecture Deep Dive - Technical architecture details For Builders: Discover Available Gateways - Find which gateways you can access Connect Your Agent - Integrate your application Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution 🎉 Thanks for your feedback! AI Gateway Quickstart