Collapse

What is an AI Gateway?

Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in limited availability.

Redpanda AI Gateway keeps your AI-powered applications highly available and your AI spend under control. It sits between your applications and the LLM providers and AI tools they depend on. If a provider goes down, the gateway provides automatic failover to keep your apps running. It also offers centralized budget controls to prevent runaway costs. For platform teams, it adds governance at the model-fallback level, tenancy modeling for teams, individuals, apps, and service accounts, and a single proxy layer for both LLM models and MCP servers.

The problem

Modern AI applications face two business-critical challenges: staying up and staying on budget.

First, applications typically hardcode provider-specific SDKs. An application using OpenAI’s SDK cannot easily switch to Anthropic or Google without code changes and redeployment. When a provider hits rate limits, suffers an outage, or degrades in performance, your application goes down with it. Your end users don’t care which provider you use; they care that the app works.

Second, costs can spiral without centralized controls. Without a single view of token consumption across teams and applications, it’s difficult to attribute costs to specific customers, features, or environments. Testing and debugging can generate unexpected bills, and there’s no way to enforce budgets or rate limits per team, application, or service account. The result: runaway spend that finance discovers only after the fact.

These two challenges are compounded by fragmented observability across provider dashboards, which makes it harder to detect availability issues or cost anomalies in time to act. And as organizations adopt AI agents that call MCP tools, the lack of centralized tool governance adds another dimension of uncontrolled cost and risk.

What AI Gateway solves

Redpanda AI Gateway delivers two core business outcomes, high availability and cost governance, backed by platform-level controls that set it apart from simple proxy layers.

High availability through governed failover

Your end users don’t care whether you use OpenAI, Anthropic, or Google: they care that your app stays up. AI Gateway lets you configure provider pools with automatic failover, so when your primary provider hits rate limits, times out, or returns errors, the gateway routes requests to a fallback provider with no code changes and no downtime for your users.

Unlike simple retry logic, AI Gateway provides governance at the failover level: you define which providers fail over to which, under what conditions, and with what priority. This controlled failover can significantly improve uptime even during extended provider outages.

Cost governance and budget controls

AI Gateway gives you centralized fiscal control over AI spend. Set monthly budget caps for each gateway, enforce them automatically, and set rate limits per team, environment, or application. No more runaway costs discovered after the fact.

You can route requests to different models based on user attributes. For example, to direct premium users to a more capable model while routing free tier users to a cost-effective option, use a CEL expression. For example:

// Route premium users to best model, free users to cost-effective model
request.headers["x-user-tier"] == "premium"
  ? "anthropic/claude-opus-4.6"
  : "anthropic/claude-sonnet-4.5"

You can also set different rate limits and spend limits for each environment to prevent staging or development traffic from consuming production budgets.

Tenancy and access governance

AI Gateway provides multi-tenant isolation by design. Create separate gateways for teams, individual developers, applications, or service accounts, each with their own budgets, rate limits, routing policies, and observability scope. This tenancy model lets platform teams govern who uses what, how much they spend, and which models and tools they can access, without building custom authorization layers.

Unified LLM access (single endpoint for all providers)

AI Gateway provides a single OpenAI-compatible endpoint that routes requests to multiple LLM providers. Instead of integrating with each provider’s SDK separately, you configure your application once and switch providers by changing only the model parameter.

Without AI Gateway, you need different SDKs and patterns for each provider:

# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic (different SDK, different patterns)
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
response = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

With AI Gateway, you use the OpenAI SDK for all providers:

from openai import OpenAI

# Single configuration, multiple providers
client = OpenAI(
    base_url="<your-gateway-endpoint>",
    api_key="your-redpanda-token",
)

# Route to OpenAI
response = client.chat.completions.create(
    model="openai/gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}]
)

# Route to Anthropic (same code, different model string)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Route to Google Gemini (same code, different model string)
response = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[{"role": "user", "content": "Hello"}]
)

To switch providers, you change only the model parameter from openai/gpt-5.2 to anthropic/claude-sonnet-4.5. No code changes or redeployment needed.

Proxy for LLM models and MCP servers

AI Gateway acts as a single proxy layer for both LLM model requests and MCP servers. For LLM traffic, it provides a unified endpoint. For AI agents that use MCP tools, it aggregates multiple MCP servers and provides deferred tool loading, which dramatically reduces token costs.

Without AI Gateway, agents typically load all available MCP tools from multiple MCP servers at startup. This approach sends 50+ tool definitions with every request, creating high token costs (thousands of tokens per request), slow agent startup times, and no centralized governance over which tools agents can access.

With AI Gateway, you configure approved MCP servers once, and the gateway loads only search and orchestrator tools initially. Agents query for specific tools only when needed, which often reduces token usage by 80-90% depending on your configuration and the number of tools aggregated. You also gain centralized approval and governance over which MCP servers your agents can access.

For complex workflows, AI Gateway provides a JavaScript-based orchestrator tool that reduces multi-step workflows from multiple round trips to a single call. For example, you can create a workflow that searches a vector database and, if the results are insufficient, falls back to web search—all in one orchestration step.

Unified observability and cost tracking

AI Gateway provides a single dashboard that tracks all LLM traffic across providers, eliminating the need to switch between multiple provider dashboards.

The dashboard tracks request volume for each gateway, model, and provider, along with token usage for both prompt and completion tokens. You can view estimated spend per model with cross-provider comparisons, latency metrics (p50, p95, p99), and errors broken down by type, provider, and model.

This unified view helps you answer critical questions such as which model is the most cost-effective for your use case, why a specific user request failed, how much your staging environment costs each week, and what the latency difference is between providers for your workload.

Common gateway patterns

Some common patterns for configuring gateways include:

Team isolation: When multiple teams share infrastructure but need separate budgets and policies, create one gateway for each team. For example, you might configure Team A’s gateway with a $5K/month budget for both staging and production environments, while Team B’s gateway has a $10K/month budget with different rate limits. Each team sees only their own traffic in the observability dashboards, providing clear cost attribution and isolation.
Environment separation: To prevent staging traffic from affecting production metrics, create separate gateways for each environment. Configure the staging gateway with lower rate limits, restricted model access, and aggressive cost controls to prevent runaway expenses. The production gateway can have higher rate limits, access to all models, and alerting configured to detect anomalies.
Primary and fallback for reliability: To ensure uptime during provider outages, configure provider pools with automatic failover. For example, you can set OpenAI as your primary provider (preferred for quality) and configure Anthropic as the fallback that activates when the gateway detects rate limits or timeouts from OpenAI. Monitor the fallback rate to detect primary provider issues early, before they impact your users.
A/B testing models: To compare model quality and cost without dual integration, route a percentage of traffic to different models. For example, you can send 80% of traffic to claude-sonnet-4.5 and 20% to claude-opus-4.6, then compare quality metrics and costs in the observability dashboard before adjusting the split.
Customer-based routing: For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models:

Customer-based routing

For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models:

request.headers["x-customer-tier"] == "enterprise" ? "anthropic/claude-opus-4.6" :
request.headers["x-customer-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" :
"anthropic/claude-haiku"

When to use AI Gateway

AI Gateway is ideal for organizations that:

Use or plan to use multiple LLM providers
Need centralized cost tracking and budgeting
Want to experiment with different models without code changes
Require high availability during provider outages
Have multiple teams or customers using AI services
Build AI agents that need MCP tool aggregation
Need unified observability across all AI traffic

AI Gateway may not be necessary if:

You only use a single provider with simple requirements
You have minimal AI traffic (< 1000 requests/day)
You don’t need cost tracking or policy enforcement
Your application doesn’t require provider switching

Next steps

Gateway Quickstart - Get started quickly with a basic gateway setup

For Administrators:

Setup Guide - Enable providers, models, and create gateways
Architecture Deep Dive - Technical architecture details

For Builders:

Discover Available Gateways - Find which gateways you can access
Connect Your Agent - Integrate your application

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution