AI Gateway Architecture

Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in limited availability.

This page provides technical details about AI Gateway’s architecture, request processing, and capabilities. For an overview of AI Gateway, see What is an AI Gateway?

Architecture overview

AI Gateway consists of a control plane for configuration and management, a data plane for request processing and routing, and an observability plane for monitoring and analytics.

Control plane

The control plane manages gateway configuration and policy definition:

Workspace management: Multi-tenant isolation with separate namespaces for different teams or environments
Provider configuration: Enable and configure LLM providers (such as OpenAI and Anthropic)
Gateway creation: Define gateways with specific routing rules, budgets, and rate limits
Policy definition: Create CEL-based routing policies, spend limits, and rate limits
MCP server registration: Configure which MCP servers are available to agents

Data plane

The data plane handles all runtime request processing:

Request ingestion: Accept requests via OpenAI-compatible API endpoints
Authentication: Validate API keys and gateway access
Policy evaluation: Apply rate limits, spend limits, and routing policies
Provider pool management: Select primary or fallback providers based on availability
MCP proxy: Aggregate tools from multiple MCP servers with deferred loading
Response transformation: Normalize provider-specific responses to OpenAI format
Metrics collection: Record token usage, latency, and cost for every request

Observability plane

The observability plane provides monitoring and analytics:

Request logs: Store full request/response history with prompt and completion content
Metrics aggregation: Calculate token usage, costs, latency percentiles, and error rates
Dashboard UI: Display real-time and historical analytics per gateway, model, or provider
Cost tracking: Estimate spend based on provider pricing and token consumption

Request lifecycle

When a request flows through AI Gateway, it passes through several policy and routing stages before reaching the LLM provider. Understanding this lifecycle helps you configure policies effectively and troubleshoot issues:

Application sends request to gateway endpoint
Gateway authenticates request
Rate limit policy evaluates (allow/deny)
Spend limit policy evaluates (allow/deny)
Routing policy evaluates (which model/provider to use)
Provider pool selects backend (primary/fallback)
Request forwarded to LLM provider
Response returned to application
Request logged with tokens, cost, latency, status

Each policy evaluation happens synchronously in the request path. If rate limits or spend limits reject the request, the gateway returns an error immediately without calling the LLM provider, which helps you control costs.

MCP tool request lifecycle

For MCP tool requests, the lifecycle differs slightly to support deferred tool loading:

Application discovers tools via /mcp endpoint
Gateway aggregates tools from approved MCP servers
Application receives search + orchestrator tools (deferred loading)
Application invokes specific tool
Gateway routes to appropriate MCP server
Tool execution result returned
Request logged with execution time, status

The gateway only loads and exposes specific tools when requested, which dramatically reduces the token overhead compared to loading all tools upfront.

Next steps

AI Gateway Quickstart: Route your first request through AI Gateway
MCP Gateway: Configure MCP server aggregation for AI agents

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback: