AI Gateway Architecture

Redpanda Agentic Data Plane is supported on BYOC clusters running with AWS and Redpanda version 25.3 and later. It is currently in a limited availability release.

This page provides technical details about AI Gateway’s architecture, request processing, and capabilities. For an overview of AI Gateway, see What is an AI Gateway?

Architecture overview

AI Gateway consists of a control plane for configuration and management, a data plane for request processing and routing, and an observability plane for monitoring and analytics.

Control plane

The control plane manages gateway configuration and policy definition:

  • Workspace management: Multi-tenant isolation with separate namespaces for different teams or environments

  • Provider configuration: Enable and configure LLM providers (such as OpenAI and Anthropic)

  • Gateway creation: Define gateways with specific routing rules, budgets, and rate limits

  • Policy definition: Create CEL-based routing policies, spend limits, and rate limits

  • MCP server registration: Configure which MCP servers are available to agents

Data plane

The data plane handles all runtime request processing:

  • Request ingestion: Accept requests via OpenAI-compatible API endpoints

  • Authentication: Validate API keys and gateway access

  • Policy evaluation: Apply rate limits, spend limits, and routing policies

  • Provider pool management: Select primary or fallback providers based on availability

  • MCP proxy: Aggregate tools from multiple MCP servers with deferred loading

  • Response transformation: Normalize provider-specific responses to OpenAI format

  • Metrics collection: Record token usage, latency, and cost for every request

Observability plane

The observability plane provides monitoring and analytics:

  • Request logs: Store full request/response history with prompt and completion content

  • Metrics aggregation: Calculate token usage, costs, latency percentiles, and error rates

  • Dashboard UI: Display real-time and historical analytics per gateway, model, or provider

  • Cost tracking: Estimate spend based on provider pricing and token consumption

Request lifecycle

When a request flows through AI Gateway, it passes through several policy and routing stages before reaching the LLM provider. Understanding this lifecycle helps you configure policies effectively and troubleshoot issues:

  1. Application sends request to gateway endpoint

  2. Gateway authenticates request

  3. Rate limit policy evaluates (allow/deny)

  4. Spend limit policy evaluates (allow/deny)

  5. Routing policy evaluates (which model/provider to use)

  6. Provider pool selects backend (primary/fallback)

  7. Request forwarded to LLM provider

  8. Response returned to application

  9. Request logged with tokens, cost, latency, status

Each policy evaluation happens synchronously in the request path. If rate limits or spend limits reject the request, the gateway returns an error immediately without calling the LLM provider, which helps you control costs.

MCP tool request lifecycle

For MCP tool requests, the lifecycle differs slightly to support deferred tool loading:

  1. Application discovers tools via /mcp endpoint

  2. Gateway aggregates tools from approved MCP servers

  3. Application receives search + orchestrator tools (deferred loading)

  4. Application invokes specific tool

  5. Gateway routes to appropriate MCP server

  6. Tool execution result returned

  7. Request logged with execution time, status

The gateway only loads and exposes specific tools when requested, which dramatically reduces the token overhead compared to loading all tools upfront.

Next steps