AI Gateway Setup Guide

Redpanda Agentic Data Plane is supported on BYOC clusters running with AWS and Redpanda version 25.3 and later. It is currently in a limited availability release.

This guide walks administrators through the setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation.

After completing this guide, you will be able to:

  • Enable LLM providers and models in the catalog

  • Create and configure gateways with routing policies, rate limits, and spend limits

  • Set up MCP tool aggregation for AI agents

Prerequisites

  • Access to the Redpanda Cloud Console with administrator privileges

  • API keys for at least one LLM provider (OpenAI, Anthropic, Google AI)

  • (Optional) MCP server endpoints if you plan to use tool aggregation

Enable a provider

Providers represent upstream services (Anthropic, OpenAI, Google AI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator.

  1. In the Redpanda Cloud Console, navigate to Agentic AIProviders.

  2. Select a provider (for example, Anthropic).

  3. On the Configuration tab for the provider, click Add configuration.

  4. Enter your API Key for the provider.

    Store provider API keys securely. Each provider configuration can have multiple API keys for rotation and redundancy.
  5. Click Save to enable the provider.

Repeat this process for each LLM provider you want to make available through AI Gateway.

Enable models

The model catalog is the set of models made available through the gateway. Models are disabled by default. After enabling a provider, you can enable its models.

The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases.

  1. Navigate to Agentic AIModels.

  2. Review the list of available models from enabled providers.

  3. For each model you want to expose through gateways, toggle it to Enabled. For example:

    • openai/gpt-5.2

    • openai/gpt-5.2-mini

    • anthropic/claude-sonnet-4.5

    • anthropic/claude-opus-4.6

  4. Click Save changes.

Only enabled models will be accessible through gateways. You can enable or disable models at any time without affecting existing gateways.

Model naming convention

Model requests must use the vendor/model_id format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider. For example:

  • openai/gpt-5.2

  • anthropic/claude-sonnet-4.5

  • openai/gpt-5.2-mini

Create a gateway

A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It’s a "virtual gateway" that you can create per team, environment (staging/production), product, or customer.

  1. Navigate to Agentic AIGateways.

  2. Click Create Gateway.

  3. Configure the gateway:

    • Name: Choose a descriptive name (for example, production-gateway, team-ml-gateway, staging-gateway)

    • Workspace: Select the workspace this gateway belongs to

      A workspace is conceptually similar to a resource group in Redpanda streaming.
    • Description (optional): Add context about this gateway’s purpose

    • Tags (optional): Add metadata for organization and filtering

  4. Click Create.

  5. After creation, note the following information:

You’ll share the gateway endpoint with users who need to access this gateway.

Configure LLM routing

On the gateway details page, select the LLM tab to configure rate limits, spend limits, routing, and provider pools with fallback options.

The LLM routing pipeline visually represents the request lifecycle:

  1. Rate Limit: Global rate limit (for example, 100 requests/second)

  2. Spend Limit / Monthly Budget: Monthly budget with blocking enforcement (for example, $15K/month)

  3. Routing: Primary provider pool with optional fallback provider pools

Configure rate limits

Rate limits control how many requests can be processed within a time window.

  1. In the LLM tab, locate the Rate Limit section.

  2. Click Add rate limit.

  3. Configure the limit:

    • Requests per second: Maximum requests per second (for example, 100)

    • Burst allowance (optional): Allow temporary bursts above the limit

  4. Click Save.

Rate limits apply to all requests through this gateway, regardless of model or provider.

Configure spend limits and budgets

Spend limits prevent runaway costs by blocking requests after a monthly budget is exceeded.

  1. In the LLM tab, locate the Spend Limit section.

  2. Click Configure budget.

  3. Set the budget:

    • Monthly budget: Maximum spend per month (for example, $15000)

    • Enforcement: Choose Block to reject requests after the budget is exceeded, or Alert to notify but allow requests

    • Notification threshold (optional): Alert when X% of budget is consumed (for example, 80%)

  4. Click Save.

Budget tracking uses estimated costs based on token usage and public provider pricing.

Configure routing and provider pools

Provider pools define which LLM providers handle requests, with support for primary and fallback configurations.

  1. In the LLM tab, locate the Routing section.

  2. Click Add provider pool.

  3. Configure the primary pool:

    • Name: For example, primary-anthropic

    • Providers: Select one or more providers (for example, Anthropic)

    • Models: Choose which models to include (for example, anthropic/claude-sonnet-4.5)

    • Load balancing: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.)

  4. (Optional) Click Add fallback pool to configure automatic failover:

    • Name: For example, fallback-openai

    • Providers: Select fallback provider (for example, OpenAI)

    • Models: Choose fallback models (for example, openai/gpt-5.2)

    • Trigger conditions: When to activate fallback:

      • Rate limit exceeded (429 from primary)

      • Timeout (primary provider slow)

      • Server errors (5xx from primary)

  5. Configure routing rules using CEL expressions (optional):

    For simple routing, select Route all requests to primary pool.

    For advanced routing based on request properties, use CEL expressions. See CEL Routing Cookbook for examples.

    Example CEL expression for tier-based routing:

    request.headers["x-user-tier"] == "premium"
      ? "anthropic/claude-opus-4.6"
      : "anthropic/claude-sonnet-4.5"
  6. Click Save routing configuration.

Provider pool (UI) = Backend pool (API)

Load balancing and multi-provider distribution

If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance:

  • Round-robin: Distribute evenly across all providers

  • Weighted: Assign weights (for example, 80% to Anthropic, 20% to OpenAI)

  • Least latency: Route to fastest provider based on recent performance

  • Cost-optimized: Route to cheapest provider for each model

Configure MCP tools (optional)

If your users will build AI agents that need access to tools via Model Context Protocol (MCP), configure MCP tool aggregation.

On the gateway details page, select the MCP tab to configure tool discovery and execution. The MCP proxy aggregates multiple MCP servers, allowing agents to find and call tools through a single endpoint.

Configure MCP rate limits

Rate limits for MCP work the same way as LLM rate limits.

  1. In the MCP tab, locate the Rate Limit section.

  2. Click Add rate limit.

  3. Configure the maximum requests per second and optional burst allowance.

  4. Click Save.

Add MCP servers

  1. In the MCP tab, click Create MCP Server.

  2. Configure the server:

    • Server ID: Unique identifier for this server

    • Display Name: Human-readable name (for example, database-server, slack-server)

    • Server Address: Endpoint URL for the MCP server (for example, https://mcp-database.example.com)

  3. Configure server settings:

    • Timeout (seconds): Maximum time to wait for a response from this server

    • Enabled: Whether this server is active and accepting requests

    • Defer Loading Override: Controls whether tools from this server are loaded upfront or on demand

      Option Description

      Inherit from gateway

      Use the gateway-level deferred loading setting (default)

      Enabled

      Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed. This can reduce token usage by 80-90%.

      Disabled

      Always load all tools from this server upfront.

    • Forward OIDC Token Override: Controls whether the client’s OIDC token is forwarded to this MCP server

      Option Description

      Inherit from gateway

      Use the gateway-level OIDC forwarding setting (default)

      Enabled

      Always forward the OIDC token to this server

      Disabled

      Never forward the OIDC token to this server

  4. Click Save to add the server to this gateway.

Repeat for each MCP server you want to aggregate.

See MCP Gateway for detailed information about MCP aggregation.

Configure the MCP orchestrator

The MCP orchestrator is a built-in MCP server that enables programmatic tool calling. Agents can generate JavaScript code to call multiple tools in a single orchestrated step, reducing the number of round trips.

Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator.

The orchestrator is pre-configured when you initialize the MCP gateway. Its server configuration (Server ID, Display Name, Transport, Command, and Timeout) is system-managed and cannot be modified.

You can configure blocked tool patterns to prevent specific tools from being called through the orchestrator:

  1. In the MCP tab, select the orchestrator server to edit it.

  2. Under Blocked Tools, click Add Pattern to add glob patterns for tools that should be blocked from execution.

    Example patterns:

    • server_id:* - Block all tools from a specific server

    • *:dangerous_tool - Block a specific tool across all servers

    • specific:tool - Block a single tool on a specific server

    The orchestrator’s own tools are blocked by default to prevent recursive execution.
  3. Click Save.

Verify your setup

After completing the setup, verify that the gateway is working correctly:

Test the gateway endpoint

curl ${GATEWAY_ENDPOINT}/models \
  -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}"

Expected result: List of enabled models.

Send a test request

curl ${GATEWAY_ENDPOINT}/chat/completions \
  -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2-mini",
    "messages": [{"role": "user", "content": "Hello, AI Gateway!"}],
    "max_tokens": 50
  }'

Expected result: Successful completion response.

Check the gateway overview

  1. Navigate to Gateways → Select your gateway → Overview.

  2. Check the aggregate metrics to verify your test request was processed:

    • Total Requests: Should have incremented

    • Total Tokens: Should show tokens consumed

    • Total Cost: Should show estimated cost

Share access with users

Now that your gateway is configured, share access with users (builders):

  1. Provide the Gateway Endpoint (for example, https://example/gateways/gw_abc123/v1)

  2. Share API credentials (Redpanda Cloud tokens with appropriate permissions)

  3. (Optional) Document available models and any routing policies

  4. (Optional) Share rate limits and budget information

Users can then discover and connect to the gateway using the information provided. See Discover Available Gateways for user documentation.

Next steps

Configure and optimize: