Collapse

MCP Gateway

Redpanda Agentic Data Plane is supported only on BYOC clusters running with AWS and Redpanda version 25.3+. It is currently in limited availability.

The MCP Gateway provides Model Context Protocol (MCP) aggregation, allowing AI agents to access tools from multiple MCP servers through a single unified endpoint. This eliminates the need for agents to manage multiple MCP connections and significantly reduces token costs through deferred tool loading.

MCP Gateway benefits:

Single endpoint: One MCP endpoint aggregates all approved MCP servers
Token reduction: Fewer tokens through deferred tool loading (depending on configuration)
Centralized governance: Admin-approved MCP servers only
Orchestration: JavaScript-based orchestrator reduces multi-step round trips
Security: Controlled tool execution environment

What is MCP?

Model Context Protocol (MCP) is a standard for exposing tools (functions) that AI agents can discover and invoke. MCP servers provide tools like:

Database queries
File system operations
API integrations (CRM, payment, analytics)
Search (web, vector, enterprise)
Code execution
Workflow automation

Without AI Gateway With AI Gateway

Without AI Gateway	With AI Gateway
Agent connects to each MCP server individually	Agent connects to gateway’s unified `/mcp` endpoint
Agent loads ALL tools from ALL servers upfront (high token cost)	Gateway aggregates tools from approved MCP servers
No centralized governance or security	Deferred loading: Only search + orchestrator tools sent initially
Complex configuration	Agent queries for specific tools when needed (token savings)
	Centralized governance and observability

Agent connects to each MCP server individually

Agent connects to gateway’s unified /mcp endpoint

Agent loads ALL tools from ALL servers upfront (high token cost)

Gateway aggregates tools from approved MCP servers

No centralized governance or security

Deferred loading: Only search + orchestrator tools sent initially

Complex configuration

Agent queries for specific tools when needed (token savings)

Centralized governance and observability

Architecture

┌─────────────────┐
│   AI Agent      │
│  (Claude, GPT)  │
└────────┬────────┘
         │
         │ 1. Discover tools with /mcp endpoint
         │ 2. Invoke specific tool
         │
┌────────▼────────────────────────────────┐
│      AI Gateway (MCP Aggregator)        │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  Deferred tool loading          │   │
│  │  (Send search + orchestrator    │   │
│  │   initially, defer others)      │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  Orchestrator (JavaScript)      │   │
│  │  (Reduce round trips for        │   │
│  │   multi-step workflows)         │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  Approved MCP Server Registry   │   │
│  │  (Admin-controlled)             │   │
│  └─────────────────────────────────┘   │
└────────┬────────────────────────────────┘
         │
         │ Routes to appropriate MCP server
         │
    ┌────▼─────┬──────────┬─────────┐
    │          │          │         │
┌───▼────┐ ┌──▼─────┐ ┌──▼──────┐ ┌▼──────┐
│ MCP    │ │ MCP    │ │  MCP    │ │ MCP   │
│Database│ │Filesystem│ │ Slack  │ │Search │
│Server  │ │ Server │ │ Server  │ │Server │
└────────┘ └────────┘ └─────────┘ └───────┘

MCP request lifecycle

Tool discovery (initial connection)

Agent request:

GET /mcp/tools
Headers:
  Authorization: Bearer {TOKEN}
  rp-aigw-mcp-deferred: true  # Enable deferred loading

Gateway response (with deferred loading):

{
  "tools": [
    {
      "name": "search_tools",
      "description": "Query available tools by keyword or category",
      "input_schema": {
        "type": "object",
        "properties": {
          "query": {"type": "string"},
          "category": {"type": "string"}
        }
      }
    },
    {
      "name": "orchestrator",
      "description": "Execute multi-step workflows with JavaScript logic",
      "input_schema": {
        "type": "object",
        "properties": {
          "workflow": {"type": "string"},
          "context": {"type": "object"}
        }
      }
    }
  ]
}

Note: Only 2 tools returned initially (search + orchestrator), not all 50+ tools from all MCP servers.

Token savings:

Without deferred loading: ~5,000-10,000 tokens (all tool definitions)
With deferred loading: ~500-1,000 tokens (2 tool definitions)
Typically 80-90% reduction

Tool query (when agent needs specific tool)

Agent request:

POST /mcp/tools/search_tools
Headers:
  Authorization: Bearer {TOKEN}
Body:
{
  "query": "database query"
}

Gateway response:

{
  "tools": [
    {
      "name": "execute_sql",
      "description": "Execute SQL query against the database",
      "mcp_server": "database-server",
      "input_schema": {
        "type": "object",
        "properties": {
          "query": {"type": "string"},
          "database": {"type": "string"}
        },
        "required": ["query"]
      }
    },
    {
      "name": "list_tables",
      "description": "List all tables in the database",
      "mcp_server": "database-server",
      "input_schema": {
        "type": "object",
        "properties": {
          "database": {"type": "string"}
        }
      }
    }
  ]
}

Agent receives only relevant tools based on query.

Tool execution

Agent request:

POST /mcp/tools/execute_sql
Headers:
  Authorization: Bearer {TOKEN}
Body:
{
  "query": "SELECT * FROM users WHERE tier = 'premium' LIMIT 10",
  "database": "prod"
}

Gateway:

Routes to appropriate MCP server (database-server)
Executes tool
Returns result

Gateway response:

{
  "result": [
    {"id": 1, "name": "Alice", "tier": "premium"},
    {"id": 2, "name": "Bob", "tier": "premium"},
    ...
  ]
}

Agent receives result and can continue reasoning.

Deferred tool loading

How it works

Traditional MCP (No deferred loading):

Agent connects to MCP endpoint
Gateway sends all tools from all MCP servers (50+ tools)
Agent includes all tool definitions in every LLM request
High token cost: ~5,000-10,000 tokens per request

Deferred loading (AI Gateway):

Agent connects to MCP endpoint with rp-aigw-mcp-deferred: true header
Gateway sends only 2 tools: search_tools + orchestrator
Agent includes only 2 tool definitions in LLM request (~500-1,000 tokens)
When agent needs specific tool:
- Agent calls search_tools with query (for example, "database")
- Gateway returns matching tools
- Agent calls specific tool (for example, execute_sql)
Total token cost: Initial 500-1,000 + per-query ~200-500

When to use deferred loading

Use deferred loading when:

You have 10+ tools across multiple MCP servers
Agents don’t need all tools for every request
Token costs are a concern
Agents can handle multi-step workflows (search → execute)

Don’t use deferred loading when:

You have <5 tools total (overhead not worth it)
Agents need all tools for every request (rare)
Latency is more important than token costs (deferred adds 1 round trip)

Configure deferred loading

Deferred loading is configured for each MCP server through the Defer Loading Override setting in the Create MCP Server dialog.

Navigate to your gateway’s MCP tab.
Create or edit an MCP server.

Under Server Settings, set Defer Loading Override:

Option	Description
Inherit from gateway	Use the gateway-level deferred loading setting (default)
Enabled	Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed.
Disabled	Always load all tools from this server upfront.

Option

Description

Inherit from gateway

Use the gateway-level deferred loading setting (default)

Enabled

Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed.

Disabled

Always load all tools from this server upfront.

Click Save.

Measure token savings

Compare token usage before/after deferred loading:

Check logs without deferred loading:
- Filter: Gateway = your-gateway, Model = your-model, Date = before enabling
- Note the average tokens per request
Enable deferred loading
Check logs after deferred loading:
- Filter: Same gateway/model, Date = after enabling
- Note the average tokens per request

Calculate savings:

Savings % = ((Before - After) / Before) × 100

Expected results: Typically 80-90% reduction in average tokens per request

Orchestrator: multi-step workflows

What is the orchestrator?

The orchestrator is a special tool that executes JavaScript workflows, reducing multi-step interactions from multiple round trips to a single request.

Without Orchestrator:

Agent: "Search vector database for relevant docs" → Round trip 1
Agent receives results, evaluates: "Results insufficient"
Agent: "Fallback to web search" → Round trip 2
Agent receives results, processes → Round trip 3
Total: 3 round trips (high latency, 3x token cost)

With Orchestrator:

Agent: "Execute workflow: Search vector DB → if insufficient, fallback to web search"
Gateway executes entire workflow in JavaScript
Agent receives final result → 1 round trip

Benefits:

Latency Reduction: 1 round trip vs 3+
Token Reduction: No intermediate LLM calls needed
Reliability: Workflow logic executes deterministically
Cost: Single LLM call instead of multiple

When to use orchestrator

Use orchestrator when:

Multi-step workflows with conditional logic (if/else)
Fallback patterns (try A, if fails, try B)
Sequential tool calls with dependencies
Loop-based operations (iterate, aggregate)

Don’t use orchestrator when:

Single tool call (no benefit)
Agent needs to reason between steps (orchestrator is deterministic)
Workflow requires LLM judgment at each step

Orchestrator example: search with fallback

Scenario: Search vector database; if results insufficient, fallback to web search.

Without Orchestrator (3 round trips):

# Agent's internal reasoning (3 separate LLM calls)

# Round trip 1: Search vector DB
vector_results = call_tool("vector_search", {"query": "Redpanda pricing"})

# Round trip 2: Agent evaluates results
if len(vector_results) < 3:
    # Round trip 3: Fallback to web search
    web_results = call_tool("web_search", {"query": "Redpanda pricing"})
    results = web_results
else:
    results = vector_results

# Agent processes final results

With Orchestrator (1 round trip):

# Agent invokes orchestrator once
results = call_tool("orchestrator", {
    "workflow": """
        // JavaScript workflow
        const vectorResults = await tools.vector_search({
            query: context.query
        });

        if (vectorResults.length < 3) {
            // Fallback to web search
            const webResults = await tools.web_search({
                query: context.query
            });
            return webResults;
        }

        return vectorResults;
    """,
    "context": {
        "query": "Redpanda pricing"
    }
})

# Agent receives final results directly

Savings:

Latency: ~3-5 seconds (3 round trips) → ~1-2 seconds (1 round trip)
Tokens: ~1,500 tokens (3 LLM calls) → ~500 tokens (1 LLM call)
Cost: ~$0.0075 → ~$0.0025 (67% reduction)

Orchestrator API

Tool name: orchestrator

Input schema:

{
  "workflow": "string (JavaScript code)",
  "context": "object (variables available to workflow)"
}

Available in workflow:

tools.{tool_name}(params): Call any tool from approved MCP servers
context.{variable}: Access context variables
Standard JavaScript: if, for, while, try/catch, async/await

Security:

Sandboxed execution (no file system, network, or system access)
Timeout and memory limits are system-managed and cannot be modified

Limitations:

Cannot call external APIs directly (must use MCP tools)
Cannot import npm packages (built-in JS only)

Orchestrator example: data aggregation

Scenario: Fetch user data from database, calculate summary statistics.

results = call_tool("orchestrator", {
    "workflow": """
        // Fetch all premium users
        const users = await tools.execute_sql({
            query: "SELECT * FROM users WHERE tier = 'premium'",
            database: "prod"
        });

        // Calculate statistics
        const stats = {
            total: users.length,
            by_region: {},
            avg_spend: 0
        };

        let totalSpend = 0;
        for (const user of users) {
            // Count by region
            if (!stats.by_region[user.region]) {
                stats.by_region[user.region] = 0;
            }
            stats.by_region[user.region]++;

            // Sum spend
            totalSpend += user.monthly_spend;
        }

        stats.avg_spend = totalSpend / users.length;

        return stats;
    """,
    "context": {}
})

Output:

{
  "total": 1250,
  "by_region": {
    "us-east": 600,
    "us-west": 400,
    "eu": 250
  },
  "avg_spend": 149.50
}

vs Without Orchestrator:

Would require fetching all users to agent → agent processes → 2 round trips
Orchestrator: All processing in gateway → 1 round trip

Orchestrator best practices

DO:

Use for deterministic workflows (same input → same output)
Use for sequential operations with dependencies
Use for fallback patterns
Handle errors with try/catch
Keep workflows readable (add comments)

DON’T:

Use for workflows requiring LLM reasoning at each step (let agent handle that)
Execute long-running operations (timeout will hit)
Access external resources (use MCP tools instead)
Execute untrusted user input (security risk)

MCP server administration

Add MCP servers

Prerequisites:

MCP server URL
Authentication method (if required)
List of tools to enable

Steps:

Navigate to MCP servers:
- In the sidebar, navigate to Agentic > AI Gateway > Gateways, select your gateway, then select the MCP tab.

Configure server:

# PLACEHOLDER: Actual configuration format
name: database-server
url: https://mcp-database.example.com
authentication:
  type: bearer_token
  token: ${SECRET_REF}  # Reference to secret
enabled_tools:
  * execute_sql
  * list_tables
  * describe_table

Test connection:
- Gateway attempts connection to MCP server
- Verifies authentication
- Retrieves tool list
Enable server:
- Server status: Active
- Tools available to agents

Common MCP servers:

Database: PostgreSQL, MySQL, MongoDB query tools
Filesystem: Read/write/search files
API integrations: Slack, GitHub, Salesforce, Stripe
Search: web search, vector search, enterprise search
Code execution: Python, JavaScript sandboxes
Workflow: Zapier, n8n integrations

MCP server approval workflow

Why approval is required:

Security: Prevent agents from accessing unauthorized systems
Governance: Control which tools are available
Cost: Some tools are expensive (API calls, compute)
Compliance: Audit trail of approved tools

Typical approval process:

Request: User/team requests MCP server
Review: Admin reviews security, cost, necessity
Approval/Rejection: Admin decision
Configuration: If approved, admin adds server to gateway

The exact approval workflow may vary by organization. In some cases, admins may directly enable servers without a formal workflow.

Rejected server behavior:

Server not listed in tool discovery
Agent cannot query or invoke tools from this server
Requests return 403 Forbidden

Restrict MCP server access

Per-gateway restrictions:

# PLACEHOLDER: Actual configuration format
gateways:
  - name: production-gateway
    mcp_servers:
      allowed:
        - database-server  # Only this server allowed
      denied:
        - filesystem-server  # Explicitly denied

  - name: staging-gateway
    mcp_servers:
      allowed:
        - "*"  # All approved servers allowed

Use cases:

Production gateway: Only production-safe tools
Staging gateway: All tools for testing
Customer-specific gateway: Only tools relevant to customer

MCP server versioning

Challenge: MCP server updates may change tool schemas.

Best practices for version management:

Pin versions (if supported):

mcp_servers:
  * name: database-server
    version: "1.2.3"  # Pin to specific version

Test in staging first:
- Update MCP server in staging gateway
- Test agent workflows
- Promote to production when validated
Monitor breaking changes:
- Subscribe to MCP server changelogs
- Set up alerts for schema changes

MCP observability

Logs

MCP tool invocations appear in request logs with:

Tool name
MCP server
Input parameters
Output result
Execution time
Errors (if any)

Filter logs by MCP:

Filter: request.path.startsWith("/mcp")

Common log fields:

Field Description Example

Field	Description	Example
Tool	Tool invoked	`execute_sql`
MCP Server	Which server handled it	`database-server`
Input	Parameters sent	`{"query": "SELECT …"}`
Output	Result returned	`[{"id": 1, …}]`
Latency	Tool execution time	`250ms`
Status	Success/failure	`200`, `500`

Tool

Tool invoked

execute_sql

MCP Server

Which server handled it

database-server

Input

Parameters sent

{"query": "SELECT …"}

Output

Result returned

[{"id": 1, …}]

Latency

Tool execution time

250ms

Status

Success/failure

200, 500

Metrics

The following MCP-specific metrics may be available depending on your gateway configuration:

MCP requests per second
Tool invocation count (by tool, by MCP server)
MCP latency (p50, p95, p99)
MCP error rate (by server, by tool)
Orchestrator execution count
Orchestrator execution time

Dashboard: MCP Analytics

Top tools by usage
Top MCP servers by latency
Error rate by MCP server
Token savings from deferred loading

Debug MCP issues

Issue: "Tool not found"

Possible causes:

MCP server not added to gateway
Tool not enabled in MCP server configuration
Deferred loading enabled but agent didn’t query for tool first

Solution:

Verify MCP server is active in the Redpanda Cloud console
Verify tool is in enabled_tools list
If deferred loading: Agent must call search_tools first

Issue: "MCP server timeout"

Possible causes:

MCP server is down/unreachable
Tool execution is slow (for example, expensive database query)
Gateway timeout too short

Solution:

Check MCP server health
Optimize tool (for example, add database index)
Contact support if you need to adjust timeout limits

Issue: "Orchestrator workflow failed"

Possible causes:

JavaScript syntax error
Tool invocation failed inside workflow
Timeout exceeded
Memory limit exceeded

Solution:

Test workflow syntax in JavaScript playground
Check logs for tool error inside orchestrator
Simplify workflow or increase timeout
Reduce data processing in workflow

Security considerations

Authentication

Gateway → MCP server:

Bearer token (most common)
API key
mTLS (for high-security environments)

Agent → Gateway:

Standard gateway authentication (Redpanda Cloud token)
Gateway endpoint URL identifies the gateway (and its approved MCP servers)

Audit trail

All MCP operations logged:

Who (agent/user) invoked tool
When (timestamp)
What tool was invoked
What parameters were sent
What result was returned
Whether it succeeded or failed

Use case: Compliance, security investigation, debugging

Restrict dangerous tools

Recommendation: Don’t enable destructive tools in production gateways

Examples of dangerous tools*:

File deletion (delete_file)
Database writes without safeguards (execute_sql with UPDATE/DELETE)
Payment operations (charge_customer)
System commands (execute_bash)

Best practice:

Read-only tools in production gateway
Write tools only in staging gateway (with approval workflows)
Wrap dangerous operations in MCP server with safeguards (for example, "require confirmation token")

MCP + LLM routing

Combine MCP with CEL routing

Use case: Route agents to different MCP servers based on customer tier

CEL expression:

request.headers["x-customer-tier"] == "enterprise"
  ? "gateway-with-premium-mcp-servers"
  : "gateway-with-basic-mcp-servers"

Result:

Enterprise customers: Access to proprietary data, expensive APIs
Basic customers: Access to public data, free APIs

MCP with provider pools

Scenario: Different agents use different models + different tools

Configuration:

Gateway A: GPT-5.2 + database + CRM MCP servers
Gateway B: Claude Sonnet + web search + analytics MCP servers

Use case: Optimize model-tool pairing (some models better at certain tools)

Integration examples

Python (OpenAI SDK)
Claude Code CLI
LangChain

from openai import OpenAI

# Initialize client with MCP endpoint
client = OpenAI(
    base_url=os.getenv("GATEWAY_ENDPOINT"),
    api_key=os.getenv("REDPANDA_CLOUD_TOKEN"),
    default_headers={
        "rp-aigw-mcp-deferred": "true"  # Enable deferred loading
    }
)

# Discover tools
tools_response = requests.get(
    f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools",
    headers={
        "Authorization": f"Bearer {os.getenv('REDPANDA_CLOUD_TOKEN')}",
        "rp-aigw-mcp-deferred": "true"
    }
)
tools = tools_response.json()["tools"]

# Agent uses tools
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[
        {"role": "user", "content": "Query the database for premium users"}
    ],
    tools=tools,  # Pass MCP tools to agent
    tool_choice="auto"
)

# Handle tool calls
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        # Execute tool via gateway
        tool_result = requests.post(
            f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools/{tool_call.function.name}",
            headers={
                "Authorization": f"Bearer {os.getenv('REDPANDA_CLOUD_TOKEN')}",
            },
            json=json.loads(tool_call.function.arguments)
        )

        # Continue conversation with tool result
        response = client.chat.completions.create(
            model="anthropic/claude-sonnet-4.5",
            messages=[
                {"role": "user", "content": "Query the database for premium users"},
                response.choices[0].message,
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(tool_result.json())
                }
            ]
        )

# Configure gateway with MCP
export CLAUDE_API_BASE="https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1"
export ANTHROPIC_API_KEY="your-redpanda-token"

# Claude Code automatically discovers MCP tools from gateway
claude code

# Agent can now use aggregated MCP tools

from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool

# Initialize LLM with gateway
llm = ChatOpenAI(
    base_url=os.getenv("GATEWAY_ENDPOINT"),
    api_key=os.getenv("REDPANDA_CLOUD_TOKEN"),
)

# Fetch MCP tools from gateway
# PLACEHOLDER: LangChain-specific integration code

# Create agent with MCP tools
agent = initialize_agent(
    tools=mcp_tools,
    llm=llm,
    agent="openai-tools",
    verbose=True
)

# Agent can now use MCP tools
response = agent.run("Find all premium users in the database")

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

MCP Gateway

What is MCP?

Architecture

MCP request lifecycle

Tool discovery (initial connection)

Tool query (when agent needs specific tool)

Tool execution

Deferred tool loading

How it works

When to use deferred loading

Configure deferred loading

Measure token savings

Orchestrator: multi-step workflows

What is the orchestrator?

When to use orchestrator

Orchestrator example: search with fallback

Orchestrator API

Orchestrator example: data aggregation

Orchestrator best practices

MCP server administration

Add MCP servers

MCP server approval workflow

Restrict MCP server access

MCP server versioning

MCP observability

Logs

Metrics

Debug MCP issues

Security considerations

Authentication

Audit trail

Restrict dangerous tools

MCP + LLM routing

Combine MCP with CEL routing

MCP with provider pools

Integration examples

Simple online edits

Contribution guide