Configure an LLM Provider
Create an LLM provider to give your applications a managed proxy URL: Redpanda handles the upstream API keys, forwards requests to the provider, and records usage for you. Create a provider for each upstream you use, whether that’s OpenAI, Anthropic, Google AI, AWS Bedrock, or an OpenAI-compatible endpoint.
After reading this page, you will be able to:
-
Create an LLM provider for OpenAI, Anthropic, Google AI, AWS Bedrock, or an OpenAI-compatible endpoint
-
Select the models you want to expose through the provider
-
Verify the provider is reachable using the built-in Test Connection control
Prerequisites
-
An API key (or AWS credentials for Bedrock) for the upstream provider you want to configure.
-
One or more secrets already created in your dataplane’s secret store for the provider’s credentials. Secret references must use
UPPER_SNAKE_CASE. For example:OPENAI_API_KEY,ANTHROPIC_API_KEY,AWS_ACCESS_KEY_ID.
Fill in the provider card
The first card on the page collects identity fields. Enter a Display name; the form auto-derives the Resource ID from it as you type.
| Field | Required | Notes |
|---|---|---|
|
Yes |
Human-readable label shown in dashboards and model selectors. Up to 253 characters. The form auto-derives the |
|
Yes |
Machine identifier used in API calls and CLI commands. Lowercase letters, numbers, and hyphens only ( |
The Summary panel labels the Resource ID as Name.
Choose a provider type
The Provider type card shows five cards. Pick the one that matches your upstream.
| Type | Use when |
|---|---|
OpenAI |
Proxy GPT, o-series, and embeddings through the OpenAI API. Best when you already hold an OpenAI API key or want the broadest GPT model catalog. |
Anthropic |
Call Claude Opus, Sonnet, and Haiku directly. Strong at coding, long-context reasoning, and tool use. Supports forwarding client |
Google AI |
Reach Gemini Pro, Flash, and multimodal models through Google AI Studio. Ideal for long-context workloads and image/video inputs. |
AWS Bedrock |
Invoke foundation models (Claude, Llama, Titan, Nova, Mistral, AI21 Jamba) hosted inside your AWS account. Requires an AWS region and credentials (static, STS-assumed role, or the default credential chain). Supports the native Bedrock APIs ( |
OpenAI-compatible |
Point at any OpenAI-compatible endpoint that ships |
Selecting a type reveals the type-specific configuration fields.
Fill in the type-specific configuration
Each API key reference and credential field points at a secret-store entry, not the secret value itself. Use the Existing tab to pick a secret already in your dataplane’s secret store, or the New tab to create one inline.
-
OpenAI
-
Anthropic
-
Google AI
-
AWS Bedrock
-
OpenAI-compatible
| Field | Notes |
|---|---|
|
Optional. Leave empty for the standard OpenAI API ( |
|
Required. Secret-store reference for the OpenAI API key. Must be |
| Field | Notes |
|---|---|
|
Optional. Leave empty for the standard Anthropic API ( |
|
Required unless |
|
Optional toggle. When on, AI Gateway forwards the client’s |
| Field | Notes |
|---|---|
|
Optional. Leave empty for the standard Google AI API ( |
|
Required. Secret-store reference for the Google AI API key. |
|
Gemini uses the |
| Field | Notes |
|---|---|
|
Required. AWS region where the Bedrock endpoint is deployed, for example |
|
Optional. Override the default regional Bedrock endpoint. |
|
How AI Gateway authenticates to Bedrock: Default chain, Static keys, or Assume IAM role. The fields below depend on the mode you pick. |
|
Static keys only. Secret-store reference for the AWS access key ID, |
|
Static keys only. Secret-store reference for the AWS secret access key, |
|
Assume IAM role only. Required. ARN of the IAM role AI Gateway assumes through AWS STS, for example |
|
Assume IAM role only. Optional. External ID for cross-account role assumption. Set it only when the role’s trust policy mandates an external ID. |
|
Assume IAM role only. Optional. Session name that appears in AWS CloudTrail audit logs, for example |
|
Optional. Name of a guardrail to attach to this provider, or empty for none. Only the Bedrock provider type exposes this setting. AI Gateway validates the name when you save: it rejects a guardrail that doesn’t exist or is being deleted, so set the field to an existing guardrail or leave it empty. See Create a guardrail. |
Pick a Credential type to control how AI Gateway authenticates to Bedrock:
-
Default chain (default): Leave the credentials unset to use the AWS SDK’s default provider chain (environment variables, shared config, EKS Pod Identity, IRSA, or instance profile). Use this when the gateway already runs with an AWS identity.
-
Static keys: An access key pair stored in the secret store. Use this when no ambient AWS identity is available. This is the path the Bedrock setup guide walks through.
-
Assume IAM role: AI Gateway assumes an IAM role through AWS STS. Use this for cross-account access or when your security policy requires short-lived credentials.
| Field | Notes |
|---|---|
|
Required. URL of your OpenAI-compatible endpoint, for example |
|
Optional. Leave empty for endpoints with no authentication (common for local runtimes). |
OpenAI-compatible endpoints can serve any model. Enter the exact model identifiers your upstream server exposes (for example, meta-llama/Llama-3.3-70B-Instruct or qwen3:8b).
|
|
For the OpenAI, Google AI, and AWS Bedrock provider types, AI Gateway validates that the credential references resolve before it accepts the create or update. AI Gateway rejects a missing or empty secret reference at save time instead of failing at first call. The OpenAI-compatible type does not require a credential reference, so it can be created with no authentication for local runtimes such as Ollama or vLLM. |
Select models
Models you select on this form become the catalog the provider exposes. Leave the list empty to allow every model the upstream catalog returns.
For OpenAI, Anthropic, Google AI, and AWS Bedrock, the form shows a picker backed by the provider’s catalog. Each model in the picker shows its input and output price per million tokens. Pick from the list, or type a model identifier the catalog doesn’t show. For OpenAI-compatible, the form takes a freeform list: type the exact identifiers your upstream serves.
Redpanda maintains the catalog of available models in the picker. When an upstream provider publishes a new model, it usually appears in the picker within a day or two; admins don’t have to wait for a Redpanda release. New models aren’t enabled automatically: an admin still selects the model in the catalog to make it callable through this provider.
For Bedrock, the picker exposes inference profiles, not raw foundation-model IDs. See AWS Bedrock: Inference profiles and IAM.
|
Redpanda stores models as structured |
Override per-model pricing
Cost reporting prices each call at the catalog rates for the model. If your organization negotiates non-standard rates, or you track spend against an internal chargeback rate, override the rates per model on this provider.
In the model picker, each selected model carries a pencil icon (Override pricing). Click it to open the pricing dialog for that model. The dialog lists one field per billing bucket, in the same order as the provider’s published rate card:
| Bucket | What it bills |
|---|---|
Input |
Per 1M input tokens. Tool-use input also bills at this rate. |
Output |
Per 1M output tokens. Reasoning tokens also bill at this rate. |
Cached input |
Per 1M tokens read from prompt cache. |
Cache write (5-minute TTL) |
Per 1M tokens written to a 5-minute prompt cache. |
Cache write (1-hour TTL) |
Per 1M tokens written to a 1-hour prompt cache. |
Enter rates in dollars per million tokens. Each field is independent:
-
Leave a field blank to keep the catalog rate for that bucket. The catalog rate shows as the field’s placeholder.
-
Enter a positive value to replace the catalog rate for that bucket only.
-
Enter
0to make that bucket explicitly free, which is different from leaving it blank.
Cache writes with an unknown TTL always bill at the catalog rate; they have no override field.
Use the reset control on a field to clear a single override, or clear every field to drop all overrides for the model. Overrides are scoped to this provider and model, and they change what ADP’s cost reporting computes, not what the upstream provider actually charges you.
After you create the provider, its detail page has two tabs. The Overview tab carries a Last 7 days KPI strip (TOTAL SPEND, REQUESTS, TOKENS) with sparklines and a View more link on each card, the Connection card (provider type, status, authentication passthrough state, proxy URL, upstream base URL, and the API key secret reference), and the model list, where each model shows its input and output prices per million tokens and its spend from requests routed through this provider over the last 7 days. For analysis across providers, use the Cost & Usage page under Governance (see View cost and usage).
The Connect tab generates ready-made client configuration for this provider: a gateway-token step, setup instructions for popular clients such as Claude Code, and code examples in several languages, all with the provider’s proxy URL prefilled. See Connect your app to AI Gateway for the underlying flow.
Configure transcript logging
The Transcripts card controls whether AI Gateway records the message bodies this provider proxies. It has two independent toggles, both off by default:
| Toggle | What it captures |
|---|---|
|
Captures the full request body (prompt content and tool-call arguments) on observability traces. |
|
Captures the full response body (completion content and tool-call results) on observability traces. |
Because both toggles default to off, AI Gateway does not retain message bodies for a new provider until you turn them on. Enable them to power turn-by-turn investigation and per-conversation drill-down in the Transcripts view. Leave them off for workloads where the message body must not be retained, such as regulated PII or customer secrets.
These are per-provider settings, not per-request: applications cannot opt in or out at call time. To split sensitive from non-sensitive traffic, create one provider with recording on and another with it off, and route each application to whichever proxy URL matches its data class.
Recording settings do not affect cost and usage telemetry. Token counts, latency, and provider/model attribution are always recorded, so the Cost & Usage page reports spend for traffic on the provider regardless of these toggles; only the message bodies are withheld when the toggles are off.
| Changing a toggle takes effect for new requests. Transcripts already captured under the previous setting are not retroactively redacted; delete or rotate the provider if you need to purge historical content. |
Save and verify
-
Click Create provider. The button activates after
NameandTypeare both set. The Summary panel checks them off as you fill them in. -
On the provider’s detail page, the Connection card shows your
Proxy URL,DiscoveryURL,Base URL, andAPI key ref. Copy theProxy URL: this is where your applications point. -
Scroll to the Verify connection section. Pick a model from the dropdown and click Test Connection. The status updates from Not tested yet to a pass/fail indicator. Use the Show commands disclosure if you want to see the equivalent curl or SDK call.
-
To wire up an application, open Connect your app further down the page or follow Connect your app to AI Gateway.
A successful Test Connection result confirms that the provider’s credentials, region (Bedrock), and network path are all correct. If the call fails, see Troubleshooting.
AWS Bedrock: Inference profiles and IAM
Bedrock has three concepts that affect how you configure a provider: foundation models, cross-region inference profiles, and IAM. Get these right and the Test connection check passes. Get them wrong and you see AccessDenied or ValidationException errors.
Foundation models versus inference profiles
A foundation model is the base model AWS exposes (for example, anthropic.claude-sonnet-4-6). It runs in the AWS region you call.
A cross-region inference profile wraps a foundation model with a geography prefix that routes requests across multiple regions for higher availability and throughput. The prefix tells AWS which geography the request should run in:
| Prefix | Geography |
|---|---|
|
US regions |
|
EU regions |
|
Asia-Pacific regions |
|
Australia regions |
|
Japan regions |
|
Any region; routes for lowest cost |
Examples: us.anthropic.claude-sonnet-4-6 (Claude Sonnet 4.6 routed across US regions), eu.anthropic.claude-haiku-4-5 (Haiku 4.5 routed across EU regions).
|
Anthropic Claude 4.6+ models (Sonnet 4.6, Opus 4.6, Opus 4.7) cannot be invoked with the bare foundation-model ID; they require an inference profile. If you try the bare ID, Bedrock returns:
Older 4.5 and earlier Claude models still accept bare IDs. |
Pricing varies by profile. The bare foundation-model ID and the global. profile share AWS’s headline rate; geo profiles (us., eu., apac., au., jp.) carry approximately a 10% cross-region inference premium. Use global. when you want the headline rate and don’t need a specific geography. Use us. / eu. / apac. when data residency matters.
AI Gateway preserves the regional prefix end to end when it records spend, so the Cost & Usage page attributes usage to the correct regional rate. A call to eu.anthropic.claude-haiku-4-5 is billed at the EU Haiku rate, not the headline foundation-model rate.
IAM policy patterns
Bedrock IAM resources have different ARN structures depending on whether you reference a foundation model, a system-defined inference profile, or an account-scoped application inference profile. The provider’s IAM principal needs bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream on every resource it calls.
| Resource type | ARN shape |
|---|---|
Foundation model |
|
System-defined inference profile |
|
Application inference profile (account-scoped) |
|
A minimal policy granting access to all foundation models plus all cross-region profiles:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
"Resource": [
"arn:aws:bedrock:*::foundation-model/*",
"arn:aws:bedrock:*:*:inference-profile/*"
]
}]
}
For production, scope to specific models and regions instead of using wildcards.
Anthropic: Authorization passthrough
If you want each client to authenticate against Anthropic with its own subscription (Claude Pro, Max, Team, or enterprise), enable Authorization passthrough instead of configuring a server-side API key. In this mode:
-
Leave the
API keyfield empty. -
Clients must send their own Anthropic
Authorizationheader with every request. AI Gateway forwards it unchanged. -
Use this when you want to aggregate individual client subscriptions rather than share a single API account.
The provider detail page shows whether Authorization passthrough is enabled in the Connection card.
Browse providers in the list view
The LLM Providers list page is the at-a-glance home for every provider in your dataplane. Open it from the sidebar’s LLM Providers entry.
| Column | What it shows |
|---|---|
|
User-given name plus the provider-type icon (OpenAI, Anthropic, Google, AWS Bedrock, OpenAI-compatible) and a copyable preview of the proxy base URL. |
|
Shows |
|
First two model identifiers exposed by the provider, plus a |
|
Spend over the last 7 days with a small sparkline. The window is fixed at 7 days on this view. Longer-range analysis runs through the Cost & Usage page under Governance (see View cost and usage). |
|
Relative timestamp of the last edit. |
The Filter button narrows the list by provider type, status, or name. The Create provider button opens the create flow described in Open the Create LLM provider page. The list paginates, with a rows-per-page selector in the footer.
View cost and usage
The Cost & Usage page tracks spend, request volume, and token volume over time across providers and models. Open Cost & Usage under Governance in the sidebar. Use it when you want to understand which provider or model generated usage during a selected time window.
The page includes these charts:
-
Spend over time: Estimated spend in USD for the selected range.
-
Requests over time: Request count for the selected range.
-
Tokens over time: Token count for the selected range.
Use Group by to switch the chart breakdown between providers, models, and token type. Group by provider to see which upstream consumed the most budget. Group by model to see which model drove spend inside one or more providers. Group by token type to separate input, output, cached, cache-write, and reasoning usage where those buckets apply.
Use Filter to narrow the charts by provider, model, cost type, token type, user, or agent. Each filter appears as a chip above the chart, and you can combine them. For example, filter to one Anthropic provider, drill into claude-opus-4-7, then limit the spend view to input tokens. Selecting an agent also narrows the provider options to the providers that agent used.
The date-range picker supports last 7 days, last 14 days, last 30 days, last 90 days, month to date, quarter to date, year to date, and custom ranges. The chart subtitle shows the selected date range and bucket size.
A custom range writes customStart and customEnd ISO-8601 timestamps to the page URL, so the view is shareable: copy the URL after picking a custom range and any teammate who opens it lands on the same window.
The chart renders empty buckets in the selected range as zero-height bars rather than gaps, so quiet days line up with their date label and the trend stays readable when traffic is bursty.
The chart palette is colorblind-safe. When multiple providers of the same type exist (for example, two OpenAI providers), the chart renders each one with a distinct hatched pattern so the series stay visually distinguishable.
The spend chart footer summarizes the selected view by cost bucket, including total, input, output, cached, cache writes, and reasoning when the selected traffic includes those categories.
Edit, disable, or delete a provider
-
Edit: Click Edit on the detail page. You can change any field except
NameandType, which are immutable. Model lists, credential references, and the enabled state can all change. -
Disable: Click Disable on the detail page. The provider remains in the list, but requests to its proxy URL are rejected until you enable it again. Use this when you want to pause traffic without losing configuration.
-
Delete: Scroll to the Delete this provider section at the bottom of the detail page and click Delete. The action is permanent. In-flight requests fail and downstream clients receive errors until reconfigured.
Troubleshooting
| Symptom | What to check |
|---|---|
|
Confirm the secret exists in your dataplane’s secret store and the reference in the provider configuration is spelled identically ( |
Bedrock returns |
Verify the AWS region field matches the region where your Bedrock models are enabled. Bedrock model availability varies by region. Confirm the IAM principal has |
Bedrock returns "Invocation of model ID … with on-demand throughput isn’t supported" |
You called a Claude 4.6+ model with a bare foundation-model ID. Switch to an inference profile (for example, |
Anthropic returns 401 when passthrough is enabled |
Confirm the client is sending its own |
Gemini returns 401 |
Gemini uses the |
Provider list empty or 403 |
Confirm your account has the |
Limitations
AI Gateway does not provide these capabilities. For current status, consult the ADP release notes.
-
Multi-provider routing, failover, and retries across providers. A synthetic provider that fans requests to multiple upstreams is not part of AI Gateway.
-
Rate limits. Requests-per-second, per-minute, or per-day limits are not available. To cap spend rather than request rate, use budgets, which enforce a per-agent hard cap.
-
Managed MCP aggregation at the gateway. Register MCP tool servers separately under MCP Servers in ADP.