Agentic Data Plane

Set Up Budgets

The Agentic Data Plane caps LLM spend with budgets and records every LLM call as a spending event. Set a budget to enforce a hard spending cap per agent, then read what you actually spend on the Cost & Usage page under Governance, through individual transcripts, and through breakdown queries by provider, model, user, agent, or provider type.

After completing these steps, you will be able to:

  • Set a per-agent budget that caps LLM spend and warns before the cap

  • Identify what spending data the Agentic Data Plane records automatically

  • View spend breakdowns by agent, model, and provider

Set a budget

A budget caps LLM spend over a recurring period. When an agent’s spend for the period reaches the budget’s hard limit, AI Gateway rejects that agent’s next LLM request with HTTP 429 until the period resets. A separate warning threshold fires before the cap, so you can react before AI Gateway cuts the agent off.

Redpanda ADP enforces budgets per agent. A budget identifies an agent by its resource name in the form agents/<slug>, the same identity that appears as agent_name in spend data.

How a budget works

Setting What it does

Limit

The hard cap on per-period spend. You set it in dollars in the UI; the API stores it in USD microcents (1 cent = 1,000,000 microcents). When a matching agent’s accrued spend for the period reaches this value, the agent’s next LLM request through the gateway gets HTTP 429. Spend resets to zero at the start of each period.

Warning threshold

A spend level, lower than the limit, at which ADP warns: gateway responses carry a SpendLimit-Warning header and ADP records a warning metric. Requests still pass. Must be greater than zero and less than the limit.

Period

How often the spend pool resets: daily, weekly, or monthly. Periods are calendar-aligned in UTC: daily at 00:00 UTC, weekly at 00:00 UTC Monday, monthly at 00:00 UTC on the first of the month.

Target agent

Which agent the budget applies to. Leave it unset to create the tenant default budget. Set it to an agent’s resource name (agents/<slug>) to create a per-agent override.

Default and per-agent override budgets

You can have one default budget per tenant and at most one override per agent:

  • The default budget (no target agent) gives every agent its own independent pool of the limit per period. One agent reaching its cap doesn’t affect another.

  • A per-agent override targets a single agent by resource name (agents/<slug>) and replaces the default for that agent. Use an override to give a specific agent a higher or lower cap than the fleet default.

The target agent is immutable. To move an override to a different agent, delete it and create a new one. An override matches on the agent’s resource name, so recreating an agent with the same name keeps the override pointed at it; the new instance counts as a continuation for budget purposes. When you read a budget, it also reports the current period’s spend, when the period started, when it resets, and (for the default budget) the agent currently closest to its cap.

Set a budget in the UI

Open Budgets under Governance in the sidebar. The page shows the tenant default budget as a card: its cap and period, the warn threshold, and how many agents are doing fine, getting close, or over the limit. Per-agent overrides appear in a table below the card, with each override’s Agent override target, Period, current Usage against the cap, Warn at threshold, and when it was last Updated.

To create the tenant default budget:

  1. Click Create default budget.

  2. Under Budget, set Cap usage at to a dollar amount and choose a period (day, week, or month). Use the quick-set chips ($25, $100, $500, $1,000) for common values.

  3. Drag the Warn at slider to set the warning threshold as a percentage of the cap (80% by default).

  4. Review the Configuration preview panel, which summarizes the budget, period, and the warn and block thresholds in dollars, then click Create default budget.

Open the default budget from the Budgets page to see its detail view: the per-agent spending limit (each agent gets its own limit, not one shared pool), the reset schedule, the warn threshold, and how much each agent has spent so far toward its limit.

To give one agent a different cap, click Add override, pick the agent (each agent can have at most one override; agents that already have one are grayed out), then set the Budget and Warn at controls the same way. The Resource name is auto-derived from the picked agent and is immutable; the Display name is editable.

Manage budgets through the API

BudgetService exposes standard create, read, update, and delete operations:

Method Use it to

CreateBudget

Create the tenant default or a per-agent override.

GetBudget, ListBudgets

Read one budget, or list all budgets, each with current-period spend status.

UpdateBudget

Change the limit, warning threshold, or period. Send a field mask naming the fields you change.

DeleteBudget

Remove a budget. Deleting the default removes the per-agent pools; deleting an override falls that agent back to the default.

A service account needs the matching dataplane_adp_budget_* permission for each operation (create, get, list, update, or delete). See Budget permissions.

What ADP records automatically

Every LLM call routed through AI Gateway becomes a spending event. Each event captures:

  • Input tokens, output tokens, and cached tokens.

  • Total cost (in microcents).

  • Request count.

  • The provider, model, user, and organization context the call ran under.

No setup required: the gateway captures spending the moment your first agent runs through it.

ADP tracks streaming and non-streaming requests the same way, and attributes cache-write tokens (Anthropic 4.x, OpenAI 4.x prompt caches) correctly on streaming responses, so cost rollups stay accurate when an agent reuses long system prompts.

ADP reports cost in microcents. 1 cent = 1,000,000 microcents, so $1 = 100,000,000 microcents. Divide total_cost_microcents by 100,000,000 to convert to dollars.

Per-request pricing variations

A few request- or response-time signals change the rate ADP applies to a single call. You don’t configure these; the spending pipeline picks them up from the upstream response or request and bills accordingly.

  • Anthropic fast mode: Anthropic exposes a fast-mode option on some models (for example, Opus 4.6 fast) that carries a per-token premium over the default rate. ADP reads the speed field on each Anthropic response and bills fast-mode calls at the model’s fast-mode rate. Requests without a speed field fall back to the default rate.

  • Context-tier pricing: A few models charge a different rate once a request crosses a context-length threshold. Gemini Pro, for example, prices requests above a 128K-token context at a higher tier than shorter requests. ADP uses the call’s context-token count so requests at or above the threshold bill at the tiered rate automatically.

Where to view your spend

You don’t view spend on the Budgets page. The Cost & Usage page, transcripts, and breakdown queries are the read surfaces:

Surface Use it for

Cost & Usage page (Governance sidebar group)

Time-series spend, request, and token charts across providers and models. Use it to group by provider, model, or token type, then filter by provider, model, cost type, token type, user, or agent. See View cost and usage.

Transcripts

Per-call cost on individual executions. Useful when investigating a specific agent run or debugging a cost anomaly. See Read a transcript.

Breakdown queries

Aggregated spend by provider, model, user, agent, or provider type, available through GetSpendingBreakdown for programmatic access.

Every breakdown and time-series query reads from the same SpendingFilter shape: a time range plus optional provider_name, model_id, user_email, agent_name, agent_uid, or organization_id filters. Combine filters to scope a query (for example, "all spend on Anthropic for user alice in April"). You can break results down by provider, model, user, agent, or provider type; organization_id is a filter only, not a breakdown dimension.

For more expressive queries, SpendingFilter also accepts an AIP-160 filter expression that lets you combine and negate dimensions in a single string (for example, provider_name="anthropic" AND model_id!="claude-sonnet-4-6"). The convenience fields and the filter expression compose; populate one or both.

user_email and organization_id are populated automatically from the request’s authenticated identity (the caller’s email and organization), so spend is attributed without any setup on your part.

Query spend programmatically

SpendingService.GetSpendingBreakdown is the canonical RPC for pulling spend out of ADP. Use it for chargeback reporting, scheduled emails, internal cost dashboards, or any workflow the built-in UI doesn’t cover.

Authenticate

SpendingService uses the same OIDC client-credentials grant as the rest of AI Gateway. Mint a service-account access token using the flow in Authenticate with OIDC client credentials, then pass the token in the Authorization: Bearer <token> header on every call. The service account needs dataplane_adp_spending_get on the resource you’re querying. See Spending permissions.

Request shape

GetSpendingBreakdown takes a SpendingFilter plus a dimension. The filter accepts:

Field Meaning

start_time, end_time

RFC 3339 timestamps bracketing the window. Required.

provider_name

Restrict to one LLM provider (matches the Name field on the provider’s detail page).

model_id

Restrict to one model identifier (claude-sonnet-4-6, gpt-5.2, and so on).

user_email

Restrict to one identified user, matched on the caller’s email. Anonymous traffic is excluded.

agent_name

Restrict to one agent by its resource name (agents/<slug>), recorded on every call made by or on behalf of an agent. Leave it empty to match every row, including direct user calls; set it to scope spend to a single agent, summed across every instance that has used the name.

agent_uid

Restrict to one agent instance, identified by an opaque UUID. Only valid when agent_name is also set: setting agent_uid alone is rejected. Use it to exclude spend from a previously deleted agent that reused the same name.

organization_id

Restrict to one organization. Multi-tenant deployments only.

filter

AIP-160 expression that combines and negates dimensions in a single string (for example, provider_name="anthropic" AND model_id!="claude-sonnet-4-6"). Composes with the structured fields above; populate one or both.

The dimension value chooses the breakdown dimension. Valid values are the BreakdownDimension enum: BREAKDOWN_DIMENSION_PROVIDER, BREAKDOWN_DIMENSION_MODEL, BREAKDOWN_DIMENSION_USER, BREAKDOWN_DIMENSION_AGENT, and BREAKDOWN_DIMENSION_PROVIDER_TYPE. A breakdown on BREAKDOWN_DIMENSION_AGENT keys on the agent’s resource name (agents/<slug>) and excludes rows with no agent (direct user calls), the same way the other dimensions skip empty keys. Spend is summed across every instance that has used the name, so an agent that was deleted and recreated appears as a single entry.

cURL example

Pull per-user spend for the last 7 days against an Anthropic provider:

ACCESS_TOKEN="<oidc-access-token>"   # from the client_credentials flow
DATAPLANE_BASE="https://aigw.<cluster-id>.clusters.rdpa.co"

curl -s --request POST \
  --url "${DATAPLANE_BASE}/redpanda.api.adp.v1alpha1.SpendingService/GetSpendingBreakdown" \
  --header "Authorization: Bearer ${ACCESS_TOKEN}" \
  --header 'Content-Type: application/json' \
  --data '{
    "filter": {
      "start_time": "2026-05-17T00:00:00Z",
      "end_time":   "2026-05-24T00:00:00Z",
      "provider_name": "prod-anthropic"
    },
    "dimension": "BREAKDOWN_DIMENSION_USER"
  }' | jq

The response carries one entries row per user in the window. Each entry has a key (the user) and a stats object with total_cost_microcents, total_requests, total_tokens (server-derived), and per-bucket input, output, and cached usage. Divide total_cost_microcents by 100,000,000 to convert to dollars.

Python example

Generated client code lives in the proto bundle; if your project doesn’t already import it from cloudv2, drive SpendingService over plain HTTPS:

import os, requests
from datetime import datetime, timedelta, timezone

token = os.environ["ACCESS_TOKEN"]            # from the client_credentials flow
base  = os.environ["DATAPLANE_BASE"]          # https://aigw.<cluster-id>.clusters.rdpa.co
end   = datetime.now(timezone.utc)
start = end - timedelta(days=7)

body = {
    "filter": {
        "start_time": start.isoformat().replace("+00:00", "Z"),
        "end_time":   end.isoformat().replace("+00:00", "Z"),
        "filter": 'provider_name="prod-anthropic"',
    },
    "dimension": "BREAKDOWN_DIMENSION_USER",
}

r = requests.post(
    f"{base}/redpanda.api.adp.v1alpha1.SpendingService/GetSpendingBreakdown",
    headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
    json=body,
)
r.raise_for_status()
for entry in r.json().get("entries", []):
    stats = entry["stats"]
    dollars = int(stats["total_cost_microcents"]) / 100_000_000
    print(f"{entry['key']}: ${dollars:,.2f}  ({stats['total_requests']} requests)")

The proto-generated client (Connect-Go or grpc-python) is the long-term recommendation; the cURL and requests examples are for quick scripting.

SpendingService exposes additional methods that follow the same SpendingFilter shape:

  • GetSpendingSummary: Total spend, tokens, and requests for the range, with no breakdown. Also returns the previous comparable period so you can show a trend.

  • GetSpendingTimeSeries: Spend bucketed over the time range (hourly or daily), for chart-style consumers.

  • GetSpendingTimeSeriesByDimension: Time-series buckets split by a breakdown dimension (top-N keys by cost), for stacked charts. Reports truncated_key_count when more keys matched than were returned.

Guardrail cost

AWS bills guardrail evaluation directly to the AWS account whose credentials the guardrail’s backend uses. This cost does not appear in ADP cost reporting and is not counted against budgets. For current rates, see AWS Bedrock pricing.

For what each policy does, see How guardrails work and Guardrail policy reference.

Override per-model pricing

The Agentic Data Plane ships with default per-model pricing per provider, covering input, output, and cache-read prices for every model in the built-in catalog. Cost reporting uses these prices when it computes per-call spend, which is why every dollar value on the Cost & Usage page, in transcripts, and in SpendingService queries works without any setup.

If your organization negotiates non-standard pricing, or you want to track spend against an internal chargeback rate, override the rates as part of configuring an LLM provider. Overrides are scoped to a single provider, where you edit the rate per model. See Override per-model pricing.