# Set Up Budgets

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [agentic-data-plane-full.txt](https://docs.redpanda.com/agentic-data-plane-full.txt)

---
title: Set Up Budgets
latest-operator-version: v26.1.5
latest-console-tag: v3.7.4
latest-connect-version: 4.96.1
latest-redpanda-tag: v26.1.10
docname: budgets
page-component-name: agentic-data-plane
page-version: master
page-component-version: master
page-component-title: Agentic Data Plane
page-relative-src-path: budgets.adoc
page-edit-url: https://github.com/redpanda-data/adp-docs/edit/main/modules/control/pages/budgets.adoc
description: Cap LLM spend with per-agent budgets, and see what the Agentic Data Plane records automatically and where to view it.
page-topic-type: how-to
personas: platform_engineer, pilot_lead
learning-objective-1: Set a per-agent budget that caps LLM spend and warns before the cap
learning-objective-2: Identify what spending data the Agentic Data Plane records automatically
learning-objective-3: View spend breakdowns by agent, model, and provider
page-git-created-date: "2026-05-28"
page-git-modified-date: "2026-06-10"
---

<!-- Source: https://docs.redpanda.com/agentic-data-plane/control/budgets.md -->

The Agentic Data Plane caps LLM spend with budgets and records every LLM call as a spending event. Set a budget to enforce a hard spending cap per agent, then read what you actually spend on the **Cost & Usage** page under **Governance**, through individual [transcripts](https://docs.redpanda.com/agentic-data-plane/reference/glossary/#transcript), and through breakdown queries by provider, model, user, agent, or provider type.

After completing these steps, you will be able to:

-   Set a per-agent budget that caps LLM spend and warns before the cap

-   Identify what spending data the Agentic Data Plane records automatically

-   View spend breakdowns by agent, model, and provider


## [](#set-a-spend-limit)Set a budget

A **budget** caps LLM spend over a recurring period. When an agent’s spend for the period reaches the budget’s hard limit, AI Gateway rejects that agent’s next LLM request with `HTTP 429` until the period resets. A separate warning threshold fires before the cap, so you can react before AI Gateway cuts the agent off.

Redpanda ADP enforces budgets per agent. A budget identifies an _agent_ by its resource name in the form `agents/<slug>`, the same identity that appears as `agent_name` in spend data.

### [](#how-a-budget-works)How a budget works

| Setting | What it does |
| --- | --- |
| Limit | The hard cap on per-period spend. You set it in dollars in the UI; the API stores it in USD microcents (1 cent = 1,000,000 microcents). When a matching agent’s accrued spend for the period reaches this value, the agent’s next LLM request through the gateway gets HTTP 429. Spend resets to zero at the start of each period. |
| Warning threshold | A spend level, lower than the limit, at which ADP warns: gateway responses carry a SpendLimit-Warning header and ADP records a warning metric. Requests still pass. Must be greater than zero and less than the limit. |
| Period | How often the spend pool resets: daily, weekly, or monthly. Periods are calendar-aligned in UTC: daily at 00:00 UTC, weekly at 00:00 UTC Monday, monthly at 00:00 UTC on the first of the month. |
| Target agent | Which agent the budget applies to. Leave it unset to create the tenant default budget. Set it to an agent’s resource name (agents/<slug>) to create a per-agent override. |

### [](#default-and-per-agent-override-budgets)Default and per-agent override budgets

You can have one default budget per tenant and at most one override per agent:

-   The **default** budget (no target agent) gives every agent its own independent pool of the limit per period. One agent reaching its cap doesn’t affect another.

-   A **per-agent override** targets a single agent by resource name (`agents/<slug>`) and replaces the default for that agent. Use an override to give a specific agent a higher or lower cap than the fleet default.


The target agent is immutable. To move an override to a different agent, delete it and create a new one. An override matches on the agent’s resource name, so recreating an agent with the same name keeps the override pointed at it; the new instance counts as a continuation for budget purposes. When you read a budget, it also reports the current period’s spend, when the period started, when it resets, and (for the default budget) the agent currently closest to its cap.

### [](#set-a-budget-in-the-ui)Set a budget in the UI

Open **Budgets** under **Governance** in the sidebar. The page shows the tenant default budget as a card: its cap and period, the warn threshold, and how many agents are doing fine, getting close, or over the limit. Per-agent overrides appear in a table below the card, with each override’s `Agent override` target, `Period`, current `Usage` against the cap, `Warn at` threshold, and when it was last `Updated`.

To create the tenant default budget:

1.  Click **Create default budget**.

2.  Under `Budget`, set `Cap usage at` to a dollar amount and choose a period (day, week, or month). Use the quick-set chips ($25, $100, $500, $1,000) for common values.

3.  Drag the `Warn at` slider to set the warning threshold as a percentage of the cap (80% by default).

4.  Review the Configuration preview panel, which summarizes the budget, period, and the warn and block thresholds in dollars, then click **Create default budget**.


Open the default budget from the **Budgets** page to see its detail view: the per-agent spending limit (each agent gets its own limit, not one shared pool), the reset schedule, the warn threshold, and how much each agent has spent so far toward its limit.

To give one agent a different cap, click **Add override**, pick the agent (each agent can have at most one override; agents that already have one are grayed out), then set the `Budget` and `Warn at` controls the same way. The `Resource name` is auto-derived from the picked agent and is immutable; the `Display name` is editable.

### [](#manage-budgets-through-the-api)Manage budgets through the API

`BudgetService` exposes standard create, read, update, and delete operations:

| Method | Use it to |
| --- | --- |
| CreateBudget | Create the tenant default or a per-agent override. |
| GetBudget, ListBudgets | Read one budget, or list all budgets, each with current-period spend status. |
| UpdateBudget | Change the limit, warning threshold, or period. Send a field mask naming the fields you change. |
| DeleteBudget | Remove a budget. Deleting the default removes the per-agent pools; deleting an override falls that agent back to the default. |

A service account needs the matching `dataplane_adp_budget_*` permission for each operation (`create`, `get`, `list`, `update`, or `delete`). See [Budget permissions](https://docs.redpanda.com/agentic-data-plane/control/permissions-reference/#budget-permissions).

## [](#what-adp-records-automatically)What ADP records automatically

Every LLM call routed through AI Gateway becomes a spending event. Each event captures:

-   Input tokens, output tokens, and cached tokens.

-   Total cost (in microcents).

-   Request count.

-   The provider, model, user, and organization context the call ran under.


No setup required: the gateway captures spending the moment your first agent runs through it.

ADP tracks streaming and non-streaming requests the same way, and attributes cache-write tokens (Anthropic 4.x, OpenAI 4.x prompt caches) correctly on streaming responses, so cost rollups stay accurate when an agent reuses long system prompts.

> 📝 **NOTE**
>
> ADP reports cost in **microcents**. 1 cent = 1,000,000 microcents, so $1 = 100,000,000 microcents. Divide `total_cost_microcents` by 100,000,000 to convert to dollars.

### [](#per-request-pricing-variations)Per-request pricing variations

A few request- or response-time signals change the rate ADP applies to a single call. You don’t configure these; the spending pipeline picks them up from the upstream response or request and bills accordingly.

-   **Anthropic fast mode**: Anthropic exposes a fast-mode option on some models (for example, Opus 4.6 fast) that carries a per-token premium over the default rate. ADP reads the `speed` field on each Anthropic response and bills fast-mode calls at the model’s fast-mode rate. Requests without a `speed` field fall back to the default rate.

-   **Context-tier pricing**: A few models charge a different rate once a request crosses a context-length threshold. Gemini Pro, for example, prices requests above a 128K-token context at a higher tier than shorter requests. ADP uses the call’s context-token count so requests at or above the threshold bill at the tiered rate automatically.


## [](#where-to-view-your-spend)Where to view your spend

You don’t view spend on the **Budgets** page. The **Cost & Usage** page, transcripts, and breakdown queries are the read surfaces:

| Surface | Use it for |
| --- | --- |
| Cost & Usage page (Governance sidebar group) | Time-series spend, request, and token charts across providers and models. Use it to group by provider, model, or token type, then filter by provider, model, cost type, token type, user, or agent. See View cost and usage. |
| Transcripts | Per-call cost on individual executions. Useful when investigating a specific agent run or debugging a cost anomaly. See Read a transcript. |
| Breakdown queries | Aggregated spend by provider, model, user, agent, or provider type, available through GetSpendingBreakdown for programmatic access. |

Every breakdown and time-series query reads from the same `SpendingFilter` shape: a time range plus optional `provider_name`, `model_id`, `user_email`, `agent_name`, `agent_uid`, or `organization_id` filters. Combine filters to scope a query (for example, "all spend on Anthropic for user `alice` in April"). You can break results down by provider, model, user, agent, or provider type; `organization_id` is a filter only, not a breakdown dimension.

For more expressive queries, `SpendingFilter` also accepts an AIP-160 `filter` expression that lets you combine and negate dimensions in a single string (for example, `provider_name="anthropic" AND model_id!="claude-sonnet-4-6"`). The convenience fields and the `filter` expression compose; populate one or both.

`user_email` and `organization_id` are populated automatically from the request’s authenticated identity (the caller’s email and organization), so spend is attributed without any setup on your part.

## [](#query-spend-programmatically)Query spend programmatically

`SpendingService.GetSpendingBreakdown` is the canonical RPC for pulling spend out of ADP. Use it for chargeback reporting, scheduled emails, internal cost dashboards, or any workflow the built-in UI doesn’t cover.

### [](#authenticate)Authenticate

`SpendingService` uses the same OIDC client-credentials grant as the rest of AI Gateway. Mint a service-account access token using the flow in [Authenticate with OIDC client credentials](https://docs.redpanda.com/agentic-data-plane/gateway/connect-agent/#authenticate-with-oidc-client-credentials), then pass the token in the `Authorization: Bearer <token>` header on every call. The service account needs `dataplane_adp_spending_get` on the resource you’re querying. See [Spending permissions](https://docs.redpanda.com/agentic-data-plane/control/permissions-reference/#spending-permissions).

### [](#request-shape)Request shape

`GetSpendingBreakdown` takes a `SpendingFilter` plus a `dimension`. The filter accepts:

| Field | Meaning |
| --- | --- |
| start_time, end_time | RFC 3339 timestamps bracketing the window. Required. |
| provider_name | Restrict to one LLM provider (matches the Name field on the provider’s detail page). |
| model_id | Restrict to one model identifier (claude-sonnet-4-6, gpt-5.2, and so on). |
| user_email | Restrict to one identified user, matched on the caller’s email. Anonymous traffic is excluded. |
| agent_name | Restrict to one agent by its resource name (agents/<slug>), recorded on every call made by or on behalf of an agent. Leave it empty to match every row, including direct user calls; set it to scope spend to a single agent, summed across every instance that has used the name. |
| agent_uid | Restrict to one agent instance, identified by an opaque UUID. Only valid when agent_name is also set: setting agent_uid alone is rejected. Use it to exclude spend from a previously deleted agent that reused the same name. |
| organization_id | Restrict to one organization. Multi-tenant deployments only. |
| filter | AIP-160 expression that combines and negates dimensions in a single string (for example, provider_name="anthropic" AND model_id!="claude-sonnet-4-6"). Composes with the structured fields above; populate one or both. |

The `dimension` value chooses the breakdown dimension. Valid values are the `BreakdownDimension` enum: `BREAKDOWN_DIMENSION_PROVIDER`, `BREAKDOWN_DIMENSION_MODEL`, `BREAKDOWN_DIMENSION_USER`, `BREAKDOWN_DIMENSION_AGENT`, and `BREAKDOWN_DIMENSION_PROVIDER_TYPE`. A breakdown on `BREAKDOWN_DIMENSION_AGENT` keys on the agent’s resource name (`agents/<slug>`) and excludes rows with no agent (direct user calls), the same way the other dimensions skip empty keys. Spend is summed across every instance that has used the name, so an agent that was deleted and recreated appears as a single entry.

### [](#curl-example)cURL example

Pull per-user spend for the last 7 days against an Anthropic provider:

```bash
ACCESS_TOKEN="<oidc-access-token>"   # from the client_credentials flow
DATAPLANE_BASE="https://aigw.<cluster-id>.clusters.rdpa.co"

curl -s --request POST \
  --url "${DATAPLANE_BASE}/redpanda.api.adp.v1alpha1.SpendingService/GetSpendingBreakdown" \
  --header "Authorization: Bearer ${ACCESS_TOKEN}" \
  --header 'Content-Type: application/json' \
  --data '{
    "filter": {
      "start_time": "2026-05-17T00:00:00Z",
      "end_time":   "2026-05-24T00:00:00Z",
      "provider_name": "prod-anthropic"
    },
    "dimension": "BREAKDOWN_DIMENSION_USER"
  }' | jq
```

The response carries one `entries` row per user in the window. Each entry has a `key` (the user) and a `stats` object with `total_cost_microcents`, `total_requests`, `total_tokens` (server-derived), and per-bucket `input`, `output`, and `cached` usage. Divide `total_cost_microcents` by 100,000,000 to convert to dollars.

### [](#python-example)Python example

Generated client code lives in the proto bundle; if your project doesn’t already import it from cloudv2, drive `SpendingService` over plain HTTPS:

```python
import os, requests
from datetime import datetime, timedelta, timezone

token = os.environ["ACCESS_TOKEN"]            # from the client_credentials flow
base  = os.environ["DATAPLANE_BASE"]          # https://aigw.<cluster-id>.clusters.rdpa.co
end   = datetime.now(timezone.utc)
start = end - timedelta(days=7)

body = {
    "filter": {
        "start_time": start.isoformat().replace("+00:00", "Z"),
        "end_time":   end.isoformat().replace("+00:00", "Z"),
        "filter": 'provider_name="prod-anthropic"',
    },
    "dimension": "BREAKDOWN_DIMENSION_USER",
}

r = requests.post(
    f"{base}/redpanda.api.adp.v1alpha1.SpendingService/GetSpendingBreakdown",
    headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
    json=body,
)
r.raise_for_status()
for entry in r.json().get("entries", []):
    stats = entry["stats"]
    dollars = int(stats["total_cost_microcents"]) / 100_000_000
    print(f"{entry['key']}: ${dollars:,.2f}  ({stats['total_requests']} requests)")
```

The proto-generated client (Connect-Go or grpc-python) is the long-term recommendation; the cURL and `requests` examples are for quick scripting.

### [](#related-methods)Related methods

`SpendingService` exposes additional methods that follow the same `SpendingFilter` shape:

-   `GetSpendingSummary`: Total spend, tokens, and requests for the range, with no breakdown. Also returns the previous comparable period so you can show a trend.

-   `GetSpendingTimeSeries`: Spend bucketed over the time range (hourly or daily), for chart-style consumers.

-   `GetSpendingTimeSeriesByDimension`: Time-series buckets split by a breakdown dimension (top-N keys by cost), for stacked charts. Reports `truncated_key_count` when more keys matched than were returned.


## [](#guardrail-cost)Guardrail cost

AWS bills guardrail evaluation directly to the AWS account whose credentials the guardrail’s backend uses. This cost does not appear in ADP cost reporting and is not counted against budgets. For current rates, see [AWS Bedrock pricing](https://aws.amazon.com/bedrock/pricing/).

For what each policy does, see [How guardrails work](https://docs.redpanda.com/agentic-data-plane/control/guardrails/overview/) and [Guardrail policy reference](https://docs.redpanda.com/agentic-data-plane/control/guardrails/types-reference/).

## [](#override-per-model-pricing)Override per-model pricing

The Agentic Data Plane ships with default per-model pricing per provider, covering input, output, and cache-read prices for every model in the built-in catalog. Cost reporting uses these prices when it computes per-call spend, which is why every dollar value on the **Cost & Usage** page, in transcripts, and in `SpendingService` queries works without any setup.

If your organization negotiates non-standard pricing, or you want to track spend against an internal chargeback rate, override the rates as part of configuring an LLM provider. Overrides are scoped to a single provider, where you edit the rate per model. See [Override per-model pricing](https://docs.redpanda.com/agentic-data-plane/gateway/configure-provider/#pricing-overrides).

## [](#next-steps)Next steps

-   [Read a transcript](https://docs.redpanda.com/agentic-data-plane/monitor/transcripts/)

-   [How guardrails work](https://docs.redpanda.com/agentic-data-plane/control/guardrails/overview/)