Troubleshoot Query Out-of-Memory Errors
If a Redpanda SQL query exhausts the memory available to it, the engine cancels the query and returns an error to the client:
ERROR: Query Out of Memory!
Canceling the query frees its memory and allows the engine to continue serving other queries. This is a normal protection mechanism and is not a sign of cluster failure.
Use this page to:
-
Identify when a query was canceled because it ran out of memory
-
Recover from a query out-of-memory error and reduce its frequency
-
Monitor node memory usage to anticipate memory pressure
How Redpanda SQL uses memory
Redpanda SQL queries can read very large input sources (many terabytes), and simple operations such as filtering or projection process input incrementally with a small, roughly constant memory footprint. In those cases the engine can process far more data than fits in RAM.
Memory pressure comes from operations that materialize intermediate state: hash tables for JOIN and GROUP BY, heaps for ORDER BY and top-K, and network buffers between nodes. All of this intermediate state, along with the final result set, must fit into the aggregate memory available across the cluster. The engine does not spill intermediate state to disk, so a query that builds intermediate structures larger than available memory is canceled rather than slowed.
All concurrently running queries contribute to total memory consumption, so a single query can hit the node memory limit because of pressure from other queries running at the same time.
Recover from the error
When a single query fails with Query out of Memory, retry it. The error frees the query’s memory, so the next attempt often succeeds, especially if other concurrent queries have completed in the meantime.
If the same query keeps failing, the query itself is too memory-hungry for the current cluster size, or too many other queries are competing for memory at the same time. Reduce the query’s memory footprint or reduce concurrent load:
-
Reduce concurrency.
Run fewer queries in parallel against the cluster. Other queries running at the same time contribute to the total memory pressure.
-
Simplify the query.
Narrow the scan range with tighter
WHEREfilters, reduce the number of joins, or break a large aggregation into smaller ones. Operations that materialize wide intermediate results (joins, sorts, distinct aggregations) drive memory consumption the most. -
Scale the cluster.
Add SQL nodes to increase the aggregate memory available to queries. See Scale Redpanda SQL.
Monitor memory usage
Use the following Prometheus gauge to track memory consumed by the SQL workload and watch for sustained growth toward the node’s limit:
| Metric | Description |
|---|---|
|
Memory consumed by queries on the node, in bytes. Use this metric to monitor workload memory usage. Unlike |
OOM prevention cancellations
The cancelled due to OOM prevention error is a separate case. Redpanda SQL’s engine includes an overseer that monitors overall node memory independently of per-query accounting. When the overseer detects that the untracked memory pool has grown unexpectedly, it cancels running queries on the affected node to keep the engine operational.
This condition is rare and almost always indicates a bug in memory accounting or an unexpected workload pattern. Collect the cluster logs from around the time of the error and contact Redpanda Support.