# Tune Performance for Iceberg Topics

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: Tune Performance for Iceberg Topics
latest-redpanda-tag: v26.1.9
latest-console-tag: v3.7.3
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "false"
page-eol-date: March 31, 2027
latest-connect-version: 4.93.0
docname: iceberg/iceberg-performance-tuning
page-component-name: streaming
page-version: "26.1"
page-component-version: "26.1"
page-component-title: Streaming
page-relative-src-path: iceberg/iceberg-performance-tuning.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/main/modules/manage/pages/iceberg/iceberg-performance-tuning.adoc
description: Optimize query performance and translation throughput for Iceberg topics with partitioning, compaction, flush threshold tuning, and cluster sizing guidance.
page-topic-type: best-practices
personas: ops_admin, streaming_developer
page-git-created-date: "2026-05-06"
page-git-modified-date: "2026-05-06"
support-status: supported
---

<!-- Source: https://docs.redpanda.com/streaming/current/manage/iceberg/iceberg-performance-tuning.md -->

> 📝 **NOTE**
>
> This feature requires an [enterprise license](https://docs.redpanda.com/streaming/current/get-started/licensing/). To get a trial license key or extend your trial period, [generate a new trial license key](https://redpanda.com/try-enterprise). To purchase a license, contact [Redpanda Sales](https://redpanda.com/upgrade).
>
> If Redpanda has enterprise features enabled and it cannot find a valid license, [restrictions](https://docs.redpanda.com/streaming/current/get-started/licensing/#self-managed) apply.

This guide covers strategies for optimizing the performance of Iceberg topics in Redpanda, including improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput.

After reading this page, you will be able to:

-   Apply partitioning and compaction strategies to improve query performance

-   Choose appropriate flush threshold and lag target values for your workload

-   Identify translation performance signals using Iceberg metrics


## [](#prerequisites)Prerequisites

You must be familiar with how Iceberg topics work in Redpanda. See [About Iceberg Topics](https://docs.redpanda.com/streaming/current/manage/iceberg/about-iceberg-topics/).

## [](#optimize-query-performance)Optimize query performance

Query engines read Parquet files from object storage to process Iceberg table data. Partitioning, compaction, and schema design affect how efficiently those reads perform.

### [](#use-custom-partitioning)Use custom partitioning

To improve query performance, consider implementing custom [partitioning](https://iceberg.apache.org/docs/nightly/partitioning/) for the Iceberg topic. Use the [`redpanda.iceberg.partition.spec`](https://docs.redpanda.com/streaming/current/reference/properties/topic-properties/#redpanda-iceberg-partition-spec) topic property to define the partitioning scheme:

```bash
# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg
rpk topic create <new-topic-name> -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(<partition-key1>, <partition-key2>, ...)"
```

Valid `<partition-key>` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification.

For example:

-   To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`.

-   To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`.

-   To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`.


To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the [Apache Iceberg documentation](https://iceberg.apache.org/spec/#partitioning).

> 💡 **TIP**
>
> -   Partition by columns that you frequently use in queries. Columns with relatively few unique values (low cardinality) are good candidates for partitioning.
>
> -   If you must partition based on columns with high cardinality, for example timestamps, use Iceberg’s available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed.

### [](#compact-iceberg-tables)Compact Iceberg tables

Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance.

-   Automatic compaction: Some catalog and data platform services, such as AWS Glue and Databricks, automatically compact Iceberg tables.

-   Manual or scheduled compaction: Tools like [Apache Spark](https://spark.apache.org/) can run compaction jobs on a schedule. This is useful if your catalog or platform does not compact automatically.


If you observe degraded read performance or a high number of small files, investigate whether your catalog or platform supports automatic compaction or schedule periodic compaction jobs.

### [](#avoid-high-column-count)Avoid high column count

A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics.

## [](#tune-translation-performance)Tune translation performance

Translation is the process in which Redpanda converts topic data into Parquet files for the Iceberg table. Each round of translation processes one topic partition at a time.

Under typical conditions, Iceberg translation has the following performance characteristics:

-   Throughput: Approximately 5 MiB/s per core.

-   Flush threshold: Controlled by [`datalake_translator_flush_bytes`](https://docs.redpanda.com/streaming/current/reference/properties/cluster-properties/#datalake_translator_flush_bytes) (default: 32 MiB). Each translation process uploads its on-disk data when accumulated data reaches this threshold. This is the primary control for Parquet file size.

-   Lag target: Controlled by [`iceberg_target_lag_ms`](https://docs.redpanda.com/streaming/current/reference/properties/cluster-properties/#iceberg_target_lag_ms) (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window.


The flush threshold and lag target together determine the size of the Parquet files written to object storage. Larger Parquet files generally improve downstream query performance by reducing the number of metadata operations query engines must perform.

### [](#tune-flush-threshold-and-lag-target)Tune flush threshold and lag target

Increase both the flush threshold and the lag target together to produce larger Parquet files with more records per file. This is the primary way to tune Iceberg translation performance.

1.  Increase `datalake_translator_flush_bytes` to control the size of Parquet files. A good starting value depends on your workload:

    ```bash
    rpk cluster config set datalake_translator_flush_bytes <bytes>
    ```

    For example, to set a 64 MiB flush threshold:

    ```bash
    rpk cluster config set datalake_translator_flush_bytes 67108864
    ```

2.  Increase `iceberg_target_lag_ms` to give translators more time to accumulate data before committing:

    ```bash
    rpk cluster config set iceberg_target_lag_ms 300000
    ```

    You can also set the lag target per topic using the [`redpanda.iceberg.target.lag.ms`](https://docs.redpanda.com/streaming/current/reference/properties/topic-properties/#redpanda-iceberg-target-lag-ms) topic property.

    > 📝 **NOTE**
    >
    > Increasing the lag target means Iceberg tables receive new data less frequently. Choose a lag value that balances file efficiency against how current your downstream data must be.


> 💡 **TIP**
>
> `datalake_translator_flush_bytes` and `iceberg_target_lag_ms` work best when tuned together. A high flush threshold combined with a short lag window may not improve file sizes if the lag window expires before enough data has accumulated.

To check the current values of key translation properties:

```bash
rpk cluster config get datalake_translator_flush_bytes
rpk cluster config get iceberg_target_lag_ms
```

To check topic-level overrides:

```bash
rpk topic describe <topic-name> -c
```

### [](#optimize-message-size)Optimize message size

Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. With large messages, each Parquet file contains fewer records because the flush threshold is reached sooner. This can reduce the efficiency of analytical queries that need to scan many records.

If query latency is a concern and your workload produces large messages, consider:

-   Reducing individual message sizes if your data model allows it.

-   Increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms` to produce Parquet files with more records per file. See [Tune flush threshold and lag target](#tune-flush-threshold-and-lag-target).


### [](#size-clusters-for-iceberg-workloads)Size clusters for Iceberg workloads

When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster’s CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically increases the scheduling priority of Iceberg translation to help it catch up with incoming data. However, this does not substitute for adequate cluster resources.

You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team.

### [](#monitor-translation-performance)Monitor translation performance

Use the following [Iceberg metrics](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#iceberg-metrics) to understand whether translation is keeping pace with incoming data:

-   [`redpanda_iceberg_translation_raw_bytes_processed`](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#redpanda_iceberg_translation_raw_bytes_processed): Total raw bytes consumed for translation input. Use this to monitor input throughput and compare against the expected 5 MiB/s per core baseline.

-   [`redpanda_iceberg_translation_parquet_bytes_added`](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#redpanda_iceberg_translation_parquet_bytes_added): Total bytes written to Parquet files. Divide by `redpanda_iceberg_translation_files_created` to estimate the average file size produced by your workload.

-   [`redpanda_iceberg_translation_files_created`](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#redpanda_iceberg_translation_files_created): Number of Parquet files created. A high file creation rate relative to bytes added indicates many small files. Consider increasing `datalake_translator_flush_bytes` and `iceberg_target_lag_ms`.

-   [`redpanda_iceberg_translation_parquet_rows_added`](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#redpanda_iceberg_translation_parquet_rows_added): Total rows written to Parquet files. Useful for understanding record-level throughput.

-   [`redpanda_iceberg_translation_translations_finished`](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#redpanda_iceberg_translation_translations_finished): Number of completed translator executions. A stalling or zero rate indicates translation has stopped.


For metrics related to DLQ files, invalid records, and catalog commit failures, see [Troubleshooting metrics](https://docs.redpanda.com/streaming/current/manage/iceberg/iceberg-troubleshooting/#troubleshooting-metrics).

> 💡 **TIP**
>
> If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each core translates its assigned partitions independently, so distributing data across more partitions allows more cores to contribute to translation and can improve total throughput.

## Suggested labs

-   [Redpanda Iceberg Docker Compose Example](https://docs.redpanda.com/labs/docker-compose/iceberg/)
-   [Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark](https://docs.redpanda.com/labs/kubernetes/iceberg/)

[Search all labs](https://docs.redpanda.com/labs)