# About Iceberg Topics

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: About Iceberg Topics
latest-redpanda-tag: v25.3.11
latest-console-tag: v3.7.3
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "false"
page-eol-date: November 19, 2026
latest-connect-version: 4.93.0
docname: iceberg/about-iceberg-topics
page-component-name: streaming
page-version: "25.3"
page-component-version: "25.3"
page-component-title: Streaming
page-relative-src-path: iceberg/about-iceberg-topics.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/v/25.3/modules/manage/pages/iceberg/about-iceberg-topics.adoc
description: Learn how Redpanda can integrate topics with Apache Iceberg.
page-git-created-date: "2025-04-08"
page-git-modified-date: "2026-03-03"
support-status: supported
---

<!-- Source: https://docs.redpanda.com/streaming/25.3/manage/iceberg/about-iceberg-topics.md -->

> 📝 **NOTE**
>
> This feature requires an [enterprise license](https://docs.redpanda.com/streaming/25.3/get-started/licensing/). To get a trial license key or extend your trial period, [generate a new trial license key](https://redpanda.com/try-enterprise). To purchase a license, contact [Redpanda Sales](https://redpanda.com/upgrade).
>
> If Redpanda has enterprise features enabled and it cannot find a valid license, [restrictions](https://docs.redpanda.com/streaming/25.3/get-started/licensing/#self-managed) apply.

The Apache Iceberg integration for Redpanda allows you to store topic data in the cloud in the Iceberg open table format. This makes your streaming data immediately available in downstream analytical systems, including data warehouses like Snowflake, Databricks, ClickHouse, and Redshift, without setting up and maintaining additional ETL pipelines. You can also integrate your data directly into commonly-used big data processing frameworks, such as Apache Spark and Flink, standardizing and simplifying the consumption of streams as tables in a wide variety of data analytics pipelines.

Redpanda supports [version 2](https://iceberg.apache.org/spec/#format-versioning) of the Iceberg table format.

## [](#iceberg-concepts)Iceberg concepts

[Apache Iceberg](https://iceberg.apache.org) is an open source format specification for defining structured tables in a data lake. The table format lets you quickly and easily manage, query, and process huge amounts of structured and unstructured data. This is similar to the way you would manage and run SQL queries against relational data in a database or data warehouse. The open format lets you use many different languages, tools, and applications to process the same data in a consistent way, so you can avoid vendor lock-in. This data management system is also known as a _data lakehouse_.

In the Iceberg specification, tables consist of the following layers:

-   **Data layer**: Stores the data in data files. The Iceberg integration currently supports the Parquet file format. Parquet files are column-based and suitable for analytical workloads at scale. They come with compression capabilities that optimize files for object storage.

-   **Metadata layer**: Stores table metadata separately from data files. The metadata layer allows multiple writers to stage metadata changes and apply updates atomically. It also supports database snapshots, and time travel queries that query the database at a previous point in time.

    -   Manifest files: Track data files and contain metadata about these files, such as record count, partition membership, and file paths.

    -   Manifest list: Tracks all the manifest files belonging to a table, including file paths and upper and lower bounds for partition fields.

    -   Metadata file: Stores metadata about the table, including its schema, partition information, and snapshots. Whenever a change is made to the table, a new metadata file is created and becomes the latest version of the metadata in the catalog.


    For Iceberg-enabled topics, the manifest files are in JSON format.

-   **Catalog**: Contains the current metadata pointer for the table. Clients reading and writing data to the table see the same version of the current state of the table. The Iceberg integration supports two [catalog integration](https://docs.redpanda.com/streaming/25.3/manage/iceberg/use-iceberg-catalogs/) types. You can configure Redpanda to catalog files stored in the same object storage bucket or container where the Iceberg data files are located, or you can configure Redpanda to use an [Iceberg REST catalog](https://iceberg.apache.org/terms/#decoupling-using-the-rest-catalog) endpoint to update an externally-managed catalog when there are changes to the Iceberg data and metadata.


![Redpanda’s Iceberg integration](https://docs.redpanda.com/streaming/25.3/shared/_images/iceberg-integration-optimized.png)

When you enable the Iceberg integration for a Redpanda topic, Redpanda brokers store streaming data in the Iceberg-compatible format in Parquet files in object storage, in addition to the log segments uploaded using Tiered Storage. Storing the streaming data in Iceberg tables in the cloud allows you to derive real-time insights through many compatible data lakehouse, data engineering, and business intelligence [tools](https://iceberg.apache.org/vendors/).

## [](#prerequisites)Prerequisites

To enable Iceberg for Redpanda topics, you must have the following:

-   **rpk**: See [Install or Update rpk](https://docs.redpanda.com/streaming/25.3/get-started/rpk-install/).

-   **Enterprise license**: To check if you already have a license key applied to your cluster:

    ```bash
    rpk cluster license info
    ```

-   **Tiered Storage**: Enable [Tiered Storage](https://docs.redpanda.com/streaming/25.3/manage/tiered-storage/#set-up-tiered-storage) for the topics for which you want to generate Iceberg tables.


## [](#limitations)Limitations

-   It is not possible to append topic data to an existing Iceberg table that is not created by Redpanda.

-   If you enable the Iceberg integration on an existing Redpanda topic, Redpanda does not backfill the generated Iceberg table with topic data.

-   JSON schemas are supported starting with Redpanda version 25.2.


## [](#enable-iceberg-integration)Enable Iceberg integration

To create an Iceberg table for a Redpanda topic, you must set the cluster configuration property `[iceberg_enabled](https://docs.redpanda.com/streaming/25.3/reference/properties/cluster-properties/#iceberg_enabled)` to `true`, and also configure the topic property [`redpanda.iceberg.mode`](https://docs.redpanda.com/streaming/25.3/reference/properties/topic-properties/#redpanda-iceberg-mode). You can choose to provide a schema if you need the Iceberg table to be structured with defined columns.

1.  Set the `iceberg_enabled` configuration option on your cluster to `true`.

    ```bash
    rpk cluster config set iceberg_enabled true
    ```

    ```bash
    Successfully updated configuration. New configuration version is 2.
    ```

    You must restart your cluster if you change this configuration for a running cluster.

2.  (Optional) Create a new topic.

    ```bash
    rpk topic create <new-topic-name>
    ```

    ```bash
    TOPIC              STATUS
    <new-topic-name>   OK
    ```

3.  Configure `redpanda.iceberg.mode` for the topic. You can choose one of the following [Iceberg modes](https://docs.redpanda.com/streaming/25.3/manage/iceberg/specify-iceberg-schema/):

    -   `key_value`: Creates an Iceberg table using a simple schema, consisting of two columns, one for the record metadata including the key, and another binary column for the record’s value.

    -   `value_schema_id_prefix`: Creates an Iceberg table whose structure matches the Redpanda schema for this topic, with columns corresponding to each field. You must register a schema in the Schema Registry (see next step), and producers must write to the topic using the Schema Registry wire format.

    -   `value_schema_latest`: Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry.

    -   `disabled` (default): Disables writing to an Iceberg table for this topic.


    ```bash
    rpk topic alter-config <new-topic-name> --set redpanda.iceberg.mode=<topic-iceberg-mode>
    ```

    ```bash
    TOPIC              STATUS
    <new-topic-name>   OK
    ```

4.  Register a schema for the topic. This step is required for the `value_schema_id_prefix` and `value_schema_latest` modes.

    ```bash
    rpk registry schema create <subject-name> --schema </path-to-schema> --type <format>
    ```

    ```bash
    SUBJECT          VERSION   ID   TYPE
    <subject-name>   1         1    PROTOBUF
    ```


As you produce records to the topic, the data also becomes available in object storage for Iceberg-compatible clients to consume. You can use the same analytical tools to [read the Iceberg topic data](https://docs.redpanda.com/streaming/25.3/manage/iceberg/query-iceberg-topics/) in a data lake as you would for a relational database.

See also: [Schema types translation](https://docs.redpanda.com/streaming/25.3/manage/iceberg/specify-iceberg-schema/#schema-types-translation).

### [](#iceberg-data-retention)Iceberg data retention

Data in an Iceberg-enabled topic is consumable from Kafka based on the configured [topic retention policy](https://docs.redpanda.com/streaming/25.3/manage/cluster-maintenance/disk-utilization/). Conversely, data written to Iceberg remains queryable as Iceberg tables indefinitely. The Iceberg table persists unless you:

-   Delete the Redpanda topic associated with the Iceberg table. This is the default behavior set by the `[iceberg_delete](https://docs.redpanda.com/streaming/25.3/reference/properties/cluster-properties/#iceberg_delete)` cluster property and the `redpanda.iceberg.delete` topic property. If you set this property to `false`, the Iceberg table remains even after you delete the topic.

-   Explicitly delete data from the Iceberg table using a query engine.

-   Disable the Iceberg integration for the topic and delete the Parquet files in object storage.


The DLQ table (`<topic-name>~dlq`) follows the same persistence rules as the main Iceberg table.

## [](#schema-evolution)Schema evolution

Redpanda supports schema evolution in accordance with the [Iceberg specification](https://iceberg.apache.org/spec/#schema-evolution). Permitted schema evolutions include reordering fields and promoting field types. When you update the schema in Schema Registry, Redpanda automatically updates the Iceberg table schema to match the new schema.

For example, if you produce records to a topic `demo-topic` with the following Avro schema:

schema\_1.avsc

```avro
{
  "type": "record",
  "name": "ClickEvent",
  "fields": [
    {
      "name": "user_id",
      "type": "int"
    },
    {
      "name": "event_type",
      "type": "string"
    }
  ]
}
```

```bash
rpk registry schema create demo-topic-value --schema schema_1.avsc

echo '{"user_id":23, "event_type":"BUTTON_CLICK"}' | rpk topic produce demo-topic --format='%v\n' --schema-id=topic
```

Then, you update the schema to add a new field `ts`, and produce records with the updated schema:

schema\_2.avsc

```avro
{
  "type": "record",
  "name": "ClickEvent",
  "fields": [
    {
      "name": "user_id",
      "type": "int"
    },
    {
      "name": "event_type",
      "type": "string"
    },
    {
      "name": "ts",
      "type": [
          "null",
          { "type": "long", "logicalType": "timestamp-millis" }
        ],
      "default": null  # Default value for the new field
    }
  ]
}
```

The `ts` field can be either null or a long representing epoch milliseconds. The default value is null.

```bash
rpk registry schema create demo-topic-value --schema schema_2.avsc

echo '{"user_id":858, "event_type":"BUTTON_CLICK", "ts":1737998723230}' | rpk topic produce demo-topic --format='%v\n' --schema-id=topic
```

Querying the Iceberg table for `demo-topic` includes the new column `ts`:

```bash
+---------+--------------+--------------------------+
| user_id | event_type   | ts                       |
+---------+--------------+--------------------------+
| 858     | BUTTON_CLICK | 2025-02-26T20:05:23.230Z |
| 23      | BUTTON_CLICK | NULL                     |
+---------+--------------+--------------------------+
```

## [](#troubleshoot-errors)Troubleshoot errors

If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `<topic-name>~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format:

-   Redpanda cannot find the embedded schema ID in the Schema Registry.

-   Redpanda fails to translate one or more schema data types to an Iceberg type.

-   In `value_schema_id_prefix` mode, you do not use the Schema Registry wire format with the magic byte.


The DLQ table itself uses the `key_value` schema, consisting of two columns: the record metadata including the key, and a binary column for the record’s value.

> 📝 **NOTE**
>
> Topic property misconfiguration, such as [overriding the default behavior of `value_schema_latest` mode](https://docs.redpanda.com/streaming/25.3/manage/iceberg/specify-iceberg-schema/#override-value-schema-latest-default) but not specifying the fully qualified Protobuf message name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration.

### [](#inspect-dlq-table)Inspect DLQ table

You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream.

The following example produces a record to a topic named `ClickEvent` and does not use the Schema Registry wire format that includes the magic byte and schema ID:

```bash
echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n'
```

Querying the DLQ table returns the record that was not translated:

```sql
SELECT
    value
FROM <catalog-name>."ClickEvent~dlq"; -- Fully qualified table name
```

```bash
+-------------------------------------------------+
| value                                           |
+-------------------------------------------------+
| 7b 22 75 73 65 72 5f 69 64 22 3a 32 33 32 34 2c |
| 22 65 76 65 6e 74 5f 74 79 70 65 22 3a 22 42 55 |
| 54 54 4f 4e 5f 43 4c 49 43 4b 22 2c 22 74 73 22 |
| 3a 22 32 30 32 34 2d 31 31 2d 32 35 54 32 30 3a |
| 32 33 3a 35 39 2e 33 38 30 5a 22 7d             |
+-------------------------------------------------+
```

The data is in binary format, and the first byte is not `0x00`, indicating that it was not produced with a schema.

### [](#reprocess-dlq-records)Reprocess DLQ records

You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you:

ClickHouse SQL example to reprocess DLQ record

```sql
SELECT
    CAST(jsonExtractString(json, 'user_id') AS Int32) AS user_id,
    jsonExtractString(json, 'event_type') AS event_type,
    jsonExtractString(json, 'ts') AS ts
FROM (
    SELECT
        CAST(value AS String) AS json
    FROM <catalog-name>.`ClickEvent~dlq` -- Ensure that the table name is properly parsed
);
```

```bash
+---------+--------------+--------------------------+
| user_id | event_type   | ts                       |
+---------+--------------+--------------------------+
|    2324 | BUTTON_CLICK | 2024-11-25T20:23:59.380Z |
+---------+--------------+--------------------------+
```

You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records.

### [](#drop-invalid-records)Drop invalid records

To disable the default behavior and drop an invalid record, set the [`redpanda.iceberg.invalid.record.action`](https://docs.redpanda.com/streaming/25.3/reference/properties/topic-properties/#redpanda-iceberg-invalid-record-action) topic property to `drop`. You can also configure the default cluster-wide behavior for invalid records by setting the `iceberg_invalid_record_action` property.

## [](#performance-considerations)Performance considerations

When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster’s CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster.

You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team.

### [](#use-custom-partitioning)Use custom partitioning

To improve query performance, consider implementing custom [partitioning](https://iceberg.apache.org/docs/nightly/partitioning/) for the Iceberg topic. Use the [`redpanda.iceberg.partition.spec`](https://docs.redpanda.com/streaming/25.3/reference/properties/topic-properties/#redpanda-iceberg-partition-spec) topic property to define the partitioning scheme:

```bash
# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg
rpk topic create <new-topic-name> -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(<partition-key1>, <partition-key2>, ...)"
```

Valid `<partition-key>` values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as `redpanda.timestamp`) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification.

For example:

-   To partition the table by a single key, such as a column `col1`, use: `redpanda.iceberg.partition.spec=(col1)`.

-   To partition by multiple columns, use a comma-separated list: `redpanda.iceberg.partition.spec=(col1, col2)`.

-   To partition by the year of a timestamp column `ts1`, and a string column `col1`, use: `redpanda.iceberg.partition.spec=(year(ts1), col1)`.


To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the [Apache Iceberg documentation](https://iceberg.apache.org/spec/#partitioning).

> 💡 **TIP**
>
> -   Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning.
>
> -   If you must partition based on columns with high cardinality, for example timestamps, use Iceberg’s available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed.

### [](#avoid-high-column-count)Avoid high column count

A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics.

## [](#next-steps)Next steps

-   [Use Iceberg Catalogs](https://docs.redpanda.com/streaming/25.3/manage/iceberg/use-iceberg-catalogs/)

-   [Migrate existing Iceberg integrations to Iceberg Topics](https://docs.redpanda.com/streaming/25.3/manage/iceberg/migrate-to-iceberg-topics/)


## [](#suggested-reading)Suggested reading

-   [Server-Side Schema ID Validation](https://docs.redpanda.com/streaming/25.3/manage/schema-reg/schema-id-validation/)

-   [Understanding Apache Kafka Schema Registry](https://www.redpanda.com/blog/schema-registry-kafka-streaming#how-does-serialization-work-with-schema-registry-in-kafka)


## Suggested labs

-   [Redpanda Iceberg Docker Compose Example](https://docs.redpanda.com/labs/docker-compose/iceberg/)
-   [Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark](https://docs.redpanda.com/labs/kubernetes/iceberg/)

[Search all labs](https://docs.redpanda.com/labs)