# Migrate to Iceberg Topics

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: Migrate to Iceberg Topics
latest-redpanda-tag: v26.1.9
latest-console-tag: v3.7.3
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "false"
page-eol-date: March 31, 2027
latest-connect-version: 4.93.0
docname: iceberg/migrate-to-iceberg-topics
page-component-name: streaming
page-version: "26.1"
page-component-version: "26.1"
page-component-title: Streaming
page-relative-src-path: iceberg/migrate-to-iceberg-topics.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/main/modules/manage/pages/iceberg/migrate-to-iceberg-topics.adoc
description: Migrate existing Iceberg integrations to Redpanda Iceberg topics.
page-topic-type: how-to
learning-objective-1: Compare external Iceberg integrations with Iceberg Topics architectures
learning-objective-2: Implement data merge strategies using SQL patterns
learning-objective-3: Execute validation checks and perform cutover procedures
page-git-created-date: "2026-02-28"
page-git-modified-date: "2026-05-26"
support-status: supported
---

<!-- Source: https://docs.redpanda.com/streaming/current/manage/iceberg/migrate-to-iceberg-topics.md -->

Migrate existing Iceberg pipelines to Redpanda Iceberg topics to simplify your architecture and reduce operational overhead.

After reading this page, you will be able to:

-   Compare external Iceberg integrations with Iceberg Topics architectures

-   Implement data merge strategies using SQL patterns

-   Execute validation checks and perform cutover procedures


## [](#why-migrate-to-iceberg-topics)Why migrate to Iceberg Topics

Redpanda’s built-in Iceberg-enabled topics offer a simpler alternative to external Iceberg integrations for writing streaming data to Iceberg tables.

> 📝 **NOTE**
>
> This page focuses on migrating from Kafka Connect Iceberg Sink. The migration patterns and SQL examples can be adapted for other Iceberg sources such as Apache Flink or Spark.

### [](#kafka-connect-iceberg-sink-comparison)Kafka Connect Iceberg Sink comparison

The following table compares Kafka Connect Iceberg Sink with Redpanda Iceberg Topics:

| Aspect | Kafka Connect Iceberg Sink | Iceberg Topics |
| --- | --- | --- |
| Infrastructure | Requires external Kafka Connect cluster | Built into Redpanda brokers |
| Dependencies | Separate service to manage | No external dependencies |
| Setup time | Medium (deploy connector) | Fast (enable topic property and post schema) |

## [](#prerequisites)Prerequisites

To migrate from an existing Iceberg integration to Iceberg Topics, you must have:

-   [Tiered Storage](https://docs.redpanda.com/streaming/current/manage/tiered-storage/) enabled.

-   [Iceberg Topics](https://docs.redpanda.com/streaming/current/manage/iceberg/about-iceberg-topics/) enabled on your Redpanda cluster.

-   Understanding of your current schema format (Avro, Protobuf, or JSON Schema).

-   For Kafka Connect migrations, knowledge of your Kafka Connect configuration, especially if using `iceberg.tables.route-field` for multi-table routing.

    -   If migrating multi-table fan-out patterns, [data transforms](https://docs.redpanda.com/streaming/current/develop/data-transforms/how-transforms-work/) enabled on your cluster.


-   Access to both source and target (Iceberg Topics) tables in your query engine.

-   Query engine access (Snowflake, Databricks, ClickHouse, or Spark) for data merging.


## [](#migration-steps)Migration steps

Redpanda recommends following a phased approach to ensure data consistency and minimize risk:

1.  Enable Iceberg on target topics and verify new data flows.

2.  Run both systems concurrently during transition.

3.  Choose a strategy to combine historical and new data.

4.  Verify data completeness and accuracy.

5.  Disable the external Iceberg integration.


> ❗ **IMPORTANT**
>
> Iceberg Topics cannot append to existing Iceberg tables that are not created by Redpanda. You must create new Iceberg tables and merge historical data separately.

### [](#enable-iceberg-topics)Enable Iceberg Topics

For simple migrations (one topic mapping to one Iceberg table), enable the Iceberg integration for your Redpanda topics.

1.  Set the `iceberg_enabled` configuration property on your cluster to `true`:

    ```bash
    rpk cluster config set iceberg_enabled true
    ```

    If you change this configuration for a running cluster, restart the cluster.

2.  Configure the `redpanda.iceberg.mode` property for the topic:

    ```bash
    rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=<mode>
    ```

    Choose the mode based on your message format and schema configuration. For Kafka Connect migrations, use this mapping:

    | Kafka Connect Converter | Recommended Iceberg Mode |
    | --- | --- |
    | io.confluent.connect.avro.AvroConverter | value_schema_id_prefix (messages already use Schema Registry wire format) |
    | io.confluent.connect.protobuf.ProtobufConverter | value_schema_id_prefix (messages already use Schema Registry wire format) |
    | org.apache.kafka.connect.json.JsonConverter with schemas | value_schema_latest (Schema Registry resolves schema automatically) |
    | org.apache.kafka.connect.json.JsonConverter with embedded schemas | key_value (schema included with each message) |

    See [Specify Iceberg Schema](https://docs.redpanda.com/streaming/current/manage/iceberg/specify-iceberg-schema/) to learn more about the different Iceberg modes.

3.  If using `value_schema_id_prefix` or `value_schema_latest` modes, register a schema for the topic:

    ```bash
    rpk registry schema create <topic-name>-value --schema <path-to-schema> --type <format>
    ```

    > ❗ **IMPORTANT**
    >
    > If using the `value_schema_id_prefix` mode, schema subjects must use the `<topic-name>-value` [naming convention](https://docs.redpanda.com/streaming/current/manage/schema-reg/schema-id-validation/#set-subject-name-strategy-per-topic) (TopicNameStrategy).

    Note the schema ID returned, in case you need it for troubleshooting.

4.  Verify that new records are being written to the Iceberg table:

    -   Check that data appears in your query engine.

    -   Validate that the schema translation is correct.

    -   Confirm record counts are increasing.



#### [](#multi-table-fan-out-pattern)Multi-table fan-out pattern

If your existing integration routes records to multiple Iceberg tables based on a field value (for example, Kafka Connect’s `iceberg.tables.route-field` property), you need to implement equivalent routing logic. You create separate Iceberg-enabled topics for each target table, and Redpanda automatically creates corresponding Iceberg tables. Use either of the following approaches to route records to the correct topic:

##### [](#option-1-data-transforms-with-separate-topics-recommended)Option 1: Data transforms with separate topics (recommended)

Use a data transform to read the routing field from each message and write records to separate Iceberg-enabled topics. This approach keeps routing logic within Redpanda and avoids external dependencies. When using Iceberg modes that require schema validation, the transform can register schemas dynamically and encode messages with the appropriate format.

1.  Enable data transforms on your cluster:

    ```bash
    rpk cluster config set data_transforms_enabled true
    ```

2.  Create output topics and enable Iceberg with Schema Registry validation:

    ```bash
    rpk topic create <output-topic-1> <output-topic-2> <output-topic-3>
    rpk topic alter-config <output-topic-1> --set redpanda.iceberg.mode=value_schema_id_prefix
    rpk topic alter-config <output-topic-2> --set redpanda.iceberg.mode=value_schema_id_prefix
    rpk topic alter-config <output-topic-3> --set redpanda.iceberg.mode=value_schema_id_prefix
    ```

3.  Implement a transform function that:

    1.  Reads the routing field from each input message.

    2.  If using Schema Registry validation, registers schemas dynamically and encodes messages with the appropriate format.

    3.  Writes to a specific output topic based on the routing field.


4.  Deploy the transform, specifying multiple output topics:

    ```bash
    rpk transform deploy \
      --file transform.wasm \
      --name <transform-name> \
      --input-topic <input-topic> \
      --output-topic <output-topic-1> \
      --output-topic <output-topic-2> \
      --output-topic <output-topic-3>
    ```

5.  Validate the fanout by checking that each output topic receives the correct records.


For a complete implementation example with dynamic schema registration, see [Multi-topic fan-out with Schema Registry](https://docs.redpanda.com/streaming/current/develop/data-transforms/build/#multi-topic-fanout). The example demonstrates Schema Registry wire format encoding for use with `value_schema_id_prefix` mode.

##### [](#option-2-external-stream-processor)Option 2: External stream processor

Use an external stream processor for complex routing logic:

1.  Use a stream processor ([Redpanda Connect](https://docs.redpanda.com/connect/home/) or Flink) to split records.

2.  Write to separate Iceberg-enabled topics.


This approach is more complex but offers more flexibility for advanced routing requirements not supported by data transforms.

### [](#validate-schema-registry-integration)Validate Schema Registry integration

If using [`value_schema_id_prefix`](https://docs.redpanda.com/streaming/current/manage/iceberg/specify-iceberg-schema/#value_schema_id_prefix) mode, verify that messages use the Schema Registry [wire format](https://docs.redpanda.com/streaming/current/manage/schema-reg/schema-reg-overview/#wire-format).

```bash
rpk topic consume <topic-name> --num=1 --format='%v\n' | xxd | head -n 1
```

If the first byte is not `00` (magic byte), you must configure your producer to use the wire format.

The `value_schema_id_prefix` mode also requires that schema subjects follow the TopicNameStrategy: `<topic-name>-value`. Verify your schemas use the correct naming:

```bash
rpk registry schema list
```

#### [](#verify-no-records-in-dlq)Verify no records in DLQ

Check that no records failed validation and were written to the dead-letter queue. If records are present, see [Records in DLQ table](#records-in-dlq-table) for resolution steps.

```sql
SELECT COUNT(*) FROM <catalog>."<topic-name>~dlq";
```

### [](#run-systems-in-parallel)Run systems in parallel

Keep your existing Iceberg integration running while Iceberg Topics is enabled. This provides a safety net during the transition period:

-   New data flows to both the source tables and new Iceberg Topics tables.

-   You can validate data consistency between both systems.

-   You have a fallback option if issues arise.


Run a query to compare record counts between systems:

```sql
-- Source table
SELECT COUNT(*) AS source_count FROM <catalog>.<source-table>;

-- Iceberg Topics table
SELECT COUNT(*) AS iceberg_topics_count FROM <catalog>.<iceberg-topics-table>;
```

Record counts should increase at similar rates, accounting for the time Iceberg Topics was enabled. Check for DLQ records (see [Records in DLQ table](#records-in-dlq-table)).

Monitor Iceberg topic metrics to validate that data is flowing at expected rates:

-   `redpanda_iceberg_translation_parquet_rows_added`: Track rows written to Iceberg tables (compare with source write rate)

-   `redpanda_iceberg_translation_translations_finished`: Number of completed translation executions

-   `redpanda_iceberg_translation_invalid_records`: Records that failed validation

-   `redpanda_iceberg_translation_dlq_files_created`: Dead-letter queue activity

-   `redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`: Failed table commits to catalog


If using data transforms for multi-table fanout, also monitor:

-   `redpanda_transform_processor_lag`: Records pending processing in transform input topic


For a complete list of Iceberg metrics, see the [Iceberg metrics reference](https://docs.redpanda.com/streaming/current/reference/public-metrics-reference/#iceberg-metrics).

> 💡 **TIP**
>
> Run both systems for at least 24-48 hours to ensure stability before proceeding with data merge.

### [](#merge-historical-data)Merge historical data

Choose a strategy to combine your historical data with new Iceberg Topics data.

#### [](#option-1-insert-into-pattern-recommended)Option 1: INSERT INTO pattern (recommended)

Use this approach to create a unified table with all data, taking into consideration the following:

-   You want a single table for queries.

-   You can afford the one-time data copy cost.

-   You need optimal query performance.


This SQL pattern uses partition and offset metadata to identify and copy only records not yet in the target table:

```sql
-- Step 1: Find the latest offset per partition in the target (Iceberg Topics) table
WITH latest_offsets AS (
    SELECT
        partition,
        MAX(offset) AS max_offset
    FROM target_iceberg_topics_table
    GROUP BY partition
)
-- Step 2: Insert records from source table that don't exist in target
INSERT INTO target_iceberg_topics_table
SELECT s.*
FROM source_table AS s
LEFT JOIN latest_offsets AS t
    ON s.partition = t.partition
WHERE t.max_offset IS NULL      -- Partition not seen before in target
   OR s.offset > t.max_offset;  -- Record is newer than target's latest offset
```

-   The `latest_offsets` CTE finds the highest offset in the target table for each partition.

-   The `LEFT JOIN` ensures you include partitions never seen before in the target (`t.max_offset IS NULL`).

-   The `WHERE` clause filters to only records with offsets greater than the target’s latest.

-   This avoids duplicates by using Kafka partition and offset as the deduplication key.


This approach may take significant time for large datasets. Consider executing this process during low-query periods. You can also execute on an incremental basis to ease the load on your query engine, for example, by date or partition ranges.

#### [](#option-2-view-based-query-federation)Option 2: View-based query federation

Use this approach to query both tables without copying data if:

-   You cannot afford data copy time or cost.

-   You need immediate access to a unified view.

-   Query complexity and performance are acceptable with federated queries.

-   You may consolidate data later.


Create a view that queries both tables and deduplicates on the fly:

```sql
CREATE VIEW unified_iceberg_view AS
WITH latest_offsets AS (
    SELECT
        partition,
        MAX(offset) AS max_offset
    FROM target_iceberg_topics_table
    GROUP BY partition
),
historical_data AS (
    SELECT s.*
    FROM source_table AS s
    LEFT JOIN latest_offsets AS t
        ON s.partition = t.partition
    WHERE t.max_offset IS NULL
       OR s.offset <= t.max_offset  -- Only historical records not in target
),
new_data AS (
    SELECT *
    FROM target_iceberg_topics_table
)
SELECT * FROM historical_data
UNION ALL
SELECT * FROM new_data;
```

Most Iceberg-compatible query engines support views, including Snowflake, Databricks, ClickHouse, and Spark.

### [](#validate-the-migration)Validate the migration

After completing the data merge, verify the migration before cutting over:

-    Record counts match between source and target:

    ```sql
    -- Compare record counts
    SELECT 'Source' AS table_name, COUNT(*) AS record_count
    FROM <catalog>.<source-table>
    UNION ALL
    SELECT 'Target', COUNT(*)
    FROM <catalog>.<iceberg-topics-table>;
    ```

-    All partitions are represented in the target:

    ```sql
    -- Check for missing partitions
    SELECT DISTINCT partition FROM <catalog>.<source-table>
    EXCEPT
    SELECT DISTINCT partition FROM <catalog>.<iceberg-topics-table>;
    -- Should return no rows
    ```

-    Date ranges cover the full historical period. Compare `MIN(timestamp)` and `MAX(timestamp)` between source and target tables to ensure the target covers the same time range.

-    No gaps in offset sequences:

    ```sql
    -- Check for offset gaps (may indicate missing data)
    WITH offset_check AS (
      SELECT
        partition,
        offset,
        LAG(offset) OVER (PARTITION BY partition ORDER BY offset) AS prev_offset
      FROM <catalog>.<iceberg-topics-table>
    )
    SELECT * FROM offset_check
    WHERE offset - prev_offset > 1;
    -- Should return no rows
    ```

-    Sample queries return expected results. Spot check specific records by ID to verify data accuracy.

-    Schema translation is correct. Run `DESCRIBE` on both tables and verify all fields are present with correct data types.

-    New records are flowing to Iceberg Topics. Check record count for a recent time window (for example, the last hour).

-    Query performance is acceptable.

-    Monitoring and alerts are configured.

-    No records in DLQ (see [Records in DLQ table](#records-in-dlq-table)).


### [](#troubleshoot-common-migration-issues)Troubleshoot common migration issues

#### [](#records-in-dlq-table)Records in DLQ table

Iceberg Topics write records that fail validation to a dead-letter queue (DLQ) table. Records may appear in the DLQ due to:

-   Schema Registry issues. For example, using the wrong schema subject name, or Redpanda cannot find the embedded schema ID in Schema Registry.

-   When using `value_schema_id_prefix` mode: messages not encoded with Schema Registry wire format.

-   Incompatible schema changes. For example, changing field types or removing required fields.

-   Data type translation failures.


To check for DLQ records during migration:

```sql
SELECT COUNT(*) FROM <catalog>."<topic-name>~dlq";
```

If the count is greater than zero, inspect the failed records. See [Troubleshoot Iceberg Topics](https://docs.redpanda.com/streaming/current/manage/iceberg/iceberg-troubleshooting/) for steps to inspect and reprocess DLQ records.

#### [](#multi-table-fan-out-transform-issues)Multi-table fan-out transform issues

If the transform does not process messages, check if:

-   The specified output topics don’t exist or aren’t enabled with Iceberg.

-   The routing logic in the transform is incorrect, or the routing field is missing from input messages.

-   (When using Schema Registry validation) The schema registration failed during initialization, preventing the transform from starting.


To check the transform status:

```bash
rpk transform list
```

To view logs and check for errors:

```bash
rpk transform logs <transform-name>
```

To check for routing errors:

```bash
rpk transform logs <transform-name> | grep -i "unknown\|error"
```

If using Schema Registry validation, verify schema registration:

```bash
# Check transform logs for schema registration messages
rpk transform logs <transform-name> | grep -i "schema"

# List registered schemas
rpk registry schema list
```

### [](#plan-for-rollback)Plan for rollback

Before cutting over, ensure you have a rollback strategy. See the [Pre-cutover checklist](#pre-cutover-checklist) in the cutover section to verify you’re ready.

#### [](#rollback-during-parallel-operation)Rollback during parallel operation

If you discover issues while both systems are running:

1.  Keep producing to both systems.

2.  Point consumers back to source tables.

3.  Investigate Iceberg Topics issues using troubleshooting section.

4.  Fix issues and re-validate.

5.  Attempt cutover again when ready.


#### [](#rollback-after-external-integration-disabled)Rollback after external integration disabled

> ⚠️ **WARNING**
>
> Rollback after stopping your external Iceberg integration may result in data loss or gaps.

If you must rollback after disabling the external integration:

1.  Restart your external Iceberg integration immediately.

2.  Identify data written only to Iceberg Topics during the gap.

3.  Export that data from Iceberg Topics tables:

    ```sql
    SELECT * FROM iceberg_topics_table
    WHERE timestamp > '<integration_stop_time>';
    ```

4.  Write exported data back to the source system (for example, Kafka Connect input topics or directly to source tables).

5.  Verify data completeness across both systems.

6.  Resume operations on the external integration.


Redpanda recommends maintaining the ability to rollback for at least seven days after cutover to allow for issue discovery.

### [](#cut-over-to-iceberg-topics)Cut over to Iceberg Topics

#### [](#pre-cutover-checklist)Pre-cutover checklist

Before disabling your external Iceberg integration, ensure you have completed all validation steps:

-    All historical data is successfully merged (see [Merge historical data](#merge-historical-data)).

-    Parallel operation is complete and stable for at least 24-48 hours.

-    All validation queries pass (see [Validate the migration](#validate-the-migration)).

-    No records in DLQ tables, or all DLQ records are investigated and resolved.

-    Query performance meets requirements.

-    Downstream consumers are successfully tested with Iceberg Topics tables.

-    Monitoring and alerts are configured.

-    Rollback plan is verified and documented.


#### [](#cutover-procedure)Cutover procedure

1.  Set an appropriate maintenance window, ideally during low-traffic periods.

2.  Stop your external Iceberg integration.

3.  Monitor Iceberg Topics to ensure data continues flowing.

4.  Verify that no new records are being written to source tables:

    ```sql
    SELECT MAX(timestamp) FROM <catalog>.<source-table>;
    -- Should not change after integration is stopped
    ```

5.  Run validation queries from [Validate the migration](#validate-the-migration) after 1-2 hours of operation.

6.  Wait for a short period, such as 24-48 hours, to monitor and validate stability.

7.  If migrating to a unified table of historical plus new data, optionally delete old source tables after an extended validation period (for example, at least seven days):

    > 📝 **NOTE**
    >
    > Ensure you have backups before deleting historical data. Some organizations keep old tables for compliance or audit purposes.

    ```sql
    DROP TABLE <catalog>.<source-table>;
    ```

8.  Decommission external Iceberg infrastructure after an extended safety period (30+ days, for example).


If any issues arise during cutover, see [Plan for rollback](#plan-for-rollback).

## [](#next-steps)Next steps

-   [Query Iceberg Topics](https://docs.redpanda.com/streaming/current/manage/iceberg/query-iceberg-topics/)

-   [About Iceberg Topics](https://docs.redpanda.com/streaming/current/manage/iceberg/about-iceberg-topics/)