# Query Iceberg Topics

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: Query Iceberg Topics
latest-redpanda-tag: v26.1.9
latest-console-tag: v3.7.3
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "false"
page-eol-date: March 31, 2027
latest-connect-version: 4.93.0
docname: iceberg/query-iceberg-topics
page-component-name: streaming
page-version: "26.1"
page-component-version: "26.1"
page-component-title: Streaming
page-relative-src-path: iceberg/query-iceberg-topics.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/main/modules/manage/pages/iceberg/query-iceberg-topics.adoc
description: Query Redpanda topic data stored in Iceberg tables, based on the topic Iceberg mode and schema.
page-git-created-date: "2025-04-07"
page-git-modified-date: "2026-04-24"
support-status: supported
---

<!-- Source: https://docs.redpanda.com/streaming/current/manage/iceberg/query-iceberg-topics.md -->

> 📝 **NOTE**
>
> This feature requires an [enterprise license](https://docs.redpanda.com/streaming/current/get-started/licensing/). To get a trial license key or extend your trial period, [generate a new trial license key](https://redpanda.com/try-enterprise). To purchase a license, contact [Redpanda Sales](https://redpanda.com/upgrade).
>
> If Redpanda has enterprise features enabled and it cannot find a valid license, [restrictions](https://docs.redpanda.com/streaming/current/get-started/licensing/#self-managed) apply.

When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic [Iceberg mode](https://docs.redpanda.com/streaming/current/manage/iceberg/specify-iceberg-schema/) and whether you’ve registered a schema for the topic in the [Redpanda Schema Registry](https://docs.redpanda.com/streaming/current/manage/schema-reg/schema-reg-overview/). You do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda.

## [](#access-iceberg-tables)Access Iceberg tables

> ❗ **IMPORTANT**
>
> Redpanda v25.3 introduces breaking schema changes for Iceberg topics. If you are using Iceberg topics and want to retain the data in the corresponding Iceberg tables, review [Schema Changes and Migration Guide for Iceberg Topics in Redpanda v25.3](https://docs.redpanda.com/streaming/current/upgrade/iceberg-schema-changes-and-migration-guide/) before upgrading your cluster, and follow the required migration steps to avoid sending new records to a dead-letter queue table.

Redpanda generates an Iceberg table with the same name as the topic. Depending on the processing engine and your Iceberg [catalog implementation](https://docs.redpanda.com/streaming/current/manage/iceberg/use-iceberg-catalogs/), you may also need to define the table (for example using `CREATE TABLE`) to point the data lakehouse to its location in the catalog. For an example, see [Query Iceberg Topics using Snowflake and Open Catalog](https://docs.redpanda.com/streaming/current/manage/iceberg/redpanda-topics-iceberg-snowflake-catalog/).

Some query engines may require you to manually refresh the Iceberg table snapshot (for example, by running a command like `ALTER TABLE <table-name> REFRESH;`) to see the latest data.

If your engine needs the full JSON metadata path, use the following:

```none
redpanda-iceberg-catalog/redpanda/<topic-name>/metadata/v<version-number>.metadata.json
```

This provides read access to all snapshots written as of the specified table version (denoted by `version-number`).

> 📝 **NOTE**
>
> Redpanda automatically removes expired snapshots on a periodic basis. Snapshot expiry helps maintain a smaller metadata size and reduces the window available for [time travel](#time-travel-queries).

## [](#query-examples)Query examples

To follow along with the examples on this page, suppose you produce the same stream of events to a topic `ClickEvent`, which uses a schema, and another topic `ClickEvent_key_value`, which uses the key-value mode. The topics have [Tiered Storage](https://docs.redpanda.com/streaming/current/reference/glossary/#tiered-storage) configured to an AWS S3 bucket. A sample record contains the following data:

```bash
{"user_id": 2324, "event_type": "BUTTON_CLICK", "ts": "2024-11-25T20:23:59.380Z"}
```

> 📝 **NOTE**
>
> The query examples on this page use `redpanda` as the Iceberg namespace, which is the default. If you configured a different namespace using `[iceberg_default_catalog_namespace](https://docs.redpanda.com/streaming/current/reference/properties/cluster-properties/#iceberg_default_catalog_namespace)`, replace `redpanda` with your configured namespace.

### [](#topic-with-schema-value_schema_id_prefix-mode)Topic with schema (`value_schema_id_prefix` mode)

> 📝 **NOTE**
>
> The steps in this section also apply to the `value_schema_latest` mode, except the produce step. The `value_schema_latest` mode is not compatible with the Schema Registry wire format. The [`rpk topic produce`](#reference:rpk/rpk-topic/rpk-topic-produce) command embeds the wire format header, so you must use your own producer code with `value_schema_latest`.

Assume that you have created the `ClickEvent` topic, set `redpanda.iceberg.mode` to `value_schema_id_prefix`, and are connecting to a REST-based Iceberg catalog. The following is an Avro schema for `ClickEvent`:

`schema.avsc`

```avro
{
    "type" : "record",
    "namespace" : "com.redpanda.examples.avro",
    "name" : "ClickEvent",
    "fields" : [
       { "name": "user_id", "type" : "int" },
       { "name": "event_type", "type" : "string" },
       { "name": "ts", "type": "string" }
    ]
 }
```

1.  Register the schema under the `ClickEvent-value` subject:

    ```bash
    rpk registry schema create ClickEvent-value --schema path/to/schema.avsc --type avro
    ```

2.  Produce to the `ClickEvent` topic using the following format:

    ```bash
    echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' --schema-id=topic
    ```

    The `value_schema_id_prefix` mode requires that you produce to a topic using the [Schema Registry wire format](https://docs.redpanda.com/streaming/current/manage/schema-reg/schema-reg-overview/#wire-format), which includes the magic byte and schema ID in the prefix of the message payload. This allows Redpanda to identify the correct schema version in the Schema Registry for a record.

3.  The following Spark SQL query returns values from columns in the `ClickEvent` table, with the table structure derived from the schema, and column names matching the schema fields. If you’ve integrated a catalog, query engines such as Spark SQL provide Iceberg integrations that allow easy discovery and access to existing Iceberg tables in object storage.

    ```sql
    SELECT *
    FROM `<catalog-name>`.redpanda.ClickEvent;
    ```

    ```bash
    +-----------------------------------+---------+--------------+--------------------------+
    | redpanda                          | user_id | event_type   | ts                       |
    +-----------------------------------+---------+--------------+--------------------------+
    | {"partition":0,"offset":0,"timestamp":2025-03-05 15:09:20.436,"headers":null,"key":null} | 2324    | BUTTON_CLICK | 2024-11-25T20:23:59.380Z |
    +-----------------------------------+---------+--------------+--------------------------+
    ```


### [](#topic-in-key-value-mode)Topic in key-value mode

In `key_value` mode, you do not associate the topic with a schema in the Schema Registry, which means using semi-structured data in Iceberg. The record keys and values can have an arbitrary structure, so Redpanda stores them in [binary format](https://apache.github.io/iceberg/spec/?h=spec#primitive-types) in Iceberg.

In this example, assume that you have created the `ClickEvent_key_value` topic, and set `redpanda.iceberg.mode` to `key_value`.

1.  Produce to the `ClickEvent_key_value` topic using the following format:

    ```bash
    echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent_key_value --format='%k %v\n'
    ```

2.  The following Spark SQL query returns the semi-structured data in the `ClickEvent_key_value` table. The table consists of two columns: one named `redpanda`, containing the record key and other metadata, and another binary column named `value` for the record’s value:

    ```sql
    SELECT *
    FROM `<catalog-name>`.redpanda.ClickEvent_key_value;
    ```

    ```bash
    +-----------------------------------+------------------------------------------------------------------------------+
    | redpanda                          | value                                                                        |
    +-----------------------------------+------------------------------------------------------------------------------+
    | {"partition":0,"offset":0,"timestamp":2025-03-05 15:14:30.931,"headers":null,"key":key1} | {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"} |
    +-----------------------------------+------------------------------------------------------------------------------+
    ```


Depending on your query engine, you might need to first decode the binary value to display the record key and value using a SQL helper function. For example, see the [`decode` and `unhex`](https://spark.apache.org/docs/latest/api/sql/index.html#unhex) Spark SQL functions, or the [HEX\_DECODE\_STRING](https://docs.snowflake.com/en/sql-reference/functions/hex_decode_string) Snowflake function. Some engines may also automatically decode the binary value for you.

### [](#time-travel-queries)Time travel queries

Some query engines, such as Spark, support time travel with Iceberg, allowing you to query the table as it existed at a specific point in the past. You can run a time travel query by specifying a timestamp or version number.

Redpanda automatically removes expired snapshots on a periodic basis, which also reduces the window available for time travel queries. By default, Redpanda retains snapshots for five days, so you can query Iceberg tables as of up to five days ago.

The following example queries a `ClickEvent` table at a specific timestamp in Spark:

```sql
SELECT * FROM `<catalog-name>`.redpanda.ClickEvent TIMESTAMP AS OF '2025-03-02 10:00:00';
```

## Suggested labs

-   [Redpanda Iceberg Docker Compose Example](https://docs.redpanda.com/labs/docker-compose/iceberg/)
-   [Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark](https://docs.redpanda.com/labs/kubernetes/iceberg/)

[Search all labs](https://docs.redpanda.com/labs)