# Query Iceberg Topics

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: Query Iceberg Topics
latest-redpanda-tag: v25.1.1
latest-console-tag: v3.7.3
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "true"
page-eol-date: April 7, 2026
latest-connect-version: 4.93.0
docname: iceberg/query-iceberg-topics
page-component-name: streaming
page-version: "25.1"
page-component-version: "25.1"
page-component-title: Streaming
page-relative-src-path: iceberg/query-iceberg-topics.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/v/25.1/modules/manage/pages/iceberg/query-iceberg-topics.adoc
description: Query Redpanda topic data stored in Iceberg tables, based on the topic Iceberg mode and schema.
page-git-created-date: "2025-04-07"
page-git-modified-date: "2025-04-18"
support-status: past end-of-life
---

<!-- Source: https://docs.redpanda.com/streaming/25.1/manage/iceberg/query-iceberg-topics.md -->

> 📝 **NOTE**
>
> This feature requires an [enterprise license](https://docs.redpanda.com/streaming/25.1/get-started/licensing/overview/). To get a trial license key or extend your trial period, [generate a new trial license key](https://redpanda.com/try-enterprise). To purchase a license, contact [Redpanda Sales](https://redpanda.com/upgrade).
>
> If Redpanda has enterprise features enabled and it cannot find a valid license, [restrictions](https://docs.redpanda.com/streaming/25.1/get-started/licensing/overview/#self-managed) apply.

When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic [Iceberg mode](https://docs.redpanda.com/streaming/25.1/manage/iceberg/choose-iceberg-mode/) and whether you’ve registered a schema for the topic in the [Redpanda Schema Registry](https://docs.redpanda.com/streaming/25.1/manage/schema-reg/schema-reg-overview/). You do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda.

## [](#access-iceberg-tables)Access Iceberg tables

Redpanda generates an Iceberg table with the same name as the topic. Depending on the processing engine and your Iceberg [catalog implementation](https://docs.redpanda.com/streaming/25.1/manage/iceberg/use-iceberg-catalogs/), you may also need to define the table (for example using `CREATE TABLE`) to point the data lakehouse to its location in the catalog. For an example, see [Query Iceberg Topics using Snowflake and Open Catalog](https://docs.redpanda.com/streaming/25.1/manage/iceberg/redpanda-topics-iceberg-snowflake-catalog/).

If your engine needs the full JSON metadata path, use the following:

```none
redpanda-iceberg-catalog/metadata/redpanda/<topic-name>/v<version-number>.metadata.json
```

This provides read access to all snapshots written as of the specified table version (denoted by `version-number`).

> 📝 **NOTE**
>
> Redpanda automatically removes expired snapshots on a periodic basis. Snapshot expiry helps maintain a smaller metadata size and reduces the window available for [time travel](#time-travel-queries).

## [](#query-examples)Query examples

To follow along with the examples on this page, suppose you produce the same stream of events to a topic `ClickEvent`, which uses a schema, and another topic `ClickEvent_key_value`, which uses the key-value mode. The topics have [Tiered Storage](https://docs.redpanda.com/streaming/25.1/reference/glossary/#tiered-storage) configured to an AWS S3 bucket. A sample record contains the following data:

```bash
{"user_id": 2324, "event_type": "BUTTON_CLICK", "ts": "2024-11-25T20:23:59.380Z"}
```

### [](#topic-with-schema-value_schema_id_prefix-mode)Topic with schema (`value_schema_id_prefix` mode)

> 📝 **NOTE**
>
> The steps in this section also apply to the `value_schema_latest` mode, except for step 2. The `value_schema_latest` mode doesn’t require the Schema Registry wire format, so you’ll use your own producer code instead of [`rpk topic produce`](#reference:rpk/rpk-topic/rpk-topic-produce).

Assume that you have created the `ClickEvent` topic, set `redpanda.iceberg.mode` to `value_schema_id_prefix`, and are connecting to a REST-based Iceberg catalog. The following is an Avro schema for `ClickEvent`:

`schema.avsc`

```avro
{
    "type" : "record",
    "namespace" : "com.redpanda.examples.avro",
    "name" : "ClickEvent",
    "fields" : [
       { "name": "user_id", "type" : "int" },
       { "name": "event_type", "type" : "string" },
       { "name": "ts", "type": "string" }
    ]
 }
```

1.  Register the schema under the `ClickEvent-value` subject:

    ```bash
    rpk registry schema create ClickEvent-value --schema path/to/schema.avsc --type avro
    ```

2.  Produce to the `ClickEvent` topic using the following format:

    ```bash
    echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' --schema-id=topic
    ```

    The `value_schema_id_prefix` mode requires that you produce to a topic using the [Schema Registry wire format](https://docs.redpanda.com/streaming/25.1/manage/schema-reg/schema-reg-overview/#wire-format), which includes the magic byte and schema ID in the prefix of the message payload. This allows Redpanda to identify the correct schema version in the Schema Registry for a record.

3.  The following Spark SQL query returns values from columns in the `ClickEvent` table, with the table structure derived from the schema, and column names matching the schema fields. If you’ve integrated a catalog, query engines such as Spark SQL provide Iceberg integrations that allow easy discovery and access to existing Iceberg tables in object storage.

    ```sql
    SELECT *
    FROM `<catalog-name>`.redpanda.ClickEvent;
    ```

    ```bash
    +-----------------------------------+---------+--------------+--------------------------+
    | redpanda                          | user_id | event_type   | ts                       |
    +-----------------------------------+---------+--------------+--------------------------+
    | {"partition":0,"offset":0,"timestamp":2025-03-05 15:09:20.436,"headers":null,"key":null} | 2324    | BUTTON_CLICK | 2024-11-25T20:23:59.380Z |
    +-----------------------------------+---------+--------------+--------------------------+
    ```


### [](#topic-in-key-value-mode)Topic in key-value mode

In `key_value` mode, you do not associate the topic with a schema in the Schema Registry, which means using semi-structured data in Iceberg. The record keys and values can have an arbitrary structure, so Redpanda stores them in [binary format](https://apache.github.io/iceberg/spec/?h=spec#primitive-types) in Iceberg.

In this example, assume that you have created the `ClickEvent_key_value` topic, and set `redpanda.iceberg.mode` to `key_value`.

1.  Produce to the `ClickEvent_key_value` topic using the following format:

    ```bash
    echo 'key1 {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent_key_value --format='%k %v\n'
    ```

2.  The following Spark SQL query returns the semi-structured data in the `ClickEvent_key_value` table. The table consists of two columns: one named `redpanda`, containing the record key and other metadata, and another binary column named `value` for the record’s value:

    ```sql
    SELECT *
    FROM `<catalog-name>`.redpanda.ClickEvent_key_value;
    ```

    ```bash
    +-----------------------------------+------------------------------------------------------------------------------+
    | redpanda                          | value                                                                        |
    +-----------------------------------+------------------------------------------------------------------------------+
    | {"partition":0,"offset":0,"timestamp":2025-03-05 15:14:30.931,"headers":null,"key":key1} | {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"} |
    +-----------------------------------+------------------------------------------------------------------------------+
    ```


Depending on your query engine, you might need to first decode the binary value to display the record key and value using a SQL helper function. For example, see the [`decode` and `unhex`](https://spark.apache.org/docs/latest/api/sql/index.html#unhex) Spark SQL functions, or the [HEX\_DECODE\_STRING](https://docs.snowflake.com/en/sql-reference/functions/hex_decode_string) Snowflake function. Some engines may also automatically decode the binary value for you.

### [](#time-travel-queries)Time travel queries

Some query engines, such as Spark, support time travel with Iceberg, allowing you to query the table at a specific point in time. You can query the table as it existed at a specific timestamp or version number.

Redpanda automatically removes expired snapshots on a periodic basis, which also reduces the window available for time travel queries.

The following example queries a `ClickEvent` table at a specific timestamp in Spark:

```sql
SELECT * FROM `<catalog-name>`.redpanda.ClickEvent TIMESTAMP AS OF '2025-03-02 10:00:00';
```

## [](#next-steps)Next steps

-   [Query Iceberg Topics using Snowflake and Open Catalog](https://docs.redpanda.com/streaming/25.1/manage/iceberg/redpanda-topics-iceberg-snowflake-catalog/)


## Suggested labs

-   [Redpanda Iceberg Docker Compose Example](https://docs.redpanda.com/labs/docker-compose/iceberg/)
-   [Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark](https://docs.redpanda.com/labs/kubernetes/iceberg/)

[Search all labs](https://docs.redpanda.com/labs)