Cloud

Query Iceberg-enabled Topics

With Iceberg-enabled topics, you can retain streaming data in your lakehouse beyond your Redpanda topic retention window and continue to query it with SQL. A single query returns both the live records still in the topic and the Iceberg-committed history that has aged out under topic retention, in one result.

For the streaming-only query path (live records only, without Iceberg history), see Query Streaming Topics.

After reading this page, you will be able to:

  • Identify the storage and catalog objects Cloud sets up for querying Iceberg-enabled topics

  • Query both the live records and the Iceberg-committed history of a topic in a single result

  • Handle schema differences between a Redpanda topic and its Iceberg table

Prerequisites

  • A Redpanda BYOC cluster on AWS with Redpanda SQL enabled. See Enable Redpanda SQL.

  • The cluster’s iceberg_catalog_type property is set to rest. The object_storage catalog type does not support Iceberg queries from Redpanda SQL.

  • The cluster’s Iceberg REST catalog is configured. See Integrate with REST Catalogs for the supported REST catalog options (AWS Glue, Snowflake, Databricks Unity, Polaris, and others) and their configuration steps.

  • An Iceberg-enabled Redpanda topic with a schema (Protobuf, Avro, or JSON) registered in Schema Registry. To enable Iceberg on a topic, see About Iceberg Topics.

  • Connect to Redpanda SQL with psql or another PostgreSQL client. See Connect to Redpanda SQL.

How the query catalogs are set up

When you enable Redpanda SQL on a BYOC cluster with the Iceberg REST catalog configured, Cloud runs the following statements under the hood so you don’t have to. The result is a default_redpanda_catalog already linked to an Iceberg catalog, ready for CREATE TABLE against an Iceberg-enabled topic. The storage and Iceberg-catalog options match the cluster’s REST catalog configuration (endpoint, credentials, region).

  1. A storage connection holds the object-storage credentials your cluster’s REST catalog uses:

    CREATE STORAGE iceberg_storage TYPE = S3 WITH (
        endpoint = '<rest-catalog-storage-endpoint>',
        access_key_id = '<access-key-id>',
        secret_access_key = '<secret-access-key>',
        region = '<region>',
        path_style = 'true'
    );
  2. An Iceberg catalog points at the cluster’s REST catalog endpoint:

    CREATE ICEBERG CATALOG lakehouse_catalog STORAGE iceberg_storage WITH (
        uri = '<rest-catalog-uri>',
        warehouse = '<warehouse>',
        auth_type = '<auth-type>'
    );
  3. The default_redpanda_catalog is linked to the Iceberg catalog. The pandaproxy_url option is required when a Redpanda catalog is linked:

    ALTER REDPANDA CATALOG default_redpanda_catalog
    USING CATALOG lakehouse_catalog WITH (
        pandaproxy_url = '<pandaproxy-url>'
    );

To inspect the link on your cluster, run DESCRIBE REDPANDA CATALOG default_redpanda_catalog. For full option lists, see CREATE STORAGE, CREATE ICEBERG CATALOG, and CREATE REDPANDA CATALOG.

Map a topic as a SQL table

Define a SQL table against the Iceberg-enabled topic in default_redpanda_catalog. Replace orders and orders-value with your topic name and Schema Registry value subject:

CREATE TABLE default_redpanda_catalog=>orders WITH (
  topic = 'orders',
  schema_subject = 'orders-value'
);

Redpanda SQL reads the registered schema from Schema Registry to map fields to SQL columns.

When you query a table mapped from a Redpanda topic, Redpanda SQL also exposes two reserved metadata columns alongside your schema’s columns:

  • redpanda (a struct with topic-level metadata such as partition, offset, timestamp, key, and headers)

  • redpanda_raw (raw key and value bytes)

These column names are reserved. A topic schema with a top-level redpanda or redpanda_raw field conflicts with the metadata columns.

Query live and historical records together

Query the table using standard SELECT syntax. The query returns records from both the live Redpanda topic and the Iceberg-committed history in one result, with no overlap between them.

After any shape change to the Iceberg table or its corresponding Redpanda topic schema, run REFRESH <catalog>⇒<table> to update the schema view Redpanda SQL uses for query planning. Without a refresh, the next query against the table fails at planning time with Schema not found for Iceberg table '<table>'. REFRESH updates only the schema view, not data; you don’t run it between writes.
SELECT * FROM default_redpanda_catalog=>orders LIMIT 10;

Redpanda SQL plans the union internally, so you don’t write a UNION ALL.

Iceberg-committed data persists independently of Redpanda topic retention. Queries continue to return records past the Redpanda topic’s retention window, provided they were committed to Iceberg first.

Handle schema differences

A topic schema can evolve over time. You might add or remove fields in your Schema Registry value subject as your application changes. Redpanda writes new records to the Iceberg table forward-only: it adds columns to the Iceberg table when the topic schema widens, but it does not drop columns from the Iceberg table when the topic schema narrows. As a result, the Iceberg table can carry columns the current topic schema doesn’t have.

Redpanda SQL handles this divergence by treating the topic’s Schema Registry subject as canonical and filling in missing fields on the Iceberg side. The rules that apply to a query against a linked Redpanda catalog:

  • The result’s column order, count, and types come from the topic schema.

  • If the Iceberg table is missing a column that the topic schema has, Redpanda SQL returns NULL for that column on Iceberg-side rows. This is common after adding a new column to the topic schema, since earlier records already in Iceberg don’t have it.

  • If the Iceberg table has columns the topic schema does not, the query fails at planning time with the error Kafka schema must be a name-superset of Iceberg schema. This happens when a column is removed from the topic’s Schema Registry subject. To resolve, re-add the column to the topic’s Schema Registry subject so the topic schema covers every column present in the Iceberg table.

Next steps