Use Iceberg Catalogs

Beta

To read from the Redpanda-generated Iceberg table, your Iceberg-compatible client or tool needs access to the catalog to retrieve the table metadata and know the current state of the table. The catalog provides the current table metadata, which includes locations for all the table’s data files. You can configure Redpanda to either connect to a REST-based catalog, or use a filesystem-based catalog.

The Iceberg integration for Redpanda Cloud is a beta feature. It is not supported for production deployments. To configure REST catalog authentication for use with Iceberg topics in your cloud cluster, contact Redpanda support.

For production deployments, Redpanda recommends using an external REST catalog to manage Iceberg metadata. This enables built-in table maintenance, safely handles multiple engines and tools accessing tables at the same time, facilitates data governance, and maximizes data discovery. However, if it is not possible to use a REST catalog, you may use the filesystem-based catalog (object_storage catalog type), which does not require you to maintain a separate service to access the Iceberg data. In either case, you use the catalog to load, query, or refresh the Iceberg table as you produce to the Redpanda topic. See the documentation for your query engine or Iceberg-compatible tool for specific guidance on adding the Iceberg tables to your data warehouse or lakehouse using the catalog.

After you have selected a catalog type at the cluster level and enabled the Iceberg integration for a topic, you cannot switch to another catalog type.

Integrate filesystem-based catalog (object_storage)

By default, Iceberg topics use the filesystem-based catalog (iceberg_catalog_type cluster property set to object_storage). Redpanda stores the table metadata in hhttps://iceberg.apache.org/docs/latest/java-api-quickstart/#using-a-hadoop-catalog[HadoopCatalog^] format in the same object storage bucket or container as the data files.

If using the object_storage catalog type, you provide the object storage URI of the table’s metadata.json file to an Iceberg client so it can access the catalog and data files for your Redpanda Iceberg tables.

The metadata.json file points to a specific Iceberg table snapshot. In your query engine, you must update your tables whenever a new snapshot is created so that they point to the latest snapshot. See the official Iceberg documentation for more information, and refer to the documentation for your query engine or Iceberg-compatible tool for specific guidance on Iceberg table update or refresh.

Example filesystem-based catalog configuration

To configure Apache Spark to use a filesystem-based catalog, specify at least the following properties:

spark.sql.catalog.streaming = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.streaming.type = hadoop
# URI for table metadata: AWS S3 example
spark.sql.catalog.streaming.warehouse = s3a://<bucket-name>/redpanda-iceberg-catalog
# You may need to configure additional properties based on your object storage provider.
# See https://iceberg.apache.org/docs/latest/spark-configuration/#spark-configuration and https://spark.apache.org/docs/latest/configuration.html
# For example, for AWS S3:
# spark.hadoop.fs.s3.impl = org.apache.hadoop.fs.s3a.S3AFileSystem
# spark.hadoop.fs.s3a.endpoint = http://<s3-uri>
# spark.sql.catalog.streaming.s3.endpoint = http://<s3-uri>
Redpanda recommends setting credentials in environment variables so Spark can securely access your Iceberg data in object storage. For example, for AWS, use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Depending on your processing engine, you may need to also create a new table to point the data lakehouse to the table location.

Specify metadata location

The base path for the filesystem-based catalog if using the object_storage catalog type is redpanda-iceberg-catalog.