Use Iceberg Catalogs

This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales.

If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply.

To read from the Redpanda-generated Iceberg table, your Iceberg-compatible client or tool needs access to the catalog to retrieve the table metadata and know the current state of the table. The catalog provides the current table metadata, which includes locations for all the table’s data files. You can configure Redpanda to either connect to a REST-based catalog, or use a filesystem-based catalog.

For production deployments, Redpanda recommends using an external REST catalog to manage Iceberg metadata. This enables built-in table maintenance, safely handles multiple engines and tools accessing tables at the same time, facilitates data governance, and maximizes data discovery. However, if it is not possible to use a REST catalog, you may use the filesystem-based catalog (object_storage catalog type), which does not require you to maintain a separate service to access the Iceberg data. In either case, you use the catalog to load, query, or refresh the Iceberg table as you produce to the Redpanda topic. See the documentation for your query engine or Iceberg-compatible tool for specific guidance on adding the Iceberg tables to your data warehouse or lakehouse using the catalog.

After you have selected a catalog type at the cluster level and enabled the Iceberg integration for a topic, you cannot switch to another catalog type.

Connect to a REST catalog

Connect to an Iceberg REST catalog using the standard REST API supported by many catalog providers. Use this catalog integration type with REST-enabled Iceberg catalog services, such as Databricks Unity and Snowflake Open Catalog.

Limitations

The Iceberg integration for Redpanda Self-Managed supports multiple Iceberg catalogs, with progressive levels of release maturity. Each catalog integration is tested and released independently.

The following shows the current status of Iceberg catalog integrations. Check this list regularly as Redpanda continues to expand GA coverage for Iceberg topics.

Catalog service	Status
Databricks Unity Catalog	Tested
Snowflake Open Catalog	Tested
Dremio Nessie	Tested

Catalog service

Status

Databricks Unity Catalog

Tested

Snowflake Open Catalog

Tested

Dremio Nessie

Tested

Set cluster properties

To connect to a REST catalog, set the following cluster configuration properties:

iceberg_catalog_type: rest
iceberg_rest_catalog_endpoint: The endpoint URL for your Iceberg catalog. You either manage this directly, or you have this managed by an external catalog service.

You must set iceberg_rest_catalog_endpoint at the same time that you set iceberg_catalog_type to rest.

Configure authentication

To authenticate with the REST catalog, set the following cluster properties:

iceberg_rest_catalog_authentication_mode: The authentication mode to use for the REST catalog. Choose from oauth2, bearer, or none (default).
- For oauth2, also configure the following properties:
  iceberg_rest_catalog_oauth2_server_uri: The OAuth endpoint URI used to retrieve tokens for REST catalog authentication. If left unset, the deprecated catalog endpoint /v1/oauth/tokens is used as the token endpoint instead.
  
  iceberg_rest_catalog_client_id: The ID used to query the OAuth token endpoint for REST catalog authentication.
  
  iceberg_rest_catalog_client_secret: The secret used with the client ID to query the OAuth token endpoint for REST catalog authentication.
- For bearer, configure the iceberg_rest_catalog_token property with your bearer token.
  
  Redpanda uses the bearer token unconditionally and does not attempt to refresh the token. Only use the bearer authentication mode for ad hoc or testing purposes.

For REST catalogs that use self-signed certificates, also configure these properties:

iceberg_rest_catalog_trust: The contents of a certificate chain to trust for the REST catalog.
- Or, use iceberg_rest_catalog_trust_file to specify the path to the certificate chain file.
iceberg_rest_catalog_crl: The contents of a certificate revocation list for iceberg_rest_catalog_trust.
- Or, use iceberg_rest_catalog_crl_file to specify the path to the certificate revocation list file.

See Cluster Configuration Properties for the full list of cluster properties to configure for a catalog integration.

Example REST catalog configuration

Suppose you configure the following Redpanda cluster properties for connecting to a REST catalog:

iceberg_catalog_type: rest
iceberg_rest_catalog_endpoint: http://catalog-service:8181
iceberg_rest_catalog_authentication_mode: oauth2
iceberg_rest_catalog_client_id: <rest-connection-id>
iceberg_rest_catalog_client_secret: <rest-connection-secret>

If you use Apache Spark as a processing engine, your Spark configuration might look like the following. This example uses a catalog named streaming:

spark.sql.catalog.streaming = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.streaming.type = rest
spark.sql.catalog.streaming.uri = http://catalog-service:8181
# You may need to configure additional properties based on your object storage provider.
# See https://iceberg.apache.org/docs/latest/spark-configuration/#catalog-configuration and https://spark.apache.org/docs/latest/configuration.html
# For example, for AWS S3:
# spark.sql.catalog.streaming.io-impl = org.apache.iceberg.aws.s3.S3FileIO
# spark.sql.catalog.streaming.warehouse = s3://<bucket-name>/
# spark.sql.catalog.streaming.s3.endpoint = http://<s3-uri>

Redpanda recommends setting credentials in environment variables so Spark can securely access your Iceberg data in object storage. For example, for AWS, use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

The Spark engine can use the REST catalog to automatically discover the topic’s Iceberg table. Using Spark SQL, you can query the Iceberg table directly by specifying the catalog name, the namespace, and the table name:

SELECT * FROM streaming.redpanda.<table-name>;

The Iceberg table name is the name of your Redpanda topic. Redpanda puts the Iceberg table into a namespace called redpanda, creating the namespace if necessary.

You may need to explicitly create a table for the Iceberg data in your query engine. For an example, see Query Iceberg Topics using Snowflake and Open Catalog.

Integrate filesystem-based catalog (`object_storage`)

By default, Iceberg topics use the filesystem-based catalog (iceberg_catalog_type cluster property set to object_storage). Redpanda stores the table metadata in HadoopCatalog format in the same object storage bucket or container as the data files.

If using the object_storage catalog type, you provide the object storage URI of the table’s metadata.json file to an Iceberg client so it can access the catalog and data files for your Redpanda Iceberg tables.

The metadata.json file points to a specific Iceberg table snapshot. In your query engine, you must update your tables whenever a new snapshot is created so that they point to the latest snapshot. See the official Iceberg documentation for more information, and refer to the documentation for your query engine or Iceberg-compatible tool for specific guidance on Iceberg table update or refresh.

Example filesystem-based catalog configuration

To configure Apache Spark to use a filesystem-based catalog, specify at least the following properties:

spark.sql.catalog.streaming = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.streaming.type = hadoop
# URI for table metadata: AWS S3 example
spark.sql.catalog.streaming.warehouse = s3a://<bucket-name>/redpanda-iceberg-catalog
# You may need to configure additional properties based on your object storage provider.
# See https://iceberg.apache.org/docs/latest/spark-configuration/#spark-configuration and https://spark.apache.org/docs/latest/configuration.html
# For example, for AWS S3:
# spark.hadoop.fs.s3.impl = org.apache.hadoop.fs.s3a.S3AFileSystem
# spark.hadoop.fs.s3a.endpoint = http://<s3-uri>
# spark.sql.catalog.streaming.s3.endpoint = http://<s3-uri>

Depending on your processing engine, you may need to also create a new table to point the data lakehouse to the table location.

Specify metadata location

The iceberg_catalog_base_location property stores the base path for the filesystem-based catalog if using the object_storage catalog type. The default value is redpanda-iceberg-catalog.

Do not change the iceberg_catalog_base_location value after you have enabled Iceberg integration for a topic.

Next steps

Suggested labs

Redpanda Iceberg Docker Compose Example

Search all labs

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Use Iceberg Catalogs

Connect to a REST catalog

Limitations

Set cluster properties

Configure authentication

Example REST catalog configuration

Integrate filesystem-based catalog (object_storage)

Example filesystem-based catalog configuration

Specify metadata location

Next steps

Suggested labs

Simple online edits

Contribution guide

Integrate filesystem-based catalog (`object_storage`)