Query Iceberg Topics using Snowflake and Open Catalog

Beta

This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales.

If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply.

This guide walks you through querying Redpanda topics as Iceberg tables in Snowflake, with AWS S3 as object storage and a catalog integration using Open Catalog.

Prerequisites

  • Object storage configured for your cluster and Tiered Storage enabled for the topics for which you want to generate Iceberg tables.

    • The S3 bucket URI so that you can configure it as external storage for Open Catalog.

  • A Snowflake account.

  • An Open Catalog account. To create an Open Catalog account, you require ORGADMIN access in Snowflake.

  • An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage.

    • Follow this guide to create a catalog with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS:

      1. If you don’t already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket.

      2. Create an IAM role and attach the IAM policy to the role.

      3. After creating a new catalog in Open Catalog, grant the catalog’s AWS IAM user access to the S3 bucket.

  • A Snowflake external volume set up using the Tiered Storage bucket.

Set up catalog integration using Open Catalog

Create a new Open Catalog service connection for Redpanda

To create a new service connection to integrate the Iceberg-enabled topics into Open Catalog:

  1. In Open Catalog, select Connections, then + Connection.

  2. In Configure Service Connection, provide a name. Open Catalog creates a new principal with this name.

  3. Make sure Create new principal role is toggled on.

  4. Enter a name for the principal role. Then, click Create.

After you create the connection, you are provided the client ID and client secret. Save these credentials to add to your cluster configuration in a later step.

Create a catalog role

Grant privileges to the principal created in the previous step:

  1. In Open Catalog, select Catalogs, and select your catalog.

  2. On the Roles tab of your catalog, click + Catalog Role.

  3. Give the catalog role a name.

  4. Under Privileges, select CATALOG_MANAGE_CONTENT. This provides full management privileges for the catalog. Then, click Create.

  5. On the Roles tab of the catalog, click Grant to Principal Role.

  6. Select the catalog role you just created.

  7. Select the principal role you created earlier. Click Grant.

Update cluster configuration

To configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog:

  1. Edit your cluster configuration to set the iceberg_enabled property to true, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. You can run rpk cluster config edit to update these properties:

    iceberg_enabled: true
    iceberg_catalog_type: rest
    iceberg_rest_catalog_endpoint: https://<snowflake-orgname>-<open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog
    iceberg_rest_catalog_client_id: <open-catalog-connection-client-id>
    iceberg_rest_catalog_client_secret: <open-catalog-connection-client-secret>
    iceberg_rest_catalog_prefix: <open-catalog-name>
    
    # Optional
    iceberg_translation_interval_ms_default: 1000
    iceberg_catalog_commit_interval_ms: 1000

    Use your own values for the following placeholders:

    • <snowflake-orgname> and <open-catalog-account-name>: Your Open Catalog account URI is composed of these values.

      In Snowflake, navigate to Admin, then Accounts. Click the ellipsis near your Open Catalog account name, and select Manage URLs. The Current URL contains <snowflake-orgname> and <open-catalog-account-name>.
    • <open-catalog-connection-client-id>: The client ID of the service connection you created in an earlier step.

    • <open-catalog-connection-client-secret>: The client secret of the service connection you created in an earlier step.

    • <open-catalog-name>: The name of your catalog in Open Catalog.

    Successfully updated configuration. New configuration version is 2.
  2. You must restart your cluster so that the configuration changes take effect.

  3. Enable the integration for a topic by configuring the topic property redpanda.iceberg.mode. This mode creates an Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record’s value. See Enable Iceberg integration for more details on Iceberg modes. The following examples show how to use rpk to either create a new topic, or alter the configuration for an existing topic, to set the Iceberg mode to key_value.

    Create a new topic and set redpanda.iceberg.mode:
    rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value
    Set redpanda.iceberg.mode for an existing topic:
    rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
  4. Produce to the topic. For example,

    echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'

You should see the topic as a table in Open Catalog.

  1. In Open Catalog, select Catalogs, then open your catalog.

  2. Under your catalog, you should see the redpanda namespace, and a table with the name of your topic. The redpanda namespace and the table are automatically added for you.

Query Iceberg table in Snowflake

To query the topic in Snowflake, you must create a catalog integration so that Snowflake has access to the table data and metadata.

Configure catalog integration with Snowflake

  1. Run the CREATE CATALOG INTEGRATION command in Snowflake:

    CREATE CATALOG INTEGRATION <catalog-integration-name>
      CATALOG_SOURCE = POLARIS
      TABLE_FORMAT = ICEBERG
      CATALOG_NAMESPACE = 'redpanda'
      REST_CONFIG = (
        CATALOG_URI = '<open-catalog-uri>'
        WAREHOUSE = '<open-catalog-name>'
      )
      REST_AUTHENTICATION = (
        TYPE = OAUTH
        OAUTH_CLIENT_ID = '<open-catalog-connection-client-id>'
        OAUTH_CLIENT_SECRET = '<open-catalog-connection-client-secret>'
        OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
      )
      REFRESH_INTERVAL_SECONDS = 30
      ENABLED = TRUE;

    Use your own values for the following placeholders:

  2. Run the following command to verify that the catalog is integrated correctly:

    SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog-integration-name>');
    # Example result for redpanda.iceberg.mode=key_value
    +-----------------------------------------------------------------------+
    | SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog_integration_name>') |
    +-----------------------------------------------------------------------+
    | [{"namespace":"redpanda","name":"<table_name>"}]                      |
    +-----------------------------------------------------------------------+

Create Iceberg table in Snowflake

After creating the catalog integration, you must create an externally-managed table in Snowflake. You must run your Snowflake queries against this table.

In your Snowflake database, run the CREATE ICEBERG TABLE command. The following example also specifies that the table should automatically refresh metadata:

CREATE ICEBERG TABLE <table-name>
  CATALOG = '<catalog-integration-name>'
  EXTERNAL_VOLUME = '<iceberg-external-volume-name>'
  CATALOG_TABLE_NAME = '<topic-name>'
  AUTO_REFRESH = TRUE

Use your own values for the following placeholders:

  • <table-name>: Provide a name for your table in Snowflake.

  • <catalog-integration-name>: The name of the catalog integration you configured in an earlier step.

  • <iceberg-external-volume-name>: The name of the external volume you configured using the Tiered Storage bucket.

  • <topic-name>: The name of the table in your catalog, which is the same as your Redpanda topic name.

Query table

To verify that Snowflake has successfully created the table containing the topic data, run the following:

SELECT * FROM <table-name>;

Your query results should look like the following:

# Example for redpanda.iceberg.mode=key_value with 3 records produced to topic

+--------------------------------------------------------------------------------------------------------------+------------+
|                                                   REDPANDA                                                   |   VALUE    |
+--------------------------------------------------------------------------------------------------------------+------------+
| { "partition": 0, "offset": 0, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "68656C6C6F"} | 776F726C64 |
| { "partition": 0, "offset": 1, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "666F6F"}     | 626172     |
| { "partition": 0, "offset": 2, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "62617A" }    | 717578     |
+--------------------------------------------------------------------------------------------------------------+------------+