Query Iceberg Topics using Snowflake and Open Catalog

beta

This guide walks you through querying Redpanda topics as Iceberg tables in Snowflake, with AWS S3 as object storage and a catalog integration using Open Catalog.

Prerequisites

  • rpk or familiarity with the Redpanda Cloud API to use secrets in your cluster configuration. For rpk, see Install or Update rpk. For the Cloud API, you must authenticate using a service account.

  • A Snowflake account.

  • An Open Catalog account. To create an Open Catalog account, you require ORGADMIN access in Snowflake.

  • An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage.

    Follow this guide to create a catalog with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS:

    1. If you don’t already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket.

    2. Create an IAM role and attach the IAM policy to the role.

    3. After creating a new catalog in Open Catalog, grant the catalog’s AWS IAM user access to the S3 bucket.

  • A Snowflake external volume set up using the Tiered Storage bucket.

    Follow this guide to configure the external volume with S3. You can use the same IAM policy as the catalog for the external volume’s IAM role and user.

Set up catalog integration using Open Catalog

Create a new Open Catalog service connection for Redpanda

To create a new service connection to integrate the Iceberg-enabled topics into Open Catalog:

  1. In Open Catalog, select Connections, then + Connection.

  2. In Configure Service Connection, provide a name. Open Catalog creates a new principal with this name.

  3. Make sure Create new principal role is selected.

  4. Enter a name for the principal role. Then, click Create.

After you create the connection, get the client ID and client secret. Save these credentials to add to your cluster configuration in a later step.

Create a catalog role

Grant privileges to the principal created in the previous step:

  1. In Open Catalog, select Catalogs, and select your catalog.

  2. On the Roles tab of your catalog, click + Catalog Role.

  3. Give the catalog role a name.

  4. Under Privileges, select CATALOG_MANAGE_CONTENT. This provides full management privileges for the catalog. Then, click Create.

  5. On the Roles tab of the catalog, click Grant to Principal Role.

  6. Select the catalog role you just created.

  7. Select the principal role you created earlier. Click Grant.

Update cluster configuration

To configure your Redpanda cluster to enable Iceberg on a topic and integrate with Open Catalog:

  1. Store the Open Catalog client secret in your cluster using rpk or the Data Plane API.

  2. Edit your cluster configuration to set the iceberg_enabled property to true, and set the catalog integration properties listed in the example below using rpk or the Control Plane API. For example, to use rpk cluster config set, run:

    rpk cluster config set \
      iceberg_enabled=true \
      iceberg_catalog_type=rest \
      iceberg_rest_catalog_endpoint=https://<snowflake-orgname>-<open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog \
      iceberg_rest_catalog_authentication_mode=oauth2 \
      iceberg_rest_catalog_client_id=<open-catalog-connection-client-id> \
      iceberg_rest_catalog_client_secret=${secrets.<open-catalog-client-secret-name>} \
      iceberg_rest_catalog_prefix=<open-catalog-name>
    
    # Optional properties:
    # iceberg_translation_interval_ms_default=1000
    # iceberg_catalog_commit_interval_ms=1000

    Use your own values for the following placeholders:

    • <snowflake-orgname> and <open-catalog-account-name>: Your Open Catalog account URI is composed of these values.

      In Snowflake, navigate to Admin, then Accounts. Click the ellipsis near your Open Catalog account name, and select Manage URLs. The Current URL contains <snowflake-orgname> and <open-catalog-account-name>.
    • <open-catalog-connection-client-id>: The client ID of the service connection you created in an earlier step.

    • <open-catalog-client-secret-name>: The name of the secret you created in the previous step. You must pass the secret name to the ${secrets.<secret-name>} placeholder, not the secret value itself.

    • <open-catalog-name>: The name of your catalog in Open Catalog.

    Successfully updated configuration. New configuration version is 2.
  3. Enable the integration for a topic by configuring the topic property redpanda.iceberg.mode. This mode creates an Iceberg table for the topic consisting of two columns: one for the record metadata including the key, and another binary column for the record’s value. See Enable Iceberg integration for more details on Iceberg modes.

    Use any of the following to set redpanda.iceberg.mode:

    • rpk. See the following examples to run rpk topic commands.

    • The Cloud UI. Navigate to Topics to create a new topic and specify redpanda.iceberg.mode in Additional Configuration, or edit an existing topic under the topic’s Configuration tab.

    • The Data Plane API to create a new topic or update a property for an existing topic. Specify the key-value pair for redpanda.iceberg.mode in the request body.

      The following examples show how to use rpk to create a new topic or alter the configuration for an existing topic, setting the Iceberg mode to key_value.

      Create a new topic and set redpanda.iceberg.mode:
      rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value
      Set redpanda.iceberg.mode for an existing topic:
      rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
  4. Produce to the topic. For example,

    echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'

You should see the topic as a table in Open Catalog.

  1. In Open Catalog, select Catalogs, then open your catalog.

  2. Under your catalog, you should see the redpanda namespace, and a table with the name of your topic. The redpanda namespace and the table are automatically added for you.

Query Iceberg table in Snowflake

To query the topic in Snowflake, you must create a catalog integration so that Snowflake has access to the table data and metadata.

Configure catalog integration with Snowflake

  1. Run the CREATE CATALOG INTEGRATION command in Snowflake:

    CREATE CATALOG INTEGRATION <catalog-integration-name>
      CATALOG_SOURCE = POLARIS
      TABLE_FORMAT = ICEBERG
      CATALOG_NAMESPACE = 'redpanda'
      REST_CONFIG = (
        CATALOG_URI = '<open-catalog-uri>'
        WAREHOUSE = '<open-catalog-name>'
      )
      REST_AUTHENTICATION = (
        TYPE = OAUTH
        OAUTH_CLIENT_ID = '<open-catalog-connection-client-id>'
        OAUTH_CLIENT_SECRET = '<open-catalog-connection-client-secret>'
        OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
      )
      REFRESH_INTERVAL_SECONDS = 30
      ENABLED = TRUE;

    Use your own values for the following placeholders:

  2. Run the following command to verify that the catalog is integrated correctly:

    SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog-integration-name>');
    # Example result for redpanda.iceberg.mode=key_value
    +-----------------------------------------------------------------------+
    | SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog_integration_name>') |
    +-----------------------------------------------------------------------+
    | [{"namespace":"redpanda","name":"<table_name>"}]                      |
    +-----------------------------------------------------------------------+

Create Iceberg table in Snowflake

After creating the catalog integration, you must create an externally-managed table in Snowflake. You must run your Snowflake queries against this table.

In your Snowflake database, run the CREATE ICEBERG TABLE command. The following example also specifies that the table should automatically refresh metadata:

CREATE ICEBERG TABLE <table-name>
  CATALOG = '<catalog-integration-name>'
  EXTERNAL_VOLUME = '<iceberg-external-volume-name>'
  CATALOG_TABLE_NAME = '<topic-name>'
  AUTO_REFRESH = TRUE

Use your own values for the following placeholders:

  • <table-name>: Provide a name for your table in Snowflake.

  • <catalog-integration-name>: The name of the catalog integration you configured in an earlier step.

  • <iceberg-external-volume-name>: The name of the external volume you configured using the Tiered Storage bucket.

  • <topic-name>: The name of the table in your catalog, which is the same as your Redpanda topic name.

Query table

To verify that Snowflake has successfully created the table containing the topic data, run the following:

SELECT * FROM <table-name>;

Your query results should look like the following:

# Example for redpanda.iceberg.mode=key_value with 3 records produced to topic

+--------------------------------------------------------------------------------------------------------------+------------+
|                                                   REDPANDA                                                   |   VALUE    |
+--------------------------------------------------------------------------------------------------------------+------------+
| { "partition": 0, "offset": 0, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "68656C6C6F"} | 776F726C64 |
| { "partition": 0, "offset": 1, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "666F6F"}     | 626172     |
| { "partition": 0, "offset": 2, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "62617A" }    | 717578     |
+--------------------------------------------------------------------------------------------------------------+------------+