Query Iceberg Topics using AWS Glue

This guide walks you through querying Redpanda topics as Iceberg tables stored in AWS S3, using a catalog integration with AWS Glue. For general information about Iceberg catalog integrations in Redpanda, see Use Iceberg Catalogs.

Prerequisites

Limitations

Nested partition spec support

AWS Glue does not support partitioning on nested fields. If Redpanda detects that the default partitioning (hour(redpanda.timestamp)) is in use, it will instead apply an empty partition spec (), which means the table will not be partitioned.

If you want to use partitioning, you must specify a custom partition specification using your own partition columns (columns that are not nested).

Authorize access to AWS Glue

You must allow Redpanda access to AWS Glue services in your AWS account. It is recommended to create a new IAM policy or role that manages access to AWS Glue, allowing all AWS Glue API actions ("glue:*") on the following resources:

  • Root catalog (catalog)

  • All databases (database/*)

  • All tables (table/*/*)

Your IAM policy should include a statement similar to the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:*"
      ],
      "Resource": [
        "arn:aws:glue:<aws-region>:<aws-account-id>:catalog",
        "arn:aws:glue:<aws-region>:<aws-account-id>:database/*",
        "arn:aws:glue:<aws-region>:<aws-account-id>:table/*/*"
        ]
    }
  ]
}

For more information on configuring IAM permissions, see the AWS Glue documentation.

Configure authentication and credentials

You must configure credentials for the AWS Glue Data Catalog integration using the following properties:

Update cluster configuration

To configure your Redpanda cluster to enable Iceberg on a topic and integrate with the AWS Glue Data Catalog:

  1. Edit your cluster configuration to set the iceberg_enabled property to true, and set the catalog integration properties listed in the example below. Use rpk like in the following example, or use the Cloud API to update these cluster properties. The update might take several minutes to complete.

    rpk cloud login
    
    rpk profile create --from-cloud <cluster-id>
    
    rpk cluster config set \
      iceberg_enabled=true \
      iceberg_catalog_type=rest \
      iceberg_rest_catalog_endpoint=https://glue.<glue-region>.amazonaws.com/iceberg \
      iceberg_rest_catalog_authentication_mode=aws_sigv4 \
      iceberg_rest_catalog_base_location=s3://<bucket-name>/<warehouse-path>
      iceberg_rest_catalog_aws_region=<glue-region>
      iceberg_rest_catalog_aws_access_key=<glue-access-key>
      iceberg_rest_catalog_aws_secret_key=${secrets.<glue-secret-key-name>}

    Use your own values for the following placeholders:

    • <glue-region>: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in your iceberg_rest_catalog_aws_region property.

    • <bucket-name> and <warehouse-path>: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, s3://<bucket-name>/iceberg. As a security best practice, Redpanda Data recommends specifying a subfolder (using prefixes) rather than the root of the bucket.

    • <glue-access-key>: The AWS access key ID for your Glue service account.

    • <glue-secret-key-name>: The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example iceberg_rest_catalog_aws_secret_key, you must first store the secret value.

    Successfully updated configuration. New configuration version is 2.
  2. Enable the integration for a topic by configuring the topic property redpanda.iceberg.mode. The following examples show how to use rpk to either create a new topic or alter the configuration for an existing topic and set the Iceberg mode to key_value. The key_value mode creates a two-column Iceberg table for the topic, with one column for the record metadata including the key, and another binary column for the record’s value. See Specify Iceberg Schema for more details on Iceberg modes.

    Create a new topic and set redpanda.iceberg.mode:
    rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value
    Set redpanda.iceberg.mode for an existing topic:
    rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
  3. Produce to the topic. For example,

    echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'

You should see the topic as a table with data in AWS Glue Data Catalog. The data may take some time to become visible, depending on your iceberg_target_lag_ms setting.

  1. In AWS Glue Studio, go to Databases.

  2. Select the redpanda database. The redpanda database and the table within are automatically added for you. The table name is the same as the topic name.

Query Iceberg table

You can query the Iceberg table using different engines, such as Amazon Athena, PyIceberg, or Apache Spark. To query the table or view the table data in AWS Glue, ensure that your account has the necessary permissions to access the catalog, database, and table.

To query the table in Amazon Athena:

  1. On the list of tables in AWS Glue Studio, click "Table data" under the View data column.

  2. Click "Proceed" to be redirected to the Athena query editor.

  3. In the query editor, select AwsDataCatalog as the data source, and select the redpanda database.

  4. The SQL query editor should be pre-populated with a query that selects 10 rows from the Iceberg table. Run the query to see a preview of the table data.

    SELECT * FROM "AwsDataCatalog"."redpanda"."<table-name>" limit 1;

    Your query results should look like the following:

    +-----------------------------------------------------+----------------+
    | redpanda                                            | value          |
    +-----------------------------------------------------+----------------+
    | {partition=0, offset=0, timestamp=2025-07-21        | 77 6f 72 6c 64 |
    | 18:11:25.070000, headers=null, key=[B@1900af31}     |                |
    +-----------------------------------------------------+----------------+