# Query Iceberg Topics using AWS Glue

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [cloud-data-platform-full.txt](https://docs.redpanda.com/cloud-data-platform-full.txt)

---
title: Query Iceberg Topics using AWS Glue
page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments.
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-connect-version: 4.93.0
latest-redpanda-tag: v26.1.9
docname: iceberg/iceberg-topics-aws-glue
page-component-name: cloud-data-platform
page-version: master
page-component-version: master
page-component-title: Cloud
page-relative-src-path: iceberg/iceberg-topics-aws-glue.adoc
page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc
description: Add Redpanda topics as Iceberg tables that you can access through the AWS Glue Data Catalog.
# Beta release status
page-beta: "true"
page-git-created-date: "2025-08-05"
page-git-modified-date: "2026-05-26"
release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments.
---

<!-- Source: https://docs.redpanda.com/cloud-data-platform/manage/iceberg/iceberg-topics-aws-glue.md -->

This guide walks you through querying Redpanda topics as Iceberg tables stored in AWS S3, using a catalog integration with [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro). For general information about Iceberg catalog integrations in Redpanda, see [Use Iceberg Catalogs](https://docs.redpanda.com/cloud-data-platform/manage/iceberg/use-iceberg-catalogs/).

## [](#prerequisites)Prerequisites

-   An AWS account with access to [AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html).

    -   AWS Glue Data Catalog must be in the same AWS account and region as the cluster.


-   Redpanda version 25.2 or later.

-   [`rpk`](https://docs.redpanda.com/cloud-data-platform/manage/rpk/rpk-install/) installed or updated to the latest version.

    -   You can also use the Redpanda Cloud API to [reference secrets in your cluster configuration](https://docs.redpanda.com/cloud-data-platform/manage/cluster-maintenance/config-cluster/#set-cluster-configuration-properties).


-   Admin permissions to create IAM policies and roles in AWS.


## [](#limitations)Limitations

### [](#lowercase-field-names-required)Lowercase field names required

Use only lowercase field names. AWS Glue converts all table column names to lowercase, and Redpanda requires exact column name matches to manage schemas. Using uppercase letters prevents Redpanda from finding matching columns, which breaks schema management.

### [](#nested-partition-spec-support)Nested partition spec support

AWS Glue does not support partitioning on nested fields. If Redpanda detects that the default partitioning `(hour(redpanda.timestamp))` based on the record metadata is in use, it will instead apply an empty partition spec `()`, which means the table will not be partitioned.

To use partitioning, you must implement custom partitioning using your own partition columns (that is, columns that are not nested).

> 📝 **NOTE**
>
> In Redpanda versions 25.2.1 and earlier, an empty partition spec `()` can cause a known issue that prevents certain engines like Amazon Redshift from successfully querying the table. To resolve this issue, specify custom partitioning, or upgrade Redpanda to versions 25.2.2 or later.

### [](#manual-deletion-of-iceberg-tables)Manual deletion of Iceberg tables

The AWS Glue catalog integration does not support automatic deletion of Iceberg tables from Redpanda. To manually delete Iceberg tables in AWS Glue, you must either:

-   Set the cluster property `[iceberg_delete](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_delete)` to `false` when you configure the catalog integration.

-   Override the cluster property `iceberg_delete` by setting the topic property `redpanda.iceberg.delete` to `false` for the topic you want to delete.


When `iceberg_delete` or the topic override `redpanda.iceberg.delete` is set to `false`, you can delete the Redpanda topic, and then delete the table in AWS Glue and the Iceberg data and metadata files in the S3 bucket. If you plan to re-create the topic after deleting it, you must delete the table data entirely before re-creating the topic.

## [](#authorize-access-to-aws-glue)Authorize access to AWS Glue

For BYOC clusters created in March 2026 or later, the required AWS Glue IAM policy is automatically provisioned and attached to the cluster’s IAM role when Iceberg is enabled. You don’t need to manually create IAM policies or roles for Glue access.

For clusters created before March 2026, you must re-run `rpk byoc apply` to provision the Glue IAM policy before enabling Iceberg. This is a one-time operation that updates the cluster’s IAM role with the necessary Glue permissions.

## [](#configure-authentication-and-credentials)Configure authentication and credentials

You can configure credentials for the AWS Glue Data Catalog integration in either of the following ways:

-   Allow Redpanda to use the same object storage credential properties already configured for S3. This is the recommended approach, especially in BYOC deployments where the cluster’s existing AWS credentials already include the necessary Glue permissions.

    For an example cluster configuration that uses the same IAM credentials for both S3 and AWS Glue, see the **Use cluster’s IAM credentials** tab in the [next section](#update-cluster-configuration).

-   If you want to configure authentication to AWS Glue separately from authentication to S3, there are equivalent credential configuration properties named `iceberg_rest_catalog_aws_*` that override the object storage credentials. These properties only apply to REST catalog authentication, and never to S3 authentication:

    -   `[iceberg_rest_catalog_credentials_source](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_rest_catalog_credentials_source)`. To use the cluster’s IAM role, set the property to `aws_instance_metadata`. To use static credentials, set to `config_file`.

    -   `[iceberg_rest_catalog_aws_access_key](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_rest_catalog_aws_access_key)` (static credentials only)

    -   `[iceberg_rest_catalog_aws_secret_key](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_rest_catalog_aws_secret_key)` (static credentials only), added as a secret value (see the [next section](#update-cluster-configuration) for details)

    -   `[iceberg_rest_catalog_aws_region](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_rest_catalog_aws_region)`


    For an example cluster configuration that uses separate access keys for AWS Glue, see the **Use static credentials (override IAM)** tab in the [next section](#update-cluster-configuration).


## [](#update-cluster-configuration)Update cluster configuration

To configure your Redpanda cluster to enable Iceberg on a topic and integrate with the AWS Glue Data Catalog:

1.  Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below.

    By default, Redpanda creates Iceberg tables in a namespace called `redpanda`. Because AWS Glue provides a single catalog per account, each Redpanda cluster that writes to the same Glue catalog must use a distinct namespace to avoid table name collisions. To set a unique namespace, also set `[iceberg_default_catalog_namespace](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_default_catalog_namespace)` when you set `iceberg_enabled`. This property cannot be changed after Iceberg is enabled.

    Use `rpk` as shown in the following examples, or [use the Cloud API](https://docs.redpanda.com/cloud-data-platform/manage/cluster-maintenance/config-cluster/#set-cluster-configuration-properties) to update these cluster properties. The update might take several minutes to complete.

    ### Use cluster’s IAM credentials

    ```bash
    # Glue requires Redpanda Iceberg tables to be manually deleted
    # so iceberg_delete is set to false.
    rpk cloud login

    rpk profile create --from-cloud <cluster-id>

    rpk cluster config set \
      iceberg_enabled=true \
      iceberg_delete=false \
      iceberg_default_catalog_namespace='["<custom-namespace>"]' \
      iceberg_catalog_type=rest \
      iceberg_rest_catalog_endpoint=https://glue.<glue-region>.amazonaws.com/iceberg \
      iceberg_rest_catalog_authentication_mode=aws_sigv4 \
      iceberg_rest_catalog_credentials_source=aws_instance_metadata \
      iceberg_rest_catalog_aws_region=<glue-region> \
      iceberg_rest_catalog_base_location=s3://<cluster-storage-bucket-name>/<warehouse-path>
    ```


    ### Use static credentials (override IAM)

    ```bash
    # Glue requires Redpanda Iceberg tables to be manually deleted
    # so iceberg_delete is set to false.
    rpk cluster config set \
      iceberg_enabled=true \
      iceberg_delete=false \
      iceberg_default_catalog_namespace='["<custom-namespace>"]' \
      iceberg_catalog_type=rest \
      iceberg_rest_catalog_endpoint=https://glue.<glue-region>.amazonaws.com/iceberg \
      iceberg_rest_catalog_authentication_mode=aws_sigv4 \
      iceberg_rest_catalog_credentials_source=config_file \
      iceberg_rest_catalog_aws_region=<glue-region> \
      iceberg_rest_catalog_aws_access_key=<glue-access-key> \
      iceberg_rest_catalog_aws_secret_key='${secrets.<glue-secret-key-name>}' \
      iceberg_rest_catalog_base_location=s3://<cluster-storage-bucket-name>/<warehouse-path>
    ```

    Use your own values for the following placeholders:

    -   `<custom-namespace>`: A unique namespace for this cluster’s Iceberg tables. Each Redpanda cluster that writes to the same Glue catalog must use a distinct namespace to avoid table name collisions. If omitted, the default namespace `redpanda` is used.

    -   `<glue-region>`: The AWS region where your Data Catalog is located. The region in the AWS Glue endpoint must match the region specified in your `[iceberg_rest_catalog_aws_region](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_rest_catalog_aws_region)` property.

    -   `<cluster-storage-bucket-name>` and `<warehouse-path>`: AWS Glue requires you to specify the base location where Redpanda stores Iceberg data and metadata files. You must use an S3 URI; for example, `s3://<cluster-storage-bucket-name>/iceberg`.

        -   Bucket name: For BYOC clusters, the bucket name is `redpanda-cloud-storage-<cluster-id>`. For BYOVPC clusters, use the name of the object storage bucket you created as a [customer-managed resource](https://docs.redpanda.com/cloud-data-platform/get-started/cluster-types/byoc/aws/vpc-byo-aws/#configure-the-redpanda-network-and-cluster).

            This must be the same bucket used for your cluster’s object storage. You cannot specify a different bucket for Iceberg data.

        -   Warehouse: This is a name you choose as the logical name (such as `iceberg`) for the warehouse represented by all Redpanda Iceberg topic data in the cluster.

            As a security best practice, do not use the bucket root for the base location. Always specify a subfolder to avoid interfering with the rest of your cluster’s data in object storage.


    -   `<glue-access-key>` (static credentials only): The AWS access key ID for your Glue service account.

    -   `<glue-secret-key-name>` (static credentials only): The name of the secret that stores the AWS secret access key for your Glue service account. To reference a secret in a cluster property, for example `iceberg_rest_catalog_aws_secret_key`, you must first [store the secret value](https://docs.redpanda.com/cloud-data-platform/manage/iceberg/use-iceberg-catalogs/#store-a-secret-for-rest-catalog-authentication).


    ```bash
    Successfully updated configuration. New configuration version is 2.
    ```

2.  Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following examples show how to use [`rpk`](https://docs.redpanda.com/cloud-data-platform/manage/rpk/rpk-install/) to either create a new topic or alter the configuration for an existing topic and set the Iceberg mode to `key_value`. The `key_value` mode creates a two-column Iceberg table for the topic, with one column for the record metadata including the key, and another binary column for the record’s value. See [Specify Iceberg Schema](https://docs.redpanda.com/cloud-data-platform/manage/iceberg/specify-iceberg-schema/) for more details on Iceberg modes.

    Create a new topic and set `redpanda.iceberg.mode`:

    ```bash
    rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value
    ```

    Set `redpanda.iceberg.mode` for an existing topic:

    ```bash
    rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
    ```

3.  Produce to the topic. For example,

    ```bash
    echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'
    ```


You should see the topic as a table with data in AWS Glue Data Catalog. The data may take some time to become visible, depending on your `[iceberg_target_lag_ms](https://docs.redpanda.com/cloud-data-platform/reference/properties/cluster-properties/#iceberg_target_lag_ms)` setting.

1.  In AWS Glue Studio, go to Databases.

2.  Select the `redpanda` database. The `redpanda` database and the table within are automatically added for you. The table name is the same as the topic name.


## [](#query-iceberg-table)Query Iceberg table

You can query the Iceberg table using different engines, such as Amazon Athena, PyIceberg, or Apache Spark. To query the table or view the table data in AWS Glue, ensure that your account has the necessary permissions to access the catalog, database, and table.

To query the table in Amazon Athena:

1.  On the list of tables in AWS Glue Studio, click "Table data" under the **View data** column.

2.  Click "Proceed" to be redirected to the Athena query editor.

3.  In the query editor, select AwsDataCatalog as the data source, and select the `redpanda` database.

4.  The SQL query editor should be pre-populated with a query that selects 10 rows from the Iceberg table. Run the query to see a preview of the table data.

    ```sql
    SELECT * FROM "AwsDataCatalog"."redpanda"."<table-name>" limit 10;
    ```

    Your query results should look like the following:

    ```sql
    +-----------------------------------------------------+----------------+
    | redpanda                                            | value          |
    +-----------------------------------------------------+----------------+
    | {partition=0, offset=0, timestamp=2025-07-21        | 77 6f 72 6c 64 |
    | 18:11:25.070000, headers=null, key=[B@1900af31}     |                |
    +-----------------------------------------------------+----------------+
    ```


## [](#suggested-reading)Suggested reading

-   [Query Iceberg Topics](https://docs.redpanda.com/cloud-data-platform/manage/iceberg/query-iceberg-topics/)