Labs

Redpanda Iceberg Docker Compose Example

This lab provides a Docker Compose environment to help you quickly get started with Redpanda and its integration with Apache Iceberg. It showcases how Redpanda, when paired with a Tiered Storage solution like MinIO, can write data in the Iceberg format, enabling seamless analytics workflows. The lab also includes a Spark environment configured for querying the Iceberg tables using SQL within a Jupyter Notebook interface.

In this setup, you will:

Produce data to Redpanda topics that are Iceberg-enabled.
Observe how Redpanda writes this data in Iceberg format to MinIO as the Tiered Storage backend.
Use Spark to query the Iceberg tables, demonstrating a complete pipeline from data production to querying.

This environment is ideal for experimenting with Redpanda’s Iceberg and Tiered Storage capabilities, enabling you to test end-to-end workflows for analytics and data lake architectures.

Prerequisites

You must have the following installed on your machine:

This lab is intended for Linux and macOS users. If you are using Windows, you must use the Windows Subsystem for Linux (WSL) to run the commands in this lab.

Run the lab

Clone this repository:

git clone https://github.com/redpanda-data/redpanda-labs.git

Change into the docker-compose/iceberg/ directory:
```
cd redpanda-labs/docker-compose/iceberg
```
Set the REDPANDA_VERSION environment variable to at least version 24.3.1. For all available versions, see the GitHub releases.

For example:
```
export REDPANDA_VERSION=v26.1.14
```
Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.

You must use at least version v3.0.0 of Redpanda Console to deploy this lab.

For example:
```
export REDPANDA_CONSOLE_VERSION=v3.9.0
```
Start the Docker Compose environment, which includes Redpanda, MinIO, Spark, and Jupyter Notebook:
```
docker compose build && docker compose up
```
The build process may take a few minutes to complete, as it builds the Spark image with the necessary dependencies for Iceberg.

Create and switch to a new rpk profile that connects to your Redpanda broker:

rpk profile create docker-compose-iceberg --set=admin_api.addresses=localhost:19644 --set=brokers=localhost:19092 --set=schema_registry.addresses=localhost:18081

Create two topics with Iceberg enabled:

rpk topic create key_value --topic-config=redpanda.iceberg.mode=key_value
rpk topic create value_schema_id_prefix --topic-config=redpanda.iceberg.mode=value_schema_id_prefix

Produce data to the key_value topic and see data show up.

echo "hello world" | rpk topic produce key_value --format='%k %v\n'

Open Redpanda Console at http://localhost:8081/topics to see that the topics exist in Redpanda.
Open MinIO at http://localhost:9001/browser to view your data stored in the S3-compatible object store.

Login credentials:
- Username: minio
- Password: minio123
Open the Jupyter Notebook server at http://localhost:8888. The notebook guides you through querying Iceberg tables created from Redpanda topics. Complete the next two steps first before running the code in the notebook.

Create a schema in the Schema Registry:

rpk registry schema create value_schema_id_prefix-value --schema schema.avsc

Produce data to the value_schema_id_prefix topic:

echo '{"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}\n{"user_id":3333,"event_type":"SCROLL","ts":"2024-11-25T20:24:14.774Z"}\n{"user_id":7272,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:24:34.552Z"}' | rpk topic produce value_schema_id_prefix --format='%v\n' --schema-id=topic

When the data is committed, it should be available in Iceberg format and you can query the table lab.redpanda.value_schema_id_prefix in the Jupyter Notebook.

Alternative query interfaces

While the notebook server is running, you can query Iceberg tables directly using Spark’s CLI tools, Instead of Jupyter Notebook:

Spark Shell

docker exec -it spark-iceberg spark-shell

Spark SQL

docker exec -it spark-iceberg spark-sql

PySpark

docker exec -it spark-iceberg pyspark

Clean up

To shut down and delete the containers along with all your cluster data:

docker compose down -v

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Redpanda Iceberg Docker Compose Example

Prerequisites

Run the lab

Alternative query interfaces

Clean up

Simple online edits

Contribution guide