Redpanda Iceberg Docker Compose Example
This lab provides a Docker Compose environment to help you quickly get started with Redpanda and its integration with Apache Iceberg. It showcases how Redpanda, when paired with a Tiered Storage solution like MinIO, can write data in the Iceberg format, enabling seamless analytics workflows. The lab also includes a Spark environment configured for querying the Iceberg tables using SQL within a Jupyter Notebook interface.
In this setup, you will:
-
Produce data to Redpanda topics that are Iceberg-enabled.
-
Observe how Redpanda writes this data in Iceberg format to MinIO as the Tiered Storage backend.
-
Use Spark to query the Iceberg tables, demonstrating a complete pipeline from data production to querying.
This environment is ideal for experimenting with Redpanda’s Iceberg and Tiered Storage capabilities, enabling you to test end-to-end workflows for analytics and data lake architectures.
Run the lab
-
Clone this repository:
git clone https://github.com/redpanda-data/redpanda-labs.git
bash -
Change into the
docker-compose/iceberg/
directory:cd redpanda-labs/docker-compose/iceberg
bash -
Set the
REDPANDA_VERSION
environment variable to at least version 24.3.1. For all available versions, see the GitHub releases.For example:
export REDPANDA_VERSION=v25.1.3
bash -
Set the
REDPANDA_CONSOLE_VERSION
environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.You must use at least version v3.0.0 of Redpanda Console to deploy this lab. For example:
export REDPANDA_CONSOLE_VERSION=v3.1.0
bash -
Start the Docker Compose environment, which includes Redpanda, MinIO, Spark, and Jupyter Notebook:
docker compose build && docker compose up
bash -
Create and switch to a new
rpk
profile that connects to your Redpanda broker:rpk profile create docker-compose-iceberg --set=admin_api.addresses=localhost:19644 --set=brokers=localhost:19092 --set=schema_registry.addresses=localhost:18081
bash -
Create two topics with Iceberg enabled:
rpk topic create key_value --topic-config=redpanda.iceberg.mode=key_value rpk topic create value_schema_id_prefix --topic-config=redpanda.iceberg.mode=value_schema_id_prefix
bash -
Produce data to the
key_value
topic and see data show up.echo "hello world" | rpk topic produce key_value --format='%k %v\n'
bash -
Open Redpanda Console at http://localhost:8081/topics to see that the topics exist in Redpanda.
-
Open MinIO at http://localhost:9001/browser to view your data stored in the S3-compatible object store.
Login credentials:
-
Username:
minio
-
Password:
minio123
-
-
Open the Jupyter Notebook server at http://localhost:8888. The notebook guides you through querying Iceberg tables created from Redpanda topics.
-
Create a schema in the Schema Registry:
rpk registry schema create value_schema_id_prefix-value --schema schema.avsc
bash -
Produce data to the
value_schema_id_prefix
topic:echo '{"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}\n{"user_id":3333,"event_type":"SCROLL","ts":"2024-11-25T20:24:14.774Z"}\n{"user_id":7272,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:24:34.552Z"}' | rpk topic produce value_schema_id_prefix --format='%v\n' --schema-id=topic
bash
When the data is committed, it should be available in Iceberg format and you can query the table lab.redpanda.value_schema_id_prefix
in the Jupyter Notebook.
Alternative query interfaces
While the notebook server is running, you can query Iceberg tables directly using Spark’s CLI tools, Instead of Jupyter Notebook:
docker exec -it spark-iceberg spark-shell
docker exec -it spark-iceberg spark-sql
docker exec -it spark-iceberg pyspark
Clean up
To shut down and delete the containers along with all your cluster data:
docker compose down -v