Redpanda Iceberg Docker Compose Example
This lab provides a Docker Compose environment to help you quickly get started with Redpanda and its integration with Apache Iceberg. It showcases how Redpanda, when paired with a Tiered Storage solution like MinIO, can write data in the Iceberg format, enabling seamless analytics workflows. The lab also includes a Spark environment configured for querying the Iceberg tables using SQL within a Jupyter Notebook interface.
In this setup, you will:
-
Produce data to Redpanda topics that are Iceberg-enabled.
-
Observe how Redpanda writes this data in Iceberg format to MinIO as the Tiered Storage backend.
-
Use Spark to query the Iceberg tables, demonstrating a complete pipeline from data production to querying.
This environment is ideal for experimenting with Redpanda’s Iceberg and Tiered Storage capabilities, enabling you to test end-to-end workflows for analytics and data lake architectures.
Prerequisites
You must have the following installed on your machine:
This lab is intended for Linux and macOS users. If you are using Windows, you must use the Windows Subsystem for Linux (WSL) to run the commands in this lab.
Run the lab
-
Clone this repository:
git clone https://github.com/redpanda-data/redpanda-labs.git -
Change into the
docker-compose/iceberg/directory:cd redpanda-labs/docker-compose/iceberg -
Set the
REDPANDA_VERSIONenvironment variable to at least version 24.3.1. For all available versions, see the GitHub releases.For example:
export REDPANDA_VERSION=v26.1.9 -
Set the
REDPANDA_CONSOLE_VERSIONenvironment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.You must use at least version v3.0.0 of Redpanda Console to deploy this lab. For example:
export REDPANDA_CONSOLE_VERSION=v3.7.3 -
Start the Docker Compose environment, which includes Redpanda, MinIO, Spark, and Jupyter Notebook:
docker compose build && docker compose upThe build process may take a few minutes to complete, as it builds the Spark image with the necessary dependencies for Iceberg.
-
Create and switch to a new
rpkprofile that connects to your Redpanda broker:rpk profile create docker-compose-iceberg --set=admin_api.addresses=localhost:19644 --set=brokers=localhost:19092 --set=schema_registry.addresses=localhost:18081 -
Create two topics with Iceberg enabled:
rpk topic create key_value --topic-config=redpanda.iceberg.mode=key_value rpk topic create value_schema_id_prefix --topic-config=redpanda.iceberg.mode=value_schema_id_prefix -
Produce data to the
key_valuetopic and see data show up.echo "hello world" | rpk topic produce key_value --format='%k %v\n' -
Open Redpanda Console at http://localhost:8081/topics to see that the topics exist in Redpanda.
-
Open MinIO at http://localhost:9001/browser to view your data stored in the S3-compatible object store.
Login credentials:
-
Username:
minio -
Password:
minio123
-
-
Open the Jupyter Notebook server at http://localhost:8888. The notebook guides you through querying Iceberg tables created from Redpanda topics. Complete the next two steps first before running the code in the notebook.
-
Create a schema in the Schema Registry:
rpk registry schema create value_schema_id_prefix-value --schema schema.avsc -
Produce data to the
value_schema_id_prefixtopic:echo '{"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}\n{"user_id":3333,"event_type":"SCROLL","ts":"2024-11-25T20:24:14.774Z"}\n{"user_id":7272,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:24:34.552Z"}' | rpk topic produce value_schema_id_prefix --format='%v\n' --schema-id=topic
When the data is committed, it should be available in Iceberg format and you can query the table lab.redpanda.value_schema_id_prefix in the Jupyter Notebook.
Alternative query interfaces
While the notebook server is running, you can query Iceberg tables directly using Spark’s CLI tools, Instead of Jupyter Notebook:
docker exec -it spark-iceberg spark-shell
docker exec -it spark-iceberg spark-sql
docker exec -it spark-iceberg pyspark
Clean up
To shut down and delete the containers along with all your cluster data:
docker compose down -v