Docs Labs Redpanda Iceberg Docker Compose Example This lab provides a Docker Compose environment to help you quickly get started with Redpanda and its integration with Apache Iceberg. It showcases how Redpanda, when paired with a Tiered Storage solution like MinIO, can write data in the Iceberg format, enabling seamless analytics workflows. The lab also includes a Spark environment configured for querying the Iceberg tables using SQL within a Jupyter Notebook interface. In this setup, you will: Produce data to Redpanda topics that are Iceberg-enabled. Observe how Redpanda writes this data in Iceberg format to MinIO as the Tiered Storage backend. Use Spark to query the Iceberg tables, demonstrating a complete pipeline from data production to querying. This environment is ideal for experimenting with Redpanda’s Iceberg and Tiered Storage capabilities, enabling you to test end-to-end workflows for analytics and data lake architectures. Prerequisites You must have the following installed on your machine: Docker and Docker Compose rpk Run the lab Clone this repository: git clone https://github.com/redpanda-data/redpanda-labs.git Change into the docker-compose/iceberg/ directory: cd redpanda-labs/docker-compose/iceberg Set the REDPANDA_VERSION environment variable to at least version 24.3.1. For all available versions, see the GitHub releases. For example: export REDPANDA_VERSION=24.3.1 Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases. For example: export REDPANDA_CONSOLE_VERSION=2.8.0 Start the Docker Compose environment, which includes Redpanda, MinIO, Spark, and Jupyter Notebook: docker compose build && docker compose up Create and switch to a new rpk profile that connects to your Redpanda broker: rpk profile create docker-compose-iceberg --set=admin_api.addresses=localhost:19644 --set=brokers=localhost:19092 --set=schema_registry.addresses=localhost:18081 Create two topics with Iceberg enabled: rpk topic create key_value --topic-config=redpanda.iceberg.mode=key_value rpk topic create value_schema_id_prefix --topic-config=redpanda.iceberg.mode=value_schema_id_prefix Produce data to the key_value topic and see data show up. echo "hello world" | rpk topic produce key_value --format='%k %v\n' Open Redpanda Console at http://localhost:8081/topics to see that the topics exist in Redpanda. Open the Jupyter Notebook server at http://localhost:8888. The notebook guides you through querying Iceberg tables created from Redpanda topics. Create a schema in the Schema Registry: rpk registry schema create value_schema_id_prefix-value --schema schema.avsc Produce data to the value_schema_id_prefix topic: echo '{"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}\n{"user_id":3333,"event_type":"SCROLL","ts":"2024-11-25T20:24:14.774Z"}\n{"user_id":7272,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:24:34.552Z"}' | rpk topic produce value_schema_id_prefix --format='%v\n' --schema-id=topic When the data is committed, it should be available in Iceberg format and you can query the table lab.redpanda.value_schema_id_prefix in the Jupyter Notebook. Alternative query interfaces While the notebook server is running, you can query Iceberg tables directly using Spark’s CLI tools, Instead of Jupyter Notebook: Spark Shell docker exec -it spark-iceberg spark-shell Spark SQL docker exec -it spark-iceberg spark-sql PySpark docker exec -it spark-iceberg pyspark Clean up To shut down and delete the containers along with all your cluster data: docker compose down -v Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution