Docs Labs Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark This lab provides a local Kubernetes environment to help you quickly get started with Redpanda and its integration with Apache Iceberg. It showcases how Redpanda, when paired with a Tiered Storage solution like MinIO, can write data in the Iceberg format, enabling seamless analytics workflows. The lab also includes a Spark environment configured for querying the Iceberg tables using SQL within a Jupyter Notebook interface. In this setup, you will: Produce data to Redpanda topics that are Iceberg-enabled. Observe how Redpanda writes this data in Iceberg format to MinIO as the Tiered Storage backend. Use Spark to query the Iceberg tables, demonstrating a complete pipeline from data production to querying. This environment is ideal for experimenting with Redpanda’s Iceberg and Tiered Storage capabilities, enabling you to test end-to-end workflows for analytics and data lake architectures. Prerequisites System requirements Make sure you have the following system requirements: Operating System: macOS, Linux, or Windows with WSL2 CPU: Minimum 4 cores Memory: Minimum 8 GB RAM Storage: Minimum 20 GB free disk space for Docker images and data Docker: At least 6 GB memory allocation for Docker Desktop Required tools Install the following tools: Docker (with sufficient memory allocation) kind (Kubernetes in Docker) kubectl (Kubernetes CLI) Helm (Kubernetes package manager) Run the lab Follow these steps to set up the lab environment: Clone the repository git clone https://github.com/redpanda-data/redpanda-labs.git cd redpanda-labs/kubernetes/iceberg Create the Kubernetes cluster and namespace kind create cluster --config kind.yaml kubectl create namespace iceberg-lab Configure Redpanda for Iceberg Configuration for Redpanda is provided in configmap.yaml and secret.yaml. This sets up the necessary configurations for Iceberg topics and Tiered Storage. kubectl apply -f configmap.yaml --namespace iceberg-lab kubectl apply -f secret.yaml --namespace iceberg-lab Deploy MinIO Deploy the MinIO Operator: kubectl apply -k "github.com/minio/operator?ref=v5.0.18" Wait for the MinIO Operator to be ready: kubectl wait --for=condition=available deployment --all --namespace minio-operator --timeout=120s Create a MinIO instance: helm repo add minio-operator https://operator.min.io helm repo update helm upgrade --install iceberg-minio minio-operator/tenant \ --namespace iceberg-lab \ --version 7.1.1 \ --values minio-tenant-values.yaml Wait for the MinIO instance to be ready: sleep 10 && kubectl wait --for=condition=ready pod -l v1.min.io/tenant=iceberg-minio --namespace iceberg-lab --timeout=300s If you see error: no matching resources found, it means the MinIO Operator has not yet created the Pods. Wait a few moments and try the command again. Set up MinIO bucket and permissions This step creates a job that sets up the MinIO bucket and permissions required for Redpanda to access the object store. kubectl apply -f minio-setup-job.yaml --namespace iceberg-lab kubectl wait --for=condition=complete job/minio-setup --namespace iceberg-lab --timeout=60s If the job times out, check the logs with kubectl logs job/minio-setup --namespace iceberg-lab iceberg-lab. If you see 400 Bad Request errors, MinIO might have reverted to TLS mode. See the troubleshooting section. Deploy the Iceberg REST catalog This step deploys the Iceberg REST catalog, which allows you to interact with Iceberg tables over HTTP. It uses an init container approach to automatically resolve DNS issues with bucket-style S3 URLs. kubectl apply -f iceberg-rest.yaml kubectl wait --for=condition=available deployment/iceberg-rest --namespace iceberg-lab --timeout=120s Prepare Spark Label and taint your dedicated Spark worker node: kubectl label node kind-worker node-role.kubernetes.io/spark-node=true kubectl taint nodes kind-worker dedicated=spark:NoSchedule Build and load the Spark Docker image: docker build -t spark-iceberg-jupyter:latest ./spark kind load docker-image spark-iceberg-jupyter:latest --name kind --nodes kind-worker This step builds the Spark image with Kubernetes-specific configurations for Iceberg and uploads it to the kind cluster. The Dockerfile automatically detects your system architecture (ARM64 or x86_64) and downloads the appropriate dependencies. Wait for the build to complete and the image to be loaded into the kind cluster. Verify the image is loaded: docker exec -it kind-worker crictl images | grep spark-iceberg-jupyter You should see output similar to the following: docker.io/library/spark-iceberg-jupyter latest 86f20b1213dd3 3.83GB Deploy Spark: kubectl apply -f spark.yaml kubectl wait --for=condition=available deployment/spark-iceberg --namespace iceberg-lab --timeout=120s Deploy Redpanda Install the Redpanda Operator custom resource definitions (CRDs): kubectl kustomize "https://github.com/redpanda-data/redpanda-operator//operator/config/crd?ref=v2.4.4" \ | kubectl apply --server-side -f - Deploy the Redpanda Operator: helm repo add jetstack https://charts.jetstack.io helm repo add redpanda https://charts.redpanda.com helm repo update helm install cert-manager jetstack/cert-manager \ --set crds.enabled=true \ --namespace cert-manager \ --create-namespace \ --version 1.17.4 helm upgrade --install redpanda-controller redpanda/operator \ --namespace iceberg-lab \ --create-namespace \ --version v2.4.4 Ensure that the Deployment is successfully rolled out: kubectl --namespace iceberg-lab rollout status --watch deployment/redpanda-controller-operator When you see deployment "redpanda-controller-operator" successfully rolled out, the operator is ready. Create the Redpanda cluster: kubectl apply -f redpanda.yaml --namespace iceberg-lab Wait for the Redpanda cluster to be ready: kubectl get redpanda --namespace iceberg-lab --watch When the Redpanda cluster is ready, the output should look similar to the following: NAME READY STATUS redpanda True Redpanda reconciliation succeeded Expose services In this step, you set up access to the MinIO UI, Spark Jupyter Notebook, and Redpanda Console. Set up MinIO console access For reliable access to the MinIO console, create a NodePort service: kubectl apply -f minio-nodeport.yaml --namespace iceberg-lab The NodePort service exposes MinIO console on port 32090 of all cluster nodes. In a kind cluster, you can access it directly at: http://localhost:32090 This approach avoids the known port-forwarding issues with MinIO console (see issue #2539). The MinIO Console UI requires websockets which don’t work reliably through kubectl port-forward tunnels. NodePort provides direct access without websocket connectivity issues. Set up port forwarding for other services For Spark Jupyter Notebook and Redpanda Console, use port forwarding: kubectl port-forward deploy/spark-iceberg 8888:8888 --namespace iceberg-lab & kubectl port-forward svc/redpanda-console 8080:8080 --namespace iceberg-lab & You can run these commands in separate terminals, or run them in the background by appending & as shown above. This way, all port-forward processes will run in the background in the same terminal. You can bring them to the foreground with fg or stop them with kill if needed. Create and validate Iceberg topics You can validate your setup by performing the following steps: Alias the Redpanda CLI: alias internal-rpk="kubectl --namespace iceberg-lab exec -i -t redpanda-0 -c redpanda -- rpk" This command allows you to run rpk commands directly against the Redpanda broker in the iceberg-lab namespace using the internal-rpk alias. You can also use kubectl exec -i -t redpanda-0 -c redpanda — rpk directly if you prefer not to set an alias. Create Iceberg topics: internal-rpk topic create key_value --topic-config=redpanda.iceberg.mode=key_value Produce sample data: echo "hello world" | internal-rpk topic produce key_value --format='%k %v\n' Open Redpanda Console at http://localhost:8080/topics to see that the topics exist in Redpanda. Open MinIO at http://localhost:32090 to view your data stored in the S3-compatible object store. Login credentials: Username: minio Password: minio123 Open the Jupyter Notebook server at http://localhost:8888. The notebook guides you through querying the Iceberg table created from your Redpanda topic. Clean up When you’re finished with the lab, you can clean up the resources: Stop all port forwarding processes: pkill -f "kubectl port-forward" You can also use Ctrl+C if the port forwarding is running in the foreground. Delete the MinIO NodePort service: kubectl delete service minio-nodeport -n iceberg-lab Delete the kind cluster (this removes everything): kind delete cluster Or, if you want to keep the cluster but remove just the lab resources: # Delete the namespace (removes all lab resources) kubectl delete namespace iceberg-lab # Delete the MinIO operator kubectl delete -k "github.com/minio/operator?ref=v5.0.18" # Delete cert-manager helm uninstall cert-manager --namespace cert-manager kubectl delete namespace cert-manager Troubleshoot Redpanda bucket access errors If Redpanda logs show bucket not found errors after setup: Verify the bucket exists: kubectl exec -n iceberg-lab iceberg-minio-pool-0-0 -c minio -- mc ls local/ Check that Redpanda can reach MinIO: kubectl exec -n iceberg-lab redpanda-0 -c redpanda -- curl -I http://iceberg-minio-hl.iceberg-lab.svc.cluster.local:9000/redpanda Iceberg REST catalog 500 errors If the Iceberg REST catalog shows UnknownHostException errors in the logs: Check the catalog logs for DNS resolution errors: kubectl logs -n iceberg-lab deployment/iceberg-rest | grep -i "unknownhost\|resolve" If you see errors, check the init container logs to see if DNS resolution failed: kubectl logs -n iceberg-lab deployment/iceberg-rest -c dns-resolver The init container automatically resolves MinIO’s IP and configures DNS mappings. If MinIO pods restart and get new IPs, restart the Iceberg REST catalog: kubectl rollout restart deployment/iceberg-rest -n iceberg-lab kubectl rollout status deployment/iceberg-rest -n iceberg-lab Suggested reading MinIO Kubernetes Operator installation Deploy MinIO tenant with Helm Iceberg Topics in Redpanda Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution 🎉 Thanks for your feedback!