Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark

This lab provides a local Kubernetes environment to help you quickly get started with Redpanda and its integration with Apache Iceberg. It showcases how Redpanda, when paired with a Tiered Storage solution like MinIO, can write data in the Iceberg format, enabling seamless analytics workflows. The lab also includes a Spark environment configured for querying the Iceberg tables using SQL within a Jupyter Notebook interface.

In this setup, you will:

Produce data to Redpanda topics that are Iceberg-enabled.
Observe how Redpanda writes this data in Iceberg format to MinIO as the Tiered Storage backend.
Use Spark to query the Iceberg tables, demonstrating a complete pipeline from data production to querying.

This environment is ideal for experimenting with Redpanda’s Iceberg and Tiered Storage capabilities, enabling you to test end-to-end workflows for analytics and data lake architectures.

Prerequisites

System requirements

Make sure you have the following system requirements:

Operating System: macOS, Linux, or Windows with WSL2
CPU: Minimum 4 cores
Memory: Minimum 8 GB RAM
Storage: Minimum 20 GB free disk space for Docker images and data
Docker: At least 6 GB memory allocation for Docker Desktop

Required tools

Install the following tools:

Docker (with sufficient memory allocation)
kind (Kubernetes in Docker)
kubectl (Kubernetes CLI)
Helm (Kubernetes package manager)

Run the lab

Follow these steps to set up the lab environment:

Clone the repository

git clone https://github.com/redpanda-data/redpanda-labs.git
cd redpanda-labs/kubernetes/iceberg

Create the Kubernetes cluster and namespace

kind create cluster --config kind.yaml
kubectl create namespace iceberg-lab

Configure Redpanda for Iceberg

Configuration for Redpanda is provided in configmap.yaml and secret.yaml. This sets up the necessary configurations for Iceberg topics and Tiered Storage.

kubectl apply -f configmap.yaml --namespace iceberg-lab
kubectl apply -f secret.yaml --namespace iceberg-lab

Deploy MinIO

Deploy the MinIO Operator:

kubectl apply -k "github.com/minio/operator?ref=v5.0.18"

Wait for the MinIO Operator to be ready:

kubectl wait --for=condition=available deployment --all --namespace minio-operator --timeout=120s

Create a MinIO instance:

helm repo add minio-operator https://operator.min.io
helm repo update
helm upgrade --install iceberg-minio minio-operator/tenant \
  --namespace iceberg-lab \
  --version 7.1.1 \
  --values minio-tenant-values.yaml

Wait for the MinIO instance to be ready:
```
sleep 10 && kubectl wait --for=condition=ready pod -l v1.min.io/tenant=iceberg-minio --namespace iceberg-lab --timeout=300s
```
If you see error: no matching resources found, it means the MinIO Operator has not yet created the Pods. Wait a few moments and try the command again.

Set up MinIO bucket and permissions

This step creates a job that sets up the MinIO bucket and permissions required for Redpanda to access the object store.

kubectl apply -f minio-setup-job.yaml --namespace iceberg-lab
kubectl wait --for=condition=complete job/minio-setup --namespace iceberg-lab --timeout=60s

If the job times out, check the logs with kubectl logs job/minio-setup --namespace iceberg-lab iceberg-lab. If you see 400 Bad Request errors, MinIO might have reverted to TLS mode. See the troubleshooting section.

Deploy the Iceberg REST catalog

This step deploys the Iceberg REST catalog, which allows you to interact with Iceberg tables over HTTP. It uses an init container approach to automatically resolve DNS issues with bucket-style S3 URLs.

kubectl apply -f iceberg-rest.yaml
kubectl wait --for=condition=available deployment/iceberg-rest --namespace iceberg-lab --timeout=120s

Prepare Spark

Label and taint your dedicated Spark worker node:

kubectl label node kind-worker node-role.kubernetes.io/spark-node=true
kubectl taint nodes kind-worker dedicated=spark:NoSchedule

Build and load the Spark Docker image:
```
docker build -t spark-iceberg-jupyter:latest ./spark
kind load docker-image spark-iceberg-jupyter:latest --name kind --nodes kind-worker
```
This step builds the Spark image with Kubernetes-specific configurations for Iceberg and uploads it to the kind cluster. The Dockerfile automatically detects your system architecture (ARM64 or x86_64) and downloads the appropriate dependencies.

Wait for the build to complete and the image to be loaded into the kind cluster.

Verify the image is loaded:

docker exec -it kind-worker crictl images | grep spark-iceberg-jupyter

You should see output similar to the following:

docker.io/library/spark-iceberg-jupyter    latest      86f20b1213dd3    3.83GB

Deploy Spark:

kubectl apply -f spark.yaml
kubectl wait --for=condition=available deployment/spark-iceberg --namespace iceberg-lab --timeout=120s

Deploy Redpanda

Install the Redpanda Operator custom resource definitions (CRDs):

kubectl kustomize "https://github.com/redpanda-data/redpanda-operator//operator/config/crd?ref=v2.4.4" \
    | kubectl apply --server-side -f -

Deploy the Redpanda Operator:

helm repo add jetstack https://charts.jetstack.io
helm repo add redpanda https://charts.redpanda.com
helm repo update

helm install cert-manager jetstack/cert-manager \
  --set crds.enabled=true \
  --namespace cert-manager \
  --create-namespace \
  --version 1.17.4

helm upgrade --install redpanda-controller redpanda/operator \
  --namespace iceberg-lab \
  --create-namespace \
  --version v2.4.4

Ensure that the Deployment is successfully rolled out:
```
kubectl --namespace iceberg-lab rollout status --watch deployment/redpanda-controller-operator
```
When you see deployment "redpanda-controller-operator" successfully rolled out, the operator is ready.

Create the Redpanda cluster:

kubectl apply -f redpanda.yaml --namespace iceberg-lab

Wait for the Redpanda cluster to be ready:

kubectl get redpanda --namespace iceberg-lab --watch

When the Redpanda cluster is ready, the output should look similar to the following:

NAME       READY   STATUS
redpanda   True    Redpanda reconciliation succeeded

Expose services

In this step, you set up access to the MinIO UI, Spark Jupyter Notebook, and Redpanda Console.

Set up MinIO console access

For reliable access to the MinIO console, create a NodePort service:

kubectl apply -f minio-nodeport.yaml --namespace iceberg-lab

The NodePort service exposes MinIO console on port 32090 of all cluster nodes. In a kind cluster, you can access it directly at: http://localhost:32090

This approach avoids the known port-forwarding issues with MinIO console (see issue #2539). The MinIO Console UI requires websockets which don’t work reliably through kubectl port-forward tunnels. NodePort provides direct access without websocket connectivity issues.

Set up port forwarding for other services

For Spark Jupyter Notebook and Redpanda Console, use port forwarding:

kubectl port-forward deploy/spark-iceberg 8888:8888 --namespace iceberg-lab &
kubectl port-forward svc/redpanda-console 8080:8080 --namespace iceberg-lab &

You can run these commands in separate terminals, or run them in the background by appending & as shown above.

This way, all port-forward processes will run in the background in the same terminal. You can bring them to the foreground with fg or stop them with kill if needed.

Create and validate Iceberg topics

You can validate your setup by performing the following steps:

Alias the Redpanda CLI:
```
alias internal-rpk="kubectl --namespace iceberg-lab exec -i -t redpanda-0 -c redpanda -- rpk"
```
This command allows you to run rpk commands directly against the Redpanda broker in the iceberg-lab namespace using the internal-rpk alias. You can also use kubectl exec -i -t redpanda-0 -c redpanda — rpk directly if you prefer not to set an alias.

Create Iceberg topics:

internal-rpk topic create key_value --topic-config=redpanda.iceberg.mode=key_value

Produce sample data:

echo "hello world" | internal-rpk topic produce key_value --format='%k %v\n'

Open Redpanda Console at http://localhost:8080/topics to see that the topics exist in Redpanda.
Open MinIO at http://localhost:32090 to view your data stored in the S3-compatible object store.

Login credentials:
- Username: minio
- Password: minio123
Open the Jupyter Notebook server at http://localhost:8888. The notebook guides you through querying the Iceberg table created from your Redpanda topic.

Clean up

When you’re finished with the lab, you can clean up the resources:

Stop all port forwarding processes:
```
pkill -f "kubectl port-forward"
```
You can also use Ctrl+C if the port forwarding is running in the foreground.

Delete the MinIO NodePort service:

kubectl delete service minio-nodeport -n iceberg-lab

Delete the kind cluster (this removes everything):
```
kind delete cluster
```

Or, if you want to keep the cluster but remove just the lab resources:

# Delete the namespace (removes all lab resources)
kubectl delete namespace iceberg-lab

# Delete the MinIO operator
kubectl delete -k "github.com/minio/operator?ref=v5.0.18"

# Delete cert-manager
helm uninstall cert-manager --namespace cert-manager
kubectl delete namespace cert-manager

Troubleshoot

Redpanda bucket access errors

If Redpanda logs show bucket not found errors after setup:

Verify the bucket exists:

kubectl exec -n iceberg-lab iceberg-minio-pool-0-0 -c minio -- mc ls local/

Check that Redpanda can reach MinIO:

kubectl exec -n iceberg-lab redpanda-0 -c redpanda -- curl -I http://iceberg-minio-hl.iceberg-lab.svc.cluster.local:9000/redpanda

Iceberg REST catalog 500 errors

If the Iceberg REST catalog shows UnknownHostException errors in the logs:

Check the catalog logs for DNS resolution errors:

kubectl logs -n iceberg-lab deployment/iceberg-rest | grep -i "unknownhost\|resolve"

If you see errors, check the init container logs to see if DNS resolution failed:
```
kubectl logs -n iceberg-lab deployment/iceberg-rest -c dns-resolver
```
The init container automatically resolves MinIO’s IP and configures DNS mappings. If MinIO pods restart and get new IPs, restart the Iceberg REST catalog:
```
kubectl rollout restart deployment/iceberg-rest -n iceberg-lab
kubectl rollout status deployment/iceberg-rest -n iceberg-lab
```