Store the Redpanda Data Directory in hostPath Volumes

You can configure Redpanda to use Kubernetes hostPath volumes to store the Redpanda data directory. A hostPath volume mounts a file or directory from the host node’s file system into your Pod.

Use hostPath volumes only for development environments. If the Pod is deleted and recreated, it might be scheduled on another worker node and lose access to the data.

Prerequisites

You must have the following:

  • Kubernetes cluster: Ensure you have a running Kubernetes cluster, either locally, such as with minikube or kind, or remotely.

  • Kubectl: Ensure you have the kubectl command-line tool installed and configured to communicate with your cluster.

  • Dedicated directory: Ensure you have a dedicated directory on the host worker node to prevent potential conflicts with other applications or system processes.

  • File system: Ensure that the chosen directory is on an ext4 or XFS file system.

Configure Redpanda to use hostPath volumes

Both the Redpanda Helm chart and the Redpanda custom resource provide an interface for configuring hostPath volumes.

To store Redpanda data in hostPath volumes:

  • Helm + Operator

  • Helm

redpanda-cluster.yaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
name: redpanda
spec:
chartRef: {}
clusterSpec:
  storage:
    hostPath: "<absolute-path>"
    persistentVolume:
      enabled: false
  initContainers:
    setDataDirOwnership:
      enabled: true
kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
  • --values

  • --set

hostpath.yaml
storage:
hostPath: "<absolute-path>"
persistentVolume:
  enabled: false
initContainers:
  setDataDirOwnership:
    enabled: true
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
--values hostpath.yaml --reuse-values
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set storage.hostPath=<absolute-path> \
  --set storage.persistentVolume.enabled=false \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true
  • storage.hostPath: Absolute path on the host to store the Redpanda data directory.

  • storage.persistentVolume.enabled: Determine if a PersistentVolumeClaim (PVC) should be created for the Redpanda data directory. When set to false, a PVC is not created.

  • statefulset.initContainers.setDataDirOwnership.enabled: Enable the init container to set write permissions on the data directories.

    Pods that run Redpanda brokers must have read/write access to their data directories. The initContainer is responsible for setting write permissions on the data directories. By default, statefulset.initContainers.setDataDirOwnership is disabled because most storage drivers call SetVolumeOwnership to give Redpanda permissions to the root of the storage mount. However, some storage drivers, such as hostPath, do not call SetVolumeOwnership. In this case, you must enable the initContainer to set the permissions.

    To set permissions on the data directories, the initContainer must run as root. However, be aware that an initContainer running as root can introduce the following security risks:

    • Privilege escalation: If attackers gains access to the initContainer, they can escalate privileges to gain full control over the system. For example, attackers could use the initContainer to gain unauthorized access to sensitive data, tamper with the system, or start denial-of-service attacks.

    • Container breakouts: If the container is misconfigured or the container runtime has a vulnerability, attackers could escape from the initContainer and access the host operating system.

    • Image tampering: If attackers gain access to the container image of the initContainer, they could add malicious code or backdoors to it. Image tampering could compromise the security of the entire cluster.

Next steps

Monitor disk usage to detect issues early, optimize performance, and plan capacity.