Node Pool Migration on Nodes Running Redpanda

This topic outlines a safe procedure for migrating your Redpanda cluster to a new node pool. This is especially useful when you need to change node instance types, perform vertical scaling, or upgrade Kubernetes. The process ensures that your cluster remains fault tolerant and data safe throughout the migration.

Prerequisites

  • A staging environment in which to test the upgrade procedure before performing it in production.

  • Familiarity with your hosting platform (GKE, EKS, AKS, or self-managed) and any CLI tools your platform provides (for example, gcloud or eksctl).

  • Confirmation that the new version of Kubernetes is compatible with the version of Redpanda you are using. See the Kubernetes Compatibility.

  • Confirmation that your new node pool configuration is compatible with Redpanda requirements.

  • A backup of your Redpanda data in case of unexpected data loss.

Pre-upgrade considerations

When replacing Kubernetes nodes under a running Redpanda cluster, consider the following:

  • Cost: Be aware that changes in broker count may impact infrastructure costs.

  • Data retention: Storage capacity and retention values depend largely on the local disk capacity across brokers.

  • Durability: Your broker count should be one more than your highest partition replication factor.

  • Partition count: This value is generally determined by the overall CPU core count of your cluster.

Upgrade the Redpanda Helm chart

Before upgrading Kubernetes, always upgrade your Redpanda Helm chart or Redpanda CRD to the latest version. This ensures that any deprecated Kubernetes resources (e.g. the v1beta1 PodDisruptionBudget in Kubernetes 1.25) are replaced with updated ones. If you skip this step, you may see errors during or after the Kubernetes upgrade.

For more details, see Upgrade Redpanda in Kubernetes.

Choose your storage strategy

When migrating to new nodes, Kubernetes replaces each node’s underlying virtual machine. This process impacts the Pods running on those nodes. Choose the strategy that best suits your storage backend:

Ephemeral local storage

When using ephemeral local storage, the broker Pod’s storage is tied to the Pod’s lifecycle. This means the data is lost when the Pod is terminated. To preserve data integrity, you must decommission each broker before deleting the Pod so that its data and replicas are migrated, and a new broker ID is allocated when the Pod is rescheduled.

flowchart TB A(("Start")) --> B["Ensure safe PDB & topic replication ≥ 3"] B --> C["Create new node pool"] C --> C1{"Broker count = highest topic replication factor?"} C1 -- Yes --> C2["Extra node available for temporary broker?"] C2 -- Yes --> C3["Scale up replicas by one (add buffer broker)"] C2 -- No --> C5["Provision extra node"] C3 --> H C5 --> C3 C1 -- No --> G["Proceed without adding buffer broker"] G --> H["Set StatefulSet update strategy to OnDelete"] H --> I["Disable controllers"] I --> J["Ensure new Pods get scheduled on new node pool"] J --> K["For each broker on old node pool:"] K --> K1["Identify broker & corresponding node"] K1 --> K2["Decommission broker (migrate data)"] K2 --> K3["Drain node (evict Pod)"] K3 --> K4["Delete associated PVC (ephemeral storage)"] K4 --> K5["Wait for new broker Pod on new node pool"] K5 --> K6["Verify broker and cluster health"] K6 --> L{"More brokers to upgrade?"} L -- Yes --> K1 L -- No --> M["Revert update strategy to RollingUpdate"] M --> N["If buffer broker exists, scale replicas down by one"] N --> O["Re-enable controllers"] O --> P["Final cluster health verification"] P --> Q["Delete old node pool (cleanup)"] Q --> R(("Process complete")) %% Define classes classDef userAction stroke:#374D7C, fill:#E2EBFF, font-weight:bold, rx:5, ry:5; classDef systemAction fill:#F6FBF6, stroke:#25855a, stroke-width:2px, color:#20293c, rx:5, ry:5; classDef decision fill:#FFF3CD, stroke:#FFEEBA, font-weight:bold, rx:5, ry:5; class A,R systemAction; class B,E,H,I,J,K,N,O,P,Q userAction; class C,D,L decision; %% Add clickable anchors to each pertinent step click B "#pdb" "Ensure safe PDB & topic replication ≥ 3" click C "#node-pool" "Create new node pool" click G "#update-strategy-ondelete" "Proceed without adding buffer broker" click H "#update-strategy-ondelete" "Set StatefulSet update strategy to OnDelete" click I "#disable-controllers" "Disable controllers" click J "#configure-nodepool" "Ensure new Pods get scheduled on new node pool" click K1 "#identify-broker" "Identify broker & corresponding node" click K2 "#decommission-broker" "Decommission broker (migrate data)" click K3 "#drain-node" "Drain node (evict Pod)" click K4 "#delete-pvc" "Delete associated PVC (ephemeral storage)" click K6 "#verify-broker-health" "Verify broker and cluster health" click L "#identify-broker" "More brokers to upgrade?" click M "#revert-strategy" "Revert update strategy to RollingUpdate" click N "#scale-down" "Scale replicas down (buffer broker removal)" click O "#re-enable-controllers" "Re-enable controllers" click P "#final-health" "Final cluster health verification" click Q "#delete-old-nodepool" "Delete old node pool (cleanup)"
  1. Ensure that a safe Pod Disruption Budget (PDB) is set:

    kubectl get pdb --namespace <namespace>

    Setting statefulset.budget.maxUnavailable to 1 in your Redpanda resource or Helm values ensures that only one broker Pod can be unavailable during the upgrade process. This setting helps maintain cluster availability and data integrity by preventing multiple Pods from being disrupted simultaneously.

    Example output:

    NAME       MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
    redpanda   N/A             1                 1                     2m36s
  2. Ensure each topic has a replication factor of at least 3:

    rpk topic describe <topic-name>
    rpk topic alter-config <topic-name> --set replication.factor=<replication-factor>

    A replication factor of 3 is recommended to ensure high availability and fault tolerance. With three replicas, the system can tolerate the failure of one broker without losing data or availability. This setup allows for one replica to be down for maintenance or due to failure, while still having two replicas available to serve read and write requests. This redundancy is crucial for maintaining the health and reliability of the cluster.

  3. Create a new node pool with the same number of nodes as your current pool.

    If you have the same number of brokers as your highest replication factor, add an extra node (buffer node).

    The recommended topic replication factor is 3. This means each partition within a topic must have three replicas to maintain cluster health. To avoid making your cluster unhealthy when you remove a broker, add a temporary one. This temporary broker is not needed if you have more brokers than your highest topic replication factor.

  4. Set the update strategy for the StatefulSet to OnDelete.

    This change ensures that broker Pods are only replaced when explicitly deleted, giving you manual control over the upgrade process.

    • Helm + Operator

    • Helm

    For example, patch your Redpanda resource to include:

    cat <<EOF > patch.yaml
    spec:
      clusterSpec:
        statefulset:
          # replicas: <number>
          updateStrategy:
            type: OnDelete
    EOF
    kubectl patch redpanda redpanda --namespace <namespace> --type merge --patch-file patch.yaml
    If you created a buffer node, uncomment the statefulset.replicas line to add a buffer broker to your cluster.

    Update your Helm values file to include:

    statefulset.yaml
    statefulset:
      # replicas: <number>
      updateStrategy:
        type: OnDelete
    If you created a buffer node, uncomment the statefulset.replicas line to add a buffer broker to your cluster.

    Then, run:

    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values statefulset.yaml --reuse-values
  5. Disable controllers:

    • Helm + Operator

    • Helm

    Scale down the operator to 0 to temporarily disable it:

    kubectl scale deployment redpanda-controller-operator --replicas=0 --namespace <namespace>

    If you enabled the NodeWatcher or Decommission controller in the Redpanda Helm chart:

    1. Edit or remove the settings for the controllers sidecar in your Helm values overrides. For example, if you used a YAML values file:

      remove-controllers.yaml
      statefulset:
        sideCars:
          controllers:
            enabled: false
            run: []
      rbac:
        enabled: false
    2. Run helm upgrade to apply the changes.

  6. Make sure that your Pods won’t get rescheduled on the old node pool:

    • If you have a dedicated Kubernetes cluster for Redpanda, you don’t need to do anything. New Pods will be scheduled on the new node pool.

    • If you share the cluster with other applications, you need to configure tolerations and node selectors to ensure that the Pods are scheduled onto nodes in your new node pool.

      Example of setting taints and tolerations
      • Helm + Operator

      • Helm

      Patch your Redpanda resource with the following command:

      cat <<EOF > patch-nodepool.yaml
      spec:
        clusterSpec:
          tolerations:
            - effect: NoSchedule
              key: <taint>   # Replace with your node taint key
              operator: Equal
              value: "true"
          nodeSelector:
            nodetype: <label>   # Replace with your node label value
      EOF
      kubectl patch redpanda redpanda -n <namespace> --type merge --patch-file patch-nodepool.yaml

      Update your Helm values file with the following settings:

      tolerations.yaml
      tolerations:
        - effect: NoSchedule
          key: <taint>   # Replace with your node taint key
          operator: Equal
          value: "true"
      nodeSelector:
        nodetype: <label>   # Replace with your node label value

      Then run:

      helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
        --values tolerations.yaml --reuse-values
  7. For each broker running on the old node pool, perform these steps:

    1. List your brokers and their associated broker IDs. Replace <pod-name> with any broker Pod name.

      kubectl --namespace <namespace> exec -ti <pod-name> -c redpanda -- rpk cluster info
      Example output
      CLUSTER
      =======
      redpanda.8a18c0db-ff14-40c8-9fc0-ba29c277707d
      
      BROKERS
      =======
      ID    HOST                                             PORT
      0     redpanda-0.redpanda.redpanda.svc.cluster.local.  9093
      1*    redpanda-1.redpanda.redpanda.svc.cluster.local.  9093
      2     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093
      3     redpanda-3.redpanda.redpanda.svc.cluster.local.  9093
      
      TOPICS
      ======
      NAME      PARTITIONS  REPLICAS
      _schemas  1           3

      The output shows that the broker IDs don’t match the StatefulSet ordinal, which appears in the hostname. In this example, redpanda-* in the hostname represents the pod name. The StatefulSet ordinal is the number in the pod name after the broker name and the broker ID is the number in the ID column.

    2. Find the node that your chosen broker is running on:

      kubectl get pods --namespace <namespace> -o wide | grep <pod-name>

      Example output (node name is kind-worker2):

      redpanda-0    2/2     Running     0    11m     10.244.7.3   kind-worker2
    3. Decommission the broker. This moves its data and replicas to other brokers.

      rpk redpanda admin brokers decommission <broker-id>

      The time required depends on the amount of data being migrated from the broker’s partitions. If the broker has a large amount of data, this process can take hours. For more details, see Decommission Brokers.

    4. Check the status of the decommission process:

      rpk redpanda admin brokers decommission-status <broker-id>

      Wait until you see this message before proceeding:

      Node <broker-id> is decommissioned successfully.
    5. Drain the node that the decommissioned broker is on.

      kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

      When the node is drained, the Pod is evicted, but remains in the Pending state until the associated PVC is deleted.

    6. Delete the PVC for the broker’s local PV. This allows Kubernetes to provision new storage for the rescheduled broker.

      kubectl get pvc --namespace <namespace>
      kubectl delete pvc <pvc-name> --namespace <namespace> --wait=false
      Deleting the PVC sets a deletion timestamp. The actual removal occurs when the Pod terminates. The --wait=false flag ensures the associated PersistentVolume is not deleted until the associated Pod is deleted so that the Redpanda broker can shut down cleanly.
    7. Delete the Pod to trigger rescheduling on the new node pool:

      kubectl delete pod <pod-name> --namespace <namespace>

      The Pod is rescheduled on the new node pool with a new broker ID.

    8. Check that the Pod is now running on your new node pool:

      kubectl get pods --namespace <namespace> -o wide | grep <pod-name>

      If the Pod is not rescheduled, check the node’s status and events for any issues.

      kubectl describe node <node-name>
      kubectl get events --namespace <namespace>
      Save the events to a file so that you can review them later.
    9. List your brokers and their associated broker IDs. Replace <pod-name> with any broker Pod name.

      kubectl --namespace <namespace> exec -ti <pod-name> -c redpanda -- rpk cluster info

      You should see that you have the same number of brokers, but the broker ID for your new broker has changed. The broker ID is the number in the ID column.

      Example output
      CLUSTER
      =======
      redpanda.8a18c0db-ff14-40c8-9fc0-ba29c277707d
      
      BROKERS
      =======
      ID    HOST                                             PORT
      1*    redpanda-1.redpanda.redpanda.svc.cluster.local.  9093
      2     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093
      3     redpanda-3.redpanda.redpanda.svc.cluster.local.  9093
      4     redpanda-0.redpanda.redpanda.svc.cluster.local.  9093
      
      TOPICS
      ======
      NAME      PARTITIONS  REPLICAS
      _schemas  1           3
    10. Verify cluster health:

      rpk cluster health

      Example output:

      CLUSTER HEALTH OVERVIEW
      =======================
      Healthy:                          true
      Unhealthy reasons:                []
      Controller ID:                    1
      All nodes:                        [1 2 3 4]
      Nodes down:                       []
      Leaderless partitions (0):        []
      Under-replicated partitions (0):  []
    11. Repeat these steps for each broker sequentially.

  8. Revert the StatefulSet updateStrategy back to RollingUpdate:

    • Helm + Operator

    • Helm

    cat <<EOF > patch-update-strategy.yaml
    spec:
      clusterSpec:
        statefulset:
          updateStrategy:
            type: RollingUpdate
    EOF
    kubectl patch redpanda redpanda -n <namespace> --type merge --patch-file patch-update-strategy.yaml
    cat <<EOF > statefulset.yaml
    statefulset:
      updateStrategy:
        type: RollingUpdate
    EOF
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace --values statefulset.yaml --reuse-values
  9. If you added a buffer broker, scale the StatefulSet replica count back down by one and delete the node that the broker was running on.

  10. If you disabled the controllers, re-enable them. For example:

    • Helm + Operator

    • Helm

    Scale up the operator to re-enable it:

    kubectl scale deployment redpanda-controller-operator --replicas=1 --namespace <namespace>
    controllers.yaml
    statefulset:
      sideCars:
        controllers:
          enabled: true
          run:
            - "nodeWatcher"
            - "decommission"
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values controllers.yaml --reuse-values
  11. Verify cluster health:

    rpk cluster health

    Ensure all brokers are online and that data replication is healthy.

    You may need to run this command a few times before consistently getting a healthy output. Each broker has its own view of the cluster state, and the controller leader may not have replicated the actual cluster state to the broker providing the results for this command. Eventually, all brokers should agree on the same cluster state.
  12. Delete the old node pool according to your platform’s best practices.

Networked (remountable) PersistentVolumes

When using network-backed PersistentVolumes (PVs), the data is stored on external storage systems such as EBS, Persistent Disks, or Azure Disks. Because the storage remains intact even if the node is replaced, the same Pod can be rescheduled on another node and the PV will be automatically remounted.

  1. Confirm that a safe Pod Disruption Budget (PDB) exists:

    kubectl get pdb --namespace <namespace>

    Setting statefulset.budget.maxUnavailable to 1 in your Redpanda resource or Helm values ensures that only one broker Pod can be unavailable during the upgrade process. This setting helps maintain cluster availability and data integrity by preventing multiple Pods from being disrupted simultaneously.

    Example output:

    NAME       MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
    redpanda   N/A             1                 1                     2m36s
  2. Ensure each topic has a replication factor of at least 3:

    rpk topic describe <topic-name>
    rpk topic alter-config <topic-name> --set replication.factor=<replication-factor>

    A replication factor of 3 is recommended to ensure high availability and fault tolerance. With three replicas, the system can tolerate the failure of one broker without losing data or availability. This setup allows for one replica to be down for maintenance or due to failure, while still having two replicas available to serve read and write requests. This redundancy is crucial for maintaining the health and reliability of the cluster.

  3. Create a new node pool with the same number of nodes as your current pool.

  4. Make sure that your Pods won’t get rescheduled on the old node pool:

    • If you have a dedicated Kubernetes cluster for Redpanda, you don’t need to do anything. New Pods will be scheduled on the new node pool.

    • If you share the cluster with other applications, you need to configure tolerations and node selectors to ensure that the Pods are scheduled onto nodes in your new node pool.

      Example of setting taints and tolerations
      • Helm + Operator

      • Helm

      Patch your Redpanda resource with the following command:

      cat <<EOF > patch-nodepool.yaml
      spec:
        clusterSpec:
          tolerations:
            - effect: NoSchedule
              key: <taint>   # Replace with your node taint key
              operator: Equal
              value: "true"
          nodeSelector:
            nodetype: <label>   # Replace with your node label value
      EOF
      kubectl patch redpanda redpanda -n <namespace> --type merge --patch-file patch-nodepool.yaml

      Update your Helm values file with the following settings:

      tolerations.yaml
      tolerations:
        - effect: NoSchedule
          key: <taint>   # Replace with your node taint key
          operator: Equal
          value: "true"
      nodeSelector:
        nodetype: <label>   # Replace with your node label value

      Then run:

      helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
        --values tolerations.yaml --reuse-values
  5. Disable controllers:

    • Helm + Operator

    • Helm

    Scale down the operator to 0 to temporarily disable it:

    kubectl scale deployment redpanda-controller-operator --replicas=0 --namespace <namespace>

    If you enabled the NodeWatcher or Decommission controller in the Redpanda Helm chart:

    1. Edit or remove the settings for the controllers sidecar in your Helm values overrides. For example, if you used a YAML values file:

      remove-controllers.yaml
      statefulset:
        sideCars:
          controllers:
            enabled: false
            run: []
      rbac:
        enabled: false
    2. Run helm upgrade to apply the changes.

  6. Drain the nodes from the old node pool one at a time to force Pod rescheduling onto the new pool.

    kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

    Because the underlying networked PV is remountable, the data remains intact and the Pod will be rescheduled on the new node pool with the same broker ID.

  7. Verify that the new broker Pods are scheduled on the new node pool:

    kubectl get pods --namespace <namespace> -o wide | grep <pod-name>
  8. If you disabled the controllers, re-enable them. For example:

    • Helm + Operator

    • Helm

    Scale up the operator to re-enable it:

    kubectl scale deployment redpanda-controller-operator --replicas=1 --namespace <namespace>
    controllers.yaml
    statefulset:
      sideCars:
        controllers:
          enabled: true
          run:
            - "nodeWatcher"
            - "decommission"
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values controllers.yaml --reuse-values
  9. Verify cluster health:

    rpk cluster health

    Ensure that all brokers are online and data replication is healthy.

  10. Delete the old node pool following your hosting platform’s best practices.