Scale Redpanda in Kubernetes

You can scale a cluster both vertically, by increasing or decreasing the resources available to existing brokers, and horizontally, by increasing or decreasing the number of brokers in the cluster.

Vertical scaling

Vertical scaling involves increasing the amount of resources available to Redpanda brokers (scaling up) or decreasing the amount of resources (scaling down). Resources include the amount of hardware available to Redpanda brokers, such as CPU cores, memory, and storage.

To scale a Redpanda cluster vertically, see Manage Pod Resources in Kubernetes.

You cannot decrease the number of CPU cores in a running cluster.

If your existing worker nodes have either too many resources or not enough resources, you may need to move Redpanda brokers to new worker nodes that meet your resource requirements. This process involves:

  • Making sure the new worker nodes are available.

  • Deleting each worker node one by one.

  • Deleting the Pod’s PersistentVolumeClaim (PVC).

  • Ensuring that the PersistentVolume’s (PV) reclaim policy is set to Retain to make sure that you can roll back to the original worker node without losing data.

As an emergency backstop, the Nodewatcher controller can automate the deletion of PVCs and set the reclaim policy of PVs to Retain.

Horizontal scaling

Horizontal scaling involves modifying the number of brokers in your cluster, either by adding new ones (scaling out) or removing existing ones (scaling in). In situations where the workload is variable, horizontal scaling allows for flexibility. You can scale out when demand is high and scale in when demand is low, optimizing resource usage and cost.

Redpanda does not support Kubernetes autoscalers. Autoscalers rely on CPU and memory metrics for scaling decisions, which do not fully capture the complexities involved in scaling Redpanda clusters. Improper scaling can lead to operational challenges. Always manually scale your Redpanda clusters as described in this topic.

Scale out

Scaling out is the process of adding more brokers to your Redpanda cluster. You may want to add more brokers for increased throughput, high availability, and fault tolerance. Adding more brokers allows for better distribution of data across the cluster. This can be particularly important when dealing with large data sets.

To add Redpanda brokers to your cluster:

  1. Ensure that you have one additional worker node for each Redpanda broker that you want to add. Each Redpanda broker requires its own dedicated worker node so that it has access to all resources. For more details, see Kubernetes Cluster Requirements and Recommendations.

  2. If you use local PersistentVolumes (PV), ensure that your additional worker nodes have local disks available that meet the requirements of the configured StorageClass. See Store the Redpanda Data Directory in PersistentVolumes.

  3. If you have external access enabled, make sure that your new node has the necessary node ports open to external clients. See Networking and Connectivity in Kubernetes.

  4. Verify that your cluster is in a healthy state:

    kubectl exec redpanda-0 --namespace <namespace> -- rpk cluster health
  5. Increase the number of replicas in the Helm values:

    • Helm + Operator

    • Helm

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha2
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        statefulset:
          replicas: <number-of-replicas>
    kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
    • --values

    • --set

    replicas.yaml
    statefulset:
      replicas: <number-of-replicas>
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values replicas.yaml --reuse-values
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --set statefulset.replicas=<number-of-replicas>
  6. Wait until the StatefulSet is rolled out:

    kubectl --namespace <namespace> rollout status statefulset redpanda --watch
  7. Verify that your cluster is in a healthy state:

    kubectl exec redpanda-0 --namespace <namespace> -- rpk cluster health

Scale in

Scaling in is the process of removing brokers from your Redpanda cluster. You may want to remove brokers for cost reduction and resource optimization.

To scale in a Redpanda cluster, you must decommission the brokers that you want to remove before updating the statefulset.replica setting in the Helm values. See Decommission Brokers in Kubernetes.

Install the Nodewatcher controller

The Nodewatcher controller maintains cluster operation during node failures by managing the lifecycle of PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) for Redpanda clusters. When the controller detects that a Node resource is not available, it sets the reclaim policy of the PV to Retain, helping to prevent data loss. Concurrently, it orchestrates the deletion of the PVC, which allows the Redpanda broker that was previously running on the deleted worker node to be rescheduled onto new, operational nodes.

The Nodewatcher controller is an emergency backstop to keep your Redpanda cluster running in case of unexpected node failures. Never use this controller as a routine method for removing brokers.

Using the Nodewatcher controller as a routine method for removing brokers can lead to unintended consequences, such as increased risk of data loss and inconsistent cluster states. The Nodewatcher is designed for emergency scenarios and not for managing the regular scaling, decommissioning, and rebalancing of brokers.

To safely scale in your Redpanda cluster, always use the decommission process, which ensures that brokers are removed in a controlled manner, with data properly redistributed across the remaining nodes, maintaining cluster health and data integrity.

  1. Install the Nodewatcher controller:

    • Helm + Operator

    • Helm

    You can install the Nodewatcher controller as part of the Redpanda Operator or as a sidecar on each Pod that runs a Redpanda broker. When you install the controller as part of the Redpanda Operator, the controller monitors all Redpanda clusters running in the same namespace as the Redpanda Operator. If you want the controller to manage only a single Redpanda cluster, install it as a sidecar on each Pod that runs a Redpanda broker, using the Redpanda resource.

    To install the Nodewatcher controller as part of the Redpanda Operator:

    1. Deploy the Redpanda Operator with the Nodewatcher controller:

      helm repo add redpanda https://charts.redpanda.com
      helm upgrade --install redpanda-controller redpanda/operator \
        --namespace <namespace> \
        --set image.tag=v2.2.2-24.2.4 \
        --create-namespace \
        --set additionalCmdFlags={--additional-controllers="nodeWatcher"} \
        --set rbac.createAdditionalControllerCRs=true
      • --additional-controllers="nodeWatcher": Enables the Nodewatcher controller.

      • rbac.createAdditionalControllerCRs=true: Creates the required RBAC rules for the Redpanda Operator to monitor the Node resources and update PVCs and PVs.

    2. Deploy a Redpanda resource:

      redpanda-cluster.yaml
      apiVersion: cluster.redpanda.com/v1alpha2
      kind: Redpanda
      metadata:
        name: redpanda
      spec:
        chartRef: {}
        clusterSpec: {}
      kubectl apply -f redpanda-cluster.yaml --namespace <namespace>

    To install the Decommission controller as a sidecar:

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha2
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        statefulset:
          sideCars:
            controllers:
              enabled: true
              run:
                - "nodeWatcher"
        rbac:
          enabled: true
    • statefulset.sideCars.controllers.enabled: Enables the controllers sidecar.

    • statefulset.sideCars.controllers.run: Enables the Nodewatcher controller.

    • rbac.enabled: Creates the required RBAC rules for the controller to monitor the Node resources and update PVCs and PVs.

    • --values

    • --set

    decommission-controller.yaml
    statefulset:
      sideCars:
        controllers:
          enabled: true
          run:
            - "nodeWatcher"
    rbac:
      enabled: true
    • statefulset.sideCars.controllers.enabled: Enables the controllers sidecar.

    • statefulset.sideCars.controllers.run: Enables the Nodewatcher controller.

    • rbac.enabled: Creates the required RBAC rules for the controller to monitor the Node resources and update PVCs and PVs.

    helm upgrade --install redpanda redpanda/redpanda \
      --namespace <namespace> \
      --create-namespace \
      --set statefulset.sideCars.controllers.enabled=true \
      --set statefulset.sideCars.controllers.run={"nodeWatcher"} \
      --set rbac.enabled=true
    • statefulset.sideCars.controllers.enabled: Enables the controllers sidecar.

    • statefulset.sideCars.controllers.run: Enables the Nodewatcher controller.

    • rbac.enabled: Creates the required RBAC rules for the controller to monitor the Node resources and update PVCs and PVs.

  2. Test the Nodewatcher controller by deleting a Node resource:

    kubectl delete node <node-name>
    This step is for testing purposes only.
  3. Monitor the logs of the Nodewatcher controller:

    • If you’re running the Nodewatcher controller as part of the Redpanda Operator:

      kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace>
    • If you’re running the Nodewatcher controller as a sidecar:

      kubectl logs <pod-name> --namespace <namespace> -c redpanda-controllers

    You should see that the controller successfully deleted the PVC of the Pod that was running on the deleted Node resource.

    kubectl get persistentvolumeclaim --namespace <namespace>
  4. Verify that the reclaim policy of the PV is set to Retain to allow you to recover the node, if necessary:

    kubectl get persistentvolume --namespace <namespace>

After the Nodewatcher controller has finished, decommission the broker that was removed from the node. This is necessary to prevent a potential loss of quorum and ensure cluster stability.

Make sure to use the --force flag when decommissioning the broker with rpk redpanda admin brokers decommission. This flag is required when the broker is no longer running.