Docs Self-Managed Manage Kubernetes Decommission Brokers Decommission Brokers in Kubernetes Decommissioning a broker is the safe and controlled way to remove a Redpanda broker from the cluster without risking data loss or causing instability. By decommissioning, you ensure that partition replicas are reallocated across the remaining brokers so that you can then safely shut down the broker. You may want to decommission a broker in the following situations: You are removing a broker to decrease the size of the cluster, also known as scaling down. The broker has lost its storage and you need a new broker with a new node ID (broker ID). You are replacing a worker node, for example, by upgrading the Kubernetes cluster or replacing the hardware. When a broker is decommissioned, it cannot rejoin the cluster. If a broker with the same ID tries to rejoin the cluster, it is rejected. Prerequisites You must have the following: Kubernetes cluster: Ensure you have a running Kubernetes cluster, either locally, with minikube or kind, or remotely. Kubectl: Ensure you have the kubectl command-line tool installed and configured to communicate with your cluster. jq: This guide uses jq make parsing JSON output easier. What happens when a broker is decommissioned? When a broker is decommissioned, the controller leader creates a reallocation plan for all partition replicas that are allocated to that broker. By default, this reallocation is done in batches of 50 to avoid overwhelming the remaining brokers with Raft recovery. See partition_autobalancing_concurrent_moves. The reallocation of each partition is translated into a Raft group reconfiguration and executed by the controller leader. The partition leader then handles the reconfiguration for its Raft group. After the reallocation for a partition is complete, it is recorded in the controller log and the status is updated in the topic tables of each broker. The decommissioning process is successful only when all partition reallocations have been completed. The controller leader polls for the status of all the partition-level reallocations to ensure that everything completes as expected. During the decommissioning process, new partitions are not allocated to the broker that is being decommissioned. After all the reallocations have been completed successfully, the broker is removed from the cluster. The decommissioning process is designed to tolerate controller leadership transfers. Should you decommission brokers? Deciding whether to decommission brokers requires careful evaluation of various factors that contribute to the overall health your cluster. For the purposes of this section, the focus is on a cluster with seven brokers. In subsequent sections, the output from the given commands provides additional details to help you determine the minimum number of brokers required in a cluster before it’s safe to decommission brokers. Availability You should have enough brokers to span across each rack or availability zone. Run the following command to determine whether rack awareness is enabled in your cluster: rpk cluster config get enable_rack_awareness When rack awareness is enabled, you can view which rack each broker is assigned to by running the following command: rpk cluster info Example output CLUSTER ======= redpanda.560e2403-3fd6-448c-b720-7b456d0aa78c BROKERS ======= ID HOST PORT RACK 0 redpanda-0.testcluster.local 32180 A 1 redpanda-1.testcluster.local 32180 A 4 redpanda-3.testcluster.local 32180 B 5* redpanda-2.testcluster.local 32180 B 6 redpanda-4.testcluster.local 32180 C 8 redpanda-6.testcluster.local 32180 C 9 redpanda-5.testcluster.local 32180 D The output shows four racks (A/B/C/D), so you might want to have at least four brokers to use all racks. Rack awareness is just one aspect of availability. Refer to High Availability for more details on deploying Redpanda for high availability. Cost Infrastructure costs increase with each broker because each broker requires a dedicated node (instance), so adding a broker means an additional instance cost. For example, if the instance cost is $1925 per month in a cluster with seven brokers, the instance cost for each broker is $275. Reducing the number of brokers from seven to five would save $550 per month ($275 x 2), and reducing it to three brokers would save $1100 per month. You must also consider other costs, but they won’t be as impacted by changing the broker count. Data retention Local data retention is determined by the storage capability of each broker and producer throughput, which is the amount of data produced over a given period. When decommissioning, storage capability must consider both the free storage space and the amount of space already in use by existing partitions. Run the following command to determine how much storage is being used, in bytes, on each broker: rpk cluster logdirs describe --aggregate-into broker Example output BROKER SIZE ERROR 0 263882790656 1 256177979648 2 257698037504 3 259934992896 4 254087316992 5 258369126144 6 255227998208 This example shows that each broker has roughly 240GB of data. This means scaling in to five brokers would require each broker to have at least 337GB to store that same data. Keep in mind that the actual space used on disk will be greater than the data size reported by Redpanda. Redpanda reserves some data on disk per partition and reserves less space per partition as available disk space decreases. Incoming data for each partition is then written to disk as segments (files). The time when segments are written to disk is based on a number of factors, including the topic’s segment configuration, broker restarts, and changes in Raft leadership. Throughput is the primary measurement required to calculate future data storage requirements. For example, if the throughput is 200MB/sec, the application will generate 0.72TB/hour (17.28TB/day, or 120.96TB/wk). Divide this amount by the target number of brokers to get an estimate of how much storage is needed to retain that much data for various periods of time: Retention Disk size (on each of the 5 brokers) 30mins (200MB/sec * 30mins * 1.1) = 0.396TB / 5 brokers = 79.2GB 6hrs (200MB/sec * 6hrs * 1.1) = = 4.752TB / 5 brokers = 950.4GB 1d (200MB/sec * 1d * 1.1) = 19.008TB / 5 brokers = 3.8TB 3d (200MB/sec * 3d * 1.1) = 57.024TB / 5 brokers = 11.4TB In the example cluster, only six hours of data locally must be retained. Any older data can be moved to Tiered Storage with a retention of one year. So each broker should have 1.2TB of storage available, taking into account both throughput and current data. Cost and use case requirements determine how much to spend on local disk capacity. Tiered Storage can help to both decrease costs and expand data retention capabilities. At this point in the example, it remains unclear whether it is safe to scale down to five brokers. Current calculations are based on five brokers. Additionally, some assumptions have been made regarding a constant throughput and perfect data balancing. Throughput fluctuates across all partitions, which causes data imbalance. The calculations presented as examples attempt to accommodate for this by padding disk size by 1%. You can increase this buffer, for example in the case of expected hot spot partitions. For details on sizing, see Sizing Guidelines. Durability The brokers in a Redpanda cluster are part of a Raft group that requires at least enough brokers to form a quorum-based majority (three brokers minimally). Each topic’s partitions are also Raft groups, so your cluster also needs to have at least as many brokers as the lowest replication factor across all topics. To find the maximum replication factor across all topics in a cluster, run the following command: rpk topic list | tail -n +2 | awk '{print $3}' | sort -n | tail -1 Example output: 5 In this example, the highest replication factor is five, which means that at least five brokers are required in this cluster. Generally, a cluster can withstand a higher number of brokers going down if more brokers exist in the cluster. For details, see Raft consensus algorithm. Partition count It is best practice to make sure the total partition count does not exceed 1K per core. This maximum partition count depends on many other factors, such as memory per core, CPU performance, throughput, and latency requirements. Exceeding 1K partitions per core can lead to increased latency, an increased number of partition leadership elections, and generally reduced stability. Run the following command to get the total partition count for your cluster: curl -sk http://<broker-url>:<admin-api-port>/v1/partitions/local_summary | jq .count Example output: 3018 Next, determine the number of cores that are available across the remaining brokers: rpk redpanda admin brokers list Example output NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION 0 8 active true v23.1.8 1 8 active true v23.1.8 2 8 active true v23.1.8 3 8 active true v23.1.8 4 8 active true v23.1.8 5 8 active true v23.1.8 6 8 active true v23.1.8 In this example, each broker has eight cores available. If you plan to scale down to five brokers, then you would have 40 cores available, which means that your cluster is limited by core count to 40K partitions, which exceeds the current 3018 partitions. To best ensure the stability of the cluster, maintain less than 50K partitions per cluster. Decommission assessment The considerations tested above yield the following for the example case: At least four brokers are required based on availability. Cost is not a limiting factor in this example, but lower cost and lower broker count is always best. At least 1.2TB of data resides on each broker when spread across five brokers. This falls within the 1.5TB of local storage available in the example. At least five brokers are required based on the highest replication factor across all topics. At 3018 partitions, the partition count is so low as to not be a determining factor in broker count (a single broker in this example environment could handle many more partitions). So the primary limitation consideration is the replication factor of five, meaning that you could scale down to five brokers at minimum. Decommission a broker To decommission a broker, you can use one of the following methods: Manually decommission a broker: Use rpk to decommission one broker at a time. Use the Decommission controller: Use the Decommission controller to automatically decommission brokers whenever you reduce the number of StatefulSet replicas. Manually decommission a broker Follow this workflow to manually decommission a broker before reducing the number of StatefulSet replicas: flowchart TB %% Define classes classDef userAction stroke:#374D7C, fill:#E2EBFF, font-weight:bold,rx:5,ry:5 A[Start Manual Scale-In]:::userAction --> B["Identify broker to R=remove(highest Pod ordinal)"]:::userAction B --> C[Decommission broker running on Pod with highest ordinal]:::userAction C --> D[Monitor decommission status]:::userAction D --> E{Is broker removed?}:::userAction E -- No --> D E -- Yes --> F[Decrease StatefulSet replicas by 1]:::userAction F --> G[Wait for rolling update and cluster health]:::userAction G --> H{More brokers to remove?}:::userAction H -- Yes --> B H -- No --> I[Done]:::userAction List your brokers and their associated broker IDs: kubectl --namespace <namespace> exec -ti redpanda-0 -c redpanda -- \ rpk cluster info Example output CLUSTER ======= redpanda.560e2403-3fd6-448c-b720-7b456d0aa78c BROKERS ======= ID HOST PORT RACK 0 redpanda-0.testcluster.local 32180 A 1 redpanda-1.testcluster.local 32180 A 4 redpanda-3.testcluster.local 32180 B 5* redpanda-2.testcluster.local 32180 B 6 redpanda-4.testcluster.local 32180 C 8 redpanda-6.testcluster.local 32180 C 9 redpanda-5.testcluster.local 32180 D The output shows that the IDs don’t match the StatefulSet ordinal, which appears in the hostname. In this example, two brokers will be decommissioned: redpanda-6 (ID 8) and redpanda-5 (ID 9). When scaling in a cluster, you cannot choose which broker is removed. Redpanda is deployed as a StatefulSet in Kubernetes. The StatefulSet controls which Pods are destroyed and always starts with the Pod that has the highest ordinal. So the first broker to be removed when updating the StatefulSet in this example is redpanda-6 (ID 8). Decommission the broker with the highest Pod ordinal: kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \ rpk redpanda admin brokers decommission <broker-id> This message is displayed before the decommission process is complete. Success, broker <broker-id> has been decommissioned! If the broker is not running, use the --force flag. Monitor the decommissioning status: kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \ rpk redpanda admin brokers decommission-status <broker-id> The output uses cached cluster health data that is refreshed every 10 seconds. When the completion column for all rows is 100%, the broker is decommissioned. Another way to verify decommission is complete is by running the following command: kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \ rpk cluster health Be sure to verify that the decommissioned broker’s ID does not appear in the list of IDs. In this example, ID 9 is missing, which means the decommission is complete. CLUSTER HEALTH OVERVIEW ======================= Healthy: true Controller ID: 0 All nodes: [4 1 0 5 6 8] Nodes down: [] Leaderless partitions: [] Under-replicated partitions: [] Decrease the number of replicas by one to remove the Pod with the highest ordinal (the one you just decommissioned). When scaling in (removing brokers), remove only one broker at a time. If you reduce the StatefulSet replicas by more than one, Kubernetes can terminate multiple Pods simultaneously, causing quorum loss and cluster unavailability. Helm + Operator Helm redpanda-cluster.yaml apiVersion: cluster.redpanda.com/v1alpha2 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: statefulset: replicas: <number-of-replicas> Apply the Redpanda resource: kubectl apply -f redpanda-cluster.yaml --namespace <namespace> --values --set decommission.yaml statefulset: replicas: <number-of-replicas> helm upgrade redpanda redpanda/redpanda --namespace <namespace> --wait --reuse-values --set statefulset.replicas=<number-of-replicas> This process triggers a rolling restart of each Pod so that each broker has an up-to-date seed_servers configuration to reflect the new list of brokers. You can repeat this procedure to continue to scale down. Use the Decommission controller The Decommission controller is responsible for monitoring the StatefulSet for changes in the number replicas. When the number of replicas is reduced, the controller decommissions brokers, starting from the highest Pod ordinal, until the number of brokers matches the number of replicas. flowchart TB %% Define classes classDef userAction stroke:#374D7C, fill:#E2EBFF, font-weight:bold,rx:5,ry:5 classDef systemAction fill:#F6FBF6,stroke:#25855a,stroke-width:2px,color:#20293c,rx:5,ry:5 %% Legend subgraph Legend direction TB UA([User action]):::userAction SE([System event]):::systemAction end Legend ~~~ Workflow %% Main workflow subgraph Workflow direction TB A[Start automated scale-in]:::userAction --> B[Decrease StatefulSetreplicas by 1]:::userAction B --> C[Decommission controllerdetects reduced replicas]:::systemEvent C --> D[Controller markshighest ordinal Pod for removal]:::systemEvent D --> E[Controller orchestratesbroker decommission]:::systemEvent E --> F[Partitions reallocateunder controller supervision]:::systemEvent F --> G[Check cluster health]:::systemEvent G --> H{Broker fully removed?}:::systemEvent H -- No --> F H -- Yes --> I[Done,or repeat if further scale-in needed]:::userAction end For example, you have a Redpanda cluster with the following brokers: ID HOST 0 redpanda-0.testcluster.local 1 redpanda-1.testcluster.local 4 redpanda-3.testcluster.local 5* redpanda-2.testcluster.local 6 redpanda-4.testcluster.local 8 redpanda-6.testcluster.local 9 redpanda-5.testcluster.local The IDs are the broker IDs. The output shows that the IDs don’t match the StatefulSet ordinal, which appears in the hostname. In this example, the Pod with the highest ordinal is redpanda-6 (ID 8). You cannot choose which broker is decommissioned. Redpanda is deployed as a StatefulSet in Kubernetes. The StatefulSet controls which Pods are destroyed and always starts with the Pod that has the highest ordinal. So the first broker to be destroyed when the controller decommissions the brokers in this example is redpanda-6 (ID 8). When you reduce the number of replicas, the controller terminates the Pod with the highest ordinal, removes its PVC, and then attempts to set the reclaim policy of the PV to Retain. Finally, the controller waits for the cluster state to become healthy before committing to decommissioning the broker that was running in the terminated Pod. Always decommission one broker at a time. Install the Decommission controller: Helm + Operator Helm You can install the Decommission controller as part of the Redpanda Operator or as a sidecar on each Pod that runs a Redpanda broker. When you install the controller as part of the Redpanda Operator, it monitors all Redpanda clusters running in the same namespace as the Redpanda Operator. If you want the controller to manage only a single Redpanda cluster, install it as a sidecar on each Pod that runs a Redpanda broker, using the Redpanda resource. To install the Decommission controller as part of the Redpanda Operator: Deploy the Redpanda Operator with the Decommission controller: helm repo add redpanda https://charts.redpanda.com helm repo update helm upgrade --install redpanda-controller redpanda/operator \ --namespace <namespace> \ --set image.tag=v2.3.6-24.3.3 \ --create-namespace \ --set additionalCmdFlags={--additional-controllers="decommission"} \ --set rbac.createAdditionalControllerCRs=true --additional-controllers="decommission": Enables the Decommission controller. rbac.createAdditionalControllerCRs=true: Creates the required RBAC rules for the Redpanda Operator to monitor the StatefulSet and update PVCs and PVs. Configure a Redpanda resource with seven Redpanda brokers: redpanda-cluster.yaml apiVersion: cluster.redpanda.com/v1alpha2 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: statefulset: replicas: 7 statefulset.replicas: This example starts with a seven-broker Redpanda cluster. Apply the Redpanda resource: kubectl apply -f redpanda-cluster.yaml --namespace <namespace> To install the Decommission controller as a sidecar: Configure a Redpanda resource with the sidecar controller enabled: redpanda-cluster.yaml apiVersion: cluster.redpanda.com/v1alpha2 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: statefulset: replicas: 7 sideCars: controllers: enabled: true run: - "decommission" rbac: enabled: true statefulset.replicas: This example starts with a seven-broker Redpanda cluster. statefulset.sideCars.controllers.enabled: Enables the controllers sidecar. statefulset.sideCars.controllers.run: Enables the Decommission controller. rbac.enabled: Creates the required RBAC rules for the controller to monitor the StatefulSet and update PVCs and PVs. Apply the Redpanda resource: kubectl apply -f redpanda-cluster.yaml --namespace <namespace> If you deploy the Redpanda Helm chart with Argo CD, you cannot use the Decommission controller. --values --set decommission-controller.yaml statefulset: replicas: 7 sideCars: controllers: enabled: true run: - "decommission" rbac: enabled: true statefulset.replicas: This example starts with a seven-broker Redpanda cluster. statefulset.sideCars.controllers.enabled: Enables the controllers sidecar. statefulset.sideCars.controllers.run: Enables the Decommission controller. rbac.enabled: Creates the required RBAC rules for the controller to monitor the StatefulSet and update PVCs and PVs. helm upgrade --install redpanda redpanda/redpanda \ --namespace <namespace> \ --create-namespace \ --set statefulset.replicas=7 \ --set statefulset.sideCars.controllers.enabled=true \ --set statefulset.sideCars.controllers.run={"decommission"} \ --set rbac.enabled=true statefulset.replicas: This example starts with a seven-broker Redpanda cluster. statefulset.sideCars.controllers.enabled: Enables the controllers sidecar. statefulset.sideCars.controllers.run: Enables the Decommission controller. rbac.enabled: Creates the required RBAC rules for the controller to monitor the StatefulSet and update PVCs and PVs. Verify that your cluster is in a healthy state: kubectl exec redpanda-0 --namespace <namespace> -- rpk cluster health Decrease the number of replicas by one. When scaling in (removing brokers), remove only one broker at a time. If you reduce the StatefulSet replicas by more than one, Kubernetes can terminate multiple Pods simultaneously, causing quorum loss and cluster unavailability. Helm + Operator Helm redpanda-cluster.yaml apiVersion: cluster.redpanda.com/v1alpha2 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: statefulset: replicas: 6 sideCars: controllers: enabled: true run: - "decommission" rbac: enabled: true kubectl apply -f redpanda-cluster.yaml --namespace <namespace> --values --set replicas.yaml statefulset: replicas: 6 helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values replicas.yaml --reuse-values helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set statefulset.replicas=6 \ --set statefulset.sideCars.controllers.enabled=true \ --set statefulset.sideCars.controllers.run={"decommission"} \ --set rbac.enabled=true The Decommission controller detects when the number of replicas decreases and decommissions the brokers, starting from the Pod with the highest ordinal. This process triggers a rolling restart of each Pod so that each broker has an up-to-date seed_servers configuration to reflect the new list of brokers. Verify that your cluster is in a healthy state: kubectl exec redpanda-0 --namespace <namespace> -- rpk cluster health It may take some time for the Decommission controller to reconcile. You can check the progress by looking at the Decommission controller logs: If you’re running the Decommission controller as part of the Redpanda Operator: kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace> If you’re running the Decommission controller as a sidecar: kubectl logs <pod-name> --namespace <namespace> -c redpanda-controllers You can repeat this procedure to continue to scale down. Troubleshooting If the decommissioning process is not making progress, investigate the following potential issues: Absence of a controller leader or partition leader: The controller leader serves as the orchestrator for decommissioning. Additionally, if one of the partitions undergoing reconfiguration does not have a leader, the reconfiguration process may stall. Make sure that an elected leader is present for all partitions. Bandwidth limitations for partition recovery: Try increasing the value of raft_learner_recovery_rate, and monitor the status using the redpanda_raft_recovery_partition_movement_available_bandwidth metric. If these steps do not allow the decommissioning process to complete, enable TRACE level logging in the Helm chart to investigate any other issues. For default values and documentation for configuration options, see the values.yaml file. Next steps If you have rack awareness enabled, you may want to reassign the remaining brokers to appropriate racks after the decommission process is complete. See Enable Rack Awareness in Kubernetes. Suggested reading rpk-redpanda-admin-brokers-decommission Engineering a more robust Raft group reconfiguration Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution Install the Nodewatcher Controller Recovery Mode