Skip to main content
Version: 23.1

Decommission Brokers

Loading...

When you decommission a broker, its partition replicas are reallocated across the remaining brokers and it is removed from the cluster. You may want to decommission a broker in the following circumstances:

  • The broker has lost its storage and you need a new broker with a new node ID (broker ID).
  • You are replacing a node, for example to upgrade the Linux kernel or to replace the hardware.
  • You are removing a broker to decrease the size of the cluster.
important

When a broker is decommissioned, it cannot rejoin the cluster. If a broker with the same ID tries to rejoin the cluster, it is rejected.

What happens when a broker is decommissioned

When a broker is decommissioned, the controller leader creates a reallocation plan for all partition replicas that are allocated to that broker. By default, this reallocation is done in batches of 50 to avoid overwhelming the remaining brokers with Raft recovery. See partition_autobalancing_concurrent_moves.

The reallocation of each partition is translated into a Raft group reconfiguration and executed by the controller leader. The partition leader then handles the reconfiguration for its Raft group. After the reallocation for a partition is complete, it is recorded in the controller log and the status is updated in the topic tables of each broker.

The decommissioning process is successful only when all partition reallocations have been completed successfully. The controller leader polls for the status of all the partition-level reallocations to ensure that everything completes as expected.

During the decommissioning process, new partitions are not allocated to the broker that is being decommissioned. After all the reallocations have been completed successfully, the broker is removed from the cluster.

note

The decommissioning process is designed to tolerate controller leadership transfers.

Decommission a broker

  1. List your brokers and their associated broker IDs:

    rpk cluster info \
    --brokers <broker-url>:<kafka-api-port>
  2. Decommission the broker with your selected broker ID:

    rpk redpanda admin brokers decommission <broker-id> \
    --hosts <broker-url>:<admin-api-port> \
    --force
    note

    The --force flag is required only if the broker is not running.

    If you see Success, broker <broker-id> has been decommissioned!, the broker is decommissioned. Otherwise, the decommissioning process is still in progress. You can monitor the decommissioning status to follow its progress.

  3. Monitor the decommissioning status:

    rpk redpanda admin brokers decommission-status <broker-id> \
    --api-urls <broker-url>:<admin-api-port>

    The output uses cached cluster health data that is refreshed every 10 seconds.

When the completion column for all rows is 100%, the broker is decommissioned.

important

If you add a new broker, make sure to give it a unique ID. Do not reuse the ID of the decommissioned broker.

Troubleshooting

If the decommissioning process is not making progress, investigate the following potential issues:

  • Absence of a controller leader or partition leader: The controller leader serves as the orchestrator for decommissioning. Additionally, if one of the partitions undergoing reconfiguration does not have a leader, the reconfiguration process may stall. Make sure that an elected leader is present for all partitions.

  • Bandwidth limitations for partition recovery: Try increasing the value of raft_learner_recovery_rate, and monitor the status using the redpanda_raft_recovery_partition_movement_available_bandwidth metric.

If these steps do not allow the decommissioning process to complete, enable TRACE level logging on the controller leader to investigate any other issues.

Suggested reading

What do you like about this doc?




Optional: Share your email address if we can contact you about your feedback.

Let us know what we do well: