Maintenance Mode

Maintenance mode lets you take a Redpanda broker offline temporarily while minimizing disruption to client operations. When a broker is in maintenance mode, you can safely perform operations that require the Redpanda process to be temporarily stopped; for example, system maintenance or a rolling upgrade.

A broker cannot be decommissioned while it’s in maintenance mode. Take the broker out of maintenance mode first by running rpk cluster maintenance disable <node-id>.

When a broker is placed in maintenance mode, if the replication factor is greater than one, it reassigns partition leadership to other brokers in the cluster. The broker is not eligible for partition leadership again until it is taken out of maintenance mode.

Maintenance mode only transfers leadership. It does not move any partitions to other brokers in the cluster.
If a broker hosts a partition with a replica count of one, the partition is unavailable when the Redpanda process is not running.
Maintenance mode operations only handle a single broker in maintenance at a time. If you attempt to place more than one broker in maintenance mode, you will get an error.

The amount of time it takes to drain a broker and reassign leadership depends on the number of partitions and the health of the cluster. For healthy clusters, draining leadership should take less than a minute. For unhealthy clusters (for example, when follower is not in sync with the leader), draining the broker can take longer. Note that the draining process won’t start until the cluster is healthy.

When a broker is in maintenance mode, Redpanda continues to replicate updates to that broker. When the broker is taken offline, partitions with replicas on the broker could become out of sync until the broker is brought back online. When the broker is available again, data is copied to under-replicated replicas on the broker until all affected partitions are in sync with the leader.

Place a broker in maintenance mode

Before shutting down a broker, you may want to temporarily disable or ignore alerts related to under-replicated partitions. These alerts likely point to false positives as a result of the broker being taken offline and replicas being temporarily unreachable. To prevent under-replicated partitions altogether, you can move all partitions to other brokers.

To place a broker into maintenance mode, run:

rpk cluster maintenance enable <node-id> --wait

The --wait option ensures that rpk waits until leadership is drained from the broker before responding.

To remove a broker from maintenance mode (and thus enable the broker to start taking leadership of partitions), run:

rpk cluster maintenance disable <node-id>

To see the maintenance status of brokers in the cluster, run:

rpk cluster maintenance status

The output of this command identifies which brokers in the cluster are in the process of draining leadership, which brokers are finished with that process, and whether any brokers had errors.