Node maintenance mode enables you to take a Redpanda node offline temporarily while minimizing disruption to client operations. When a node is in maintenance mode, you can safely perform operations that require the Redpanda process to be temporarily stopped; for example, performing hardware or system maintenance, or performing a rolling upgrade.
A node cannot be decommissioned while it is in maintenance mode. Take the node out of maintenance mode first by running
rpk cluster maintenance disable <node-id>.
When a node is placed in maintenance mode, it reassigns partition leadership to other nodes in the cluster. The node is not eligible for partition leadership again until it is taken out of maintenance mode. Note that maintenance mode only transfers leadership; it does not move any partitions to other nodes in the cluster.
If a node hosts a partition with a replica count of 1, you have three options:
- Manually reassign the partition to another node by changing its replica set.
- Increase the number of replicas.
- Do nothing, but be aware that the partition will not be available while the
redpandaprocess is shut down temporarily.
The amount of time it takes to drain a node and reassign leadership depends on the number of partitions and how healthy the cluster is. For healthy clusters, draining leadership should take less than a minute. If the cluster is unhealthy (for example, a follower is not in sync with the leader), then draining the node can take even longer. Note that the draining process won’t start until the cluster is healthy.
When a node is in maintenance mode, Redpanda continues to replicate updates to that node. When the node is taken offline, partitions with replicas on the node could become out of sync until the node is brought back online. Once the node is available again, data is copied to under-replicated replicas on the node until all affected partitions are in sync with the leader.
Placing a node in maintenance mode
Before placing a node in maintenance mode, you may want to temporarily disable or ignore alerts related to under-replicated partitions. To prevent under-replicated partitions altogether, you can move all partitions to other nodes.
To place a node into maintenance mode:
rpk cluster maintenance enable <node-id> --wait
--wait option ensures that
rpk waits until leadership is drained from the node before responding.
To remove a node from maintenance mode (and thus enable the node to start taking leadership of partitions):
rpk cluster maintenance disable <node-id>
To see the maintenance status of nodes in the cluster:
rpk cluster maintenance status
The output of this command identifies which nodes in the cluster are in the process of draining leadership, which nodes are finished with that process, and whether any nodes had resulting errors.