Rolling Upgrades
One of the primary uses for maintenance mode is to perform a rolling upgrade on each node in the cluster. This process involves putting a node into maintenance mode, upgrading the node, waiting to see if there are any issues, taking the node out of maintenance mode, and then repeating the process on the next node in the cluster. Placing nodes into maintenance mode ensures a smooth upgrade of your cluster while reducing the risk of interruption or degradation in service.
If you are using the Redpanda Kubernetes Operator to trigger an upgrade, then node maintenance mode is automatically incorporated into the upgrade process. |
The upgrade process cannot be applied to downgrades between major versions, since earlier versions might not support some data formats in the current version. For example, you can’t downgrade from version 22.2.1 to version 22.1.7. However, downgrades are supported between minor versions; for example, you can downgrade from version 22.2.2 to version 22.2.1.
Performing a rolling upgrade
To perform a rolling upgrade:
-
Check to ensure all nodes are healthy by running
rpk cluster health
. -
Select a node that has not been upgraded yet and place it into maintenance mode using
rpk cluster maintenance enable <node-id> --wait
. The--wait
option ensures that the cluster is healthy before putting the node into maintenance mode. -
Verify that the node is in maintenance mode by running
rpk cluster maintenance status
and confirming thatFINISHED
istrue
. -
Validate the health of the cluster by running
rpk cluster health
.If the cluster is healthy, the output shows
Healthy: true
. You can also evaluate external metrics to determine cluster health. If the cluster has any issues, take the node out of maintenance mode by runningrpk cluster maintenance disable <node-id>
before proceeding with other operations, such as decommissioning or retrying the rolling upgrade. -
If there are no issues, perform a node upgrade by following the version upgrade process specific to your deployment model. The basic steps include node shutdown, upgrade, and restart.
-
Take the node out of maintenance mode by running
rpk cluster maintenance disable <node-id>
. -
Run
rpk cluster maintenance status
to ensure that the node is no longer in maintenance mode. -
Restart the node.
-
Repeat this process on another node in the cluster.
You can stop the upgrade process and roll back to the original version as long as you have not upgraded every node and restarted the cluster. |