Rolling Upgrades

One of the primary uses for maintenance mode is to perform a rolling upgrade on each node in the cluster. This process involves putting a node into maintenance mode, upgrading the node, waiting to see if there are any issues, taking the node out of maintenance mode, and then repeating the process on the next node in the cluster. Placing nodes into maintenance mode ensures a smooth upgrade of your cluster while reducing the risk of interruption or degradation in service.

If you are using the Redpanda Kubernetes Operator to trigger an upgrade, then node maintenance mode is automatically incorporated into the upgrade process.

The upgrade process cannot be applied to downgrades between major versions, since earlier versions might not support some data formats in the current version. For example, you can’t downgrade from version 22.2.1 to version 22.1.7. However, downgrades are supported between minor versions; for example, you can downgrade from version 22.2.2 to version 22.2.1.

Performing a rolling upgrade

To perform a rolling upgrade:

  1. Check to ensure all nodes are healthy by running rpk cluster health.

  2. Select a node that has not been upgraded yet and place it into maintenance mode using rpk cluster maintenance enable <node-id> --wait. The --wait option ensures that the cluster is healthy before putting the node into maintenance mode.

  3. Verify that the node is in maintenance mode by running rpk cluster maintenance status and confirming that FINISHED is true.

  4. Validate the health of the cluster by running rpk cluster health.

    If the cluster is healthy, the output shows Healthy: true. You can also evaluate external metrics to determine cluster health. If the cluster has any issues, take the node out of maintenance mode by running rpk cluster maintenance disable <node-id> before proceeding with other operations, such as decommissioning or retrying the rolling upgrade.

  5. If there are no issues, perform a node upgrade by following the version upgrade process specific to your deployment model. The basic steps include node shutdown, upgrade, and restart.

  6. Take the node out of maintenance mode by running rpk cluster maintenance disable <node-id>.

  7. Run rpk cluster maintenance status to ensure that the node is no longer in maintenance mode.

  8. Restart the node.

  9. Repeat this process on another node in the cluster.

You can stop the upgrade process and roll back to the original version as long as you have not upgraded every node and restarted the cluster.