Upgrade
To benefit from Redpanda’s new features and enhancements, upgrade to the latest version. Redpanda recommends that you perform a rolling upgrade on production clusters, which requires all brokers to be placed into maintenance mode and restarted separately, one after the other.
Redpanda version numbers follow the convention AB.C.D, where AB is the two digit year, C is the feature release, and D is the patch release. For example, version 22.3.1 indicates the first patch release on the third feature release of the year 2022.
|
Prerequisites
-
A running Redpanda cluster.
-
jq for listing available versions.
-
An understanding of the impact of broker restarts on clients, node CPU, and any alerting systems you use.
Find a new version
Before you upgrade, find out which Redpanda version you are currently running, whether you can upgrade straight to the new version, and what’s changed since your original version. To find your current version, run:
-
Linux
-
Docker
-
macOS
rpk redpanda admin brokers list
For all available flags, see the rpk redpanda admin brokers list
command reference.
Running Redpanda directly on Docker is not supported for production usage. This platform should only be used for testing. |
docker exec -it <container_name><container_tag> rpk version
Remember to replace the variables <container_name>
and <container_tag>
. The container tag determines which version of rpk
to use. The release process bundles rpk
and Redpanda
into the same container tag with the same version.
brew list --versions | grep redpanda
Example output
v22.3.11 (rev 9eefb90)
If your current version is more than one feature release behind the latest Redpanda version, you must first upgrade to an intermediate version. To list all available versions, run:
|
Check the release notes to find information about what has changed between Redpanda versions.
Impact of broker restarts
When brokers restart, clients may experience higher latency, nodes may experience CPU spikes when the broker becomes available again, and you may receive alerts about under-replicated partitions.
Temporary increase in latency on clients (producers and consumers)
When you restart one or more brokers in a cluster, clients (consumers and producers) may experience higher latency due to partition leadership reassignment. Because clients must communicate with the leader of a partition, they may send a request to a broker whose leadership has been transferred, and receive NOT_LEADER_FOR_PARTITION
. In this case, clients must request metadata from the cluster to find out the address of the new leader. Clients refresh their metadata periodically, or when the client receives some retryable errors that indicate that the metadata may be stale. For example:
-
Broker A shuts down.
-
Client sends a request to broker A, and receives
NOT_LEADER_FOR_PARTITION
. -
Client requests metadata, and learns that the new leader is broker B.
-
Client sends the request to broker B.
CPU spikes upon broker restart
When a restarted broker becomes available again, you may see your nodes' CPU usage increase temporarily. This temporary increase in CPU usage is due to the cluster rebalancing the partition replicas.
Under-replicated partitions
When a broker is in maintenance mode, Redpanda continues to replicate updates to that broker. When a broker is taken offline during a restart, partitions with replicas on the broker could become out of sync until it is brought back online. Once the broker is available again, data is copied to its under-replicated replicas until all affected partitions are in sync with the partition leader.
Perform a rolling upgrade
A rolling upgrade involves putting a broker into maintenance mode, upgrading the broker, taking the broker out of maintenance mode, and then repeating the process on the next broker in the cluster. Placing brokers into maintenance mode ensures a smooth upgrade of your cluster while reducing the risk of interruption or degradation in service.
When a broker is placed into maintenance mode, it reassigns its partition leadership to other brokers for all topics that have a replication factor greater than one. Reassigning partition leadership involves draining leadership from the broker and transferring that leadership to another broker. If you have topics with replication.factor=1
, and if you have sufficient disk space, Redpanda recommends temporarily increasing the replication factor. This can help limit outages for these topics during the rolling upgrade. Do this before the upgrade to make sure there’s time for the data to replicate to other brokers. For more information, see Change topic replication factor.
To ensure that all brokers are active before upgrading, run:
rpk redpanda admin brokers list
All brokers should show active
for MEMBERSHIP-STATUS
and true
for IS-ALIVE
:
Example output
NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION
0 1 active true v22.3.11
1 1 active true v22.3.11
2 1 active true v22.3.11
New features in a version are enabled after all brokers in the cluster are upgraded. If problems occur, the upgrade is not committed.
Redpanda supports consumer offsets starting in version 22.1. When upgrading from version 21.11 to 22.1, after all brokers are upgraded, Redpanda starts to migrate consumer group topics to __consumer_offsets . This takes some time, depending on the data size. Until it finishes, all consumer group-related operations (consume, offset commit, coordinator election) are blocked. The migration to consumer offsets is complete when you see consumer offset feature enabled in all brokers.
|
Enable maintenance mode
-
Check that all brokers are healthy:
rpk cluster health
-
Select a broker that has not been upgraded yet and place it into maintenance mode:
rpk cluster maintenance enable <node-id> --wait
The --wait option tells the command to wait until a given broker, 0 in this example, finishes draining all partitions it originally served. After the partition draining completes, the command completes.
-
Verify that the broker is in maintenance mode:
rpk cluster maintenance status
-
Validate again the health of the cluster:
rpk cluster health
You can also evaluate external metrics to determine cluster health. If the cluster has any issues, take the broker out of maintenance mode by running the following command before proceeding with other operations, such as decommissioning or retrying the rolling upgrade:
rpk cluster maintenance disable <node-id>
Upgrade your version
-
Linux
-
Docker
-
macOS
For Linux distributions, the process changes according to the distribution:
-
Fedora/RedHat
-
Debian/Ubuntu
On the terminal, run:
sudo yum update redpanda
On the terminal, run:
sudo apt update
sudo apt install redpanda
Running Redpanda directly on Docker is not supported for production usage. This platform should only be used for testing. |
To perform an upgrade you must replace the current image with a new one.
First, check which image is currently running on your Docker:
docker ps
Stop and remove Redpanda’s container(s):
docker stop <container_id>
...
docker rm <container_id>
Remove current images:
docker rmi <image_id>
Pull the desired Redpanda’s version, or adjust the setting to latest
in the version
tag:
docker pull docker.redpanda.com/redpandadata/redpanda:<version>
After it completes, restart the cluster:
docker restart <container_name>
For more information, see Redpanda Quickstart for Docker.
If you previously installed Redpanda with brew, run:
brew upgrade redpanda-data/tap/redpanda
For installations from binary files, download the preferred version from the release list and then overwrite the current rpk file in the installed location.
Check metrics
Check the following metrics before continuing with the upgrade:
Metric | Description |
---|---|
If this shows any non-zero value, then replication cannot catch up, and the upgrade should be paused. |
|
Before restart, wait for this to show zero unavailable partitions. |
|
Before restart, the produce and consume rate for each broker should recover to the pre-upgrade value. |
|
Before restart, the p99 histogram should recover to the pre-upgrade value. |
|
Before restart, the p99 histogram should recover to the pre-upgrade value. |
|
Check the CPU utilization. The derivative gives you a 0.0-1.0 value for how much time the core was busy in a given second. |
Restart broker
Restart the broker’s Redpanda service with rpk redpanda stop
, then rpk redpanda start
.
Disable maintenance mode
After you’ve successfully upgraded the broker:
-
Take the broker out of maintenance mode:
rpk cluster maintenance disable <node-id>
Successfully disabled maintenance mode for node 0
-
Ensure that the broker is no longer in maintenance mode:
rpk cluster maintenance status
NODE-ID DRAINING FINISHED ERRORS PARTITIONS ELIGIBLE TRANSFERRING FAILED 0 false false false 0 0 0 0 1 false false false 0 0 0 0 2 false false false 0 0 0 0
Suggested reading
To set up a real-time dashboard to monitor your cluster health, see Monitor Redpanda.