Streaming

Node-wise Partition Recovery

Multi-broker or entire AZ failures (especially in cloud environments), along with some forms of human error, can result in ‘stuck’ partitions where there are fewer replicas than required to make a quorum. In such failure scenarios, some data loss may be unavoidable. Node-wise partition recovery provides a way to unsafely recover at least a portion of your data using remaining replicas, which are moved off of target brokers and allocated to healthy ones. In one step, this process repairs partitions while draining the target brokers of all partition replicas. This topic helps admins understand what they can or cannot recover using node-wise partition recovery.

Only use this operation as a last-resort measure when all other recovery options have failed. In some cases, there may be no remaining replicas for the partitions on the dead brokers. This recovery method is intended for scenarios where you have already experienced data loss, with the goal being to stop the loss of additional data.

Perform the recovery operation

To start node-wise partition recovery, run rpk cluster partitions unsafe-recover. For example:

rpk cluster partitions unsafe-recover --from-nodes 1,3,5

This command includes a prompt to confirm the generated recovery plan, as it is a destructive operation. When you run node-wise partition recovery, the partitions on the broker are rebuilt on a best-effort basis. When there are zero surviving partition replicas, such as a topic with a replication factor of 1 (RF=1), partition recovery rebuilds empty partitions with no data (although you may be able to recover the partition from Tiered Storage), allowing producers to continue writing to the partition even though no data can be recovered in such situations.

The --from-nodes flag accepts a comma-separated list of the brokers' node IDs you wish to recover the data from. This example performs recovery operations on nodes 1, 3, and 5. Redpanda assesses these brokers to identify which partitions lack a majority. It then creates a plan to recover the impacted partitions and prompts you for confirmation. You must respond yes to continue with recovery.

The --dry flag performs a dry run and allows you to view the recovery plan with no risk to your cluster.

When running node-wise partition recovery, it’s possible that there may be more recent data (a higher offset) available in Tiered Storage if:

Raft replication was stuck or slow before the node failure
Zero live replicas remain in the cluster (because the partition had a replication factor of one, RF=1)

For topics configured to use Tiered Storage, Redpanda also attempts to recover partition data from object storage, recovering the latest offset available for a partition in either storage tier (local or object storage). This allows for the maximum amount of data to be recovered in all cases, even for topics with a replication factor of 1, where no replicas remain in local storage.

The recovery operation can take some time to complete, especially for a large amount of data. To monitor the status of the recovery operation in real-time, run:

rpk cluster partitions balancer-status

Example recovery operations

The following example shows the node-wise partition recovery process in action:

$ rpk cluster partitions unsafe-recover --from-nodes 1
NAMESPACE  TOPIC  PARTITION  REPLICA-CORE  DEAD-NODES
kafka      bar    0          [1-1]         [1]
? Confirm recovery from these nodes? Yes
Executing recovery plan...
Successfully queued the recovery plan, you may check the status by running 'rpk cluster partitions balancer-status'

$ rpk cluster partitions balancer-status
Status:                               ready
Seconds Since Last Tick:              26
Current Reassignment Count:           0
Partitions Pending Recovery (1):      [kafka/bar/0]

The following example shows the status of moved partitions:

$ rpk cluster partitions move-status
PARTITION MOVEMENTS
===================
NAMESPACE-TOPIC   PARTITION  MOVING-FROM MOVING-TO  COMPLETION-%  PARTITION-SIZE BYTES-MOVED BYTES-REMAINING
kafka/prod_tests  4          [045]       [045]      0             56204032205    0           56204032205
kafka/prod_tests  7          [045]       [045]      0             64607340009    0           64607340009
kafka/prod_tests  12         [014]       [014]      0             29074311639    0           29074311639
kafka/prod_tests  20         [014]       [014]      0             29673620476    0           29673620476
kafka/prod_tests  22         [045]       [045]      0             28471089141    0           28471089141
kafka/prod_tests  23         [045]       [045]      0             29692435312    0           29692435312
kafka/prod_tests  31         [014]       [014]      0             66982232299    0           66982232299
kafka/prod_tests  33         [014]       [014]      0             46329276747    0           46329276747