Forced partition recovery

You can use the Redpanda Admin API to recover a partition that is unavailable and has lost a majority of its replicas. This can occur when the partition replicas have lost Raft consensus, for instance if brokers in a Raft group fail, preventing the group from reaching a majority and electing a new leader.

Redpanda performs forced partition recovery by promoting the best available replica to leader, making the partition available for produce and consume. Typically, Redpanda chooses the replica with the highest offset.

Forced partition recovery allows some potential data loss on the partition if the best available replica is out of sync, ending up in an unclean leader election. Use this operation with caution, and only when brokers have failed beyond recovery to the point that the remaining replicas cannot form a majority.
If you want to instead force recover all partitions in bulk from a set of failed brokers, use nodewise recovery.

Use the Admin API to recover a partition

The following examples assume that partition 0 in topic test is unavailable and its replicas cannot form a majority to elect a leader.

  1. Call the /partitions/kafka/<topic> endpoint to determine the current broker and shard assignments of the partition replicas.

    curl http://localhost:9644/v1/partitions/kafka/test
    [
      {
        "ns": "kafka",
        "topic": "test",
        "partition_id": 0,
        "status": "done",
        "leader_id": 1,
        "raft_group_id": 1,
        "replicas": [
          {
            "node_id": 1,
            "core": 1
          },
          {
            "node_id": 2,
            "core": 1
          },
          {
            "node_id": 3,
            "core": 1
          }
        ]
      }
    ]
  2. In this scenario, brokers 2 and 3 have failed and you want to move the replicas from those brokers to brokers 4 and 5, which are healthy.

    Make a POST request to the /debug/partitions/kafka/<topic>/<partition>/force_replicas endpoint:

    curl -X POST http://localhost:9644/v1/debug/partitions/kafka/test/0/force_replicas \
      -H 'Content-Type: application/json'
      -d '{
        [
          {
            "node_id": 1,
            "core": 1
          },
          {
            "node_id": 4,
            "core": 1
          },
          {
            "node_id": 5,
            "core": 0
          }
        ]
      }'

    The request body includes the broker ID and the CPU core (shard ID) for the replica on broker 1, to hydrate the new replicas assigned to brokers 4 and 5 on cores 1 and 0 respectively.

    If there are n CPU cores on the machine, the value of core can be within the range [0, n-1]. You may use a random value within the range, or the least loaded shard. See the public metrics reference for metrics regarding CPU usage, and Redpanda Admin API for additional detail.

Suggested reading