# Cluster Balancing

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: Cluster Balancing
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "true"
page-eol-date: December 22, 2024
latest-console-tag: v3.7.3
latest-connect-version: 4.93.0
docname: cluster-maintenance/cluster-balancing
page-component-name: streaming
page-version: "23.3"
page-component-version: "23.3"
page-component-title: Streaming
page-relative-src-path: cluster-maintenance/cluster-balancing.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/v/23.3/modules/manage/pages/cluster-maintenance/cluster-balancing.adoc
description: Learn about the different tools Redpanda provides for balanced clusters.
page-git-created-date: "2023-06-02"
page-git-modified-date: "2024-04-02"
support-status: past end-of-life
---

<!-- Source: https://docs.redpanda.com/streaming/23.3/manage/cluster-maintenance/cluster-balancing.md -->

When a topic is created, Redpanda evenly distributes its partitions by sequentially allocating them to the cluster broker with the least number of partitions. For existing topics, Redpanda automatically provides leadership balancing and partition rebalancing when brokers are added or decommissioned.

With an Enterprise license, you can additionally enable _Continuous Data Balancing_ to continuously monitor broker and rack availability and disk usage. This enables self-healing clusters that dynamically balance partitions. It also continuously maintains adherence to rack-aware replica placement policy and self-heals after rack (or availability zone) failure or replacement. See [Configure Continuous Data Balancing](https://docs.redpanda.com/streaming/23.3/manage/cluster-maintenance/continuous-data-balancing/).

Cluster balancing protects you from unbalanced systems that saturate resources on one or more brokers. This can affect throughput and latency. Furthermore, a cluster with replicas on a down broker risks availability loss if more brokers fail, and a cluster that keeps losing brokers without healing eventually risks data loss.

The following table summarizes the various balancing options in Redpanda:

| Balancer | License | Description |
| --- | --- | --- |
| Partition leadership balancing | Community Edition, Enterprise Edition | Topic-aware balancer designed to distribute partition leadership across available nodes. This helps avoid topic leadership hotspots on one or a few specific nodes in your cluster. |
| Partition replica balancing | Community Edition, Enterprise Edition | Topic-aware balancer designed to distribute partition replicas when adding nodes to the cluster. This helps avoid topic replica hotspots on one or a few specific nodes in your cluster. |
| Continuous Data Balancing | Enterprise Edition | Balancer designed to monitor free disk space and proactively move data between nodes to avoid exhausting available disk space on any given node. This balancer does not keep the relative fullness of each node within a defined range, it just prevents hitting the fullness threshold of each individual node. |

## [](#partition-leadership-balancing)Partition leadership balancing

Every Redpanda topic-partition forms a Raft group with a single elected leader. All reads and writes are handled by the partition leader. Raft uses a heartbeat mechanism to maintain leadership authority and to trigger leader elections. The partition leader sends a periodic heartbeat ([`raft_heartbeat_interval_ms`](https://docs.redpanda.com/streaming/23.3/reference/tunable-properties/#raft_heartbeat_interval_ms)) to all followers to assert its leadership. If a follower does not receive a heartbeat within a timeout duration ([`raft_heartbeat_timeout_ms`](https://docs.redpanda.com/streaming/23.3/reference/tunable-properties/#raft_heartbeat_timeout_ms)), then it triggers an election to choose a new partition leader. For more information, see [Raft consensus algorithm](https://docs.redpanda.com/streaming/23.3/get-started/architecture/#raft-consensus-algorithm) and [partition leadership elections](https://docs.redpanda.com/streaming/23.3/get-started/architecture/#partition-leadership-elections).

Leadership balancing is enabled by default with the [`enable_leader_balancer`](https://docs.redpanda.com/streaming/23.3/reference/cluster-properties/#enable_leader_balancer) property. Automatic partition leadership balancing improves cluster performance by transferring the leadership of a broker’s partitions to other replicas. It changes where data is read from and written to first, but leadership transfer does not involve any data transfer.

The [`leader_balancer_mode`](https://docs.redpanda.com/streaming/23.3/reference/cluster-properties/#leader_balancer_mode) property ensures that each shard in a cluster has an equal number of partitions. It determines the movement of leadership for the replica set of a partition. It supports two modes:

-   `random_hill_climbing`: This mode randomly searches for potential leadership movements. If one is found that better balances the number of leaders per shard and the leaders of a given topic per broker, then the movement is applied to the cluster. This is the default.

-   `greedy_balanced_shards`: This mode uses a heuristic to search for leadership movements that better balance leaders per shard. It applies any movement it finds.


> 📝 **NOTE**
>
> In addition to the periodic heartbeat, leadership balancing can also occur when a [broker restarts](https://docs.redpanda.com/streaming/23.3/upgrade/rolling-upgrade/#impact-of-broker-restarts) or when the controller leader changes (such as when a controller partition changes leader). The _controller leader_ manages the entire cluster. For example, when a broker is decommissioned, the controller leader creates a reallocation plan for all partition replicas allocated to that broker. The _partition leader_ then handles the reconfiguration for its Raft group.

### [](#manually-change-leadership)Manually change leadership

Despite an even distribution of leaders, sometimes the write pattern is not even across topics, and a set of traffic-heavy partitions could land on one broker and cause a latency spike. For information about metrics to monitor, see [Partition health](https://docs.redpanda.com/streaming/23.3/manage/monitoring/#partition-health).

To manually change leadership, use the Admin API:

```bash
curl -X POST http://<broker_address>:9644/v1/partitions/kafka/<topic>/<partition>/transfer_leadership?target=<destination-broker-id>
```

For example, to change leadership to broker 2 for partition 0 on topic `test`:

```bash
curl -X POST "http://localhost:9644/v1/partitions/kafka/test/0/transfer_leadership?target=2"
```

> 📝 **NOTE**
>
> In Kubernetes, run the `transfer_leadership` request on the Pod that is running the current partition leader.

## [](#partition-replica-balancing)Partition replica balancing

While leadership balancing doesn’t move any data, partition balancing does move partition replicas to alleviate disk pressure and to honor the configured replication factor across brokers and the additional redundancy across failure domains (such as racks). Depending on the amount of data being transferred, this may take some time. Partition balancing is invoked periodically, determined by the [`partition_autobalancing_tick_interval_ms`](https://docs.redpanda.com/streaming/23.3/reference/tunable-properties/#partition_autobalancing_tick_interval_ms) property.

For predictable and stable performance, Redpanda ensures that a topic’s partitions are evenly distributed across all brokers in a cluster. It allocates partitions to random healthy brokers, to avoid topic hotspots, without needing to wait for a batch of moves to finish before it schedules the next batch.

Redpanda supports flexible use of network bandwidth for replicating under-replicated partitions. For example, if only one partition is moving, it can use the entire bandwidth for the broker. Redpanda detects which shards are idle, so other shards can essentially steal bandwidth from them. Total bandwidth is controlled by the [`raft_learner_recovery_rate`](https://docs.redpanda.com/streaming/23.3/reference/cluster-properties/#raft_learner_recovery_rate) property.

Redpanda’s default partition balancing includes the following:

-   When a broker is added to the cluster, some replicas are moved from other brokers to the new broker to take advantage of the additional capacity.

-   When a broker is down for a configured timeout, existing online replicas are used to construct a replacement replica on a new broker.

-   When a broker’s free storage space drops below its low disk space threshold, some of the replicas from the broker with low disk space are moved to other brokers.


Monitoring unavailable brokers lets Redpanda self-heal clusters by moving partitions from a failed broker to a healthy broker. Monitoring low disk space lets Redpanda distribute partitions across brokers with enough disk space. If free disk space reaches a critically low level, Redpanda blocks clients from producing. For information about the disk space threshold and alert, see [Handle full disks](https://docs.redpanda.com/streaming/23.3/manage/cluster-maintenance/disk-utilization/#handle-full-disks).

### [](#partition-balancing-settings)Partition balancing settings

Select your partition balancing setting with the [`partition_autobalancing_mode`](https://docs.redpanda.com/streaming/23.3/reference/cluster-properties/#partition_autobalancing_mode) property.

| Setting | Description |
| --- | --- |
| node_add | Partition balancing happens when brokers (nodes) are added. To avoid hotspots, Redpanda allocates brokers to random healthy brokers.This is the default setting. |
| continuous | In this mode, Redpanda continuously monitors the cluster for broker failures and high disk usage. It uses this information to automatically redistribute partitions across the cluster to maintain optimal performance and availability. It also monitors rack availability after failures, and for a given partition, it tries to move excess replicas from racks that have more than one replica to racks where there are none. See Configure Continuous Data Balancing.This option requires an Enterprise license. |
| off | All partition balancing from Redpanda is turned off.This mode is not recommended for production clusters. Only set to off if you need to move partitions manually. |

## [](#manually-move-partitions)Manually move partitions

As an alternative to Redpanda partition balancing, you can change partition assignments explicitly with `rpk cluster partitions move`.

To reassign partitions with `rpk`:

1.  Set the `partition_autobalancing_mode` property to `off`. If Redpanda partition balancing is enabled, Redpanda may change partition assignments regardless of what you do with `rpk`.

    ```bash
    rpk cluster config set partition_autobalancing_mode off
    ```

2.  Show initial replica sets. For example, for topic `test`:

    ```bash
    rpk topic describe test -p
    PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
    0          1       1      [1 2 3]   0                 645
    1          1       1      [0 1 2]   0                 682
    2          3       1      [0 1 3]   0                 672
    ```

3.  Change partition assignments. For example, to change the replica set of partition 1 from `[0 1 2]` to `[3 1 2]`, and to change the replica set of partition 2 from `[0 1 3]` to `[2 1 3]`, run:

    ```bash
    rpk cluster partitions move test -p 1:3,1,2 -p 2:2,1,3
    NAMESPACE  TOPIC  PARTITION  OLD-REPLICAS     NEW-REPLICAS      ERROR
    kafka      test   1          [0-1, 1-1, 2-0]  [1-1, 2-0, 3-0]
    kafka      test   2          [0-0, 1-0, 3-1]  [1-0, 2-0, 3-1]

    Successfully began 2 partition movement(s).

    Check the movement status with 'rpk cluster partitions move-status' or see new assignments with 'rpk topic describe -p TOPIC'.
    ```

    or

    ```bash
    rpk cluster partitions move -p test/1:3,1,2 -p test/2:2,1,3
    ```

4.  Verify that the reassignment is complete with `move-status`:

    ```bash
    rpk cluster partitions move-status
    ONGOING PARTITION MOVEMENTS
    ===========================
    NAMESPACE-TOPIC  PARTITION  MOVING-FROM  MOVING-TO  COMPLETION-%  PARTITION-SIZE  BYTES-MOVED  BYTES-REMAINING
    kafka/test       1          [0 1 2]      [1 2 3]    57            87369012        50426326     36942686
    kafka/test       2          [0 1 3]      [1 2 3]    52            83407045        43817575     39589470
    ```

    Alternatively, run `rpk topic describe` again to show your reassigned replica sets:

    ```bash
    rpk topic describe test -p
    PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
    0          1       2      [1 2 3]   0                 645
    1          1       2      [1 2 3]   0                 682
    2          3       1      [1 2 3]   0                 672
    ```

    To cancel all in-progress partition reassignments, run `move-cancel`:

    ```bash
    rpk cluster partitions move-cancel
    ```

    To cancel specific movements to or from a given node, run:

    ```bash
    rpk cluster partitions move-cancel --node 2
    ```


> 📝 **NOTE**
>
> If you prefer, Redpanda also supports the use of the `AlterPartitionAssignments` Kafka API and using standard kafka tools such as `kafka-reassign-partitions.sh`.

## [](#differences-in-partition-balancing-between-redpanda-and-kafka)Differences in partition balancing between Redpanda and Kafka

-   In a partition reassignment, you must provide the broker ID for each replica. Kafka validates the broker ID for any new replica that wasn’t in the previous replica set against the list of alive brokers. Redpanda validates all replicas against the list of alive brokers.

-   When there are two identical partition reassignment requests, Kafka cancels the first one without returning an error code, while Redpanda rejects the second one with `Partition configuration update in progress` or `update_in_progress`.

-   In Kafka, attempts to add partitions to a topic during in-progress reassignments result in a `reassignment_in_progress` error, while Redpanda successfully adds partitions to the topic.

-   Kafka doesn’t support shard-level (core) partition assignments, but Redpanda does. For help specifying a shard for partition assignments, see `rpk cluster partitions move --help`.


## [](#assign-partitions-at-topic-creation)Assign partitions at topic creation

To manually assign partitions at topic creation, run:

```bash
kafka-topics.sh --create --bootstrap-server 127.0.0.1:9092 --topic custom-assignment --replica-assignment 0:1:2,0:1:2,0:1:2
```

## Suggested labs

-   [Redpanda Iceberg Docker Compose Example](https://docs.redpanda.com/labs/docker-compose/iceberg/)
-   [Enable Unified Identity with Azure Entra ID for Redpanda and Redpanda Console](https://docs.redpanda.com/labs/docker-compose/oidc/)
-   [Owl Shop Example Application in Docker](https://docs.redpanda.com/labs/docker-compose/owl-shop/)
-   [Migrate Data with Redpanda Migrator](https://docs.redpanda.com/labs/docker-compose/redpanda-migrator/)
-   [Start a Single Redpanda Broker with Redpanda Console in Docker](https://docs.redpanda.com/labs/docker-compose/single-broker/)
-   [Start a Cluster of Redpanda Brokers with Redpanda Console in Docker](https://docs.redpanda.com/labs/docker-compose/three-brokers/)
-   [Iceberg Streaming on Kubernetes with Redpanda, MinIO, and Spark](https://docs.redpanda.com/labs/kubernetes/iceberg/)

See more

[Search all labs](https://docs.redpanda.com/labs)