Configure Continuous Data Balancing

This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales.

If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply.

Continuous Data Balancing continuously monitors your node and rack availability and disk usage, dynamically balancing partitions to maintain smooth operations and optimal cluster performance.

Continuous Data Balancing also maintains the configured replication level, even after infrastructure failure. Node availability has the highest priority in data balancing. After a rack (with all nodes belonging to it) becomes unavailable, Redpanda moves partition replicas to the remaining nodes. This violates the rack awareness constraint. After the rack (or a replacement rack) becomes available, Redpanda repairs the constraint by moving excess replicas from racks that have more than one replica to the newly-available rack.

After reading this page, you will be able to:

  • Enable Continuous Data Balancing on a Redpanda cluster

  • Check data balancing status using rpk

  • Cancel partition balancing moves for a specific node

Set Continuous Data Balancing properties

To enable Continuous Data Balancing, set the partition_autobalancing_mode property to continuous. Customize the following properties to monitor node availability and disk usage.

Property Description

partition_autobalancing_node_availability_timeout_sec

When a node is unreachable for the specified amount of time, Redpanda acts as if the node had been decommissioned: rebalancing begins, re-creating all of its replicas on other nodes in the cluster.

The node remains part of the cluster and can rejoin when it comes back online. A node that was actually decommissioned is removed from the cluster.

Default is 900 seconds (15 minutes).

partition_autobalancing_node_autodecommission_timeout_sec

When a node is unavailable for this timeout duration, Redpanda automatically and permanently decommissions the node. This property only applies when partition_autobalancing_mode is set to continuous. Unlike partition_autobalancing_node_availability_timeout_sec, which moves partitions while keeping the node in the cluster, this property removes the node from the cluster entirely. A decommissioned node cannot rejoin the cluster.

Only one node is decommissioned at a time. If a decommission is already in progress, automatic decommission does not trigger until it completes. If the decommission stalls (for example, because the node holds the only replica of a partition), manual intervention is required. See Node-wise Partition Recovery.

By default, this property is null and automatic decommission is disabled.

partition_autobalancing_max_disk_usage_percent

When a node fills up to this disk usage percentage, Redpanda starts moving replicas off the node to other nodes with disk utilization below the percentage.

Default is 80%.

For the other partition_autobalancing_mode options, see Cluster balancing.

Use data balancing commands

Use the following rpk commands to monitor and control data balancing.

Check data balancing status

To see the status, run:

rpk cluster partitions balancer-status

This shows the time since the last data balancing, the number of replica movements in progress, the nodes that are unavailable, and the nodes that are over the disk space threshold (default = 80%).

It also returns a data balancing status: off, ready, starting, in-progress, or stalled. If the command reports a stalled status, verify:

  • Are there enough healthy nodes? For example, in a three node cluster, no movements are possible for partitions with three replicas.

  • Does the cluster have sufficient space? Partitions are not moved if all nodes in the cluster are utilizing more than their disk space threshold.

  • Do all partitions have quorum? Partitions are not moved if the majority of its replicas are down.

  • Are any nodes in maintenance mode? Partitions are not moved if a node is in maintenance mode.

Cancel data balancing moves

To cancel the current partition balancing moves, run:

rpk cluster partitions movement-cancel

To cancel partition moves on a specific node, use the --node flag. For example:

rpk cluster partitions movement-cancel --node 1
If continuous balancing is still enabled and the cluster remains unbalanced, Redpanda schedules another partition balancing round. To stop all balancing, first set partition_autobalancing_mode to off, then cancel the current data balancing moves.