Skip to main content
Version: 22.2

Configuring Continuous Data Balancing

info

This feature requires an Enterprise license. To upgrade, contact Redpanda sales.

Continuous Data Balancing continuously monitors your node availability and disk usage. This enables self-healing clusters that dynamically balance partitions, ensuring smooth operations and optimal cluster performance.

Set Continuous Data Balancing properties

To enable Continuous Data Balancing, set the partition_autobalancing_mode property to continuous. You can then customize properties for monitoring your node availability and disk usage.

PropertyDescription
partition_autobalancing_node_availability_timeout_secWhen a node is unreachable for the specified amount of time, Redpanda acts as if the node had been decommissioned: rebalancing begins, re-creating all of its replicas on other nodes in the cluster.

Note: The node remains part of the cluster, and it can re-join when it comes back online. A node that was actually decommissioned is removed from the cluster.

Default is 900 seconds (15 minutes).
partition_autobalancing_max_disk_usage_percentWhen a node fills up to this disk usage percentage, Redpanda starts moving replicas off the node to other nodes with disk utilization below the percentage.

Default is 80%.

For information about other modes with partition_autobalancing_mode, see Cluster Balancing.

Use Data Balancing commands

Check data balancing status

To see the status, run:

rpk cluster partitions balancer-status

This shows the time since the last data balancing, the number of replica movements in progress, the nodes that are unavailable, and the nodes that are over the disk space threshold (default = 80%).

It also returns a data balancing status: off, ready, starting, in-progress, or stalled. If the command reports a stalled status, check the following:

  • Are there are enough healthy nodes? For example, in a three node cluster, no movements are possible for partitions with three replicas.
  • Does the cluster have sufficient space? Partitions are not moved if all nodes in the cluster are utilizing more than their disk space threshold.
  • Do all partitions have quorum? Partitions are not moved if the majority of its replicas are down.
  • Are any nodes in maintenance mode? Partitions are not moved if a node is in maintenance mode.

Cancel data balancing moves

To cancel the current partition balancing moves, run:

rpk cluster partitions movement-cancel

To cancel the partition moves in a specific node, add --node. For example:

rpk cluster partitions movement-cancel --node 1

note

If continuous balancing hasn't been turned off, and if the system is still unbalanced, then it schedules another partition balancing. To stop all balancing, first set partition_autobalancing_mode to off. Then cancel current data balancing moves.