Fast Commission and Decommission Brokers

Tiered Storage gives you the option to boost the speed and reduce the impact of broker operations, particularly when resizing a cluster. For instance, adding a new broker to increase capacity for an overburdened cluster could introduce additional stress as partition replicas are transferred to the new broker. For cloud deployments, commissioning or decommissioning a broker can be a slow and expensive process, especially across multiple availability zones (AZs).

Tiered Storage can help make the cluster resize process faster and more cost-efficient by leveraging data that has already been uploaded to object storage. Instead of transferring all data from local storage to the reassigned replicas, the replicas can instead be initialized to rely more heavily on Tiered Storage for read requests.

To accomplish this, Redpanda takes an on-demand snapshot at an offset that is already uploaded to object storage. This is a later offset compared to the default log start offset managed by Raft. When an empty replica is initialized, the later offset uploaded to object storage can be used as the start offset source. Reads that access data before the uploaded offset can be executed as remote reads. Therefore, the only data that needs to be sent over the network to replicas is the data locally retained after the uploaded offset. This also results in less time taken overall to grow or shrink the cluster.

Configure fast commission and decommission

To activate fast commission and decommission when brokers enter and leave the cluster:

  1. Make sure to configure topics for Tiered Storage.

  2. Configure at least one of the following cluster configuration properties. These properties limit the size of data replicated across brokers to local storage using Raft:

    • initial_retention_local_target_bytes_default: Initial local retention size target for partitions of topics with Tiered Storage enabled. The default is null.

      • Use the initial.retention.local.target.bytes topic configuration property to override on the topic level.

    • initial_retention_local_target_ms_default: Initial local retention time target for partitions of topics with Tiered Storage enabled. The default is null.

      • Use the initial.retention.local.target.ms topic configuration property to override on the topic level.

    If no values are set for the cluster configuration properties, all locally-retained data is delivered by default to the new broker (learner) when joining a partition replica set.

Because topics become more reliant on object storage to serve data, you may experience higher latency reads for data beyond the range of the configured local retention target. Carefully weigh the tradeoffs between faster broker commission/decommission and the increased read latency due to less data being available in local storage.

Monitor fast commission and decommission

Use the following to monitor fast commission and decommission:

  • The vectorized_cluster_partition_start_offset metric on newly-joined brokers should be greater than on existing brokers.

  • Raft protocol logs taking an on-demand snapshot are logged at the debug level, for example:

    DEBUG 2023-11-20 09:19:21,287 [shard 0:raft] raft - [follower: {id: {5}, revision: {51}}] [group_id:33, {kafka/topic-epohgytrdz/32}] - recovery_stm.cc:460 - creating on demand snapshot with last included offset: 2447

    A snapshot request is also logged at the trace level on followers:

    TRACE 2023-11-20 09:19:21,225 [shard 1:raft] raft - [group_id:8, {kafka/topic-epohgytrdz/7}] consensus.cc:2260 - received install_snapshot request: {term: 1, group: 8, target_node_id: {id: {5}, revision: {42}}, node_id: {id: {2}, revision: {30}}, last_included_index: 2269, file_offset: 0, chunk_size: 211, done: true, dirty_offset: 2523}
  • Depending on retention settings, disk usage on a newly-joined broker should generally be much lower than on existing brokers.