High Availability

Redpanda is designed to ensure data integrity and high availability (HA), even at high-throughput levels.

Deployment strategies

Consider the following Redpanda deployment strategies for the most common types of failures.

Failure Impact Mitigation strategy

Broker failure

Loss of function for an individual broker or for any virtual machine (VM) that hosts the broker

Multi-broker deployment

Rack or switch failure

Loss of brokers/VMs hosted within that rack, or loss of connectivity to them

Multi-broker deployment spread across multiple racks or network failure domains

Data center failure

Loss of brokers/VMs hosted within that data center, or loss of connectivity to them

Multi-AZ or replicated deployment

Region failure

Loss of brokers/VMs hosted within that region, or loss of connectivity to them

Geo-stretch (latency dependent) or replicated deployment

Global, systemic outage (DNS failures, routing failures)

Complete outage for all systems and services impacting customers and staff

Offline backups (Tiered Storage), replicas in 3rd-party domains

Data loss or corruption (accidental or malicious)

Corrupt or unavailable data that also affects synchronous replicas

Offline backups (Tiered Storage)

HA deployment options

This section explains the trade-offs with different HA configurations.

Multi-broker deployment

Redpanda is designed to be deployed in a cluster that consists of at least three brokers. Although clusters with a single broker are convenient for development and testing, they aren’t resilient to failure. Adding brokers to a cluster provides a way to handle individual broker failures. You can also use Rack awareness to assign brokers to different racks, which allows Redpanda to tolerate the loss of a rack or failure domain.

Single-AZ deployment

Multi-AZ deployment

An availability zone (AZ) consists of one or more data centers served by high-bandwidth links with low latency (and typically within a close distance of one another). All AZs have discrete failure domains (power, cooling, fire, and network), but they also have common-cause failure domains, such as catastrophic events, that affect their geographical location. To safeguard against such possibilities, a cluster can be deployed across multiple AZs by configuring each AZ as a rack using rack awareness.

Redpanda’s internal implementation of Raft lets it tolerate losing a minority of replicas for a given topic or for controller groups. For this to translate to a multi-AZ deployment, however, it’s necessary to deploy to at least three AZs (affording the loss of one zone). In a typical multi-AZ deployment, cluster performance is constrained by inter-AZ bandwidth and latency.

Multi-AZ deployment

Multi-region deployment

A multi-region deployment is similar to a multi-AZ deployment, in that it needs at least three regions to counter the loss of a single region. Note that this deployment strategy increases latency due to the physical distance between regions. Consider the following strategies to mitigate this problem:

  • Manually configure leadership of each partition to ensure that leaders are congregated in the primary region (closest to the producers and consumers).

  • Configure producers to have acks=1 instead of acks=all; however, this introduces the possibility of losing messages if the primary region becomes lost or unavailable.

Multi-cluster deployment

In a multi-cluster deployment, each cluster is configured using one of the other HA deployments, along with standby clusters or Remote Read Replica clusters in one or more remote locations. A standby cluster is a fully functional cluster that can handle producers and consumers. A remote read replica is a read-only cluster that can act as a backup for topics. To replicate data across clusters in a multi-cluster deployment, use one of the following options:

Alternatively, you could dual-feed clusters in multiple regions. Dual feeding is the process of having producers connect to your cluster across multiple regions. However, this introduces additional complexity onto the producing application. It also requires consumers that have sufficient deduplication logic built in to handle offsets, since they won’t be the same across each cluster.

HA features in Redpanda

Redpanda includes the following high-availability features:

Replica synchronization

A cluster’s availability is directly tied to replica synchronization. Brokers can be either leaders or replicas (followers) for a partition. A cluster’s replica brokers must be consistent with the leader to be available for consumers and producers.

  1. The leader writes data to the disk. It then dispatches append entry requests to the followers in parallel with the disk flush.

  2. The replicas receive messages written to the partition of the leader. They send acknowledgments to the leader after successfully replicating the message to their internal partition.

  3. The leader sends an acknowledgment to the producer of the message, as determined by that producer’s acks value. Redpanda considers the group consistent after a majority has formed consensus; that is, a majority of participants acknowledged the write.

While Apache Kafka® uses in-sync replicas, Redpanda uses a quorum-based majority with the Raft replication protocol. Kafka performance is negatively impacted when any "in-sync" replica is running slower than other replicas in the In-Sync Replica (ISR) set.

Monitor the health of your cluster with the rpk cluster health command, which tells you if any brokers are down, and if you have any leaderless partitions.

Rack awareness

Rack awareness is one of the most important features for HA. It lets Redpanda spread partition replicas across available brokers in different failure zones. Rack awareness ensures that no more than a minority of replicas are placed on a single rack, even during cluster balancing.

Make sure you assign separate rack IDs that actually correspond to a physical separation of brokers.

Partition leadership

Raft uses a heartbeat mechanism to maintain leadership authority and to trigger leader elections. The partition leader sends a periodic heartbeat to all followers to assert its leadership. If a follower does not receive a heartbeat over a period of time, then it triggers an election to choose a new partition leader.

Producer acknowledgment

Producer acknowledgment defines how producer clients and broker leaders communicate their status while transferring data. The acks value determines producer and broker behavior when writing data to the event bus.

Partition rebalancing

By default, Redpanda rebalances partition distribution when brokers are added or decommissioned. Continuous Data Balancing additionally rebalances partitions when brokers become unavailable or when disk space usage exceeds a threshold.

See also: Cluster Balancing

Tiered Storage and disaster recovery

In a disaster, your secondary cluster may still be available, but you need to quickly restore the original level of redundancy by bringing up a new primary cluster. In a containerized environment such as Kubernetes, all state is lost from pods that use only local storage. HA deployments with Tiered Storage address both these problems, since it offers long-term data retention and topic recovery.

See also: Tiered Storage

Single-AZ deployments

When deploying a cluster for high availability into a single AZ or data center, you need to ensure that, within the AZ, single points of failure are minimized and that Redpanda is configured to be aware of any discrete failure domains within the AZ. This is achieved with Redpanda’s rack awareness, which deploys n Redpanda brokers across three or more racks (or failure domains) within the AZ.

Single-AZ deployments in the cloud have less network costs than multi-AZ deployments, and you can leverage resilient power supplies and networking infrastructure within the AZ to mitigate against all but total-AZ failure scenarios. You can balance the benefits of increased availability and fault tolerance against any increase in cost, performance, and complexity:

  • Cost: Redpanda operates the same Raft consensus algorithm whether it’s in HA mode or not. There may be infrastructure costs when deploying across multiple racks, but these are normally amortized across a wider datacenter operations program.

  • Performance: Spreading Redpanda replicas across racks and switches increases the number of network hops between Redpanda brokers; however, normal intra-data center network latency should be measured in microseconds rather than milliseconds. Ensure that there’s sufficient bandwidth between brokers to handle replication traffic.

  • Complexity: A benefit of Redpanda is the simplicity of deployment. Because Redpanda is deployed as a single binary with no external dependencies, it doesn’t need any infrastructure for ZooKeeper or for a Schema Registry. Redpanda also includes cluster balancing, so there’s no need to run Cruise Control.

Single-AZ infrastructure

In a single-AZ deployment, ensure that brokers are spread across at least three failure domains. This generally means separate racks, under separate switches, ideally powered by separate electrical feeds or circuits. Also, ensure that there’s sufficient network bandwidth between brokers, particularly considering shared uplinks, which could be subject to high throughput intra-cluster replication traffic. In an on-premises network, this HA configuration refers to separate racks or data halls within a data center.

Cloud providers support various HA configurations:

  • AWS partition placement groups allow spreading hosts across multiple partitions (or failure domains) within an AZ. The default number of partitions is three, with a maximum of seven. This can be combined with Redpanda’s replication factor setting, so each topic partition replica is guaranteed to be isolated from the impact of hardware failure.

  • Microsoft Azure flexible scale sets let you assign VMs to specific fault domains. Each scale set can have up to five fault domains, depending on your region. Not all VM types support flexible orchestration; for example, Lsv2-series only supports uniform scale sets.

  • Google Cloud instance placement policies let you specify how many availability domains you can have (up to eight) when using the Spread Instance Placement Policy.

    Google Cloud doesn’t divulge which availability domain an instance has been placed into, so you must have an availability domain for each Redpanda broker. Essentially, this isn’t enabled with rack awareness, but it’s the only possibility for clusters with more than three brokers.

You can automate this using Terraform or a similar infrastructure-as-code (IaC) tool. See AWS, Azure, and GCP.

Single-AZ rack awareness

To make Redpanda aware of the topology it’s running on, configure the cluster to enable rack awareness, then configure each broker with the identifier of the rack.

Set the enable_rack_awareness custer property either in /etc/redpanda/.bootstrap.yaml or with rpk:

rpk cluster config set enable_rack_awareness true

For each broker, set the rack ID in /etc/redpanda/redpanda.yaml file or with rpk:

rpk redpanda config set redpanda.rack <rackid>

The modified Ansible playbooks take a per-instance rack variable from the Terraform output and use that to set the relevant cluster and broker configuration. Redpanda deployment automation can provision public cloud infrastructure with discrete failure domains (-var=ha=true) and use the resulting inventory to provision rack-aware clusters using Ansible.

Single-AZ example

The following example deploys an HA cluster into AWS, Azure, or GCP using Terraform and Ansible.

  1. Install all prerequisites, including all Ansible requirements:

    ansible-galaxy install -r ansible/requirements.yml
  2. Initialize a private key, if you haven’t done so already:

    ssh-keygen -f ~/.ssh/id_rsa
  3. Clone the deployment-automation repository:

    git clone https://github.com/redpanda-data/deployment-automation
  4. Initialize Terraform for your cloud provider:

    cd deployment-automation/aws (or cd deployment-automation/azure, or cd deployment-automation/gcp)
    terraform init
  5. Deploy the infrastructure (this assumes you have cloud credentials available):

    terraform apply -var=ha=true
  6. Verify that the racks have been correctly specified in the host.ini file:

    cd ..
    cat hosts.ini
    [redpanda]
    35.166.210.85 ansible_user=ubuntu ansible_become=True private_ip=172.31.7.173 rack=1
    18.237.173.220 ansible_user=ubuntu ansible_become=True private_ip=172.31.2.138 rack=2
    54.218.103.91 ansible_user=ubuntu ansible_become=True private_ip=172.31.2.93 rack=3
  7. Provision the cluster with Ansible:

    ansible-playbook --private-key `cat ~/.ssh/id_rsa.pub | awk '{print $2}'` ansible/playbooks/provision-node.yml -i hosts.ini
  8. Verify that rack awareness is enabled:

    1. Get connection details for the first Redpanda broker from the hosts.ini file:

      grep -A1 '\[redpanda]' hosts.ini

      Example output:

      35.166.210.85 ansible_user=ubuntu ansible_become=True private_ip=172.31.7.173 rack=1
    2. SSH into a cluster host with the username and hostname of that Redpanda broker:

      ssh -i ~/.ssh/id_rsa <username>@<hostname of redpanda broker>
    3. Verify that rack awareness is enabled:

      rpk cluster config get enable_rack_awareness

      Example output:

      true
    4. Check the rack assigned to this specific broker:

      rpk cluster status
      Expected output:
      CLUSTER
      = = = =
      redpanda.807d59af-e033-466a-98c3-bb0be15c255d
      
      BROKERS
      = = = =
      ID HOST PORT RACK
      0* 10.0.1.7 9092 1
      1 10.0.1.4 9092 2
      2 10.0.1.8 9092 3