Skip to main content
Version: 22.3

High Availability

Redpanda is designed to ensure data integrity and high availability (HA), even at high-throughput levels. This section explains the trade-offs with different configurations.

Failure scenarios​

The following table lists common types of failures and the likely impact on system availability:

FailureImpactMitigation strategy
Node failureLoss of function for an individual node or for any virtual machine (VM) hosted on that nodeMulti-node cluster
Rack or switch failureLoss of nodes/VMs hosted within that rack, or loss of connectivity to themMulti-node cluster spread across multiple racks or network failure domains
Data center failureLoss of nodes/VMs hosted within that data center, or loss of connectivity to themMulti-AZ cluster or replicated deployment
Region failureLoss of nodes/VMs hosted within that region, or loss of connectivity to themGeo-stretch cluster (latency dependent) or replicated deployment
Global, systemic outage (DNS failures, routing failures)Complete outage for all systems and services impacting customers and staffOffline backups (Tiered Storage), replicas in 3rd-party domains
Data loss or corruption (accidental or malicious)Corrupt or unavailable data that also affects synchronous replicasOffline backups (Tiered Storage)

HA deployments​

Mission-critical deployments typically employ the following strategies:

Multi-node deployment​

Redpanda is designed to be deployed in a cluster, and although clusters with a single node work fine, they aren’t resilient to failure. Adding nodes to a cluster provides a way to handle individual node failure. You can also use rack awareness to assign nodes to different racks, which allows Redpanda to tolerate the loss of a rack or failure domain.

Multi-availability zone deployment​

An availability zone (AZ) consists of one or more data centers served by high-bandwidth links with low latency (and typically within a close distance of one another). All AZs have discrete failure domains (power, cooling, fire, and network), but they also have common-cause failure domains, such as catastrophic events, that affect their geographical location. To safeguard against such possibilities, a cluster can be deployed across multiple AZs by configuring each AZ as a rack using rack awareness.

Redpanda’s internal implementation of Raft lets it tolerate losing a minority of replicas for a given topic or for controller groups. For this to translate to a multi-AZ deployment, however, it’s necessary to deploy to at least three AZs (affording the loss of one zone). In a typical multi-AZ deployment, cluster performance is constrained by inter-AZ bandwidth and latency.

Multi-region deployment​

A multi-region deployment is similar to a multi-AZ deployment, in that it needs at least three regions to counter the loss of a single region. Note that this deployment strategy increases latency due to the physical distance between regions. Consider the following strategies to mitigate this problem:

  • Manually configure leadership of each partition to ensure that leaders are congregated in the primary region (closest to the producers and consumers).
  • Configure producers to have acks=1 instead of acks=all; however, this introduces the possibility of losing messages if the primary region becomes lost or unavailable.

Multi-cluster deployment​

In a multi-cluster deployment, each cluster is configured using one of the other HA deployments, along with standby clusters or Remote Read Replica clusters in one or more remote locations. A standby cluster is a fully functional cluster that can handle producers and consumers. A remote read replica is a read-only cluster that can act as a backup for topics.

To replicate data across clusters in a multi-cluster deployment, use one of the following options:

Alternatively, you could dual-feed clusters in multiple regions. Dual feeding is the process of having producers connect to your cluster across multiple regions. However, this introduces additional complexity onto the producing application. It also requires consumers that have sufficient deduplication logic built in to handle offsets, since they won’t be the same across each cluster.

HA features​

Redpanda configurations, as well as configurations of system components like Kubernetes, can differ greatly. Consider the following cluster features for high availability:

Replica synchronization​

A cluster’s availability is directly tied to replica synchronization. Brokers can be either leaders or replicas (followers) for a partition. A cluster’s replica brokers must be consistent with the leader to be available for consumers and producers.

  1. The leader writes data to the disk. It then dispatches append entry requests to the followers in parallel with the disk flush.
  2. The replicas receive messages written to the partition of the leader. They send acknowledgments to the leader after successfully replicating the message to their internal partition.
  3. The leader sends an acknowledgment to the producer of the message, as determined by that producer’s acks value. Redpanda considers the group consistent after a majority has formed consensus; that is, a majority of participants acknowledged the write.

While Apache Kafka® uses in-sync replicas, Redpanda uses a quorum-based majority with the Raft replication protocol. Kafka performance is negatively impacted when any replica is out-of-sync, but a Redpanda cluster tolerates replica failures without any performance degradation.

Monitor the health of your cluster with the rpk cluster health command. This tells you how many nodes are down (if any), and if you have any leaderless partitions.

Rack awareness​

Rack awareness is one of the most important features for HA. It lets Redpanda spread partition replicas across available brokers in different failure zones.

tip

Make sure you assign separate rack IDs that actually correspond to a physical separation of brokers.

See also: Rack Awareness

Partition leadership​

Raft uses a heartbeat mechanism to maintain leadership authority and to trigger leader elections. The partition leader sends a periodic heartbeat to all followers to assert its leadership. If a follower does not receive a heartbeat over a period of time, then it triggers an election to choose a new partition leader.

Producer acknowledgment​

Producer acknowledgment defines how producer clients and broker leaders communicate their status while transferring data. The following acks values determine producer and broker behavior when writing data to the event bus.

  • acks=0: The producer doesn’t wait for acknowledgments from the leader and doesn’t retry sending messages. This increases throughput and lowers latency of the system at the expense of durability.
  • acks=1: The producer waits for an acknowledgment from the leader, but it doesn’t wait for the leader to get acknowledgments from replicas. This setting doesn’t prioritize throughput, latency, or durability. Instead, acks=1 attempts to provide a balance between all of them.
  • acks=all: The producer receives an acknowledgment after the leader and the majority of (and therefore, implicitly, all) replicas acknowledge the message. This increases durability at the expense of lower throughput and increased latency.

Partition rebalancing​

By default, Redpanda rebalances partition distribution when nodes are added or decommissioned. Continuous Data Balancing additionally rebalances partitions when nodes become unavailable or when disk space usage exceeds a threshold.

See also: Cluster Balancing

Tiered Storage and disaster recovery​

In a disaster, your secondary cluster may still be available, but you need to quickly restore the original level of redundancy by bringing up a new primary cluster. In a containerized environment such as Kubernetes, all state is lost from pods that use only local storage. HA deployments with Tiered Storage address both these problems, since it offers infinite data retention and topic recovery.

See also: Tiered Storage


Suggested reading​

What do you like about this doc?




Optional: Share your email address if we can contact you about your feedback.

Let us know what we do well: