High Availability
Redpanda is designed to ensure data integrity and high availability (HA), even at high-throughput levels. This section explains the trade-offs with different configurations.
Failure scenarios
The following table lists common types of failures and the likely impact on system availability:
Failure | Impact | Mitigation strategy |
---|---|---|
Node failure |
Loss of function for an individual node or for any virtual machine (VM) hosted on that node |
Multi-node cluster |
Rack or switch failure |
Loss of nodes/VMs hosted within that rack, or loss of connectivity to them |
Multi-node cluster spread across multiple racks or network failure domains |
Data center failure |
Loss of nodes/VMs hosted within that data center, or loss of connectivity to them |
Multi-AZ cluster or replicated deployment |
Region failure |
Loss of nodes/VMs hosted within that region, or loss of connectivity to them |
Geo-stretch cluster (latency dependent) or replicated deployment |
Global, systemic outage (DNS failures, routing failures) |
Complete outage for all systems and services impacting customers and staff |
Offline backups (Tiered Storage), replicas in 3rd-party domains |
Data loss or corruption (accidental or malicious) |
Corrupt or unavailable data that also affects synchronous replicas |
Offline backups (Tiered Storage) |
HA deployments
Mission-critical deployments typically employ the following strategies:
Multi-node deployment
Redpanda is designed to be deployed in a cluster, and although clusters with a single node work fine, they aren’t resilient to failure. Adding nodes to a cluster provides a way to handle individual node failure. You can also use rack awareness to assign nodes to different racks, which allows Redpanda to tolerate the loss of a rack or failure domain.
Multi-availability zone deployment
An availability zone (AZ) consists of one or more data centers served by high-bandwidth links with low latency (and typically within a close distance of one another). All AZs have discrete failure domains (power, cooling, fire, and network), but they also have common-cause failure domains, such as catastrophic events, that affect their geographical location. To safeguard against such possibilities, a cluster can be deployed across multiple AZs by configuring each AZ as a rack using rack awareness.
Redpanda’s internal implementation of Raft lets it tolerate losing a minority of replicas for a given topic or for controller groups. For this to translate to a multi-AZ deployment, however, it’s necessary to deploy to at least three AZs (affording the loss of one zone). In a typical multi-AZ deployment, cluster performance is constrained by inter-AZ bandwidth and latency.
Multi-region deployment
A multi-region deployment is similar to a multi-AZ deployment, in that it needs at least three regions to counter the loss of a single region. Note that this deployment strategy increases latency due to the physical distance between regions. Consider the following strategies to mitigate this problem:
-
Manually configure leadership of each partition to ensure that leaders are congregated in the primary region (closest to the producers and consumers).
-
Configure producers to have
acks=1
instead ofacks=all
; however, this introduces the possibility of losing messages if the primary region becomes lost or unavailable.
Multi-cluster deployment
In a multi-cluster deployment, each cluster is configured using one of the other HA deployments, along with standby clusters or Remote Read Replica clusters in one or more remote locations. A standby cluster is a fully functional cluster that can handle producers and consumers. A remote read replica is a read-only cluster that can act as a backup for topics.
To replicate data across clusters in a multi-cluster deployment, use one of the following options:
Alternatively, you could dual-feed clusters in multiple regions. Dual feeding is the process of having producers connect to your cluster across multiple regions. However, this introduces additional complexity onto the producing application. It also requires consumers that have sufficient deduplication logic built in to handle offsets, since they won’t be the same across each cluster.
HA features
Redpanda configurations, as well as configurations of system components like Kubernetes, can differ greatly. Consider the following cluster features for high availability:
Replica synchronization
A cluster’s availability is directly tied to replica synchronization. Brokers can be either leaders or replicas (followers) for a partition. A cluster’s replica brokers must be consistent with the leader to be available for consumers and producers.
-
The leader writes data to the disk. It then dispatches append entry requests to the followers in parallel with the disk flush.
-
The replicas receive messages written to the partition of the leader. They send acknowledgments to the leader after successfully replicating the message to their internal partition.
-
The leader sends an acknowledgment to the producer of the message, as determined by that producer’s
acks
value. Redpanda considers the group consistent after a majority has formed consensus; that is, a majority of participants acknowledged the write.
While Apache Kafka® uses in-sync replicas, Redpanda uses a quorum-based majority with the Raft replication protocol. Kafka performance is negatively impacted when any replica is out-of-sync, but a Redpanda cluster tolerates replica failures without any performance degradation.
Monitor the health of your cluster with the rpk cluster health
command. This tells you how many nodes are down (if any), and if you have any leaderless partitions.
Rack awareness
Rack awareness is one of the most important features for HA. It lets Redpanda spread partition replicas across available brokers in different failure zones.
Make sure you assign separate rack IDs that actually correspond to a physical separation of brokers. |
See also: Rack Awareness
Partition leadership
Raft uses a heartbeat mechanism to maintain leadership authority and to trigger leader elections. The partition leader sends a periodic heartbeat to all followers to assert its leadership. If a follower does not receive a heartbeat over a period of time, then it triggers an election to choose a new partition leader.
Producer acknowledgment
Producer acknowledgment defines how producer clients and broker leaders communicate their status while transferring data. The following acks
values determine producer and broker behavior when writing data to the event bus.
-
acks=0
: The producer doesn’t wait for acknowledgments from the leader and doesn’t retry sending messages. This increases throughput and lowers latency of the system at the expense of durability. -
acks=1
: The producer waits for an acknowledgment from the leader, but it doesn’t wait for the leader to get acknowledgments from replicas. This setting doesn’t prioritize throughput, latency, or durability. Instead,acks=1
attempts to provide a balance between all of them. -
acks=all
: The producer receives an acknowledgment after the leader and the majority of (and therefore, implicitly, all) replicas acknowledge the message. This increases durability at the expense of lower throughput and increased latency.
Partition rebalancing
By default, Redpanda rebalances partition distribution when nodes are added or decommissioned. Continuous Data Balancing additionally rebalances partitions when nodes become unavailable or when disk space usage exceeds a threshold.
See also: Cluster Balancing
Tiered Storage and disaster recovery
In a disaster, your secondary cluster may still be available, but you need to quickly restore the original level of redundancy by bringing up a new primary cluster. In a containerized environment such as Kubernetes, all state is lost from pods that use only local storage. HA deployments with Tiered Storage address both these problems, since it offers long-term data retention and topic recovery.
See also: Tiered Storage