Safety and Reliability
Safety, reliability, and security are a top priority at Redpanda and an important part of the product development lifecycle. Redpanda continuously performs chaos testing to check for data inconsistency, liveness, and availability issues during adverse events. It checks for losing brokers, network partition or packet drops, or approaching system limits in terms of disk, CPU, network, or memory utilization.
To test and ensure Redpanda Cloud adheres to consistency guarantees, Redpanda has undergone Jepsen validation and testing.
Additionally, the Redpanda Cloud, SRE, and Security teams run periodic game day testing to simulate a failure or event to test systems, processes, and team responses. This game day testing of Redpanda Cloud is designed to verify safety, reliability, observability, and security of features, and to identify any regressions or new gaps in the system, mental models, alerts, or runbooks. The Redpanda Cloud cluster infrastructure is periodically reconciled to prevent state drift from building up and causing incidents.
Redpanda Cloud cluster software artifacts (also known as the meta-package or Install Pack) are packaged and tested together with each release. Install Packs undergo a comprehensive certification process on each cloud provider that Redpanda Cloud supports, and include the testing of upgrades from the latest two Install Pack patch releases.
One output of the Install Pack certification process is a Redpanda configuration for different limits and quotas, tailored to each supported cloud provider, machine, and storage type. These limits and quotas help Redpanda to configure back pressure mechanisms on behalf of customers.