Docs Self-Managed Deploy Linux Sizing Use Cases This is documentation for Self-Managed v24.1. To view the latest available version of the docs, see v24.2. Sizing Use Cases The following scenarios provide estimates and advice for sizing Redpanda clusters for different throughput and retention use cases in your data center and in object storage. For details about sizing considerations, see Sizing Guidelines. These use cases assume a happy path with known metrics and expected outputs, but many other factors can influence performance, such as batch size and other sources of network traffic. Low throughput Metric Value Producer throughput 75 MB/sec (600 Mbps) Producer rate 300 messages per second Consumer throughput 75 MB/sec (600 Mbps) Consumer rate 300 messages per second Data retention 3 days Average message size 250 KB Failure tolerance 1 node In this use case, despite the relatively low throughput of 150 MB/sec (producer plus consumer), it’s important to calculate the expected bandwidth utilization and to use a network testing tool like iPerf to verify that the bandwidth is available and sustainable. With a single topic with a replication factor of three, producing 75 MB/sec generates an additional 150 MB/sec of data transmitted over the network for replication, and it generates a further 75 MB/sec for the consumers. The 150 MB/sec of bandwidth for replication is full duplex (where each byte sent by a broker is received by some other broker). The 75 MB/sec producer and consumer flows, however, are half-duplex, because the client endpoint in each case is outside of the cluster. Therefore, the intra-cluster bandwidth is 225 MB for incoming and outgoing flows: 150 MB/sec of intra-cluster full duplex bandwidth 75 MB/sec of ingress from producers 75 MB/sec of egress to consumers Three nodes satisfy Redpanda’s minimum deployment requirement (so Raft can form quorums) and also the single node failure tolerance. Divide the bandwidth total by the node count (3) to get the per-node bandwidth requirements. The throughput is not high enough to warrant any more than two cores and a single NVMe SSD disk. Be mindful of predicted growth of CPU and disk usage, and estimate when the cluster might need to scale up or scale out. With an average producer throughput of 75 MB/sec and a replication factor of three, each node writes 254 GB of data each hour and 6.4 TB of data each day. For three days of data retention, each node needs at least 20 TB of storage. This assumes that each node could be a leader or a follower, and there are a sufficient number of partitions for good distribution. A typical node is the leader for 1/Nth of the partitions in a cluster with N nodes and a follower for 2/Nths of the partitions. However, the per node bandwidth could vary if distribution is uneven. You may have an inexact distribution of load during Redpanda partition balancing or when the client library doesn’t write to each partition evenly. The following machine specifications provide a minimum for a bare metal cluster or its cloud-based equivalent. Bare Metal AWS GCP Azure Instance Type - m5.large n2-standard-2 F2s_v2 Nodes 3 3 3 3 Cores 2 2 2 2 Memory 4 GB 8 GB 8 GB 4 GB Instance Storage 20 TB (NVMe) - - 16 GB (SSD) Persistent Storage - 20 TB (gb3) 20 TB (Zonal SSD PD) 20 TB (Standard SSD) Network 4 Gbps Up to 10 Gbps 10 Gbps 5 Gbps Tiered Storage False False False False Medium throughput Metric Value Producer throughput ~500 MB/sec (~4,000 Mbps) Producer rate 2,000 messages per second Consumer throughput ~1,000 MB/sec (~8,000 Mbps) Consumer rate 4,000 messages per second Data retention 24 hours Average message size 250 KB Failure tolerance 1 node Producing an average of 500 MB/sec and consuming an average of 1,000 MB/sec equates to 2,500 MB/sec (20 Gbps) of network bandwidth for replication traffic. This is attainable but expensive with cloud providers, and these speeds are not as prevalent within a typical data center. With at least one partition for each core, the 500 MB/sec of data from producers is evenly distributed between the nodes. For example, with three nodes, each node receives approximately 167 MB/sec. However, that bandwidth value increases with data replication. Producer MB/sec Consumer MB/sec Avg. Replication Factor Nodes Writes per node MB/sec Reads per node MB/sec 500 1,500 3 3 500/3 * 3 = 500 1500/3 = 500 500 1,500 3 5 500/5 * 3 = 300 1500/5 = 300 500 1,500 3 7 500/7 * 3 = 215 1500/7 = 215 500 1,500 5 7 500/7 * 5 = 358 1500/7 = 215 The additional 500 MB/sec for consumer throughput is for Tiered Storage and the bandwidth required to archive log segments to object storage. When Tiered Storage is enabled on a topic, it essentially adds another consumer’s worth of bandwidth on the network. To balance the available local disk, consider exactly how many reads can be serviced from local storage. Different instance types or locally attached NVMe SSD disks provide different amounts of local storage, and therefore different amounts of available data without going back to object storage. A topic with Tiered Storage enabled can write data to faster local storage managed by local retention settings, and at the same time, it can write data to object storage managed by different retention settings, or left to grow for a longer period. Consumers that generally keep up with producers stream from local storage, but at this velocity that window of opportunity is narrower. The object store enables a consumer to read from an older offset when necessary. Bare Metal AWS GCP Azure Instance Type - i3en.6xlarge n2-standard-32 F48s_v2 Nodes 3 3 3 3 Cores 24 24 32 48 Memory 192 GB 192 GB 128 GB 96 GB Instance Storage 30 TB (NVMe) 15 TB (NVM3) 9 TB (SSD) 384 TB (SSD) Persistent Storage - - - 20 TB (Standard SSD) Available Local Retention 17 hrs 8 hrs 5 hrs 9 days Network 25 Gbps 25 Gbps 32 Gbps 21 Gbps Tiered Storage True True True True High throughput Metric Value Producer throughput 1,000 MB/sec (8,000 Mbps) Producer rate 4,000 messages per second Consumer throughput 2,000 MB/sec (16,000 Mbps) Consumer rate 8,000 messages per second Data retention 24 hours Average message size 250 KB Failure tolerance 2 nodes This use case has many topics, hundreds of partitions, and a high throughput. The combined producer and replication data equates to 8 Gbps of network traffic, plus 16 Gbps for the consumers and 8 Gbps for Tiered Storage. In total, that’s at least 32 Gbps of network bandwidth required to sustain this level of throughput. Writing at 1,000 MB/sec is near the upper limit of what a single NVMe disk can sustain. At this scale, you get significant performance gains by distributing the writes over many cores and disks to better leverage Redpanda’s thread-per-core model. For example, given five nodes with 24 cores each, start with at least one partition for each core (120 partitions in total) and scale up. Redpanda generates over 3 TB of writes each hour and over 80 TB each day. Local storage is going to fill up quickly, and the window of opportunity for consumers to read from local storage is going to be shorter than in the other scenarios. In this use case, Tiered Storage is essential. Bare Metal AWS GCP Azure Instance Type - i3en.12xlarge n2-standard-48 F48s_v2 Nodes 5 5 5 5 Cores 24 48 48 48 Memory 192 GB 384 GB 192 GB 96 GB Instance Storage 30 TB (NVMe) 30 TB (NVM3) 9 TB (SSD) 384 TB (SSD) Persistent Storage - - - 30 TB (Ultra SSD) Available Local Retention 14 hrs 7 hrs 4 hrs 7 days Network 25 Gbps 25 Gbps 32 Gbps 21 Gbps Tiered Storage True True True True Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution High Availability Sizing Guidelines