Sizing Use Cases

The following scenarios provide estimates and advice for sizing Redpanda clusters for different throughput and retention use cases in your data center and in object storage. For details about sizing considerations, see Sizing Guidelines.

These use cases assume a happy path with known metrics and expected outputs, but many other factors can influence performance, such as batch size and other sources of network traffic.

Low throughput

Metric	Value
Producer throughput	75 MB/sec (600 Mbps)
Producer rate	300 messages per second
Consumer throughput	75 MB/sec (600 Mbps)
Consumer rate	300 messages per second
Data retention	3 days
Average message size	250 KB
Failure tolerance	1 node

Metric

Value

Producer throughput

75 MB/sec (600 Mbps)

Producer rate

300 messages per second

Consumer throughput

75 MB/sec (600 Mbps)

Consumer rate

300 messages per second

Data retention

3 days

Average message size

250 KB

Failure tolerance

1 node

In this use case, despite the relatively low throughput of 150 MB/sec (producer plus consumer), it’s important to calculate the expected bandwidth utilization and to use a network testing tool like iPerf to verify that the bandwidth is available and sustainable. With a single topic with a replication factor of three, producing 75 MB/sec generates an additional 150 MB/sec of data transmitted over the network for replication, and it generates a further 75 MB/sec for the consumers.

The 150 MB/sec of bandwidth for replication is full duplex (where each byte sent by a broker is received by some other broker). The 75 MB/sec producer and consumer flows, however, are half-duplex, because the client endpoint in each case is outside of the cluster. Therefore, the intra-cluster bandwidth is 225 MB for incoming and outgoing flows:

150 MB/sec of intra-cluster full duplex bandwidth
75 MB/sec of ingress from producers
75 MB/sec of egress to consumers

Three nodes satisfy Redpanda’s minimum deployment requirement (so Raft can form quorums) and also the single node failure tolerance. Divide the bandwidth total by the node count (3) to get the per-node bandwidth requirements. The throughput is not high enough to warrant any more than two cores and a single NVMe SSD disk. Be mindful of predicted growth of CPU and disk usage, and estimate when the cluster might need to scale up or scale out.

With an average producer throughput of 75 MB/sec and a replication factor of three, each node writes 254 GB of data each hour and 6.4 TB of data each day. For three days of data retention, each node needs at least 20 TB of storage.

This assumes that each node could be a leader or a follower, and there are a sufficient number of partitions for good distribution. A typical node is the leader for 1/Nth of the partitions in a cluster with N nodes and a follower for 2/Nths of the partitions. However, the per node bandwidth could vary if distribution is uneven. You may have an inexact distribution of load during Redpanda partition balancing or when the client library doesn’t write to each partition evenly.

The following machine specifications provide a minimum for a bare metal cluster or its cloud-based equivalent.

	Bare Metal	AWS	GCP	Azure
Instance Type	-	m5.large	n2-standard-2	F2s_v2
Nodes	3	3	3	3
Cores	2	2	2	2
Memory	4 GB	8 GB	8 GB	4 GB
Instance Storage	20 TB (NVMe)	-	-	16 GB (SSD)
Persistent Storage	-	20 TB (gb3)	20 TB (Zonal SSD PD)	20 TB (Standard SSD)
Network	4 Gbps	Up to 10 Gbps	10 Gbps	5 Gbps
Tiered Storage	False	False	False	False

Bare Metal

AWS

GCP

Azure

Instance Type

m5.large

n2-standard-2

F2s_v2

Nodes

Cores

Memory

4 GB

8 GB

4 GB

Instance Storage

20 TB (NVMe)

16 GB (SSD)

Persistent Storage

20 TB (gb3)

20 TB (Zonal SSD PD)

20 TB (Standard SSD)

Network

4 Gbps

Up to 10 Gbps

10 Gbps

5 Gbps

Tiered Storage

False

Medium throughput

Metric	Value
Producer throughput	~500 MB/sec (~4,000 Mbps)
Producer rate	2,000 messages per second
Consumer throughput	~1,000 MB/sec (~8,000 Mbps)
Consumer rate	4,000 messages per second
Data retention	24 hours
Average message size	250 KB
Failure tolerance	1 node

Metric

Value

Producer throughput

~500 MB/sec (~4,000 Mbps)

Producer rate

2,000 messages per second

Consumer throughput

~1,000 MB/sec (~8,000 Mbps)

Consumer rate

4,000 messages per second

Data retention

24 hours

Average message size

250 KB

Failure tolerance

1 node

Producing an average of 500 MB/sec and consuming an average of 1,000 MB/sec equates to 2,500 MB/sec (20 Gbps) of network bandwidth for replication traffic. This is attainable but expensive with cloud providers, and these speeds are not as prevalent within a typical data center.

With at least one partition for each core, the 500 MB/sec of data from producers is evenly distributed between the nodes. For example, with three nodes, each node receives approximately 167 MB/sec. However, that bandwidth value increases with data replication.

Producer MB/sec	Consumer MB/sec	Avg. Replication Factor	Nodes	Writes per node MB/sec	Reads per node MB/sec
500	1,500	3	3	500/3 * 3 = 500	1500/3 = 500
500	1,500	3	5	500/5 * 3 = 300	1500/5 = 300
500	1,500	3	7	500/7 * 3 = 215	1500/7 = 215
500	1,500	5	7	500/7 * 5 = 358	1500/7 = 215

The additional 500 MB/sec for consumer throughput is for Tiered Storage and the bandwidth required to archive log segments to object storage. When Tiered Storage is enabled on a topic, it essentially adds another consumer’s worth of bandwidth on the network.

To balance the available local disk, consider exactly how many reads can be serviced from local storage. Different instance types or locally attached NVMe SSD disks provide different amounts of local storage, and therefore different amounts of available data without going back to object storage.

A topic with Tiered Storage enabled can write data to faster local storage managed by local retention settings, and at the same time, it can write data to object storage managed by different retention settings, or left to grow for a longer period. Consumers that generally keep up with producers stream from local storage, but at this velocity that window of opportunity is narrower. The object store enables a consumer to read from an older offset when necessary.

	Bare Metal	AWS	GCP	Azure
Instance Type	-	i3en.6xlarge	n2-standard-32	F48s_v2
Nodes	3	3	3	3
Cores	24	24	32	48
Memory	192 GB	192 GB	128 GB	96 GB
Instance Storage	30 TB (NVMe)	15 TB (NVM3)	9 TB (SSD)	384 TB (SSD)
Persistent Storage	-	-	-	20 TB (Standard SSD)
Available Local Retention	17 hrs	8 hrs	5 hrs	9 days
Network	25 Gbps	25 Gbps	32 Gbps	21 Gbps
Tiered Storage	True	True	True	True

Bare Metal

AWS

GCP

Azure

Instance Type

i3en.6xlarge

n2-standard-32

F48s_v2

Nodes

Cores

Memory

192 GB

128 GB

96 GB

Instance Storage

30 TB (NVMe)

15 TB (NVM3)

9 TB (SSD)

384 TB (SSD)

Persistent Storage

20 TB (Standard SSD)

Available Local Retention

17 hrs

8 hrs

5 hrs

9 days

Network

25 Gbps

32 Gbps

21 Gbps

Tiered Storage

True

High throughput

Metric	Value
Producer throughput	1,000 MB/sec (8,000 Mbps)
Producer rate	4,000 messages per second
Consumer throughput	2,000 MB/sec (16,000 Mbps)
Consumer rate	8,000 messages per second
Data retention	24 hours
Average message size	250 KB
Failure tolerance	2 nodes

Metric

Value

Producer throughput

1,000 MB/sec (8,000 Mbps)

Producer rate

4,000 messages per second

Consumer throughput

2,000 MB/sec (16,000 Mbps)

Consumer rate

8,000 messages per second

Data retention

24 hours

Average message size

250 KB

Failure tolerance

2 nodes

This use case has many topics, hundreds of partitions, and a high throughput. The combined producer and replication data equates to 8 Gbps of network traffic, plus 16 Gbps for the consumers and 8 Gbps for Tiered Storage. In total, that’s at least 32 Gbps of network bandwidth required to sustain this level of throughput. Writing at 1,000 MB/sec is near the upper limit of what a single NVMe disk can sustain.

At this scale, you get significant performance gains by distributing the writes over many cores and disks to better leverage Redpanda’s thread-per-core model. For example, given five nodes with 24 cores each, start with at least one partition for each core (120 partitions in total) and scale up. Redpanda generates over 3 TB of writes each hour and over 80 TB each day. Local storage is going to fill up quickly, and the window of opportunity for consumers to read from local storage is going to be shorter than in the other scenarios. In this use case, Tiered Storage is essential.

	Bare Metal	AWS	GCP	Azure
Instance Type	-	i3en.12xlarge	n2-standard-48	F48s_v2
Nodes	5	5	5	5
Cores	24	48	48	48
Memory	192 GB	384 GB	192 GB	96 GB
Instance Storage	30 TB (NVMe)	30 TB (NVM3)	9 TB (SSD)	384 TB (SSD)
Persistent Storage	-	-	-	30 TB (Ultra SSD)
Available Local Retention	14 hrs	7 hrs	4 hrs	7 days
Network	25 Gbps	25 Gbps	32 Gbps	21 Gbps
Tiered Storage	True	True	True	True

Bare Metal

AWS

GCP

Azure

Instance Type

i3en.12xlarge

n2-standard-48

F48s_v2

Nodes

Cores

Memory

192 GB

384 GB

192 GB

96 GB

Instance Storage

30 TB (NVMe)

30 TB (NVM3)

9 TB (SSD)

384 TB (SSD)

Persistent Storage

30 TB (Ultra SSD)

Available Local Retention

14 hrs

7 hrs

4 hrs

7 days

Network

25 Gbps

32 Gbps

21 Gbps

Tiered Storage

True

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Sizing Use Cases

Low throughput

Medium throughput

High throughput

Simple online edits

Contribution guide