Streaming

Kubernetes Cluster Requirements and Recommendations

This topic provides the requirements and recommendations for provisioning Kubernetes clusters and worker nodes for running Redpanda in production.

Operating system

Minimum version required of RHEL/CentOS: 8. Recommended: 9+
Minimum version required of Ubuntu: 20.04 LTS. Recommended: 22.04+

Recommendation: Linux kernel 4.19 or later for better performance.

Kubernetes version

Minimum required Kubernetes version: 1.27.0-0

Make sure to do the following:

Helm version

Minimum required Helm version: 3.10.0

Install Helm.

Helm v3.18.0 is not supported due to a bug that causes errors such as:

Error: INSTALLATION FAILED: execution error at (redpanda/templates/entry-point.yaml:17:4): invalid Quantity expected string or float64 got: json.Number (1)

To avoid similar errors, upgrade to a later version. For more details, see the Helm GitHub issue.

Number of nodes

Provision one physical node or virtual machine (VM) for each Redpanda broker that you plan to deploy in your Redpanda cluster. Each Redpanda broker requires its own dedicated node for the following reasons:

Resource isolation: Redpanda brokers are designed to make full use of available system resources, including CPU and memory. By dedicating a node to each broker, you ensure that these resources aren’t shared with other applications or processes, avoiding potential performance bottlenecks or contention.
External networking: External clients should connect directly to the broker that owns the partition they’re interested in. This means that each broker must be individually addressable. As clients must connect to the specific broker that is the leader of the partition, they need a mechanism to directly address each broker in the cluster. Assigning each broker to its own dedicated node makes this direct addressing feasible, since each node will have a unique address. See External networking.
Fault tolerance: Ensuring each broker operates on a separate node enhances fault tolerance. If one node experiences issues, it won’t directly impact the other brokers.

The Redpanda Helm chart configures podAntiAffinity rules to make sure that each Redpanda broker runs on its own node.

Recommendations: Deploy at least three Pod replicas.

Prevent automatic node upgrades

Ensure that node and operating system (OS) upgrades are manually managed when running Redpanda in production. Manual control avoids unplanned reboots or replacements that disrupt Redpanda brokers, causing service downtime, data loss, or quorum instability.

Common issues with automatic node upgrades include:

Hard timeouts for graceful shutdowns that do not allow Redpanda brokers enough time to complete decommissioning or leadership transitions.
Replacements or reboots without ensuring data has been safely migrated or replicated, risking data loss.
Parallel upgrades across multiple nodes, which can disrupt quorum or reduce cluster availability.

Requirements:

Disable automatic node maintenance or upgrades. To prevent managed Kubernetes services from automatically rebooting or upgrading nodes:
- Azure AKS: Set the OS upgrade channel to None.
- Google GKE: Disable GKE auto-upgrades for node pools.
- Amazon EKS: Disable EKS node auto-upgrades.

CPU and memory

Requirements:

Each production node must have at least two physical CPU cores.
x86_64 (Westmere or newer) and AWS Graviton processors are supported.
Each Redpanda Pod requires at least 2 GiB of memory per core.
- Request a minimum of 2.22 GiB per core to meet Redpanda’s memory allocation strategy.
  
  See Manage Pod Resources in Kubernetes for detailed guidance and examples.
Each Redpanda broker must have at least 2 GB of memory per core.
Each Redpanda broker must have at least 2 MB of memory for each topic partition replica.

The total memory available for partition replicas is determined as a percentage of the cluster’s total memory, which is controlled by the topic_partitions_memory_allocation_percent setting. Each partition replica consumes topic_memory_per_partition bytes from this pool. If insufficient memory is available, topic operations will fail. You can adjust the allocation ratio using topic_partitions_memory_allocation_percent, but doing so is not recommended, as lowering it may lead to instability or degraded performance.

Recommendations:

Four physical cores for each node are strongly recommended.

Pod resource configuration

To ensure stable performance and predictable scheduling in Kubernetes, configure Redpanda Pods with appropriate CPU and memory requests and limits:

Set resources.requests.memory and resources.limits.memory to the same value.
- Request at least 2.22 GiB of memory per core to meet Redpanda’s heap and overhead requirements.
Set resources.cpu.cores to an even integer (for example, 4, 6, or 8) to align with the Kubernetes static CPU manager policy.
Match CPU and memory resource settings for all containers in the Pod, including init containers and sidecars, to receive the Guaranteed QoS class.
Enable memory locking with the --lock-memory flag to prevent paging and improve performance.

This configuration:

Grants Redpanda exclusive access to CPU cores and memory
Reduces the risk of throttling, eviction, and OOM kills
Provides predictable and isolated runtime performance

See Manage Pod Resources in Kubernetes for configuration examples using both Helm and the Redpanda Operator.

Storage

Requirements:

NVMe (Non-Volatile Memory Express) drives are required for production deployments. NVMe drives provide the high throughput and low latency needed for optimal Redpanda performance.

See also: Disk and network self-test benchmarks.
An XFS or ext4 file system.

The Redpanda data directory (/var/lib/redpanda/data) and the Tiered Storage cache must be mounted on an XFS or ext4 file system.

For information about supported volume types for different data in Redpanda, see Supported Volume Types for Data in Redpanda.

The Network File System (NFS) is unsupported for use as the storage mechanism for the Redpanda data directory or for the Tiered Storage cache.
A default StorageClass that can provision PersistentVolumes with at least 20Gi of storage.

Recommendations:

Use an XFS file system for its enhanced performance with Redpanda workloads.
For setups with multiple disks, use a RAID-0 (striped) array. It boosts speed but lacks redundancy. A disk failure can lead to data loss.
Use local PersistentVolumes backed by NVMe disks.

Security

Recommendations:

If you’re using a cloud platform, use IAM roles to restrict access to resources in your cluster.
Secure your Redpanda cluster with TLS encryption and SASL authentication.

External networking

For external access, each node in your cluster must have a static, externally accessible IP address.
Minimum 10 GigE (10 Gigabit Ethernet) connection to ensure:
- High data throughput
- Reduced data transfer latency
- Scalability for increased network traffic

Recommendations: Use a NodePort Service for external access.

Tuning

Before deploying Redpanda to production, each node that runs Redpanda must be tuned to optimize the Linux kernel for Redpanda processes.

See Tune Kubernetes Worker Nodes for Production.

Object storage providers for Tiered Storage

Redpanda supports the following storage providers for Tiered Storage:

Amazon Simple Storage Service (S3)
Google Cloud Storage (GCS), using the Google Cloud Platform S3 API
Azure Blob Storage (ABS)

Cloud instance types

Recommendations:

Use a cloud instance type that supports locally attached NVMe devices with an XFS file system. NVMe devices offer high I/O operations per second (IOPS) and minimal latency, while XFS offers enhanced performance with Redpanda workloads.

Amazon

EKS defaults to the ext4 file system. Use XFS instead where possible.

General purpose: General-purpose instances provide a balance of compute, memory, and networking resources, and they can be used for a variety of diverse workloads.
- M5d
- M5ad
- M5dn
- M6gd
- M7gd
Memory optimized: Memory-optimized instances are designed to deliver fast performance for workloads that process large data sets in memory.
- R5ad
- R5d
- R5dn
- R6gd
- R6id
- R6idn
- R7gd
- X2gd
- X2idn
- X2iedn
- z1d
Storage optimized: Storage-optimized instances are designed for workloads that require high, sequential read and write access to very large data sets on local storage. They are optimized to deliver tens of thousands of low-latency, random IOPS to applications.
- I4g, Is4gen, Im4gn
- I4i
- I3
- I3en
Compute optimized: Compute-optimized instances deliver cost-effective high performance at a low price per compute ratio for running advanced compute-intensive workloads.
- C5d
- C5ad

Azure

AKS often defaults to the ext4 file system. Use XFS instead where possible.

General purpose: General purpose VM sizes provide balanced CPU-to-memory ratio. Ideal for testing and development, small to medium databases, and low to medium traffic web servers.

Google

GKE often defaults to the ext4 file system. Use XFS instead where possible.

General purpose: The general-purpose machine family has the best price-performance with the most flexible vCPU to memory ratios, and provides features that target most standard and cloud-native workloads.
Memory optimized: The memory-optimized machine family provides the most compute and memory resources of any Compute Engine machine family offering. They are ideal for workloads that require higher memory-to-vCPU ratios than the high-memory machine types in the general-purpose N1 machine series.
- M3 machine series
Compute optimized: Compute-optimized VM instances are ideal for compute-intensive and high-performance computing (HPC) workloads.
- C2D machine series
- C2 machine series

Next steps

After meeting these requirements, proceed to:

Deploy Redpanda for production
Validate production readiness with the comprehensive checklist

Suggested labs

Search all labs

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Kubernetes Cluster Requirements and Recommendations

Operating system

Kubernetes version

Helm version

Number of nodes

Prevent automatic node upgrades

CPU and memory

Pod resource configuration

Storage

Security

External networking

Tuning

Object storage providers for Tiered Storage

Cloud instance types

Amazon

Azure

Google

Next steps

Suggested reading

Suggested labs

Simple online edits

Contribution guide