Kubernetes Cluster Requirements and Recommendations

This topic provides the requirements and recommendations for provisioning Kubernetes clusters and worker nodes for running Redpanda in production.

Operating system

  • Minimum version required of RHEL/CentOS: 8. Recommended: 9+

  • Minimum version required of Ubuntu: 20.04 LTS. Recommended: 22.04+

Kubernetes version

Minimum required Kubernetes version: 1.21

Make sure to do the following:

Helm version

Minimum required Helm version: 3.6.0

Number of worker nodes

Provision one physical node or virtual machine (VM) for each Redpanda broker that you plan to deploy in your Redpanda cluster. Each Redpanda broker requires its own dedicated worker node for the following reasons:

  • Resource isolation: Redpanda brokers are designed to make full use of available system resources, including CPU and memory. By dedicating a worker node to each broker, you ensure that these resources aren’t shared with other applications or processes, avoiding potential performance bottlenecks or contention.

  • External networking: External clients should connect directly to the broker that owns the partition they’re interested in. This means that each broker must be individually addressable. As clients must connect to the specific broker that is the leader of the partition, they need a mechanism to directly address each broker in the cluster. Assigning each broker to its own dedicated worker node makes this direct addressing feasible, since each worker node will have a unique address. See External networking.

  • Fault tolerance: Ensuring each broker operates on a separate node enhances fault tolerance. If one node experiences issues, it won’t directly impact the other brokers.

The Redpanda Helm chart configures podAntiAffinity rules to make sure that each Redpanda broker runs on its own worker node.

CPU and memory

Requirements:

  • Two physical, not virtual, cores for each worker node.

  • x86_64 (Westmere or newer) and AWS Graviton family processors are supported.

  • 2 GB or more of memory per core.

  • 4 MB of memory for each topic partition replica. You can enforce this requirement in the tunable topic_memory_per_partition property.

Recommendations:

Storage

Requirements:

  • An XFS or ext4 file system.

    The Redpanda data directory (/var/lib/redpanda/data) and the Tiered Storage cache must be mounted on an XFS or ext4 file system.

    For information about supported volume types for different data in Redpanda, see Supported Volume Types for Data in Redpanda.

    Avoid using NFS (Network File System) for the Redpanda data directory or the Tiered Storage cache.
  • A default StorageClass that can provision PersistentVolumes with at least 20Gi of storage.

Recommendations:

  • Use an XFS file system for its enhanced performance with Redpanda workloads.

  • For setups with multiple disks, use a RAID-0 (striped) array. It boosts speed but lacks redundancy. A disk failure can lead to data loss.

  • Use local PersistentVolumes backed by NVMe disks.

Security

Recommendations:

  • If you’re using a cloud platform, use IAM roles to restrict access to resources in your cluster.

  • Secure your Redpanda cluster with TLS encryption and SASL authentication.

External networking

  • For external access, each worker node in your cluster must have a static, externally accessible IP address.

  • Minimum 10 GigE (10 Gigabit Ethernet) connection to ensure:

    • High data throughput

    • Reduced data transfer latency

    • Scalability for increased network traffic

Tuning

Before deploying Redpanda to production, each worker node that runs Redpanda must be tuned to optimize the Linux kernel for Redpanda processes.

Object storage providers for Tiered Storage

Redpanda supports the following storage providers for Tiered Storage:

  • Amazon Simple Storage Service (S3)

  • Google Cloud Storage (GCS), using the Google Cloud Platform S3 API

  • Azure Blob Storage (ABS)

Cloud instance types

Recommendations:

  • Use a cloud instance type that supports locally attached NVMe devices with an XFS file system. NVMe devices offer high I/O operations per second (IOPS) and minimal latency, while XFS offers enhanced performance with Redpanda workloads.

Amazon

EKS defaults to the ext4 file system. Use XFS instead where possible.

  • General purpose: General-purpose instances provide a balance of compute, memory, and networking resources, and they can be used for a variety of diverse workloads.

  • Memory optimized: Memory-optimized instances are designed to deliver fast performance for workloads that process large data sets in memory.

  • Storage optimized: Storage-optimized instances are designed for workloads that require high, sequential read and write access to very large data sets on local storage. They are optimized to deliver tens of thousands of low-latency, random IOPS to applications.

  • Compute optimized: Compute-optimized instances deliver cost-effective high performance at a low price per compute ratio for running advanced compute-intensive workloads.

Azure

AKS often defaults to the ext4 file system. Use XFS instead where possible.

Google

GKE often defaults to the ext4 file system. Use XFS instead where possible.

  • General purpose: The general-purpose machine family has the best price-performance with the most flexible vCPU to memory ratios, and provides features that target most standard and cloud-native workloads.

  • Memory optimized: The memory-optimized machine family provides the most compute and memory resources of any Compute Engine machine family offering. They are ideal for workloads that require higher memory-to-vCPU ratios than the high-memory machine types in the general-purpose N1 machine series.

  • Compute optimized: Compute-optimized VM instances are ideal for compute-intensive and high-performance computing (HPC) workloads.