Designed for Performance
Great news! Redpanda comes with an autotuner that detects the optimal settings for your hardware. To get the best performance for your hardware, set Redpanda to production mode. In production mode, Redpanda identifies your hardware configuration and tunes itself to give you the best performance.
The performance settings listed here are just for general reference.
Disk
Redpanda uses DMA (Direct Memory Access) for all its disk IO. To get the best IO performance, it is recommended the you place the data directory (/var/lib/redpanda/data) on an XFS partition in a local NVMe SSD. Redpanda can drive your SSD at maximum throughput at all times. Redpanda relies on XFS due to its use of sparse file system support to flush concurrent, non-overlapping pages. Although other file systems might work, they may have limitations that prevent you from getting the most value out of your hardware.
We recommend not using network-blocked devices because of their inherent performance limitations. |
For multi-disk setups it is recommended using Raid-0 with XFS on top. Future releases will manage multi-disk virtualization without user involvement.
While monitoring, you might notice that the file system file sizes might jump around. This is expected behavior as we use internal heuristics to expand the file system metadata eagerly when we determine it would improve performance for a sequence of operations or to amortize the cost of synchronization events.
Network
Modern NICs can drive multi-gigabit traffic to hosts. rpk
probes the hardware
(taking into account the number of CPUs, etc) and automatically chooses the best
setting to drive high throughput traffic to the machine. The modes are all but
cpu0, cpu0 + Hyper Thread sibling, or distributed across all cores, in addition
to other settings like backlog and max sockets, regardless if the NIC is bonded
or not. The user is never aware of any of these low level settings, and in most
production scenarios it is usually distributed across all cores. This is
to distribute the cost of interrupt processing evenly among all cores.
CGROUPS
To run at peak performance for extended periods, we leverage cgroups to isolate the Redpanda processes. This shields Redpanda processes from “noisy neighbors”, processes running alongside redpanda which demand sharing resources that adversely affect performance.
We also leverage systemd
slices. We instruct the kernel to strongly prefer
evicting other processes before evicting our process’ memory and to reserve IO
quotas and CPU time. This way, even when other processes are competing for resources,
we still deliver predictable latency and throughput to end users.
CPU
Frequently, the default CPU configuration is prioritized for typical end-user use cases, such as non-cpu-intensive desktop applications and optimizing power usage. Redpanda disables all power-saving modes and ensures that the CPU is configured for predictable latency at all times. We designed Redpanda to drive machines around ~90% utilization and still give the user predictable low latency results.
Memory
Swapping is prevented so that Redpanda is never swapped out of memory. By
design, redpanda
allocates nearly all of the available memory upfront,
partitioning the allocated memory between all cores and pinning such memory
to the specified NUMA domain (specific CPU socket). This makes sure that we have predictable memory allocations and provides predictable latency.