Collapse

Production Readiness Checklist

Before running a production workload on Redpanda in Kubernetes, follow this readiness checklist.

By completing this checklist, you will be able to:

Validate a Kubernetes-deployed Redpanda cluster against production readiness standards

For Linux deployments, see the Production Readiness Checklist for Linux.

Critical requirements

The Critical requirements checklist helps ensure that:

You have specified all required defaults and configuration items.
You have the optimal hardware setup.
You have enabled security.
You are set up to run in production.

Redpanda license

If using Enterprise features, validate that you are using a valid Enterprise license:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster license info -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

LICENSE INFORMATION
===================
Organization:      Your Company Name
Type:              enterprise
Expires:           Dec 31 2026

Production deployments using Enterprise features (such as Tiered Storage, Schema Registry, or Continuous Data Balancing) must have a valid Enterprise license with a sufficient expiration date.

Cluster health

Check that all brokers are connected and running. Run rpk cluster health to check the health of the cluster. No nodes should be down, and there should be zero leaderless or under-replicated partitions.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster health -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

CLUSTER HEALTH OVERVIEW
=======================
Healthy:                          true
Unhealthy reasons:                []
Controller ID:                    0
All nodes:                        [0 1 2]
Nodes down:                       []
Leaderless partitions (0):        []
Under-replicated partitions (0):  []

Minimum broker count

You must have at least three brokers running to ensure production-level fault tolerance.

Production clusters should have an odd number of brokers (3, 5, 7, etc.) for optimal consensus behavior.

Verify the running broker count:

Input

kubectl get pods -n <namespace> -l app.kubernetes.io/component=redpanda-statefulset

Output

NAME         READY   STATUS    RESTARTS   AGE
redpanda-0   2/2     Running   0          10d
redpanda-1   2/2     Running   0          10d
redpanda-2   2/2     Running   0          10d

Verify the configured replica count in your deployment:

Helm
Operator

Input

helm get values redpanda -n <namespace> | grep -A 1 "statefulset:"

Output

statefulset:
  replicas: 3

Input

kubectl get redpanda redpanda -n <namespace> -o jsonpath='{.spec.clusterSpec.statefulset.replicas}'

Output

Active broker membership

Verify that all brokers are in active state and not being decommissioned.

Decommissioning is used to permanently remove a broker from the cluster, such as during node pool migrations or cluster downsizing. Brokers in a decommissioned state should not be present in production clusters unless actively performing a planned migration.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk redpanda admin brokers list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

NODE-ID  NUM-CORES  MEMBERSHIP-STATUS  IS-ALIVE  BROKER-VERSION
0        4          active             true      v24.2.4
1        4          active             true      v24.2.4
2        4          active             true      v24.2.4

All brokers must show active status. If any broker shows the status draining or decommissioned, investigate immediately.

No brokers in maintenance mode

Check that no brokers are in maintenance mode during normal operations.

Maintenance mode is used when modifying brokers that will remain as members of the cluster, such as during rolling upgrades or hardware maintenance. While necessary during planned maintenance windows, brokers should not remain in maintenance mode during normal operations.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster maintenance status -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

NODE-ID  ENABLED  FINISHED  ERRORS  PARTITIONS  ELIGIBLE  TRANSFERRING  FAILED
0        false    -         -       -           -         -             -
1        false    -         -       -           -         -             -
2        false    -         -       -           -         -             -

All brokers should show ENABLED: false. If any broker shows ENABLED: true outside of a planned maintenance window, investigate immediately.

Consistent Redpanda version

Check that Redpanda is running the latest point release for the major version you’re on and that all brokers run the same version.

Verify Redpanda broker versions:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk redpanda admin brokers list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

NODE-ID  NUM-CORES  MEMBERSHIP-STATUS  IS-ALIVE  BROKER-VERSION
0        4          active             true      v25.2.4
1        4          active             true      v25.2.4
2        4          active             true      v25.2.4

All brokers must show the same BROKER-VERSION. Version mismatches between brokers can cause compatibility issues and must be resolved before advancing to production.

Verify Helm Chart or Operator version compatibility:

For Kubernetes deployments, you must also verify that your deployment tool (Helm Chart or Operator) version is compatible with your Redpanda version. The Helm Chart or Operator version must be within one minor version of the Redpanda version.

For example, if running Redpanda v25.2.x, the Helm Chart or Operator version must be v25.1.x, v25.2.x, or v25.3.x.

Helm
Operator

Input

helm list -n <namespace>

Output

NAME     NAMESPACE  REVISION  UPDATED                               STATUS    CHART            APP VERSION
redpanda redpanda   1         2024-01-15 10:30:00.123456 -0800 PST deployed  redpanda-5.2.4   v25.2.4

The CHART column shows the Helm Chart version (for example, redpanda-5.2.4), which should be compatible with the APP VERSION (Redpanda version).

Input

kubectl get deployment redpanda-controller-manager -n <namespace> -o jsonpath='{.spec.template.spec.containers[0].image}'

Output

docker.redpanda.com/redpandadata/redpanda-operator:v25.2.4

The Operator version is shown in the image tag (for example, v25.2.4), which should be compatible with your Redpanda broker version.

You can also check the Operator version using:

Input

kubectl get redpanda redpanda -n <namespace> -o jsonpath='{.metadata.annotations.redpanda\.com/operator-version}'

Version compatibility requirements:

All Redpanda brokers must run the same version
The Helm Chart or Operator version must be within ±1 minor version of Redpanda version
Example: Redpanda v25.2.x requires Helm/Operator v25.1.x, v25.2.x, or v25.3.x
Running incompatible versions can lead to deployment failures or cluster instability.

Version pinning

Verify that versions are explicitly pinned in your deployment configuration:

Helm
Operator

image:
  tag: v24.2.4  # Pin specific Redpanda version

console:
  enabled: true
  image:
    tag: v2.4.5  # Pin specific Console version

connectors:
  enabled: true
  image:
    tag: v1.0.15  # Pin specific Connectors version

Verify pinned versions:

Input

helm get values redpanda -n <namespace>

Output

image:
  tag: v24.2.4
console:
  image:
    tag: v2.4.5
connectors:
  image:
    tag: v1.0.15

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  clusterSpec:
    image:
      tag: v24.2.4  # Pin specific Redpanda version

  console:
    enabled: true
    image:
      tag: v2.4.5  # Pin specific Console version

  connectors:
    enabled: true
    image:
      tag: v1.0.15  # Pin specific Connectors version

Verify pinned versions:

Input

kubectl get redpanda redpanda -n <namespace> -o yaml | grep -A 1 "tag:"

Pin specific versions for Redpanda and all related components (Console, Connectors). This ensures all environments (dev/staging/prod) run the same tested versions, allows controlled upgrade testing before production rollout, and provides rollback capability to known-good versions.

Avoid using the latest tag, version ranges (for example, v24.2.x), or unspecified tags, as these can result in unexpected upgrades that introduce breaking changes or cause downtime.

Default topic replication factor

Check that the default replication factor (≥3) is set appropriately for production.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get default_topic_replications -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

Setting default_topic_replications to 3 or greater ensures new topics are created with adequate fault tolerance.

Existing topics replication factor

Check that all existing topics have adequate replication (default is 3).

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk topic list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

NAME              PARTITIONS  REPLICAS
_schemas          1           3
orders            12          3
payments          8           3
user-events       16          3

All production topics should have REPLICAS of three or greater. Topics with single-digit replication are at risk of data loss if a broker fails.

Persistent storage configuration

Verify that you have configured persistent storage (not hostPath or emptyDir) for data persistence.

Input

kubectl get pvc -n <namespace>

Output

NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-redpanda-0      Bound    pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890   100Gi      RWO            fast-ssd       10d
datadir-redpanda-1      Bound    pvc-b2c3d4e5-f6g7-8901-bcde-fg2345678901   100Gi      RWO            fast-ssd       10d
datadir-redpanda-2      Bound    pvc-c3d4e5f6-g7h8-9012-cdef-gh3456789012   100Gi      RWO            fast-ssd       10d

Verify the StatefulSet uses PersistentVolumeClaims:

Input

kubectl describe statefulset -n <namespace> redpanda | grep -A 5 "Volume Claims"

Output

Volume Claims:
  Name:          datadir
  StorageClass:  fast-ssd
  Labels:        <none>
  Annotations:   <none>
  Capacity:      100Gi

HostPath and emptyDir storage are not suitable for production as they lack durability guarantees.

RAID/LVM stripe configuration (multiple disks only)

If using multiple physical disks, verify they are configured to stripe data across the disks as RAID-0 or LVM stripe (not linear/concat). Striping distributes data across multiple disks in parallel for improved I/O performance.

Input

# Check block device configuration on nodes
kubectl debug node/<node-name> -it -- chroot /host /bin/bash
lsblk -o NAME,TYPE,SIZE,MOUNTPOINT,FSTYPE
lvs -o lv_name,stripes,stripe_size
mdadm --detail /dev/md*  # if using software RAID

Output

# lsblk output
NAME          TYPE  SIZE   MOUNTPOINT        FSTYPE
nvme0n1       disk  1.8T
nvme1n1       disk  1.8T
vg0-data      lvm   3.6T   /var/lib/redpanda xfs

# lvs output - note stripes > 1 indicates striping
LV    #Stripes StripeSize
data  2        256.00k

Output

# mdadm output
/dev/md0:
    Raid Level : raid0
    Array Size : 3515625472 (3.27 TiB)
  Raid Devices : 2

    Number   Major   Minor   RaidDevice State
       0     259        0        0      active sync   /dev/nvme0n1
       1     259        1        1      active sync   /dev/nvme1n1

Using LVM linear/concat or JBOD instead of stripe/RAID-0 across multiple disks will severely degrade performance because data writes are serialized rather than parallelized. For optimal I/O throughput, configure multiple disks in a striped array that writes data across all disks simultaneously. Single disk configurations do not require striping.

Storage performance requirements

Ensure storage classes provide adequate IOPS and throughput for your workload by using the following specifications when selection a storage class:

Performance specifications:

Use NVMe-based storage classes for production deployments
Specify a minimum 16,000 IOPS (Input/Output Operations Per Second)
Consider provisioned IOPS where available to meet or exceed the minimum
Enable write caching to help Redpanda perform better in environments with disks that don’t meet the recommended IOPS
NFS (Network File System) is not supported
Test storage performance under load

Avoid cloud instance types that use multi-tenant or shared disks, as these can lead to unpredictable performance due to noisy neighbor effects. Examples of instances with shared/multi-tenant storage include AWS is4gen.xlarge and similar instance types across cloud providers. Instead, use instances with dedicated local NVMe storage or provisioned IOPS volumes that guarantee consistent performance.

Multi-tenant disks can experience:

Unpredictable latency spikes from other tenants' workloads
Inconsistent throughput that varies based on neighbor activity
IOPS throttling that impacts Redpanda’s performance
Difficulty troubleshooting performance issues due to external factors

CPU and memory resource limits

Verify Pods have resource requests and limits configured.

Input

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[?(@.name=="redpanda")].resources}' | jq

Output

{
  "limits": {
    "cpu": "4",
    "memory": "8Gi"
  },
  "requests": {
    "cpu": "4",
    "memory": "8Gi"
  }
}

All Redpanda Pods must have:

Identical CPU requests and limits (requests.cpu == limits.cpu)
Identical memory requests and limits (requests.memory == limits.memory)

Setting requests equal to limits ensures the Pod receives the Guaranteed QoS class, which prevents CPU throttling and reduces the risk of Pod eviction.

CPU to memory ratio

Ensure adequate memory allocation relative to CPU for optimal performance.

Production deployments should provision at least 2 GiB of memory per CPU core. The ratio should be at least 1:2 (2 GiB per core).

Verify the CPU to memory ratio in your configuration:

Helm
Operator

Input

helm get values redpanda -n <namespace> | grep -A 2 "resources:"

Output

resources:
  cpu:
    cores: 4
  memory:
    container:
      min: 8Gi
      max: 8Gi

Input

kubectl get redpanda redpanda -n <namespace> -o jsonpath='{.spec.clusterSpec.resources}' | jq

Output

{
  "cpu": {
    "cores": 4
  },
  "memory": {
    "container": {
      "min": "8Gi",
      "max": "8Gi"
    }
  }
}

In the preceding examples, 4 CPU cores with 8 GiB memory provides a 1:2 ratio (2 GiB per core).

No fractional CPU requests

Ensure CPU requests use whole numbers for consistent performance.

Fractional CPUs can lead to performance variability in production. Use whole integer values (4, 8, or 16 are acceptable, while 3.5 or 7.5 are not).

Verify CPU configuration:

Input

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[?(@.name=="redpanda")].resources.requests.cpu}'

Output

Authorization enabled

Verify Kafka authorization is enabled for access control.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get kafka_enable_authorization -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

true

Without authorization enabled, any client can access Kafka APIs without authentication.

Production mode enabled

Verify that developer mode and overprovisioned mode are disabled for production stability.

Check developer mode:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- grep developer_mode /etc/redpanda/redpanda.yaml

Output

developer_mode: false

Developer mode should never be enabled in production environments. Developer mode disables fsync and bypasses safety checks designed for production workloads.

Check overprovisioned mode:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- grep overprovisioned /etc/redpanda/redpanda.yaml

Output

overprovisioned: false

Overprovisioned mode bypasses critical resource checks and should never be enabled in production. This mode is intended only for development environments with constrained resources.

Verify in Helm values that resources.cpu.overprovisioned is not explicitly set to true (it’s automatically calculated based on CPU allocation).

TLS enabled

Configure TLS encryption for all client and inter-broker communication. TLS prevents eavesdropping and man-in-the-middle attacks on network traffic.

Verify TLS is enabled on all listeners:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config export -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> | grep -A 10 "kafka_api:"

Output

redpanda:
  kafka_api:
    - address: 0.0.0.0
      port: 9093
      name: internal
      authentication_method: sasl
  kafka_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/tls.crt
      key_file: /etc/tls/certs/tls.key

Required TLS listeners include:

kafka_api - Client connections to Kafka API
admin_api - Administrative REST API access
rpc_server - Inter-broker communication
schema_registry - Schema Registry API (if used)

Verify certificates are properly mounted:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- ls -la /etc/tls/certs/

Output

total 16
-rw-r--r-- 1 redpanda redpanda 1234 Dec 15 10:00 ca.crt
-rw-r--r-- 1 redpanda redpanda 1675 Dec 15 10:00 tls.crt
-rw------- 1 redpanda redpanda 1704 Dec 15 10:00 tls.key

Authentication enabled

Configure appropriate authentication mechanisms to control access to Redpanda resources.

Verify SASL users are configured:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk acl user list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

USERNAME
admin
app-producer
app-consumer
monitoring

Be sure to adhere to the following authentication requirements:

Set up SASL authentication for client connections
Configure TLS certificates for encryption (see preceding TLS configuration guidance)
Implement proper user management with principle of least privilege
Configure ACLs (Access Control Lists) for resource authorization

Verify ACLs are configured:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk acl list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

PRINCIPAL       HOST  RESOURCE-TYPE  RESOURCE-NAME      OPERATION  PERMISSION
User:app-producer  *     TOPIC          orders.*          WRITE      ALLOW
User:app-consumer  *     TOPIC          orders.*          READ       ALLOW
User:app-consumer  *     GROUP          consumer-group-1  READ       ALLOW

Network security

Secure network access to the cluster using Kubernetes-native controls.

Verify NetworkPolicies are configured:

Input

kubectl get networkpolicy -n <namespace>

Output

NAME                          POD-SELECTOR                        AGE
redpanda-allow-internal       app.kubernetes.io/name=redpanda    10d
redpanda-allow-clients        app.kubernetes.io/name=redpanda    10d
redpanda-deny-all-ingress     app.kubernetes.io/name=redpanda    10d

Check NetworkPolicy rules:

Input

kubectl describe networkpolicy -n <namespace>

Be sure to satisfy the following network security requirements:

Configure NetworkPolicies to restrict pod-to-pod communication
Use TLS for all client connections (see TLS configuration)
Secure admin API endpoints with authentication and authorization
Limit ingress traffic to only necessary ports and sources
Use Kubernetes Services to control external access

Verify services and exposed ports:

Input

kubectl get svc -n <namespace>

Output

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)
redpanda           ClusterIP      None            <none>        9093/TCP,9644/TCP,8082/TCP
redpanda-external  LoadBalancer   10.100.200.50   <pending>     9093:30001/TCP

Pod Disruption Budget

Set up PDBs to control voluntary disruptions during maintenance.

Input

kubectl get pdb -n <namespace>

Output

NAME       MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
redpanda   N/A             1                 1                     10d

Production deployments must have a PodDisruptionBudget with maxUnavailable: 1 to prevent simultaneous broker disruptions during voluntary operations like node drains, upgrades, or autoscaler actions.

Rack awareness and topology spread

Configure topology spread constraints to distribute brokers across availability zones. For configuration instructions, see Multi-AZ deployment.

Production deployments require each Redpanda broker to run in a different availability zone to ensure that a single zone failure does not cause loss of quorum. For a three-broker cluster, brokers must be distributed across three separate zones.

To verify zone distribution, check your cluster configuration:

Verify topologySpreadConstraints are configured in your Helm values or Redpanda CR
Confirm nodes have zone labels (typically topology.kubernetes.io/zone)
Check that brokers are scheduled on nodes in different zones

Operator CRDs (Operator deployments only)

If your deployment uses the Redpanda Operator, all required Custom Resource Definitions (CRDs) must be installed with compatible versions. Without correct CRDs, the Operator cannot manage the cluster, leading to configuration drift, failed updates, and potential data loss.

The required CRDs are below:

clusters.cluster.redpanda.com - Manages Redpanda cluster configuration
topics.cluster.redpanda.com - Manages topic lifecycle
users.cluster.redpanda.com - Manages SASL users
schemas.cluster.redpanda.com - Manages Schema Registry schemas

If any CRDs are missing or incompatible with your Operator version, the Operator will fail to reconcile resources.

Verify all required CRDs are installed:

Input

kubectl get crd | grep redpanda.com

Output

clusters.cluster.redpanda.com
topics.cluster.redpanda.com
users.cluster.redpanda.com
schemas.cluster.redpanda.com

Run Redpanda tuners

Check that you have configured tuners for optimal performance. Tuners can significantly impact latency and throughput. In Kubernetes, tuners are configured through the Helm chart or may need to be run on worker nodes themselves. For details, see Tune Kubernetes Worker Nodes for Production.

Recommended requirements

The Recommended requirements checklist ensures that you can monitor and support your environment on a sustained basis. It includes the following checks:

You have adhered to day-2 operations best practices.
You can diagnose and recover from backup issues or failures.
You have configured monitoring, backup, and security scanning.

Deployment method

Verify that the deployment method (Helm or Operator) is correctly identified for your cluster. Understanding your deployment method is important for troubleshooting, upgrades, and configuration management.

Helm
Operator

Input

helm list -n <namespace>

Output

NAME     NAMESPACE  REVISION  UPDATED                               STATUS    CHART            APP VERSION
redpanda redpanda   1         2024-01-15 10:30:00.123456 -0800 PST deployed  redpanda-5.0.0   v24.1.1

The presence of a Helm release (CHART displays redpanda-5.0.0) indicates a Helm-managed deployment.

Input

kubectl get redpanda -n <namespace>

Output

NAME       READY   STATUS
redpanda   True    Redpanda reconciliation succeeded

The presence of a Redpanda custom resource indicates an Operator-managed deployment.

Knowing your deployment method helps determine which configuration approach to use (Helm values vs. Redpanda CR), how to perform upgrades and rollbacks, where to find deployment logs and troubleshooting information, and which documentation sections apply to your environment. See Production Deployment Workflow for the complete deployment process.

XFS filesystem

Verify that data directories use XFS filesystem for optimal performance.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- df -khT /var/lib/redpanda/data

Output

Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1   xfs   1.8T   14G  1.8T   1% /var/lib/redpanda/data

XFS provides better performance characteristics for Redpanda workloads compared to ext4. While ext4 is supported, XFS is strongly recommended for production deployments.

Pod anti-affinity

Configure Pod anti-affinity to spread brokers across nodes.

Input

kubectl get statefulset redpanda -n <namespace> -o jsonpath='{.spec.template.spec.affinity}' | jq

Output

{
  "podAntiAffinity": {
    "requiredDuringSchedulingIgnoredDuringExecution": [
      {
        "labelSelector": {
          "matchLabels": {
            "app.kubernetes.io/name": "redpanda"
          }
        },
        "topologyKey": "kubernetes.io/hostname"
      }
    ]
  }
}

This prevents single node failures from affecting multiple brokers by ensuring each Redpanda Pod runs on a different node.

Node isolation

Configure taints/tolerations or nodeSelector for workload isolation.

Input

kubectl get statefulset redpanda -n <namespace> -o jsonpath='{.spec.template.spec.nodeSelector}' | jq

Output

{
  "workload-type": "redpanda"
}

Isolating Redpanda workloads on dedicated nodes improves performance predictability by preventing resource contention with other applications.

Partition balancing

Configure automatic partition balancing across brokers and CPU cores.

Continuous Data Balancing

Continuous Data Balancing can help you manage production deployments by automatically rebalancing partition replicas across brokers based on disk usage and node changes. It also eliminates manual intervention and prevents performance degradation.

You should enable Continuous Data Balancing for all licensed production clusters.

Verify that Continuous Data Balancing is configured:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get partition_autobalancing_mode -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

continuous

The continuous setting enables automatic partition rebalancing based on:

Node additions or removals
High disk usage conditions
Broker availability changes

Without Continuous Data Balancing, partition distribution becomes skewed over time, leading to hotspots and manual rebalancing operations.

Core Balancing

Intra-broker partition balancing distributes partition replicas across CPU cores within individual brokers.

Check core balancing for CPU core partition distribution:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get core_balancing_on_core_count_change -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

true

When enabled, Redpanda continuously rebalances partitions between CPU cores on a broker for optimal resource utilization, which is especially beneficial after broker restarts or configuration changes.

System requirements

Run system checks to get more details regarding your system configuration.

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk redpanda check -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

CONDITION                         REQUIRED       CURRENT   SEVERITY  PASSED
Data directory is writable        true           true      Fatal     true
Free memory per CPU [MB]          >= 2048        8192      Warning   true
NTP Synced                        true           true      Warning   true
Swappiness                        1              1         Warning   true

Review any failed checks and remediate before proceeding to production. See rpk redpanda check for details on each validation.

Debug bundle

Verify that you can successfully generate and collect a debug bundle from your cluster. This proactive check ensures that if an issue occurs and you need to contact Redpanda support, you won’t face permission issues or silent collection failures that could delay troubleshooting.

Generate a debug bundle:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk debug bundle -o /tmp/bundle.zip

For additional options and arguments, see rpk debug bundle.

Output

Creating bundle file...
Collecting cluster info...
Collecting logs...
Collecting configuration...
Debug bundle saved to '/tmp/bundle.zip'

Debug bundles collect critical diagnostic information including cluster configuration and metadata, Redpanda logs from all brokers, system resource usage and performance metrics, and Kubernetes resource definitions.

When testing bundle generation, watch for permission errors preventing log collection, insufficient disk space for bundle creation, network policies blocking bundle transfer, or RBAC restrictions on accessing Pod logs or exec. Testing bundle generation early ensures this critical troubleshooting tool works when you need it most. Debug bundles are often required by Redpanda support to diagnose production issues efficiently.

Tiered Storage

Configure Tiered Storage for extended data retention using object storage. Tiered Storage automatically offloads older data to cloud storage (S3, GCS, Azure Blob), enabling extended retention without expanding local disk capacity.

Verify Tiered Storage configuration:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_enabled -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

true

Benefits of Tiered Storage

Reduced local storage costs from offloading cold data to cheaper object storage
Longer data retention periods without provisioning additional disk
Required for advanced features like Remote Read Replicas and Iceberg integration
Disaster recovery capabilities through cloud-backed data

To verify your Tiered Storage configuration:

Input

# Check bucket configuration
kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_bucket -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

# Check region/endpoint
kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_region -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Security scanning

Regularly scan container images and configurations for vulnerabilities to maintain security.

Container image scanning

Verify that container images are scanned before deployment:

Input

# Check current image in use
kubectl get statefulset redpanda -n <namespace> -o jsonpath='{.spec.template.spec.containers[?(@.name=="redpanda")].image}'

Output

docker.redpanda.com/redpandadata/redpanda:v24.2.4

Security scanning best practices

Security scanning best practices include:

Scan images using tools like Trivy, Snyk, or cloud-native scanners before deployment
Set up automated scanning in CI/CD pipelines
Monitor for CVE announcements and security advisories
Keep Redpanda and related components up-to-date with security patches (see Rolling Upgrades)
Review Kubernetes RBAC policies and ServiceAccount permissions (see Role Controller)

Configuration scanning

Input

# Scan Kubernetes manifests
kubectl get redpanda,statefulset,deployment -n <namespace> -o yaml > cluster-config.yaml
# Use kubesec, kube-bench, or similar tools to scan cluster-config.yaml

Establish a regular cadence for security scanning (for example, weekly or with each deployment).

Backup and recovery

Implement and test backup and recovery processes to ensure business continuity.

Backup strategy with Tiered Storage

Tiered Storage provides built-in backup capabilities by storing data in object storage. Verify Tiered Storage is configured:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_enabled -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Recovery testing

Regularly test recovery procedures to validate RTO/RPO targets:

Input

# Test topic restoration from Tiered Storage
kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk topic describe <topic-name> -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

For mission-critical workloads requiring active disaster recovery, consider implementing Shadowing to asynchronously replicate data to a standby cluster. Shadowing provides offset-preserving replication that maintains consumer positions, enabling faster recovery with lower RTO compared to restoration from backups. This Enterprise feature (available in Redpanda v25.3 or later) supports cross-region or cross-cloud disaster recovery with automatic failover capabilities.

Configure and validate Tiered Storage for automatic data backup to object storage. Document and regularly test recovery procedures for different failure scenarios in non-production environments. Establish clear Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets, and maintain runbooks for disaster recovery scenarios. For Shadowing deployments, use the Shadowing Failover Runbook as a starting point. Verify that IAM roles and permissions for object storage access are correctly configured and tested.

Audit logging

Enable and configure audit logging for compliance and security monitoring requirements.

Verify your audit log configuration:

Input

kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get audit_enabled -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism>

Output

true

Check to ensure you know where audit logs are being written:

Input

# Check audit log topic
kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk topic list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> | grep audit

Output

_redpanda.audit_log    1    3

The output values of 1 and 3 indicate the number of partitions and replicas, respectively, for the audit log topic.

For production environments with compliance requirements (SOC 2, HIPAA, PCI DSS, GDPR), forward audit logs to your SIEM system and configure retention policies according to your regulatory obligations. Ensure the audit log topic has adequate replication and retention settings.

Monitoring

Check that monitoring is configured with Prometheus and Grafana to scrape metrics from all Redpanda brokers.

Verify ServiceMonitor is configured:

Input

kubectl get servicemonitor -n <namespace>

System log retention

Check that Redpanda logs are being captured and stored for an appropriate period of time (minimally, seven days). Configure log forwarding using tools like Fluentd or your cloud provider’s logging solution to send logs to a central location for troubleshooting and compliance purposes.

Environment configuration

Check that you have a development or test environment configured to evaluate upgrades and configuration changes before applying them to production.

Upgrade policy

Check that you have an upgrade policy defined and implemented. Redpanda supports rolling upgrades, so upgrades do not require downtime. However, make sure that upgrades are scheduled on a regular basis, ideally using automation with Helm or GitOps workflows.

Advanced requirements

The Advanced requirements checklist ensures full enterprise readiness, indicates that your system is operating at the highest level of availability, and can prevent or recover from the most serious incidents. The Advanced requirements checklist confirms the following:

You are proactively monitoring mission-critical workloads.
You have business continuity solutions in place.
You have integrated into enterprise security and operational systems.
Your enterprise is ready to run mission-critical workloads.

Configure alerts

A standard set of alerts for Grafana or Prometheus is provided in the GitHub Redpanda observability repo. Customize these alerts for your specific needs.

Deployment automation

Review your deployment automation. Ensure that cluster configuration is managed using Helm or GitOps workflows, and that all configuration is saved in source control.

Monitor security settings

Regularly review your cluster’s security settings using the /v1/security/report Admin API endpoint. Investigate and address any issues identified in the alerts section.

Input

curl 'http://localhost:9644/v1/security/report'

View output

{
  "interfaces": {
    "kafka": [
      {
        "name": "test_kafka_listener",
        "host": "0.0.0.0",
        "port": 9092,
        "advertised_host": "0.0.0.0",
        "advertised_port": 9092,
        "tls_enabled": false,
        "mutual_tls_enabled": false,
        "authentication_method": "None",
        "authorization_enabled": false
      }
    ],
    "rpc": {
      "host": "0.0.0.0",
      "port": 33145,
      "advertised_host": "127.0.0.1",
      "advertised_port": 33145,
      "tls_enabled": false,
      "mutual_tls_enabled": false
    },
    "admin": [
      {
        "name": "test_admin_listener",
        "host": "0.0.0.0",
        "port": 9644,
        "tls_enabled": false,
        "mutual_tls_enabled": false,
        "authentication_methods": [],
        "authorization_enabled": false
      }
    ]
  },
  "alerts": [
    {
      "affected_interface": "kafka",
      "listener_name": "test_kafka_listener",
      "issue": "NO_TLS",
      "description": "\"kafka\" interface \"test_kafka_listener\" is not using TLS. This is insecure and not recommended."
    },
    {
      "affected_interface": "kafka",
      "listener_name": "test_kafka_listener",
      "issue": "NO_AUTHN",
      "description": "\"kafka\" interface \"test_kafka_listener\" is not using authentication. This is insecure and not recommended."
    },
    {
      "affected_interface": "kafka",
      "listener_name": "test_kafka_listener",
      "issue": "NO_AUTHZ",
      "description": "\"kafka\" interface \"test_kafka_listener\" is not using authorization. This is insecure and not recommended."
    },
    {
      "affected_interface": "rpc",
      "issue": "NO_TLS",
      "description": "\"rpc\" interface is not using TLS. This is insecure and not recommended."
    },
    {
      "affected_interface": "admin",
      "listener_name": "test_admin_listener",
      "issue": "NO_TLS",
      "description": "\"admin\" interface \"test_admin_listener\" is not using TLS. This is insecure and not recommended."
    },
    {
      "affected_interface": "admin",
      "listener_name": "test_admin_listener",
      "issue": "NO_AUTHZ",
      "description": "\"admin\" interface \"test_admin_listener\" is not using authorization. This is insecure and not recommended."
    },
    {
      "affected_interface": "admin",
      "listener_name": "test_admin_listener",
      "issue": "NO_AUTHN",
      "description": "\"admin\" interface \"test_admin_listener\" is not using authentication. This is insecure and not recommended."
    }
  ]
}

Suggested labs

Search all labs

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Production Readiness Checklist

Critical requirements

Redpanda license

Cluster health

Minimum broker count

Active broker membership

No brokers in maintenance mode

Consistent Redpanda version

Version pinning

Default topic replication factor

Existing topics replication factor

Persistent storage configuration

RAID/LVM stripe configuration (multiple disks only)

Storage performance requirements

CPU and memory resource limits

CPU to memory ratio

No fractional CPU requests

Authorization enabled

Production mode enabled

TLS enabled

Authentication enabled

Network security

Pod Disruption Budget

Rack awareness and topology spread

Operator CRDs (Operator deployments only)

Run Redpanda tuners

Recommended requirements

Deployment method

XFS filesystem

Pod anti-affinity

Node isolation

Partition balancing

Continuous Data Balancing

Core Balancing

System requirements

Debug bundle

Tiered Storage

Benefits of Tiered Storage

Security scanning

Container image scanning

Security scanning best practices

Configuration scanning

Backup and recovery

Backup strategy with Tiered Storage

Recovery testing

Audit logging

Monitoring

System log retention

Environment configuration

Upgrade policy

Advanced requirements

Configure alerts

Deployment automation

Monitor security settings

Suggested reading

Suggested labs

Simple online edits

Contribution guide