Docs Self-Managed Deploy Redpanda Kubernetes Production Readiness Checklist Production Readiness Checklist Page options Copy as Markdown Copied! View as plain text Ask AI about this topic Add MCP server to VS Code Before running a production workload on Redpanda in Kubernetes, follow this readiness checklist. By completing this checklist, you will be able to: Validate a Kubernetes-deployed Redpanda cluster against production readiness standards For Linux deployments, see the Production Readiness Checklist for Linux. Critical requirements The Critical requirements checklist helps ensure that: You have specified all required defaults and configuration items. You have the optimal hardware setup. You have enabled security. You are set up to run in production. Redpanda license If using Enterprise features, validate that you are using a valid Enterprise license: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster license info -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output LICENSE INFORMATION =================== Organization: Your Company Name Type: enterprise Expires: Dec 31 2026 Production deployments using Enterprise features (such as Tiered Storage, Schema Registry, or Continuous Data Balancing) must have a valid Enterprise license with a sufficient expiration date. See also: Redpanda Licensing SASL authentication flags The rpk commands throughout this checklist include SASL authentication flags (-X user, -X pass, -X sasl.mechanism). If your cluster does not use SASL authentication, you can omit these flags from all commands. For example: Input # With SASL authentication kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster health -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> # Without SASL authentication kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster health Common SASL mechanisms are SCRAM-SHA-256 or SCRAM-SHA-512. Update these values as needed for your deployment. Cluster health Check that all brokers are connected and running. Run rpk cluster health to check the health of the cluster. No nodes should be down, and there should be zero leaderless or under-replicated partitions. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster health -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output CLUSTER HEALTH OVERVIEW ======================= Healthy: true Unhealthy reasons: [] Controller ID: 0 All nodes: [0 1 2] Nodes down: [] Leaderless partitions (0): [] Under-replicated partitions (0): [] Minimum broker count You must have at least three brokers running to ensure production-level fault tolerance. Production clusters should have an odd number of brokers (3, 5, 7, etc.) for optimal consensus behavior. Verify the running broker count: Input kubectl get pods -n <namespace> -l app.kubernetes.io/component=redpanda-statefulset Output NAME READY STATUS RESTARTS AGE redpanda-0 2/2 Running 0 10d redpanda-1 2/2 Running 0 10d redpanda-2 2/2 Running 0 10d Verify the configured replica count in your deployment: Helm Operator Input helm get values redpanda -n <namespace> | grep -A 1 "statefulset:" Output statefulset: replicas: 3 Input kubectl get redpanda redpanda -n <namespace> -o jsonpath='{.spec.clusterSpec.statefulset.replicas}' Output 3 See also: Default Topic Replication Factor Active broker membership Verify that all brokers are in active state and not being decommissioned. Decommissioning is used to permanently remove a broker from the cluster, such as during node pool migrations or cluster downsizing. Brokers in a decommissioned state should not be present in production clusters unless actively performing a planned migration. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk redpanda admin brokers list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION 0 4 active true v24.2.4 1 4 active true v24.2.4 2 4 active true v24.2.4 All brokers must show active status. If any broker shows the status draining or decommissioned, investigate immediately. See also: Decommission Brokers No brokers in maintenance mode Check that no brokers are in maintenance mode during normal operations. Maintenance mode is used when modifying brokers that will remain as members of the cluster, such as during rolling upgrades or hardware maintenance. While necessary during planned maintenance windows, brokers should not remain in maintenance mode during normal operations. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster maintenance status -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output NODE-ID ENABLED FINISHED ERRORS PARTITIONS ELIGIBLE TRANSFERRING FAILED 0 false - - - - - - 1 false - - - - - - 2 false - - - - - - All brokers should show ENABLED: false. If any broker shows ENABLED: true outside of a planned maintenance window, investigate immediately. See also: Maintenance Mode Consistent Redpanda version Check that Redpanda is running the latest point release for the major version you’re on and that all brokers run the same version. Verify Redpanda broker versions: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk redpanda admin brokers list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output NODE-ID NUM-CORES MEMBERSHIP-STATUS IS-ALIVE BROKER-VERSION 0 4 active true v25.2.4 1 4 active true v25.2.4 2 4 active true v25.2.4 All brokers must show the same BROKER-VERSION. Version mismatches between brokers can cause compatibility issues and must be resolved before advancing to production. Verify Helm Chart or Operator version compatibility: For Kubernetes deployments, you must also verify that your deployment tool (Helm Chart or Operator) version is compatible with your Redpanda version. The Helm Chart or Operator version must be within one minor version of the Redpanda version. For example, if running Redpanda v25.2.x, the Helm Chart or Operator version must be v25.1.x, v25.2.x, or v25.3.x. Helm Operator Input helm list -n <namespace> Output NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION redpanda redpanda 1 2024-01-15 10:30:00.123456 -0800 PST deployed redpanda-5.2.4 v25.2.4 The CHART column shows the Helm Chart version (for example, redpanda-5.2.4), which should be compatible with the APP VERSION (Redpanda version). Input kubectl get deployment redpanda-controller-manager -n <namespace> -o jsonpath='{.spec.template.spec.containers[0].image}' Output docker.redpanda.com/redpandadata/redpanda-operator:v25.2.4 The Operator version is shown in the image tag (for example, v25.2.4), which should be compatible with your Redpanda broker version. You can also check the Operator version using: Input kubectl get redpanda redpanda -n <namespace> -o jsonpath='{.metadata.annotations.redpanda\.com/operator-version}' Version compatibility requirements: All Redpanda brokers must run the same version The Helm Chart or Operator version must be within ±1 minor version of Redpanda version Example: Redpanda v25.2.x requires Helm/Operator v25.1.x, v25.2.x, or v25.3.x Running incompatible versions can lead to deployment failures or cluster instability. Version pinning Verify that versions are explicitly pinned in your deployment configuration: Helm Operator image: tag: v24.2.4 # Pin specific Redpanda version console: enabled: true image: tag: v2.4.5 # Pin specific Console version connectors: enabled: true image: tag: v1.0.15 # Pin specific Connectors version Verify pinned versions: Input helm get values redpanda -n <namespace> Output image: tag: v24.2.4 console: image: tag: v2.4.5 connectors: image: tag: v1.0.15 apiVersion: cluster.redpanda.com/v1alpha2 kind: Redpanda metadata: name: redpanda spec: clusterSpec: image: tag: v24.2.4 # Pin specific Redpanda version console: enabled: true image: tag: v2.4.5 # Pin specific Console version connectors: enabled: true image: tag: v1.0.15 # Pin specific Connectors version Verify pinned versions: Input kubectl get redpanda redpanda -n <namespace> -o yaml | grep -A 1 "tag:" Pin specific versions for Redpanda and all related components (Console, Connectors). This ensures all environments (dev/staging/prod) run the same tested versions, allows controlled upgrade testing before production rollout, and provides rollback capability to known-good versions. Avoid using the latest tag, version ranges (for example, v24.2.x), or unspecified tags, as these can result in unexpected upgrades that introduce breaking changes or cause downtime. Default topic replication factor Check that the default replication factor (≥3) is set appropriately for production. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get default_topic_replications -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output 3 Setting default_topic_replications to 3 or greater ensures new topics are created with adequate fault tolerance. See also: Choose the Replication Factor Existing topics replication factor Check that all existing topics have adequate replication (default is 3). Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk topic list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output NAME PARTITIONS REPLICAS _schemas 1 3 orders 12 3 payments 8 3 user-events 16 3 All production topics should have REPLICAS of three or greater. Topics with single-digit replication are at risk of data loss if a broker fails. See also: Change Topic Replication Factor Persistent storage configuration Verify that you have configured persistent storage (not hostPath or emptyDir) for data persistence. Input kubectl get pvc -n <namespace> Output NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-redpanda-0 Bound pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890 100Gi RWO fast-ssd 10d datadir-redpanda-1 Bound pvc-b2c3d4e5-f6g7-8901-bcde-fg2345678901 100Gi RWO fast-ssd 10d datadir-redpanda-2 Bound pvc-c3d4e5f6-g7h8-9012-cdef-gh3456789012 100Gi RWO fast-ssd 10d Verify the StatefulSet uses PersistentVolumeClaims: Input kubectl describe statefulset -n <namespace> redpanda | grep -A 5 "Volume Claims" Output Volume Claims: Name: datadir StorageClass: fast-ssd Labels: <none> Annotations: <none> Capacity: 100Gi HostPath and emptyDir storage are not suitable for production as they lack durability guarantees. See also: Persistent Storage RAID/LVM stripe configuration (multiple disks only) If using multiple physical disks, verify they are configured to stripe data across the disks as RAID-0 or LVM stripe (not linear/concat). Striping distributes data across multiple disks in parallel for improved I/O performance. Input # Check block device configuration on nodes kubectl debug node/<node-name> -it -- chroot /host /bin/bash lsblk -o NAME,TYPE,SIZE,MOUNTPOINT,FSTYPE lvs -o lv_name,stripes,stripe_size mdadm --detail /dev/md* # if using software RAID Output # lsblk output NAME TYPE SIZE MOUNTPOINT FSTYPE nvme0n1 disk 1.8T nvme1n1 disk 1.8T vg0-data lvm 3.6T /var/lib/redpanda xfs # lvs output - note stripes > 1 indicates striping LV #Stripes StripeSize data 2 256.00k Output # mdadm output /dev/md0: Raid Level : raid0 Array Size : 3515625472 (3.27 TiB) Raid Devices : 2 Number Major Minor RaidDevice State 0 259 0 0 active sync /dev/nvme0n1 1 259 1 1 active sync /dev/nvme1n1 Using LVM linear/concat or JBOD instead of stripe/RAID-0 across multiple disks will severely degrade performance because data writes are serialized rather than parallelized. For optimal I/O throughput, configure multiple disks in a striped array that writes data across all disks simultaneously. Single disk configurations do not require striping. See also: Storage Storage performance requirements Ensure storage classes provide adequate IOPS and throughput for your workload by using the following specifications when selection a storage class: Performance specifications: Use NVMe-based storage classes for production deployments Specify a minimum 16,000 IOPS (Input/Output Operations Per Second) Consider provisioned IOPS where available to meet or exceed the minimum Enable write caching to help Redpanda perform better in environments with disks that don’t meet the recommended IOPS NFS (Network File System) is not supported Test storage performance under load Avoid cloud instance types that use multi-tenant or shared disks, as these can lead to unpredictable performance due to noisy neighbor effects. Examples of instances with shared/multi-tenant storage include AWS is4gen.xlarge and similar instance types across cloud providers. Instead, use instances with dedicated local NVMe storage or provisioned IOPS volumes that guarantee consistent performance. Multi-tenant disks can experience: Unpredictable latency spikes from other tenants' workloads Inconsistent throughput that varies based on neighbor activity IOPS throttling that impacts Redpanda’s performance Difficulty troubleshooting performance issues due to external factors See also: Storage requirements Cloud Instance Types CPU and memory resource limits Verify Pods have resource requests and limits configured. Input kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[?(@.name=="redpanda")].resources}' | jq Output { "limits": { "cpu": "4", "memory": "8Gi" }, "requests": { "cpu": "4", "memory": "8Gi" } } All Redpanda Pods must have: Identical CPU requests and limits (requests.cpu == limits.cpu) Identical memory requests and limits (requests.memory == limits.memory) Setting requests equal to limits ensures the Pod receives the Guaranteed QoS class, which prevents CPU throttling and reduces the risk of Pod eviction. See also: Manage Pod Resources CPU to memory ratio Ensure adequate memory allocation relative to CPU for optimal performance. Production deployments should provision at least 2 GiB of memory per CPU core. The ratio should be at least 1:2 (2 GiB per core). Verify the CPU to memory ratio in your configuration: Helm Operator Input helm get values redpanda -n <namespace> | grep -A 2 "resources:" Output resources: cpu: cores: 4 memory: container: min: 8Gi max: 8Gi Input kubectl get redpanda redpanda -n <namespace> -o jsonpath='{.spec.clusterSpec.resources}' | jq Output { "cpu": { "cores": 4 }, "memory": { "container": { "min": "8Gi", "max": "8Gi" } } } In the preceding examples, 4 CPU cores with 8 GiB memory provides a 1:2 ratio (2 GiB per core). See also: Memory No fractional CPU requests Ensure CPU requests use whole numbers for consistent performance. Fractional CPUs can lead to performance variability in production. Use whole integer values (4, 8, or 16 are acceptable, while 3.5 or 7.5 are not). Verify CPU configuration: Input kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[?(@.name=="redpanda")].resources.requests.cpu}' Output 4 Authorization enabled Verify Kafka authorization is enabled for access control. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get kafka_enable_authorization -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output true Without authorization enabled, any client can access Kafka APIs without authentication. See also: Authorization Production mode enabled Verify that developer mode and overprovisioned mode are disabled for production stability. Check developer mode: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- grep developer_mode /etc/redpanda/redpanda.yaml Output developer_mode: false Developer mode should never be enabled in production environments. Developer mode disables fsync and bypasses safety checks designed for production workloads. Check overprovisioned mode: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- grep overprovisioned /etc/redpanda/redpanda.yaml Output overprovisioned: false Overprovisioned mode bypasses critical resource checks and should never be enabled in production. This mode is intended only for development environments with constrained resources. Verify in Helm values that resources.cpu.overprovisioned is not explicitly set to true (it’s automatically calculated based on CPU allocation). TLS enabled Configure TLS encryption for all client and inter-broker communication. TLS prevents eavesdropping and man-in-the-middle attacks on network traffic. Verify TLS is enabled on all listeners: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config export -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> | grep -A 10 "kafka_api:" Output redpanda: kafka_api: - address: 0.0.0.0 port: 9093 name: internal authentication_method: sasl kafka_api_tls: - name: internal enabled: true cert_file: /etc/tls/certs/tls.crt key_file: /etc/tls/certs/tls.key Required TLS listeners include: kafka_api - Client connections to Kafka API admin_api - Administrative REST API access rpc_server - Inter-broker communication schema_registry - Schema Registry API (if used) Verify certificates are properly mounted: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- ls -la /etc/tls/certs/ Output total 16 -rw-r--r-- 1 redpanda redpanda 1234 Dec 15 10:00 ca.crt -rw-r--r-- 1 redpanda redpanda 1675 Dec 15 10:00 tls.crt -rw------- 1 redpanda redpanda 1704 Dec 15 10:00 tls.key See also: TLS Encryption Authentication enabled Configure appropriate authentication mechanisms to control access to Redpanda resources. Verify SASL users are configured: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk acl user list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output USERNAME admin app-producer app-consumer monitoring Be sure to adhere to the following authentication requirements: Set up SASL authentication for client connections Configure TLS certificates for encryption (see preceding TLS configuration guidance) Implement proper user management with principle of least privilege Configure ACLs (Access Control Lists) for resource authorization Verify ACLs are configured: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk acl list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output PRINCIPAL HOST RESOURCE-TYPE RESOURCE-NAME OPERATION PERMISSION User:app-producer * TOPIC orders.* WRITE ALLOW User:app-consumer * TOPIC orders.* READ ALLOW User:app-consumer * GROUP consumer-group-1 READ ALLOW See also: Authentication Authorization Network security Secure network access to the cluster using Kubernetes-native controls. Verify NetworkPolicies are configured: Input kubectl get networkpolicy -n <namespace> Output NAME POD-SELECTOR AGE redpanda-allow-internal app.kubernetes.io/name=redpanda 10d redpanda-allow-clients app.kubernetes.io/name=redpanda 10d redpanda-deny-all-ingress app.kubernetes.io/name=redpanda 10d Check NetworkPolicy rules: Input kubectl describe networkpolicy -n <namespace> Be sure to satisfy the following network security requirements: Configure NetworkPolicies to restrict pod-to-pod communication Use TLS for all client connections (see TLS configuration) Secure admin API endpoints with authentication and authorization Limit ingress traffic to only necessary ports and sources Use Kubernetes Services to control external access Verify services and exposed ports: Input kubectl get svc -n <namespace> Output NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) redpanda ClusterIP None <none> 9093/TCP,9644/TCP,8082/TCP redpanda-external LoadBalancer 10.100.200.50 <pending> 9093:30001/TCP See also: Listener Configuration Pod Disruption Budget Set up PDBs to control voluntary disruptions during maintenance. Input kubectl get pdb -n <namespace> Output NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE redpanda N/A 1 1 10d Production deployments must have a PodDisruptionBudget with maxUnavailable: 1 to prevent simultaneous broker disruptions during voluntary operations like node drains, upgrades, or autoscaler actions. See also: Kubernetes Pod Disruption Budgets Rack awareness and topology spread Configure topology spread constraints to distribute brokers across availability zones. For configuration instructions, see Multi-AZ deployment. Production deployments require each Redpanda broker to run in a different availability zone to ensure that a single zone failure does not cause loss of quorum. For a three-broker cluster, brokers must be distributed across three separate zones. To verify zone distribution, check your cluster configuration: Verify topologySpreadConstraints are configured in your Helm values or Redpanda CR Confirm nodes have zone labels (typically topology.kubernetes.io/zone) Check that brokers are scheduled on nodes in different zones See also: Rack Awareness Operator CRDs (Operator deployments only) If your deployment uses the Redpanda Operator, all required Custom Resource Definitions (CRDs) must be installed with compatible versions. Without correct CRDs, the Operator cannot manage the cluster, leading to configuration drift, failed updates, and potential data loss. The required CRDs are below: clusters.cluster.redpanda.com - Manages Redpanda cluster configuration topics.cluster.redpanda.com - Manages topic lifecycle users.cluster.redpanda.com - Manages SASL users schemas.cluster.redpanda.com - Manages Schema Registry schemas If any CRDs are missing or incompatible with your Operator version, the Operator will fail to reconcile resources. Verify all required CRDs are installed: Input kubectl get crd | grep redpanda.com Output clusters.cluster.redpanda.com topics.cluster.redpanda.com users.cluster.redpanda.com schemas.cluster.redpanda.com Run Redpanda tuners Check that you have configured tuners for optimal performance. Tuners can significantly impact latency and throughput. In Kubernetes, tuners are configured through the Helm chart or may need to be run on worker nodes themselves. For details, see Tune Kubernetes Worker Nodes for Production. Recommended requirements The Recommended requirements checklist ensures that you can monitor and support your environment on a sustained basis. It includes the following checks: You have adhered to day-2 operations best practices. You can diagnose and recover from backup issues or failures. You have configured monitoring, backup, and security scanning. Deployment method Verify that the deployment method (Helm or Operator) is correctly identified for your cluster. Understanding your deployment method is important for troubleshooting, upgrades, and configuration management. Helm Operator Input helm list -n <namespace> Output NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION redpanda redpanda 1 2024-01-15 10:30:00.123456 -0800 PST deployed redpanda-5.0.0 v24.1.1 The presence of a Helm release (CHART displays redpanda-5.0.0) indicates a Helm-managed deployment. Input kubectl get redpanda -n <namespace> Output NAME READY STATUS redpanda True Redpanda reconciliation succeeded The presence of a Redpanda custom resource indicates an Operator-managed deployment. Knowing your deployment method helps determine which configuration approach to use (Helm values vs. Redpanda CR), how to perform upgrades and rollbacks, where to find deployment logs and troubleshooting information, and which documentation sections apply to your environment. See Production Deployment Workflow for the complete deployment process. XFS filesystem Verify that data directories use XFS filesystem for optimal performance. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- df -khT /var/lib/redpanda/data Output Filesystem Type Size Used Avail Use% Mounted on /dev/nvme0n1 xfs 1.8T 14G 1.8T 1% /var/lib/redpanda/data XFS provides better performance characteristics for Redpanda workloads compared to ext4. While ext4 is supported, XFS is strongly recommended for production deployments. See also: Storage Requirements Pod anti-affinity Configure Pod anti-affinity to spread brokers across nodes. Input kubectl get statefulset redpanda -n <namespace> -o jsonpath='{.spec.template.spec.affinity}' | jq Output { "podAntiAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": [ { "labelSelector": { "matchLabels": { "app.kubernetes.io/name": "redpanda" } }, "topologyKey": "kubernetes.io/hostname" } ] } } This prevents single node failures from affecting multiple brokers by ensuring each Redpanda Pod runs on a different node. See also: Pod Anti-Affinity Node isolation Configure taints/tolerations or nodeSelector for workload isolation. Input kubectl get statefulset redpanda -n <namespace> -o jsonpath='{.spec.template.spec.nodeSelector}' | jq Output { "workload-type": "redpanda" } Isolating Redpanda workloads on dedicated nodes improves performance predictability by preventing resource contention with other applications. Partition balancing Configure automatic partition balancing across brokers and CPU cores. Continuous Data Balancing Continuous Data Balancing can help you manage production deployments by automatically rebalancing partition replicas across brokers based on disk usage and node changes. It also eliminates manual intervention and prevents performance degradation. You should enable Continuous Data Balancing for all licensed production clusters. Verify that Continuous Data Balancing is configured: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get partition_autobalancing_mode -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output continuous The continuous setting enables automatic partition rebalancing based on: Node additions or removals High disk usage conditions Broker availability changes Without Continuous Data Balancing, partition distribution becomes skewed over time, leading to hotspots and manual rebalancing operations. Core Balancing Intra-broker partition balancing distributes partition replicas across CPU cores within individual brokers. Check core balancing for CPU core partition distribution: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get core_balancing_on_core_count_change -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output true When enabled, Redpanda continuously rebalances partitions between CPU cores on a broker for optimal resource utilization, which is especially beneficial after broker restarts or configuration changes. System requirements Run system checks to get more details regarding your system configuration. Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk redpanda check -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output CONDITION REQUIRED CURRENT SEVERITY PASSED Data directory is writable true true Fatal true Free memory per CPU [MB] >= 2048 8192 Warning true NTP Synced true true Warning true Swappiness 1 1 Warning true Review any failed checks and remediate before proceeding to production. See rpk redpanda check for details on each validation. Debug bundle Verify that you can successfully generate and collect a debug bundle from your cluster. This proactive check ensures that if an issue occurs and you need to contact Redpanda support, you won’t face permission issues or silent collection failures that could delay troubleshooting. Generate a debug bundle: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk debug bundle -o /tmp/bundle.zip For additional options and arguments, see rpk debug bundle. Output Creating bundle file... Collecting cluster info... Collecting logs... Collecting configuration... Debug bundle saved to '/tmp/bundle.zip' Debug bundles collect critical diagnostic information including cluster configuration and metadata, Redpanda logs from all brokers, system resource usage and performance metrics, and Kubernetes resource definitions. When testing bundle generation, watch for permission errors preventing log collection, insufficient disk space for bundle creation, network policies blocking bundle transfer, or RBAC restrictions on accessing Pod logs or exec. Testing bundle generation early ensures this critical troubleshooting tool works when you need it most. Debug bundles are often required by Redpanda support to diagnose production issues efficiently. See also: Diagnostics Bundles in Kubernetes Tiered Storage Configure Tiered Storage for extended data retention using object storage. Tiered Storage automatically offloads older data to cloud storage (S3, GCS, Azure Blob), enabling extended retention without expanding local disk capacity. Verify Tiered Storage configuration: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_enabled -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output true Benefits of Tiered Storage Reduced local storage costs from offloading cold data to cheaper object storage Longer data retention periods without provisioning additional disk Required for advanced features like Remote Read Replicas and Iceberg integration Disaster recovery capabilities through cloud-backed data To verify your Tiered Storage configuration: Input # Check bucket configuration kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_bucket -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> # Check region/endpoint kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_region -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> See also: Tiered Storage Security scanning Regularly scan container images and configurations for vulnerabilities to maintain security. Container image scanning Verify that container images are scanned before deployment: Input # Check current image in use kubectl get statefulset redpanda -n <namespace> -o jsonpath='{.spec.template.spec.containers[?(@.name=="redpanda")].image}' Output docker.redpanda.com/redpandadata/redpanda:v24.2.4 Security scanning best practices Security scanning best practices include: Scan images using tools like Trivy, Snyk, or cloud-native scanners before deployment Set up automated scanning in CI/CD pipelines Monitor for CVE announcements and security advisories Keep Redpanda and related components up-to-date with security patches (see Rolling Upgrades) Review Kubernetes RBAC policies and ServiceAccount permissions (see Role Controller) Configuration scanning Input # Scan Kubernetes manifests kubectl get redpanda,statefulset,deployment -n <namespace> -o yaml > cluster-config.yaml # Use kubesec, kube-bench, or similar tools to scan cluster-config.yaml Establish a regular cadence for security scanning (for example, weekly or with each deployment). Backup and recovery Implement and test backup and recovery processes to ensure business continuity. Backup strategy with Tiered Storage Tiered Storage provides built-in backup capabilities by storing data in object storage. Verify Tiered Storage is configured: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get cloud_storage_enabled -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Recovery testing Regularly test recovery procedures to validate RTO/RPO targets: Input # Test topic restoration from Tiered Storage kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk topic describe <topic-name> -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> For mission-critical workloads requiring active disaster recovery, consider implementing Shadowing to asynchronously replicate data to a standby cluster. Shadowing provides offset-preserving replication that maintains consumer positions, enabling faster recovery with lower RTO compared to restoration from backups. This Enterprise feature (available in Redpanda v25.3 or later) supports cross-region or cross-cloud disaster recovery with automatic failover capabilities. Configure and validate Tiered Storage for automatic data backup to object storage. Document and regularly test recovery procedures for different failure scenarios in non-production environments. Establish clear Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets, and maintain runbooks for disaster recovery scenarios. For Shadowing deployments, use the Shadowing Failover Runbook as a starting point. Verify that IAM roles and permissions for object storage access are correctly configured and tested. See also: Whole Cluster Restore Configure Shadowing Shadowing Failover Runbook Audit logging Enable and configure audit logging for compliance and security monitoring requirements. Verify your audit log configuration: Input kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk cluster config get audit_enabled -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> Output true Check to ensure you know where audit logs are being written: Input # Check audit log topic kubectl exec -n <namespace> <pod-name> -c redpanda -- rpk topic list -X user=<sasl-username> -X pass=<sasl-password> -X sasl.mechanism=<sasl-mechanism> | grep audit Output _redpanda.audit_log 1 3 The output values of 1 and 3 indicate the number of partitions and replicas, respectively, for the audit log topic. For production environments with compliance requirements (SOC 2, HIPAA, PCI DSS, GDPR), forward audit logs to your SIEM system and configure retention policies according to your regulatory obligations. Ensure the audit log topic has adequate replication and retention settings. See also: Audit Logging Monitoring Check that monitoring is configured with Prometheus and Grafana to scrape metrics from all Redpanda brokers. Verify ServiceMonitor is configured: Input kubectl get servicemonitor -n <namespace> System log retention Check that Redpanda logs are being captured and stored for an appropriate period of time (minimally, seven days). Configure log forwarding using tools like Fluentd or your cloud provider’s logging solution to send logs to a central location for troubleshooting and compliance purposes. See also: Diagnostics Bundles in Kubernetes Environment configuration Check that you have a development or test environment configured to evaluate upgrades and configuration changes before applying them to production. Upgrade policy Check that you have an upgrade policy defined and implemented. Redpanda supports rolling upgrades, so upgrades do not require downtime. However, make sure that upgrades are scheduled on a regular basis, ideally using automation with Helm or GitOps workflows. Advanced requirements The Advanced requirements checklist ensures full enterprise readiness, indicates that your system is operating at the highest level of availability, and can prevent or recover from the most serious incidents. The Advanced requirements checklist confirms the following: You are proactively monitoring mission-critical workloads. You have business continuity solutions in place. You have integrated into enterprise security and operational systems. Your enterprise is ready to run mission-critical workloads. Configure alerts A standard set of alerts for Grafana or Prometheus is provided in the GitHub Redpanda observability repo. Customize these alerts for your specific needs. See also: Monitoring Metrics Deployment automation Review your deployment automation. Ensure that cluster configuration is managed using Helm or GitOps workflows, and that all configuration is saved in source control. Monitor security settings Regularly review your cluster’s security settings using the /v1/security/report Admin API endpoint. Investigate and address any issues identified in the alerts section. Input curl 'http://localhost:9644/v1/security/report' View output { "interfaces": { "kafka": [ { "name": "test_kafka_listener", "host": "0.0.0.0", "port": 9092, "advertised_host": "0.0.0.0", "advertised_port": 9092, "tls_enabled": false, "mutual_tls_enabled": false, "authentication_method": "None", "authorization_enabled": false } ], "rpc": { "host": "0.0.0.0", "port": 33145, "advertised_host": "127.0.0.1", "advertised_port": 33145, "tls_enabled": false, "mutual_tls_enabled": false }, "admin": [ { "name": "test_admin_listener", "host": "0.0.0.0", "port": 9644, "tls_enabled": false, "mutual_tls_enabled": false, "authentication_methods": [], "authorization_enabled": false } ] }, "alerts": [ { "affected_interface": "kafka", "listener_name": "test_kafka_listener", "issue": "NO_TLS", "description": "\"kafka\" interface \"test_kafka_listener\" is not using TLS. This is insecure and not recommended." }, { "affected_interface": "kafka", "listener_name": "test_kafka_listener", "issue": "NO_AUTHN", "description": "\"kafka\" interface \"test_kafka_listener\" is not using authentication. This is insecure and not recommended." }, { "affected_interface": "kafka", "listener_name": "test_kafka_listener", "issue": "NO_AUTHZ", "description": "\"kafka\" interface \"test_kafka_listener\" is not using authorization. This is insecure and not recommended." }, { "affected_interface": "rpc", "issue": "NO_TLS", "description": "\"rpc\" interface is not using TLS. This is insecure and not recommended." }, { "affected_interface": "admin", "listener_name": "test_admin_listener", "issue": "NO_TLS", "description": "\"admin\" interface \"test_admin_listener\" is not using TLS. This is insecure and not recommended." }, { "affected_interface": "admin", "listener_name": "test_admin_listener", "issue": "NO_AUTHZ", "description": "\"admin\" interface \"test_admin_listener\" is not using authorization. This is insecure and not recommended." }, { "affected_interface": "admin", "listener_name": "test_admin_listener", "issue": "NO_AUTHN", "description": "\"admin\" interface \"test_admin_listener\" is not using authentication. This is insecure and not recommended." } ] } Suggested reading Deploy for Production Customize the Helm Chart Suggested labs Disaster Recovery with Envoy and ShadowingRedpanda Iceberg Docker Compose ExampleStream Jira Issues to Redpanda for Real-Time MetricsMigrate Data with Redpanda MigratorStart a Single Redpanda Broker with Redpanda Console in DockerStart a Cluster of Redpanda Brokers with Redpanda Console in DockerSet Up GitOps for the Redpanda Helm ChartIceberg Streaming on Kubernetes with Redpanda, MinIO, and SparkSet Up MySQL CDC with Debezium and RedpandaSet Up Postgres CDC with Debezium and RedpandaSee moreSearch all labs Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution 🎉 Thanks for your feedback! Deploy Redpanda High Availability