Deploy Redpanda for Production in Kubernetes

This topic describes how to configure and deploy one or more Redpanda clusters and Redpanda Console in Kubernetes.

Prerequisites

Make sure that your Kubernetes cluster meets the requirements.

You must already have a ConfigMap that stores your io-config.yaml file. See Generate optimal I/O configuration settings.

Deploy a Redpanda cluster

To deploy Redpanda and Redpanda Console, you can use the following tools:

Redpanda Operator: The Redpanda Operator extends Kubernetes with custom resource definitions (CRDs), allowing you to define Redpanda clusters as native Kubernetes resources. The resource that the Redpanda Operator uses to represent a Redpanda cluster is the Redpanda resource.
Helm: Helm is a package manager for Kubernetes, which simplifies the process of defining, installing, and upgrading Kubernetes applications. Helm uses charts, a collection of files that describe a related set of Kubernetes resources, to deploy applications in a Kubernetes cluster.

For more details about the differences between these two methods, see Redpanda in Kubernetes.

Operator
Helm

The Redpanda Operator is namespace scoped. You must install the Redpanda Operator in the same namespace as your Redpanda resource (Redpanda cluster).

Make sure that you have permission to install custom resource definitions (CRDs):
```
kubectl auth can-i create CustomResourceDefinition --all-namespaces
```
You should see yes in the output.

You need these cluster-level permissions to install cert-manager and Redpanda Operator CRDs in the next steps.

Install cert-manager:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --set crds.enabled=true \
  --namespace cert-manager  \
  --create-namespace

The Redpanda Helm chart enables TLS by default and uses cert-manager to manage TLS certificates.

Install the Redpanda Operator CRDs:

kubectl kustomize "https://github.com/redpanda-data/redpanda-operator//operator/config/crd?ref=v2.4.5" \
    | kubectl apply --server-side -f -

Deploy the Redpanda Operator.

helm repo add redpanda https://charts.redpanda.com
helm upgrade --install redpanda-controller redpanda/operator \
  --namespace <namespace> \
  --create-namespace \
  --version v2.4.5 \ (1)

1	This flag specifies the exact version of the Redpanda Operator Helm chart to use for deployment. By setting this value, you pin the chart to a specific version, which prevents automatic updates that might introduce breaking changes or new features that have not been tested in your environment.

Ensure that the Deployment is successfully rolled out:

kubectl --namespace <namespace> rollout status --watch deployment/redpanda-controller-operator

deployment "redpanda-controller-operator" successfully rolled out

Install a Redpanda custom resource to deploy a Redpanda cluster and Redpanda Console.
redpanda-cluster.yaml
```
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  clusterSpec:
    #enterprise:
      #licenseSecretRef:
        #name: <secret-name>
        #key: <secret-key>
    statefulset:
      extraVolumes: |-
        - name: redpanda-io-config
          configMap:
            name: redpanda-io-config
      extraVolumeMounts: |-
        - name: redpanda-io-config
          mountPath: /etc/redpanda-io-config
      additionalRedpandaCmdFlags:
        - "--io-properties-file=/etc/redpanda-io-config/io-config.yaml"
```
- metadata.name: Name to assign the Redpanda cluster.
- spec.clusterSpec: This is where you can override default values in the Redpanda Helm chart. Here, you mount the I/O configuration file to the Pods that run Redpanda. For other configuration details, see Production considerations.
- spec.clusterSpec.enterprise: If you want to use enterprise features in Redpanda, uncomment this section and add the details of a Secret that stores your Enterprise Edition license key. For details, see Redpanda Licenses and Enterprise Features.
- spec.clusterSpec.statefulset: Here, you mount the I/O configuration file to the Pods that run Redpanda. For other configuration details, see Production considerations.
Apply the Redpanda resource:
```
kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
```
The Redpanda resource must be deployed in the same namespace as the Redpanda Operator. Each new deployment of Redpanda requires a separate namespace.
Wait for the Redpanda Operator to deploy Redpanda using the Helm chart:
```
kubectl get redpanda --namespace <namespace> --watch
```
```
NAME       READY   STATUS
redpanda   True    Redpanda reconciliation succeeded
```
This step may take a few minutes. You can watch for new Pods to make sure that the deployment is progressing:
```
kubectl get pod --namespace <namespace>
```
If it’s taking too long, see Troubleshooting.

Verify that each Redpanda broker is scheduled on only one Kubernetes node:

kubectl get pod --namespace <namespace>  \
  -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name -l \
  app.kubernetes.io/component=redpanda-statefulset

Expected output:

example-worker3   redpanda-0
example-worker2   redpanda-1
example-worker    redpanda-2

Install cert-manager using Helm:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --set crds.enabled=true \
  --namespace cert-manager  \
  --create-namespace

The Redpanda Helm chart enables TLS by default and uses cert-manager to manage TLS certificates.

Override the default values to mount your I/O configuration file onto each Pod that runs Redpanda.
redpanda-values.yaml
```
statefulset:
  extraVolumes: |-
    - name: redpanda-io-config
      configMap:
        name: redpanda-io-config
  extraVolumeMounts: |-
    - name: redpanda-io-config
      mountPath: /etc/redpanda-io-config
  additionalRedpandaCmdFlags:
    - "--io-properties-file=/etc/redpanda-io-config/io-config.yaml"
```
Redpanda reads from this file at startup to optimize itself for the given I/O parameters.

If you want to use enterprise features in Redpanda, add the details of a Secret that stores your Enterprise Edition license key.
redpanda-values.yaml
```
enterprise:
  licenseSecretRef:
    name: <secret-name>
    key: <secret-key>
```
For details, see Add an Enterprise Edition License to Redpanda in Kubernetes.

Install the Redpanda Helm chart to deploy a Redpanda cluster and Redpanda Console.

helm repo add redpanda https://charts.redpanda.com
helm repo update
helm install redpanda redpanda/redpanda \
  --version 5.10.4 \ (1)
  --namespace <namespace> \ (2)
  --create-namespace \
  --values redpanda-values.yaml

1	This flag specifies the exact version of the Redpanda Helm chart to use for deployment. By setting this value, you pin the chart to a specific version, which prevents automatic updates that might introduce breaking changes or new features that have not been tested in your environment.
2	Each deployment of the Redpanda Helm chart requires a separate namespace. Ensure you choose a unique namespace for each deployment.

Wait for the Redpanda cluster to be ready:

kubectl --namespace <namespace> rollout status statefulset redpanda --watch

When the Redpanda cluster is ready, the output should look similar to the following:

statefulset rolling update complete 3 pods at revision redpanda-8654f645b4...

Verify that each Redpanda broker is scheduled on only one Kubernetes node:

kubectl get pod --namespace <namespace> \
-o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name -l \
app.kubernetes.io/component=redpanda-statefulset

Expected output:

example-worker3   redpanda-0
example-worker2   redpanda-1
example-worker    redpanda-2

Deploy multiple Redpanda clusters

You can deploy more than one Redpanda cluster in the same Kubernetes cluster by using a different namespace and unique node ports.

Operator
Helm

Install another instance of the Redpanda Operator in a different namespace to your existing ones. This Redpanda Operator will manage Redpanda clusters only in its namespace.

helm repo add redpanda https://charts.redpanda.com
helm repo update
helm upgrade --install redpanda-controller redpanda/operator \
  --namespace <another-namespace> \
  --set image.tag=v2.4.5 \
  --create-namespace

Apply a Redpanda resource in the same namespace as your new Redpanda Operator to deploy your new Redpanda cluster.

Make sure to use unique node ports for the listeners in your Redpanda resource so that they don’t conflict with any existing node ports in your other Redpanda clusters. See External access.

redpanda-cluster-two.yaml

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda-two
spec:
  clusterSpec:
    listeners:
      kafka:
        external:
          default:
            advertisedPorts: [31093]
      admin:
        external:
          default:
            advertisedPorts: [31645]
      http:
        external:
          default:
            advertisedPorts: [30083]
      rpc:
        port: 33146
      schemaRegistry:
        external:
          default:
            advertisedPorts: [30084]

Install the Redpanda Helm chart in a different namespace to your existing Redpanda clusters.

Make sure to use unique node ports for the listeners in your Redpanda resource so that they don’t conflict with any existing node ports in your other Redpanda clusters. See External access.

helm repo add redpanda https://charts.redpanda.com
helm repo update
helm install redpanda-two redpanda/redpanda \
  --version 5.10.4 \
  --namespace <anothernamespace> \
  --set listeners.kafka.external.default.advertisedPorts[0]=31093 \
  --set listeners.admin.external.default.advertisedPorts[0]=31645 \
  --set listeners.http.external.default.advertisedPorts[0]=30083 \
  --set listeners.rpc.port=33146 \
  --set listeners.schemaRegistry.external.default.advertisedPorts[0]=30084
  --create-namespace

Production considerations

This section provides advice for configuring the Redpanda in Kubernetes for production.

If you’re using the Redpanda Operator, see: cluster.redpanda.com/v1alpha2 for all available settings.

If you’re using the Redpanda Helm chart, see: Redpanda Helm Chart Specification for all available settings.

Version pinning (Helm)

If you use the Redpanda Helm chart to deploy Redpanda, it’s important to pin the version of the Helm chart to ensure that you have control over the version of Redpanda that you deploy.

The Redpanda Helm chart version is independent of the Redpanda application version. The Redpanda application version can change even in patch releases of the Helm chart. This means that updates to the chart may roll out new versions of Redpanda.

To avoid unexpected changes to your deployments, pin the version of the Helm chart. Pinning refers to the practice of specifying an exact version to use during deployment, rather than using the latest or unspecified version. When you pin the Helm chart version, you maintain consistent, predictable environments, especially in production. Using a specific version helps to:

Ensure compatibility: Guarantee that the deployed application behaves as tested, regardless of new chart versions being released.
Avoid unexpected updates: Prevent automatic updates that may introduce changes incompatible with the current deployment or operational practices.

helm repo add redpanda https://charts.redpanda.com
helm repo update
helm install redpanda redpanda/redpanda \
  --version 5.10.4 \
  --namespace <namespace> \
  --create-namespace

Review the release notes to understand any significant changes, bug fixes, or potential disruptions that could affect your existing deployment.

Name overrides (Helm)

Deploying multiple instances of the same Helm chart in a Kubernetes cluster can lead to naming conflicts. Using nameOverride and fullnameOverride helps differentiate between them. If you have a production and staging environment for Redpanda, different names help to avoid confusion.

Use nameOverride to customize the labels app.kubernetes.io/component=<nameOverride>-statefulset and app.kubernetes.io/name=<nameOverride>.
Use fullnameOverride to customize the name of the StatefulSet and Services.

nameOverride: 'redpanda-production'
fullnameOverride: 'redpanda-instance-prod'

Labels

Kubernetes labels help you to organize, query, and manage your resources. Use labels to categorize Kubernetes resources in different deployments by environment, purpose, or team.

commonLabels:
  env: 'production'

Tolerations

Tolerations and taints allow Pods to be scheduled onto nodes where they otherwise wouldn’t. If you have nodes dedicated to Redpanda with a taint dedicated=redpanda:NoSchedule, the following toleration allows the Redpanda brokers to be scheduled on them.

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "redpanda"
  effect: "NoSchedule"

Docker image

You can specify the image tag to deploy a known version of the Redpanda Docker image. By default, the image tag is set in Chart.appVersion. Avoid using the latest tag, which can lead to unexpected changes.

If you’re using a private repository, always ensure your nodes have the necessary credentials to pull the image.

image:
  repository: docker.redpanda.com/redpandadata/redpanda
  tag: "v25.1.8"
imagePullSecrets: []

Number of Redpanda brokers

The number of Redpanda brokers you deploy depends on your use case and the level of redundancy you require. For production, deploy at least three Redpanda brokers. Always deploy an odd number of brokers to avoid split-brain scenarios.

statefulset:
  replicas: 3

You must provision one dedicated worker node for each Redpanda broker that you plan to deploy in your Redpanda cluster. The default podAntiAffinity rules make sure that each Redpanda broker runs on its own worker node.

See also:

TLS

By default, TLS (Transport Layer Security) is enabled for encrypted communication. Internal (default) and external (external) self-signed certificates are generated using cert-manager. See TLS Certificates.

tls:
  enabled: true
  certs:
    # This key represents the name of the certificate.
    default:
      caEnabled: true
    # This key represents the name of the certificate.
    external:
      caEnabled: true

Authentication

If you want to authenticate clients connections to the Redpanda cluster, you can enable SASL authentication.

auth:
  sasl:
    enabled: true
    mechanism: "SCRAM-SHA-512"
    secretRef: "sasl-password-secret"
    users: []

Resources

By default, the resources allocated to Redpanda are for a development environment. In a production cluster, the resources you allocate should be proportionate to your machine type. You should determine and set these values before deploying the cluster.

resources:
  cpu:
    cores: 4
  memory:
    enable_memory_locking: true
    container:
      max: 10Gi

See also:

Storage

In production, it’s best to use local PersistentVolumes (PVs) that are backed by NVMe devices to store the Redpanda data directory. NVMe devices outperform traditional SSDs or HDDs.

Redpanda Data recommends creating StorageClasses that use the local volume manager (LVM) CSI driver to automatically provision PVs. The LVM allows you to group physical storage devices into a logical volume group. Allocating logical volumes from a logical volume group provides greater flexibility in terms of storage expansion and management. The LVM supports features such as resizing, snapshots, and striping, which are not available with the other drivers such as the local volume static provisioner.

storage:
  persistentVolume:
    enabled: true
    size: 100Gi
    storageClass: csi-driver-lvm-striped-xfs

For an example of configuring local PersistentVolumes backed by NVMe disks, see one of the following guides:

See also:

External access

To make the Redpanda cluster accessible from outside the Kubernetes cluster, you can use NodePort or LoadBalancer Services.

The default NodePort Service provides the lowest latency of all the Kubernetes Services because it does not include any unnecessary routing or middleware. Client connections go to the Redpanda brokers in the most direct way possible, through the worker nodes.

By default, the fully qualified domain names (FQDNs) that brokers advertise are their internal addresses within the Kubernetes cluster, which are not reachable from outside the cluster. To make the cluster accessible from outside, each broker must advertise a domain that can be reached from outside the cluster.

external:
  enabled: true
  type: NodePort

See also:

ExternalDNS

You should use ExternalDNS to manage DNS records for your Pods' domains. ExternalDNS synchronizes exposed Kubernetes Services with various DNS providers, rendering Kubernetes resources accessible through DNS servers.

Benefits of ExternalDNS include:

Automation: ExternalDNS automatically configures public DNS records when you create, update, or delete Kubernetes Services or Ingresses. This eliminates the need for manual DNS configuration, which can be error-prone.
Compatibility: ExternalDNS is compatible with a wide range of DNS providers, including major cloud providers such as AWS, Google Cloud, and Azure, and DNS servers like CoreDNS and PowerDNS.
Integration with other tools: ExternalDNS can be used with other Kubernetes tools, such as ingress controllers or cert-manager for managing TLS certificates.

external:
  enabled: true
  type: LoadBalancer
  externalDns:
    enabled: true

See also:

Logging

By default, the log-level is set to info. In production, use the info logging level to avoid overwhelming the storage. For debugging purposes, temporarily change the logging level to debug.

logging:
  level: "info"

Monitoring

By default, monitoring is disabled. If you have the Prometheus Operator, enable monitoring to deploy a ServiceMonitor resource for Redpanda. Observability is essential in production environments.

monitoring:
  enabled: true

StatefulSet update strategy

For smooth and uninterrupted updates, use the default RollingUpdate strategy. Additionally, set a PodDisruptionBudget to ensure that at least one Pod is available during updates.

statefulset:
  updateStrategy:
    type: "RollingUpdate"
  budget:
    maxUnavailable: 1

Affinity rules

By default, podAntiAffinity rules stop the Kubernetes scheduler from placing multiple Redpanda brokers on the same node. These rules offer two benefits:

Minimize the risk of data loss by ensuring that a node’s failure results in the loss of only one Redpanda broker.
Prevent resource contention between brokers by ensuring they are never co-located on the same node.

Affinities control Pod placement in the cluster based on various conditions. Set these according to your high availability and infrastructure needs. For example, this is a soft rule that tries to ensure the Kubernetes scheduler doesn’t place two Pods with the same app: redpanda label in the same zone. However, if it’s not possible, the scheduler can still place the Pods in the same zone.

statefulset:
  podAntiAffinity:
    topologyKey: kubernetes.io/hostname
    type: hard
    weight: 100
    custom:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: "app"
              operator: "In"
              values:
              - "redpanda"
          topologyKey: "kubernetes.io/zone"

Graceful shutdown

By default, Pods are given 90 seconds to shut down gracefully. If your brokers require additional time for a graceful shutdown, modify the terminationGracePeriodSeconds.

statefulset:
  terminationGracePeriodSeconds: 100

Service account

Restricting permissions is a best practice. Create a dedicated ServiceAccount for each Pod. To assign roles to this ServiceAccount, see Role-based access control (RBAC).

serviceAccount:
  create: true
  name: "redpanda-service-account"

Role-based access control (RBAC)

RBAC is a method for providing permissions to ServiceAccounts based on roles. Some features such as rack awareness require both a ServiceAccount and RBAC to access resources using the Kubernetes API.

rbac:
  enabled: true
  annotations: {}

If you use the Redpanda Operator, you must also deploy the Redpanda Operator Helm chart with rbac.createRPKBundleCRs set to true to give it the required roles.

Perform a self test

To understand the performance capabilities of your Redpanda cluster, Redpanda offers built-in self-test features that evaluate the performance of both disk and network operations.

For more information, see Disk and network self-test benchmarks.

Explore the default Kubernetes components

By default, the Redpanda Helm chart deploys the following Kubernetes components:

A StatefulSet with three Pods.
One PersistentVolumeClaim for each Pod, each with a capacity of 20Gi.
A headless ClusterIP Service and a NodePort Service for each Kubernetes node that runs a Redpanda broker.
Self-Signed TLS Certificates.

StatefulSet

Redpanda is a stateful application. Each Redpanda broker needs to store its own state (topic partitions) in its own storage volume. As a result, the Helm chart deploys a StatefulSet to manage the Pods in which the Redpanda brokers are running.

kubectl get statefulset --namespace <namespace>

Example output:

NAME       READY   AGE
redpanda   3/3     3m11s

StatefulSets ensure that the state associated with a particular Pod replica is always the same, no matter how often the Pod is recreated. Each Pod is also given a unique ordinal number in its name such as redpanda-0. A Pod with a particular ordinal number is always associated with a PersistentVolumeClaim with the same number. When a Pod in the StatefulSet is deleted and recreated, it is given the same ordinal number and so it mounts the same storage volume as the deleted Pod that it replaced.

kubectl get pod --namespace <namespace>

Expected output:

NAME                              READY   STATUS      RESTARTS        AGE
redpanda-0                        1/1     Running     0               6m9s
redpanda-1                        1/1     Running     0               6m9s
redpanda-2                        1/1     Running     0               6m9s
redpanda-console-5ff45cdb9b-6z2vs 1/1     Running     0               5m
redpanda-configuration-smqv7      0/1     Completed   0               6m9s

The redpanda-configuration job updates the Redpanda runtime configuration.

PersistentVolumeClaim

Redpanda brokers must be able to store their data on disk. By default, the Helm chart uses the default StorageClass in the Kubernetes cluster to create a PersistentVolumeClaim for each Pod. The default StorageClass in your Kubernetes cluster depends on the Kubernetes platform that you are using.

kubectl get persistentvolumeclaims --namespace <namespace>

Expected output:

NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-redpanda-0   Bound    pvc-3311ade3-de84-4027-80c6-3d8347302962   20Gi       RWO            standard       75s
datadir-redpanda-1   Bound    pvc-4ea8bc03-89a6-41e4-b985-99f074995f08   20Gi       RWO            standard       75s
datadir-redpanda-2   Bound    pvc-45c3555f-43bc-48c2-b209-c284c8091c45   20Gi       RWO            standard       75s

Service

The clients writing to or reading from a given partition have to connect directly to the leader broker that hosts the partition. As a result, clients need to be able to connect directly to each Pod. To allow internal and external clients to connect to each Pod that hosts a Redpanda broker, the Helm chart configures two Services:

Internal using the Headless ClusterIP
External using the NodePort

kubectl get service --namespace <namespace>

Expected output:

NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                       AGE
redpanda            ClusterIP   None            <none>        <none>                                                        5m37s
redpanda-console    ClusterIP   10.0.251.204    <none>        8080                                                          5m
redpanda-external   NodePort    10.96.137.220   <none>        9644:31644/TCP,9094:31092/TCP,8083:30082/TCP,8080:30081/TCP   5m37s

Headless ClusterIP Service

The headless Service associated with a StatefulSet gives the Pods their network identity in the form of a fully qualified domain name (FQDN). Both Redpanda brokers in the same Redpanda cluster and clients within the same Kubernetes cluster use this FQDN to communicate with each other.

An important requirement of distributed applications such as Redpanda is peer discovery: The ability for each broker to find other brokers in the same cluster. When each Pod is rolled out, its seed_servers field is updated with the FQDN of each Pod in the cluster so that they can discover each other.

kubectl --namespace <namespace> exec redpanda-0 -c redpanda -- cat etc/redpanda/redpanda.yaml

redpanda:
  data_directory: /var/lib/redpanda/data
  empty_seed_starts_cluster: false
  seed_servers:
  - host:
      address: redpanda-0.redpanda.<namespace>.svc.cluster.local.
      port: 33145
  - host:
      address: redpanda-1.redpanda.<namespace>.svc.cluster.local.
      port: 33145
  - host:
      address: redpanda-2.redpanda.<namespace>.svc.cluster.local.
      port: 33145

NodePort Service

External access is made available by a NodePort service that opens the following ports by default:

Listener	Node Port	Container Port
Schema Registry	30081	8081
HTTP Proxy	30082	8083
Kafka API	31092	9094
Admin API	31644	9644

To learn more, see Networking and Connectivity in Kubernetes.

TLS Certificates

By default, TLS is enabled in the Redpanda Helm chart. The Helm chart uses cert-manager to generate four Certificate resources that provide Redpanda with self-signed certificates for internal and external connections.

Having separate certificates for internal and external connections provides security isolation. If an external certificate or its corresponding private key is compromised, it doesn’t affect the security of internal communications.

kubectl get certificate --namespace <namespace>

NAME                                 READY
redpanda-default-cert                True
redpanda-default-root-certificate    True
redpanda-external-cert               True
redpanda-external-root-certificate   True

redpanda-default-cert: Self-signed certificate for internal communications.
redpanda-default-root-certificate: Root certificate authority for the internal certificate.
redpanda-external-cert: Self-signed certificate for external communications.
redpanda-external-root-certificate: Root certificate authority for the external certificate.

By default, all listeners are configured with the same certificate. To configure separate TLS certificates for different listeners, see TLS for Redpanda in Kubernetes.

The Redpanda Helm chart provides self-signed certificates for convenience. In a production environment, it’s best to use certificates from a trusted Certificate Authority (CA) or integrate with your existing CA infrastructure.

Uninstall Redpanda

When you’ve finished testing Redpanda, you can uninstall it from your cluster and delete any Kubernetes resources that the Helm chart created.

Operator
Helm

kubectl delete -f redpanda-cluster.yaml --namespace <namespace>
helm uninstall redpanda-controller --namespace <namespace>
kubectl delete pod --all --namespace <namespace>
kubectl delete pvc --all --namespace <namespace>
kubectl delete secret --all --namespace <namespace>

helm uninstall redpanda --namespace <namespace>
kubectl delete pod --all --namespace <namespace>
kubectl delete pvc --all --namespace <namespace>
kubectl delete secret --all --namespace <namespace>

To remove the internal-rpk alias:

unalias internal-rpk

Troubleshoot

Before troubleshooting your cluster, make sure that you have all the prerequisites.

HelmRelease is not ready

If you are using the Redpanda Operator, you may see the following message while waiting for a Redpanda custom resource to be deployed:

NAME       READY   STATUS
redpanda   False   HelmRepository 'redpanda/redpanda-repository' is not ready
redpanda   False   HelmRelease 'redpanda/redpanda' is not ready

While the deployment process can sometimes take a few minutes, a prolonged 'not ready' status may indicate an issue. Follow the steps below to investigate:

Check the status of the HelmRelease:

kubectl describe helmrelease <redpanda-resource-name> --namespace <namespace>

Review the Redpanda Operator logs:

kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace>

HelmRelease retries exhausted

If you are running the operator in Flux-managed mode (chartRef.useFlux: true), the HelmRelease retries exhausted error may occur when the Helm Controller has tried to reconcile the HelmRelease a number of times, but these attempts have failed consistently.

The Helm Controller watches for changes in HelmRelease objects. When changes are detected, it tries to reconcile the state defined in the HelmRelease with the state in the cluster. The process of reconciliation includes installation, upgrade, testing, rollback or uninstallation of Helm releases.

You may see this error due to:

Incorrect configuration in the HelmRelease.
Issues with the chart, such as a non-existent chart version or the chart repository not being accessible.
Missing dependencies or prerequisites required by the chart.
Issues with the underlying Kubernetes cluster, such as insufficient resources or connectivity issues.

To debug this error do the following:

Check the status of the HelmRelease:

kubectl describe helmrelease <cluster-name> --namespace <namespace>

Review the Redpanda Operator logs:

kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace>

When you find and fix the error, you must use the Flux CLI, fluxctl, to suspend and resume the reconciliation process:

Install Flux CLI.

Suspend the HelmRelease:

flux suspend helmrelease <cluster-name> --namespace <namespace>

Resume the HelmRelease:

flux resume helmrelease <cluster-name> --namespace <namespace>

StatefulSet never rolls out

If the StatefulSet Pods remain in a pending state, they are waiting for resources to become available.

To identify the Pods that are pending, use the following command:

kubectl get pod --namespace <namespace>

The response includes a list of Pods in the StatefulSet and their status.

To view logs for a specific Pod, use the following command.

kubectl logs -f <pod-name> --namespace <namespace>

You can use the output to debug your deployment.

Didn’t match pod anti-affinity rules

If you see this error, your cluster does not have enough nodes to satisfy the anti-affinity rules:

Warning  FailedScheduling  18m  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

The Helm chart configures default podAntiAffinity rules to make sure that only one Pod running a Redpanda broker is scheduled on each worker node. To learn why, see Number of workers.

To resolve this issue, do one of the following:

Create additional worker nodes.

Modify the anti-affinity rules (for development purposes only).

If adding nodes is not an option, you can modify the podAntiAffinity rules in your StatefulSet to be less strict.

Operator
Helm

redpanda-cluster.yaml

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    statefulset:
      podAntiAffinity:
        type: soft

kubectl apply -f redpanda-cluster.yaml --namespace <namespace>

--values
--set

docker-repo.yaml

statefulset:
  podAntiAffinity:
    type: soft

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --values docker-repo.yaml --reuse-values

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set statefulset.podAntiAffinity.type=soft

Unable to mount volume

If you see volume mounting errors in the Pod events or in the Redpanda logs, ensure that each of your Pods has a volume available in which to store data.

If you’re using StorageClasses with dynamic provisioners (default), ensure they exist:
```
kubectl get storageclass
```
If you’re using PersistentVolumes, ensure that you have one PersistentVolume available for each Redpanda broker, and that each one has the storage capacity that’s set in storage.persistentVolume.size:
```
kubectl get persistentvolume --namespace <namespace>
```

To learn how to configure different storage volumes, see Configure Storage.

Failed to pull image

When deploying the Redpanda Helm chart, you may encounter Docker rate limit issues because the default registry URL is not recognized as a Docker Hub URL. The domain docker.redpanda.com is used for statistical purposes, such as tracking the number of downloads. It mirrors Docker Hub’s content while providing specific analytics for Redpanda.

Failed to pull image "docker.redpanda.com/redpandadata/redpanda:v<version>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.redpanda.com/redpandadata/redpanda:v<version>": failed to copy: httpReadSeeker: failed open: unexpected status code 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

To fix this error, do one of the following:

Replace the image.repository value in the Helm chart with docker.io/redpandadata/redpanda. Switching to Docker Hub avoids the rate limit issues associated with docker.redpanda.com.

Operator
Helm

redpanda-cluster.yaml

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    image:
      repository: docker.io/redpandadata/redpanda

kubectl apply -f redpanda-cluster.yaml --namespace <namespace>

--values
--set

docker-repo.yaml

image:
  repository: docker.io/redpandadata/redpanda

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --values docker-repo.yaml --reuse-values

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set image.repository=docker.io/redpandadata/redpanda

Authenticate to Docker Hub by logging in with your Docker Hub credentials. The docker.redpanda.com site acts as a reflector for Docker Hub. As a result, when you log in with your Docker Hub credentials, you will bypass the rate limit issues.

Dig not defined

This error means that you are using an unsupported version of Helm:

Error: parse error at (redpanda/templates/statefulset.yaml:203): function "dig" not defined

To fix this error, ensure that you are using the minimum required version: 3.10.0.

helm version

Repository name already exists

If you see this error, remove the redpanda chart repository, then try installing it again.

helm repo remove redpanda
helm repo add redpanda https://charts.redpanda.com
helm repo update

redpanda-rpk-debug-bundle is forbidden

If you see this error, your Redpanda Operator’s RBAC settings are out of sync with the Pod-level RBAC in the Redpanda resource:

… forbidden: user "…-operator" … attempting to grant RBAC permissions not currently held …

To fix this error, make sure you haven’t disabled rbac.createRPKBundleCRs in the Redpanda Operator chart while still leaving spec.clusterSpec.rbac.rpkDebugBundle enabled in your Redpanda resource. Either enable both or disable both.

Fatal error during checker "Data directory is writable" execution

This error appears when Redpanda does not have write access to your configured storage volume under storage in the Helm chart.

Error: fatal error during checker "Data directory is writable" execution: open /var/lib/redpanda/data/test_file: permission denied

To fix this error, set statefulset.initContainers.setDataDirOwnership.enabled to true so that the initContainer can set the correct permissions on the data directories.

Cannot patch "redpanda" with kind StatefulSet

This error appears when you run helm upgrade with the --values flag but do not include all your previous overrides.

Error: UPGRADE FAILED: cannot patch "redpanda" with kind StatefulSet: StatefulSet.apps "redpanda" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

To fix this error, do one of the following:

Include all the value overrides from the previous installation or upgrade using either the --set or the --values flags.
Use the --reuse-values flag.

Do not use the --reuse-values flag to upgrade from one version of the Helm chart to another. This flag stops Helm from using any new values in the upgraded chart.

Cannot patch "redpanda-console" with kind Deployment

This error appears if you try to upgrade your deployment and you already have console.enabled set to true.

Error: UPGRADE FAILED: cannot patch "redpanda-console" with kind Deployment: Deployment.apps "redpanda-console" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"redpanda", "app.kubernetes.io/name":"console"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

To fix this error, set console.enabled to false so that Helm doesn’t try to deploy Redpanda Console again.

Helm is in a pending-rollback state

An interrupted Helm upgrade process can leave your Helm release in a pending-rollback state. This state prevents further actions like upgrades, rollbacks, or deletions through standard Helm commands. To fix this:

Identify the Helm release that’s in a pending-rollback state:
```
helm list --namespace <namespace> --all
```
Look for releases with a status of pending-rollback. These are the ones that need intervention.
Verify the Secret’s status to avoid affecting the wrong resource:
```
kubectl --namespace <namespace> get secret --show-labels
```
Identify the Secret associated with your Helm release by its pending-rollback status in the labels.

Ensure you have correctly identified the Secret to avoid unintended consequences. Deleting the wrong Secret could impact other deployments or services.

Delete the Secret to clear the pending-rollback state:

kubectl --namespace <namespace> delete secret -l status=pending-rollback

After clearing the pending-rollback state:

Retry the upgrade: Restart the upgrade process. You should investigate the initial failure to avoid getting into the pending-rollback state again.
Perform a rollback: If you need to roll back to a previous release, use helm rollback <release-name> <revision> to revert to a specific, stable release version.

Crash loop backoffs

If a broker crashes after startup, or gets stuck in a crash loop, it can accumulate an increasing amount of stored state. This accumulated state not only consumes additional disk space but also prolongs the time required for each subsequent restart to process it.

To prevent infinite crash loops, the Redpanda Helm chart sets the crash_loop_limit broker configuration property to 5. The crash loop limit is the number of consecutive crashes that can happen within one hour of each other. By default, the broker terminates immediately after hitting the crash_loop_limit. The Pod running Redpanda remains in a CrashLoopBackoff state until its internal consecutive crash counter is reset to zero.

To facilitate debugging in environments where a broker is stuck in a crash loop, you can also set the crash_loop_sleep_sec broker configuration property. This setting determines how long the broker sleeps before terminating the process after reaching the crash loop limit. By providing a window during which the Pod remains available, you can SSH into it and troubleshoot the issue.

Example configuration:

config:
  node:
    crash_loop_limit: 5
    crash_loop_sleep_sec: 60

In this example, when the broker hits the crash_loop_limit of 5, it will sleep for 60 seconds before terminating the process. This delay allows administrators to access the Pod and troubleshoot.

To troubleshoot a crash loop backoff:

Check the Redpanda logs from the most recent crashes:

kubectl logs <pod-name> --namespace <namespace>

Kubernetes retains logs only for the current and the previous instance of a container. This limitation makes it difficult to access logs from earlier crashes, which may contain vital clues about the root cause of the issue. Given these log retention limitations, setting up a centralized logging system is crucial. Systems such as Loki or Datadog can capture and store logs from all containers, ensuring you have access to historical data.

Resolve the issue that led to the crash loop backoff.

Reset the crash counter to zero to allow Redpanda to restart. You can do any of the following to reset the counter:

Make changes to any of the following sections in the Redpanda Helm chart to trigger an update:
- config.node
- config.tunable
For example:
```
config:
  node:
    crash_loop_limit: <new-integer>
```

Delete the startup_log file in the broker’s data directory.

kubectl exec <pod-name> --namespace <namespace> -- rm /var/lib/redpanda/data/startup_log

It might be challenging to execute this command within a Pod that is in a CrashLoopBackoff state due to the limited time during which the Pod is available before it restarts. Wrapping the command in a loop might work.

Wait one hour since the last crash. The crash counter resets after one hour.

To avoid future crash loop backoffs and manage the accumulation of small segments effectively:

Monitor the size and number of segments regularly.
Optimize your Redpanda configuration for segment management.
Consider implementing Tiered Storage to manage data more efficiently.

A Redpanda Enterprise Edition license is required

During a Redpanda upgrade, if enterprise features are enabled and a valid Enterprise Edition license is missing, Redpanda logs a warning and aborts the upgrade process on the first broker. This issue prevents a successful upgrade.

A Redpanda Enterprise Edition license is required to use the currently enabled features. To apply your license, downgrade this broker to the pre-upgrade version and provide a valid license key via rpk using 'rpk cluster license set <key>', or via Redpanda Console. To request an enterprise license, please visit <redpanda.com/upgrade>. To try Redpanda Enterprise for 30 days, visit <redpanda.com/try-enterprise>. For more information, see <https://docs.redpanda.com/current/get-started/licenses>.

If you encounter this message, follow these steps to recover:

Roll back the affected broker to the original version.
Do one of the following:
- Apply a valid Redpanda Enterprise Edition license to the cluster.
- Disable enterprise features.
  
  If you do not have a valid license and want to proceed without using enterprise features, you can disable the enterprise features in your Redpanda configuration.
Retry the upgrade.

For more troubleshooting steps, see Troubleshoot Redpanda in Kubernetes.

Next steps

See the Manage Kubernetes topics to learn how to customize your deployment to meet your needs.

Suggested labs

Set Up GitOps for the Redpanda Helm Chart

Search all labs

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Deploy Redpanda for Production in Kubernetes

Prerequisites

Deploy a Redpanda cluster

Deploy multiple Redpanda clusters

Production considerations

Version pinning (Helm)

Name overrides (Helm)

Labels

Tolerations

Docker image

Number of Redpanda brokers

TLS

Authentication

Resources

Storage

External access

ExternalDNS

Logging

Monitoring

StatefulSet update strategy

Affinity rules

Graceful shutdown

Service account

Role-based access control (RBAC)

Perform a self test

Explore the default Kubernetes components

StatefulSet

PersistentVolumeClaim

Service

Headless ClusterIP Service

NodePort Service

TLS Certificates

Uninstall Redpanda

Troubleshoot

HelmRelease is not ready

HelmRelease retries exhausted

StatefulSet never rolls out

Didn’t match pod anti-affinity rules

Unable to mount volume

Failed to pull image

Dig not defined

Repository name already exists

redpanda-rpk-debug-bundle is forbidden

Fatal error during checker "Data directory is writable" execution

Cannot patch "redpanda" with kind StatefulSet

Cannot patch "redpanda-console" with kind Deployment

Helm is in a pending-rollback state

Crash loop backoffs

A Redpanda Enterprise Edition license is required

Next steps

Suggested reading

Suggested labs

Simple online edits

Contribution guide