Deploy a Redpanda Cluster in Amazon Elastic Kubernetes Service

Deploy a secure Redpanda cluster and Redpanda Console in Amazon Elastic Kubernetes Service (EKS). Then, use rpk both as an internal client and an external client to interact with your Redpanda cluster from the command line. Your Redpanda cluster has the following security features:

  • SASL for authenticating users' connections.

  • TLS with self-signed certificates for secure communication between the cluster and clients.

Prerequisites

Before you begin, you must have the following prerequisites.

IAM user

You need an IAM user with at least the following policies. See the AWS documentation for help creating IAM users or for help troubleshooting IAM.

Policies

Replace <account-id> with your own account ID.

AmazonEC2FullAccess
{
  "Version": "2012-10-17",
  "Statement": [
    {
     "Action": "ec2:*",
     "Effect": "Allow",
     "Resource": "*"
    },
    {
     "Effect": "Allow",
     "Action": "elasticloadbalancing:*",
     "Resource": "*"
    },
    {
     "Effect": "Allow",
     "Action": "cloudwatch:*",
     "Resource": "*"
    },
    {
     "Effect": "Allow",
     "Action": "autoscaling:*",
     "Resource": "*"
    },
    {
     "Effect": "Allow",
     "Action": "iam:CreateServiceLinkedRole",
     "Resource": "*",
     "Condition": {
      "StringEquals": {
        "iam:AWSServiceName": [
           "autoscaling.amazonaws.com",
           "ec2scheduled.amazonaws.com",
           "elasticloadbalancing.amazonaws.com",
           "spot.amazonaws.com",
           "spotfleet.amazonaws.com",
           "transitgateway.amazonaws.com"
        ]
      }
     }
    }
  ]
}
AWSCloudFormationFullAccess
{
  "Version": "2012-10-17",
  "Statement": [
    {
     "Effect": "Allow",
     "Action": [
      "cloudformation:*"
     ],
     "Resource": "*"
    }
  ]
}
EksAllAccess
{
  "Version": "2012-10-17",
  "Statement": [
    {
     "Effect": "Allow",
     "Action": "eks:*",
     "Resource": "*"
    },
    {
     "Action": [
      "ssm:GetParameter",
      "ssm:GetParameters"
     ],
     "Resource": [
      "arn:aws:ssm:*:<account-id>:parameter/aws/*",
      "arn:aws:ssm:*::parameter/aws/*"
     ],
     "Effect": "Allow"
    },
    {
     "Action": [
       "kms:CreateGrant",
       "kms:DescribeKey"
     ],
     "Resource": "*",
     "Effect": "Allow"
    },
    {
     "Action": [
       "logs:PutRetentionPolicy"
     ],
     "Resource": "*",
     "Effect": "Allow"
    }
  ]
}
IamLimitedAccess
{
  "Version": "2012-10-17",
  "Statement": [
    {
     "Effect": "Allow",
     "Action": [
      "iam:CreateInstanceProfile",
      "iam:DeleteInstanceProfile",
      "iam:GetInstanceProfile",
      "iam:RemoveRoleFromInstanceProfile",
      "iam:GetRole",
      "iam:CreateRole",
      "iam:DeleteRole",
      "iam:AttachRolePolicy",
      "iam:PutRolePolicy",
      "iam:ListInstanceProfiles",
      "iam:AddRoleToInstanceProfile",
      "iam:ListInstanceProfilesForRole",
      "iam:PassRole",
      "iam:DetachRolePolicy",
      "iam:DeleteRolePolicy",
      "iam:GetRolePolicy",
      "iam:GetOpenIDConnectProvider",
      "iam:CreateOpenIDConnectProvider",
      "iam:DeleteOpenIDConnectProvider",
      "iam:TagOpenIDConnectProvider",
      "iam:ListAttachedRolePolicies",
      "iam:TagRole",
      "iam:GetPolicy",
      "iam:CreatePolicy",
      "iam:DeletePolicy",
      "iam:ListPolicyVersions"
     ],
     "Resource": [
      "arn:aws:iam::<account-id>:instance-profile/eksctl-*",
      "arn:aws:iam::<account-id>:role/eksctl-*",
      "arn:aws:iam::<account-id>:policy/eksctl-*",
      "arn:aws:iam::<account-id>:oidc-provider/*",
      "arn:aws:iam::<account-id>:role/aws-service-role/eks-nodegroup.amazonaws.com/AWSServiceRoleForAmazonEKSNodegroup",
      "arn:aws:iam::<account-id>:role/eksctl-managed-*",
      "arn:aws:iam::<account-id>:role/AmazonEKS_EBS_CSI_DriverRole"
     ]
    },
    {
     "Effect": "Allow",
     "Action": [
      "iam:GetRole"
     ],
     "Resource": [
      "arn:aws:iam::<account-id>:role/*"
     ]
    },
    {
     "Effect": "Allow",
     "Action": [
      "iam:CreateServiceLinkedRole"
     ],
     "Resource": "*",
     "Condition": {
      "StringEquals": {
        "iam:AWSServiceName": [
           "eks.amazonaws.com",
           "eks-nodegroup.amazonaws.com",
           "eks-fargate.amazonaws.com"
        ]
      }
     }
    }
  ]
}

AWS CLI

You need the AWS CLI to configure kubeconfig and get information about your EC2 instances.

After you’ve installed the AWS CLI, make sure to configure it with credentials for your IAM user.

If your account uses an identity provider in the IAM Identity Center (previously AWS SSO), authenticate with the IAM Identity Center (aws sso login).

For troubleshooting, see the AWS CLI documentation.

eksctl

You need eksctl to create an EKS cluster from the command line.

jq

You need jq to parse JSON results and store the value in environment variables.

kubectl

You must have kubectl with the following minimum required Kubernetes version: 1.21

To check if you have kubectl installed:

kubectl version --short --client

Helm

You must have the following minimum required version of Helm: 3.10.0

To check if you have Helm installed:

helm version

Create an EKS cluster

Your EKS cluster must have one worker node available for each Redpanda broker that you plan to deploy in your Redpanda cluster. You also need to run the worker nodes on an EC2 instance type that supports the requirements and recommendations for production deployments.

In this step, you create an EKS cluster with three nodes on c5d.2xlarge instance types. Deploying three nodes allows your EKS cluster to support a Redpanda cluster with three brokers. The c5d.2xlarge instance type comes with:

  • 2 cores per worker node, which is a requirement for production.

  • Local NVMe disks, which is recommended for best performance.

The Helm chart configures podAntiAffinity rules to make sure that only one Pod running a Redpanda broker is scheduled on each worker node.
  1. Create an EKS cluster and give it a unique name. If your account is configured with OIDC, add the --with-oidc flag to the create cluster command.

    eksctl create cluster \
      --name <cluster-name> \
      --nodegroup-name nvme-workers \
      --node-type c5d.2xlarge \
      --nodes 3 \
      --external-dns-access

    To see all options:

    eksctl create cluster --help

    Or, for help creating an EKS cluster, see the Creating and managing clusters in the eksctl documentation.

  2. Make sure that your local kubeconfig file points to your EKS cluster:

    kubectl get service

    You should see a ClusterIP Service called kubernetes.

    If the kubectl command cannot connect to your cluster, update your local kubeconfig file to point to your EKS cluster.

    Your default region is in the ~/.aws/credentials file.

    aws eks update-kubeconfig --region <region> --name <cluster-name>

Create a StorageClass for your local NVMe disks

When you provisioned the Kubernetes cluster, you selected an instance type that comes with local NVMe disks. However, these disks are not automatically mounted or formatted upon creation. To use these local NVMe disks, you must mount and format them, and you must create the necessary PersistentVolumes (PVs). To automate this process, you can use a Container Storage Interface (CSI) driver.

In this step, you install the recommended local volume manager (LVM) CSI driver. Then, you create a StorageClass that references the LVM CSI driver and specifies the recommended XFS file system.

  1. Install the LVM CSI driver:

    helm repo add metal-stack https://helm.metal-stack.io
    helm repo update
    helm install csi-driver-lvm metal-stack/csi-driver-lvm \
      --namespace csi-driver-lvm \
      --create-namespace \
      --set lvm.devicePattern='/dev/nvme[1-9]n[0-9]'

    The lvm.devicePattern property specifies the pattern that the CSI driver uses to identify available NVMe volumes on your worker nodes.

  2. Create the StorageClass:

    csi-driver-lvm-striped-xfs.yaml
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: csi-driver-lvm-striped-xfs
    provisioner: lvm.csi.metal-stack.io
    reclaimPolicy: Retain
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    parameters:
      type: "striped"
      csi.storage.k8s.io/fstype: xfs
    • provisioner: The LVM CSI driver responsible for provisioning the volume.

    • reclaimPolicy: The Retain policy ensures that the underlying volume is not deleted when the corresponding PVC is deleted.

    • volumeBindingMode: The WaitForFirstConsumer mode delays the binding and provisioning of a PersistentVolume until a Pod that uses the PVC is created. This mode is important for ensuring that the PV is created on the same node where the Pod will run because the PV will use the node’s local NVMe volumes.

    • allowVolumeExpansion: Allows the volume to be expanded after it has been provisioned.

    • parameters.type: Combines multiple physical volumes to create a single logical volume. In a striped setup, data is spread across the physical volumes in a way that distributes the I/O load evenly, improving performance by allowing parallel disk I/O operations.

    • parameters.csi.storage.k8s.io/fstype: Formats the volumes with the XFS file system. Redpanda Data recommends XFS for its enhanced performance with Redpanda workloads.

  3. Apply the StorageClass:

    kubectl apply -f csi-driver-lvm-striped-xfs.yaml

    After applying this StorageClass, any PVC that references it will attempt to provision storage using the LVM CSI driver and the provided parameters.

Configure external access

In this step, you configure your EKS cluster to allow external access to the node ports on which the Redpanda deployment will be exposed. You use these node ports in later steps to configure external access to your Redpanda cluster.

  1. Get the ID of the security group that’s associated with the nodes in your EKS cluster:

    AWS_SECURITY_GROUP_ID=`aws eks describe-cluster --name <cluster-name> | jq -r '.cluster.resourcesVpcConfig.clusterSecurityGroupId'`
  2. Add inbound firewall rules to your EC2 instances so that external traffic can reach the node ports exposed on all Kubernetes worker nodes in the cluster:

    aws ec2 authorize-security-group-ingress \
      --group-id ${AWS_SECURITY_GROUP_ID} \
      --ip-permissions '[
      {
        "IpProtocol": "tcp",
        "FromPort": 30081,
        "ToPort": 30081,
        "IpRanges": [{"CidrIp": "0.0.0.0/0"}]
      },
      {
        "IpProtocol": "tcp",
        "FromPort": 30082,
        "ToPort": 30082,
        "IpRanges": [{"CidrIp": "0.0.0.0/0"}]
      },
      {
        "IpProtocol": "tcp",
        "FromPort": 31644,
        "ToPort": 31644,
        "IpRanges": [{"CidrIp": "0.0.0.0/0"}]
      },
      {
        "IpProtocol": "tcp",
        "FromPort": 31092,
        "ToPort": 31092,
        "IpRanges": [{"CidrIp": "0.0.0.0/0"}]
      }
      ]'
    If you use 0.0.0.0/0, you enable all IPv4 addresses to access your instances on those node ports. In production, you should authorize only a specific IP address or range of addresses to access your instances.

    For help creating firewall rules, see the Amazon EC2 documentation.

Deploy Redpanda and Redpanda Console

In this step, you deploy Redpanda with SASL authentication and self-signed TLS certificates. Redpanda Console is included as a subchart in the Redpanda Helm chart.

  • Helm + Operator

  • Helm

  1. Make sure that you have permission to install custom resource definitions (CRDs):

    kubectl auth can-i create CustomResourceDefinition --all-namespaces

    You should see yes in the output.

    You need these cluster-level permissions to install cert-manager and Redpanda Operator CRDs in the next steps.

  2. Install cert-manager using Helm:

    helm repo add jetstack https://charts.jetstack.io
    helm repo update
    helm install cert-manager jetstack/cert-manager \
      --set installCRDs=true \
      --namespace cert-manager  \
      --create-namespace

    The Redpanda Helm chart uses cert-manager to enable TLS and manage TLS certificates by default.

  3. Install the Redpanda Operator custom resource definitions (CRDs):

    kubectl kustomize "https://github.com/redpanda-data/redpanda-operator//src/go/k8s/config/crd?ref=v2.2.5-24.2.7" \
        | kubectl apply -f -
  4. Deploy the Redpanda Operator:

    helm repo add redpanda https://charts.redpanda.com
    helm upgrade --install redpanda-controller redpanda/operator \
      --namespace <namespace> \
      --set image.tag=v2.2.5-24.2.7 \
      --create-namespace \
      --timeout 1h
    If you already have Flux installed and you want it to continue managing resources across the entire cluster, use the --set additionalCmdFlags="{--enable-helm-controllers=false}" flag. This flag prevents the Redpanda Operator from deploying its own set of Helm controllers that may conflict with those installed with Flux.
  5. Ensure that the Deployment is successfully rolled out:

    kubectl --namespace <namespace> rollout status --watch deployment/redpanda-controller-operator
    deployment "redpanda-controller-operator" successfully rolled out
  6. Install a Redpanda custom resource in the same namespace as the Redpanda Operator:

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha1
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        external:
          domain: customredpandadomain.local
        auth:
          sasl:
            enabled: true
            users:
              - name: superuser
                password: secretpassword
        storage:
          persistentVolume:
            enabled: true
            storageClass: csi-driver-lvm-striped-xfs
    kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
    • external.domain: The custom domain that each broker will advertise to clients externally. This domain is added to the internal and external TLS certificates so that you can connect to the cluster using this domain.

    • auth.sasl.name: Creates a superuser called superuser that can grant permissions to new users in your cluster using access control lists (ACLs).

    • storage.persistentVolume.storageClass: Points each PVC associated with the Redpanda brokers to the csi-driver-lvm-striped-xfs StorageClass. This StorageClass allows the LVM CSI driver to provision the appropriate local PersistentVolumes backed by NVMe disks for each Redpanda broker.

  7. Wait for the Redpanda Operator to deploy Redpanda using the Helm chart:

    kubectl get redpanda --namespace <namespace> --watch
    NAME       READY   STATUS
    redpanda   True    Redpanda reconciliation succeeded

    This step may take a few minutes. You can watch for new Pods to make sure that the deployment is progressing:

    kubectl get pod --namespace <namespace>

    If it’s taking too long, see Troubleshoot.

  1. Install cert-manager using Helm:

    helm repo add jetstack https://charts.jetstack.io
    helm repo update
    helm install cert-manager jetstack/cert-manager \
      --set installCRDs=true \
      --namespace cert-manager \
      --create-namespace

    TLS is enabled by default. The Redpanda Helm chart uses cert-manager to manage TLS certificates by default.

  2. Install Redpanda with SASL enabled:

    helm repo add redpanda https://charts.redpanda.com \
    helm install redpanda redpanda/redpanda \
      --namespace <namespace> --create-namespace \
      --set auth.sasl.enabled=true \
      --set "auth.sasl.users[0].name=superuser" \
      --set "auth.sasl.users[0].password=secretpassword" \
      --set external.domain=customredpandadomain.local \
      --set "storage.persistentVolume.storageClass=csi-driver-lvm-striped-xfs" \
      --wait \
      --timeout 1h
    • external.domain: The custom domain that each broker advertises to clients externally. This domain is added to the internal and external TLS certificates so that you can connect to the cluster using this domain.

    • auth.sasl.name: Creates a superuser called superuser that can grant permissions to new users in your cluster using access control lists (ACLs).

    • storage.persistentVolume.storageClass: Points each PVC associated with the Redpanda brokers to the csi-driver-lvm-striped-xfs StorageClass. This StorageClass allows the LVM CSI driver to provision the appropriate local PersistentVolumes backed by NVMe disks for each Redpanda broker.

The installation displays some tips for getting started.

If the installation is taking a long time, see Troubleshoot.

Verify the deployment

When the Redpanda Helm chart is deployed, you should have:

  • Three Redpanda brokers. Each Redpanda broker runs inside a separate Pod and is scheduled on a separate worker node.

  • One PVC bound to a PV for each Redpanda broker. These PVs are what the Redpanda brokers use to store the Redpanda data directory with all your topics and metadata.

  1. Verify that each Redpanda broker is scheduled on only one Kubernetes node:

    kubectl get pod --namespace <namespace>  \
    -o=custom-columns=NODE:.spec.nodeName,POD_NAME:.metadata.name -l \
    app.kubernetes.io/component=redpanda-statefulset

    Example output:

    NODE              POD_NAME
    example-worker3   redpanda-0
    example-worker2   redpanda-1
    example-worker    redpanda-2
  2. Verify that each Redpanda broker has a bound PVC:

    kubectl get persistentvolumeclaim \
      --namespace <namespace> \
      -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,STORAGECLASS:.spec.storageClassName

    Example output:

    NAME                 STATUS   STORAGECLASS
    datadir-redpanda-0   Bound    csi-driver-lvm-striped-xfs
    datadir-redpanda-1   Bound    csi-driver-lvm-striped-xfs
    datadir-redpanda-2   Bound    csi-driver-lvm-striped-xfs

Create a user

In this step, you use rpk to create a new user. Then, you authenticate to Redpanda with the superuser to grant permissions to the new user. You’ll authenticate to Redpanda with this new user to create a topic in the next steps.

As a security best practice, you should use the superuser only to grant permissions to new users through ACLs. Never delete the superuser. You need the superuser to grant permissions to new users.
  1. Create a new user called redpanda-twitch-account with the password changethispassword:

    kubectl --namespace <namespace> exec -ti redpanda-0 -c redpanda -- \
    rpk security acl user create redpanda-twitch-account \
    -p changethispassword

    Example output:

    Created user "redpanda-twitch-account".
  2. Use the superuser to grant the redpanda-twitch-account user permission to execute all operations only for a topic called twitch-chat.

    kubectl exec --namespace <namespace> -c redpanda redpanda-0 -- \
      rpk security acl create --allow-principal User:redpanda-twitch-account \
      --operation all \
      --topic twitch-chat \
      -X user=superuser -X pass=secretpassword -X sasl.mechanism=SCRAM-SHA-512

    Example output:

    PRINCIPAL     RESOURCE-TYPE  RESOURCE-NAME   OPERATION  PERMISSION
    User:redpanda TOPIC          twitch-chat     ALL        ALLOW

Start streaming

In this step, you authenticate to Redpanda with the redpanda-twitch-account user to create a topic called twitch-chat. This topic is the only one that the redpanda-twitch-account user has permission to access. Then, you produce messages to the topic, and consume messages from it.

  1. Create an alias to simplify the rpk commands:

    alias internal-rpk="kubectl --namespace <namespace> exec -i -t redpanda-0 -c redpanda -- rpk -X user=redpanda-twitch-account -X pass=changethispassword -X sasl.mechanism=SCRAM-SHA-256"
  2. Create a topic called twitch-chat:

    • Helm + Operator

    • Helm

    1. Create a Secret in which to store your user’s password:

      kubectl create secret generic redpanda-secret --from-literal=password='changethispassword' --namespace <namespace>
    2. Create a Topic resource:

      topic.yaml
      apiVersion: cluster.redpanda.com/v1alpha1
      kind: Topic
      metadata:
        name: twitch-chat
      spec:
        kafkaApiSpec:
          brokers:
            - "redpanda-0.redpanda.<namespace>.svc.cluster.local:9093"
            - "redpanda-1.redpanda.<namespace>.svc.cluster.local:9093"
            - "redpanda-2.redpanda.<namespace>.svc.cluster.local:9093"
          tls:
            caCertSecretRef:
              name: "redpanda-default-cert"
              key: "ca.crt"
          sasl:
            username: redpanda-twitch-account
            mechanism: SCRAM-SHA-256
            passwordSecretRef:
              name: redpanda-secret
              key: password
    3. Apply the Topic resource in the same namespace as your Redpanda cluster:

      kubectl apply -f topic.yaml --namespace <namespace>
    4. Check the logs of the Redpanda Operator to confirm that the topic was created:

      kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace>

      You should see that the Redpanda Operator reconciled the Topic resource. For example:

      Example output
      {
        "level":"info",
        "ts":"2023-09-25T16:20:09.538Z",
        "logger":"TopicReconciler.Reconcile",
        "msg":"Starting reconcile loop",
        "controller":"topic",
        "controllerGroup":"cluster.redpanda.com",
        "controllerKind":"Topic",
        "Topic":
        {
          "name":"twitch-chat",
          "namespace":"<namespace>"
        },
        "namespace":"<namespace>",
        "name":"twitch-chat",
        "reconcileID":"c0cf9abc-a553-48b7-9b6e-2de3cdfb4432"
      }
      {
        "level":"info",
        "ts":"2023-09-25T16:20:09.581Z",
        "logger":"TopicReconciler.Reconcile",
        "msg":"reconciliation finished in 43.436125ms, next run in 3s",
        "controller":"topic",
        "controllerGroup":"cluster.redpanda.com",
        "controllerKind":"Topic",
        "Topic":
        {
          "name":"twitch-chat",
          "namespace":"<namespace>"
        },
        "namespace":"<namespace>",
        "name":"twitch-chat",
        "reconcileID":"c0cf9abc-a553-48b7-9b6e-2de3cdfb4432",
        "result":
        {
          "Requeue":false,
          "RequeueAfter":3000000000
        }
      }
    internal-rpk topic create twitch-chat

    Example output:

    TOPIC STATUS twitch-chat OK
  3. Describe the topic:

    internal-rpk topic describe twitch-chat
    Expected output:
    SUMMARY
    =======
    NAME        twitch-chat
    PARTITIONS  1
    REPLICAS    1
    
    CONFIGS
    =======
    KEY                     VALUE                          SOURCE
    cleanup.policy          delete                         DYNAMIC_TOPIC_CONFIG
    compression.type        producer                       DEFAULT_CONFIG
    message.timestamp.type  CreateTime                     DEFAULT_CONFIG
    partition_count         1                              DYNAMIC_TOPIC_CONFIG
    redpanda.datapolicy     function_name:  script_name:   DEFAULT_CONFIG
    redpanda.remote.read    false                          DEFAULT_CONFIG
    redpanda.remote.write   false                          DEFAULT_CONFIG
    replication_factor      1                              DYNAMIC_TOPIC_CONFIG
    retention.bytes         -1                             DEFAULT_CONFIG
    retention.ms            604800000                      DEFAULT_CONFIG
    segment.bytes           1073741824                     DEFAULT_CONFIG
  4. Produce a message to the topic:

    internal-rpk topic produce twitch-chat
  5. Type a message, then press Enter:

    Pandas are fabulous!

    Example output:

    Produced to partition 0 at offset 0 with timestamp 1663282629789.
  6. Press Ctrl+C to finish producing messages to the topic.

  7. Consume one message from the topic:

    internal-rpk topic consume twitch-chat --num 1
    Expected output:
    {
      "topic": "twitch-chat",
      "value": "Pandas are fabulous!",
      "timestamp": 1663282629789,
      "partition": 0,
      "offset": 0
    }

Explore your topic in Redpanda Console

Redpanda Console is a developer-friendly web UI for managing and debugging your Redpanda cluster and your applications.

In this step, you use port-forwarding to access Redpanda Console on your local network.

Because you’re using the Community Edition of Redpanda Console, you should not expose Redpanda Console outside your local network. The Community Edition of Redpanda Console does not provide authentication, and it connects to the Redpanda cluster as superuser. To use the Enterprise Edition, you need a license key, see Redpanda Licensing.
  1. Expose Redpanda Console to your localhost:

    kubectl --namespace <namespace> port-forward svc/redpanda-console 8080:8080

    The kubectl port-forward command actively runs in the command-line window. To execute other commands while the command is running, open another command-line window.

  2. Open Redpanda Console on http://localhost:8080.

    All your Redpanda brokers are listed along with their IP addresses and IDs.

  3. Go to Topics > twitch-chat.

    The message that you produced to the topic is displayed along with some other details about the topic.

  4. Press Ctrl+C in the command-line to stop the port-forwarding process.

Configure external access to Redpanda

If you want to connect to the Redpanda cluster with external clients, Redpanda brokers must advertise an externally accessible address that external clients can connect to. External clients are common in Internet of Things (IoT) environments, or if you use external services that do not implement VPC peering in your network.

When you created the cluster, you set the external.domain configuration to customredpandadomain.local, which means that your Redpanda brokers are advertising the following addresses:

  • redpanda-0.customredpandadomain.local

  • redpanda-1.customredpandadomain.local

  • redpanda-2.customredpandadomain.local

To access your Redpanda brokers externally, you can map your worker nodes' IP addresses to these domains.

IP addresses can change. If the IP addresses of your worker nodes change, you must update your /etc/hosts file with the new mappings.

In a production environment, it’s a best practice to use ExternalDNS to manage DNS records for your brokers. See Use ExternalDNS for external access.

  1. Add mappings in your /etc/hosts file between your worker nodes' IP addresses and their custom domain names:

    sudo true && kubectl --namespace <namespace> get endpoints,node -A -o go-template='{{ range $_ := .items }}{{ if and (eq .kind "Endpoints") (eq .metadata.name "redpanda-external") }}{{ range $_ := (index .subsets 0).addresses }}{{ $nodeName := .nodeName }}{{ $podName := .targetRef.name }}{{ range $node := $.items }}{{ if and (eq .kind "Node") (eq .metadata.name $nodeName) }}{{ range $_ := .status.addresses }}{{ if eq .type "ExternalIP" }}{{ .address }} {{ $podName }}.customredpandadomain.local{{ "\n" }}{{ end }}{{ end }}{{ end }}{{ end }}{{ end }}{{ end }}{{ end }}' | envsubst | sudo tee -a /etc/hosts
    /etc/hosts
    203.0.113.3 redpanda-0.customredpandadomain.local
    203.0.113.5 redpanda-1.customredpandadomain.local
    203.0.113.7 redpanda-2.customredpandadomain.local
  2. Save the root certificate authority (CA) to your local file system outside Kubernetes:

    kubectl --namespace <namespace> get secret redpanda-external-root-certificate -o go-template='{{ index .data "ca.crt" | base64decode }}' > ca.crt
  3. Install rpk on your local machine, not on a Pod:

    • Linux

    • macOS

    1. Download the rpk archive for Linux, and make sure the version matches your Redpanda version.

      • To download the latest version of rpk:

        curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip
      • To download a version other than the latest:

        curl -LO https://github.com/redpanda-data/redpanda/releases/download/v<version>/rpk-linux-amd64.zip
    2. Ensure that you have the folder ~/.local/bin:

      mkdir -p ~/.local/bin
    3. Add it to your $PATH:

      export PATH="~/.local/bin:$PATH"
    4. Unzip the rpk files to your ~/.local/bin/ directory:

      unzip rpk-linux-amd64.zip -d ~/.local/bin/
    5. Run rpk version to display the version of the rpk binary:

      rpk version
      23.3.1 (rev b5ade3f40)
    1. If you don’t have Homebrew installed, install it.

    2. Install rpk:

      brew install redpanda-data/tap/redpanda
    3. Run rpk version to display the version of the rpk binary:

      rpk version
      23.3.1 (rev b5ade3f40)
      This method installs the latest version of rpk, which is supported only with the latest version of Redpanda.
  4. Configure rpk to connect to your cluster using the pre-configured profile:

    rpk profile create --from-profile <(kubectl get configmap --namespace <namespace> redpanda-rpk -o go-template='{{ .data.profile }}') <profile-name>

    Replace <profile-name> with the name that you want to give this rpk profile.

  5. Test the connection:

    rpk cluster info -X user=redpanda-twitch-account -X pass=changethispassword -X sasl.mechanism=SCRAM-SHA-256

Explore the default Kubernetes components

By default, the Redpanda Helm chart deploys the following Kubernetes components:

StatefulSet

Redpanda is a stateful application. Each Redpanda broker needs to store its own state (topic partitions) in its own storage volume. As a result, the Helm chart deploys a StatefulSet to manage the Pods in which the Redpanda brokers are running.

kubectl get statefulset --namespace <namespace>

Example output:

NAME       READY   AGE
redpanda   3/3     3m11s

StatefulSets ensure that the state associated with a particular Pod replica is always the same, no matter how often the Pod is recreated. Each Pod is also given a unique ordinal number in its name such as redpanda-0. A Pod with a particular ordinal number is always associated with a PersistentVolumeClaim with the same number. When a Pod in the StatefulSet is deleted and recreated, it is given the same ordinal number and so it mounts the same storage volume as the deleted Pod that it replaced.

kubectl get pod --namespace <namespace>
Expected output:
NAME                              READY   STATUS      RESTARTS        AGE
redpanda-0                        1/1     Running     0               6m9s
redpanda-1                        1/1     Running     0               6m9s
redpanda-2                        1/1     Running     0               6m9s
redpanda-console-5ff45cdb9b-6z2vs 1/1     Running     0               5m
redpanda-configuration-smqv7      0/1     Completed   0               6m9s
The redpanda-configuration job updates the Redpanda runtime configuration.

PersistentVolumeClaim

Redpanda brokers must be able to store their data on disk. By default, the Helm chart uses the default StorageClass in the Kubernetes cluster to create a PersistentVolumeClaim for each Pod. The default StorageClass in your Kubernetes cluster depends on the Kubernetes platform that you are using.

kubectl get persistentvolumeclaims --namespace <namespace>
Expected output:
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-redpanda-0   Bound    pvc-3311ade3-de84-4027-80c6-3d8347302962   20Gi       RWO            standard       75s
datadir-redpanda-1   Bound    pvc-4ea8bc03-89a6-41e4-b985-99f074995f08   20Gi       RWO            standard       75s
datadir-redpanda-2   Bound    pvc-45c3555f-43bc-48c2-b209-c284c8091c45   20Gi       RWO            standard       75s

Service

The clients writing to or reading from a given partition have to connect directly to the leader broker that hosts the partition. As a result, clients need to be able to connect directly to each Pod. To allow internal and external clients to connect to each Pod that hosts a Redpanda broker, the Helm chart configures two Services:

kubectl get service --namespace <namespace>
Expected output:
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                       AGE
redpanda            ClusterIP   None            <none>        <none>                                                        5m37s
redpanda-console    ClusterIP   10.0.251.204    <none>        8080                                                          5m
redpanda-external   NodePort    10.96.137.220   <none>        9644:31644/TCP,9094:31092/TCP,8083:30082/TCP,8080:30081/TCP   5m37s

Headless ClusterIP Service

The headless Service associated with a StatefulSet gives the Pods their network identity in the form of a fully qualified domain name (FQDN). Both Redpanda brokers in the same Redpanda cluster and clients within the same Kubernetes cluster use this FQDN to communicate with each other.

An important requirement of distributed applications such as Redpanda is peer discovery: The ability for each broker to find other brokers in the same cluster. When each Pod is rolled out, its seed_servers field is updated with the FQDN of each Pod in the cluster so that they can discover each other.

kubectl --namespace <namespace> exec redpanda-0 -c redpanda -- cat etc/redpanda/redpanda.yaml
redpanda:
  data_directory: /var/lib/redpanda/data
  empty_seed_starts_cluster: false
  seed_servers:
  - host:
      address: redpanda-0.redpanda.<namespace>.svc.cluster.local.
      port: 33145
  - host:
      address: redpanda-1.redpanda.<namespace>.svc.cluster.local.
      port: 33145
  - host:
      address: redpanda-2.redpanda.<namespace>.svc.cluster.local.
      port: 33145

NodePort Service

External access is made available by a NodePort service that opens the following ports by default:

Listener Node Port Container Port

Schema Registry

30081

8081

HTTP Proxy

30082

8083

Kafka API

31092

9094

Admin API

31644

9644

TLS Certificates

By default, TLS is enabled in the Redpanda Helm chart. The Helm chart uses cert-manager to generate four Certificate resources that provide Redpanda with self-signed certificates for internal and external connections.

Having separate certificates for internal and external connections provides security isolation. If an external certificate or its corresponding private key is compromised, it doesn’t affect the security of internal communications.

kubectl get certificate --namespace <namespace>
NAME                                 READY
redpanda-default-cert                True
redpanda-default-root-certificate    True
redpanda-external-cert               True
redpanda-external-root-certificate   True
  • redpanda-default-cert: Self-signed certificate for internal communications.

  • redpanda-default-root-certificate: Root certificate authority for the internal certificate.

  • redpanda-external-cert: Self-signed certificate for external communications.

  • redpanda-external-root-certificate: Root certificate authority for the external certificate.

By default, all listeners are configured with the same certificate. To configure separate TLS certificates for different listeners, see TLS for Redpanda in Kubernetes.

The Redpanda Helm chart provides self-signed certificates for convenience. In a production environment, it’s best to use certificates from a trusted Certificate Authority (CA) or integrate with your existing CA infrastructure.

Uninstall Redpanda

When you’ve finished testing Redpanda, you can uninstall it from your cluster and delete any Kubernetes resources that the Helm chart created.

  • Helm + Operator

  • Helm

kubectl delete -f redpanda-cluster.yaml --namespace <namespace>
helm uninstall redpanda-controller --namespace <namespace>
kubectl delete pod --all --namespace <namespace>
kubectl delete pvc --all --namespace <namespace>
kubectl delete secret --all --namespace <namespace>
helm uninstall redpanda --namespace <namespace>
kubectl delete pod --all --namespace <namespace>
kubectl delete pvc --all --namespace <namespace>
kubectl delete secret --all --namespace <namespace>

To remove the internal-rpk alias:

unalias internal-rpk

Delete the cluster

To delete your Kubernetes cluster:

eksctl delete cluster --name <cluster-name>

Troubleshoot

Before troubleshooting your cluster, make sure that you have all the prerequisites.

HelmRelease is not ready

If you are using the Redpanda Operator, you may see the following message while waiting for a Redpanda custom resource to be deployed:

NAME       READY   STATUS
redpanda   False   HelmRepository 'redpanda/redpanda-repository' is not ready
redpanda   False   HelmRelease 'redpanda/redpanda' is not ready

While the deployment process can sometimes take a few minutes, a prolonged 'not ready' status may indicate an issue. Follow the steps below to investigate:

  1. Check the status of the HelmRelease:

    kubectl describe helmrelease <redpanda-resource-name> --namespace <namespace>
  2. Review the Redpanda Operator logs:

    kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace>

HelmRelease retries exhausted

The HelmRelease retries exhausted error occurs when the Helm Controller has tried to reconcile the HelmRelease a number of times, but these attempts have failed consistently.

The Helm Controller watches for changes in HelmRelease objects. When changes are detected, it tries to reconcile the state defined in the HelmRelease with the state in the cluster. The process of reconciliation includes installation, upgrade, testing, rollback or uninstallation of Helm releases.

You may see this error due to:

  • Incorrect configuration in the HelmRelease.

  • Issues with the chart, such as a non-existent chart version or the chart repository not being accessible.

  • Missing dependencies or prerequisites required by the chart.

  • Issues with the underlying Kubernetes cluster, such as insufficient resources or connectivity issues.

To debug this error do the following:

  1. Check the status of the HelmRelease:

    kubectl describe helmrelease <cluster-name> --namespace <namespace>
  2. Review the Redpanda Operator logs:

    kubectl logs -l app.kubernetes.io/name=operator -c manager --namespace <namespace>

When you find and fix the error, you must use the Flux CLI, fluxctl, to suspend and resume the reconciliation process:

  1. Install Flux CLI.

  2. Suspend the HelmRelease:

    flux suspend helmrelease <cluster-name> --namespace <namespace>
  3. Resume the HelmRelease:

    flux resume helmrelease <cluster-name> --namespace <namespace>

Crash loop backoffs

If a broker crashes after startup, or gets stuck in a crash loop, it could produce progressively more stored state that uses additional disk space and takes more time for each restart to process.

To prevent infinite crash loops, the Redpanda Helm chart sets the crash_loop_limit node property to 5. The crash loop limit is the number of consecutive crashes that can happen within one hour of each other. After Redpanda reaches this limit, it will not start until its internal consecutive crash counter is reset to zero. In Kubernetes, the Pod running Redpanda remains in a CrashLoopBackoff state until its internal consecutive crash counter is reset to zero.

To troubleshoot a crash loop backoff:

  1. Check the Redpanda logs from the most recent crashes:

    kubectl logs <pod-name> --namespace <namespace>
    Kubernetes retains logs only for the current and the previous instance of a container. This limitation makes it difficult to access logs from earlier crashes, which may contain vital clues about the root cause of the issue. Given these log retention limitations, setting up a centralized logging system is crucial. Systems such as Loki or Datadog can capture and store logs from all containers, ensuring you have access to historical data.
  2. Resolve the issue that led to the crash loop backoff.

  3. Reset the crash counter to zero to allow Redpanda to restart. You can do any of the following to reset the counter:

    • Update the redpanda.yaml configuration file. You can make changes to any of the following sections in the Redpanda Helm chart to trigger an update:

      • config.cluster

      • config.node

      • config.tunable

    • Delete the startup_log file in the broker’s data directory.

      kubectl exec <pod-name> --namespace <namespace> -- rm /var/lib/redpanda/data/startup_log
      It might be challenging to execute this command within a Pod that is in a CrashLoopBackoff state due to the limited time during which the Pod is available before it restarts. Wrapping the command in a loop might work.
    • Wait one hour since the last crash. The crash counter resets after one hour.

To avoid future crash loop backoffs and manage the accumulation of small segments effectively:

  • Monitor the size and number of segments regularly.

  • Optimize your Redpanda configuration for segment management.

  • Consider implementing Tiered Storage to manage data more efficiently.

StatefulSet never rolls out

If the StatefulSet Pods remain in a pending state, they are waiting for resources to become available.

To identify the Pods that are pending, use the following command:

kubectl get pod --namespace <namespace>

The response includes a list of Pods in the StatefulSet and their status.

To view logs for a specific Pod, use the following command.

kubectl logs -f <pod-name> --namespace <namespace>

You can use the output to debug your deployment.

Unable to mount volume

If you see volume mounting errors in the Pod events or in the Redpanda logs, ensure that each of your Pods has a volume available in which to store data.

  • If you’re using StorageClasses with dynamic provisioners (default), ensure they exist:

    kubectl get storageclass
  • If you’re using PersistentVolumes, ensure that you have one PersistentVolume available for each Redpanda broker, and that each one has the storage capacity that’s set in storage.persistentVolume.size:

    kubectl get persistentvolume --namespace <namespace>

To learn how to configure different storage volumes, see Configure Storage.

Failed to pull image

When deploying the Redpanda Helm chart, you may encounter Docker rate limit issues because the default registry URL is not recognized as a Docker Hub URL. The domain docker.redpanda.com is used for statistical purposes, such as tracking the number of downloads. It mirrors Docker Hub’s content while providing specific analytics for Redpanda.

Failed to pull image "docker.redpanda.com/redpandadata/redpanda:v<version>": rpc error: code = Unknown desc = failed to pull and unpack image "docker.redpanda.com/redpandadata/redpanda:v<version>": failed to copy: httpReadSeeker: failed open: unexpected status code 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

To fix this error, do one of the following:

  • Replace the image.repository value in the Helm chart with docker.io/redpandadata/redpanda. Switching to Docker Hub avoids the rate limit issues associated with docker.redpanda.com.

    • Helm + Operator

    • Helm

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha1
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        image:
          repository: docker.io/redpandadata/redpanda
    kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
    • --values

    • --set

    docker-repo.yaml
    image:
      repository: docker.io/redpandadata/redpanda
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values docker-repo.yaml --reuse-values
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --set image.repository=docker.io/redpandadata/redpanda
  • Authenticate to Docker Hub by logging in with your Docker Hub credentials. The docker.redpanda.com site acts as a reflector for Docker Hub. As a result, when you log in with your Docker Hub credentials, you will bypass the rate limit issues.

Dig not defined

This error means that you are using an unsupported version of Helm:

Error: parse error at (redpanda/templates/statefulset.yaml:203): function "dig" not defined

To fix this error, ensure that you are using the minimum required version: 3.10.0.

helm version

Repository name already exists

If you see this error, remove the redpanda chart repository, then try installing it again.

helm repo remove redpanda
helm repo add redpanda https://charts.redpanda.com
helm repo update

Fatal error during checker "Data directory is writable" execution

This error appears when Redpanda does not have write access to your configured storage volume under storage in the Helm chart.

Error: fatal error during checker "Data directory is writable" execution: open /var/lib/redpanda/data/test_file: permission denied

To fix this error, set statefulset.initContainers.setDataDirOwnership.enabled to true so that the initContainer can set the correct permissions on the data directories.

Cannot patch "redpanda" with kind StatefulSet

This error appears when you run helm upgrade with the --values flag but do not include all your previous overrides.

Error: UPGRADE FAILED: cannot patch "redpanda" with kind StatefulSet: StatefulSet.apps "redpanda" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

To fix this error, do one of the following:

  • Include all the value overrides from the previous installation or upgrade using either the --set or the --values flags.

  • Use the --reuse-values flag.

    Do not use the --reuse-values flag to upgrade from one version of the Helm chart to another. This flag stops Helm from using any new values in the upgraded chart.

Cannot patch "redpanda-console" with kind Deployment

This error appears if you try to upgrade your deployment and you already have console.enabled set to true.

Error: UPGRADE FAILED: cannot patch "redpanda-console" with kind Deployment: Deployment.apps "redpanda-console" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"redpanda", "app.kubernetes.io/name":"console"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

To fix this error, set console.enabled to false so that Helm doesn’t try to deploy Redpanda Console again.

Helm is in a pending-rollback state

An interrupted Helm upgrade process can leave your Helm release in a pending-rollback state. This state prevents further actions like upgrades, rollbacks, or deletions through standard Helm commands. To fix this:

  1. Identify the Helm release that’s in a pending-rollback state:

    helm list --namespace <namespace> --all

    Look for releases with a status of pending-rollback. These are the ones that need intervention.

  2. Verify the Secret’s status to avoid affecting the wrong resource:

    kubectl --namespace <namespace> get secret --show-labels

    Identify the Secret associated with your Helm release by its pending-rollback status in the labels.

    Ensure you have correctly identified the Secret to avoid unintended consequences. Deleting the wrong Secret could impact other deployments or services.
  3. Delete the Secret to clear the pending-rollback state:

    kubectl --namespace <namespace> delete secret -l status=pending-rollback

After clearing the pending-rollback state:

  • Retry the upgrade: Restart the upgrade process. You should investigate the initial failure to avoid getting into the pending-rollback state again.

  • Perform a rollback: If you need to roll back to a previous release, use helm rollback <release-name> <revision> to revert to a specific, stable release version.

For more troubleshooting steps, see Troubleshoot Redpanda in Kubernetes.

Next steps

When you’re ready to use a registered domain, make sure to remove your entries from the /etc/hosts file, and see Configure External Access through a NodePort Service.