Deploy Kafka Connect in Kubernetes

This topic describes how to use the Redpanda Helm chart to configure and deploy Kafka Connect in Kubernetes.

The Redpanda Connectors Docker image is a community-supported artifact. Redpanda Data does not provide enterprise support for this image. For support, reach out to the Redpanda team in Redpanda Community Slack.

The Redpanda Connectors Helm chart includes a pre-configured instance of Kafka Connect that works with Redpanda. The underlying Docker image contains only the MirrorMaker2 connector but you can build a custom image to install additional connectors.

Try Redpanda Connect for a faster way to build streaming data pipelines. It’s fully compatible with the Kafka API but eliminates the complex setup and maintenance of Kafka Connect. Redpanda Connect also comes with built-in connectors to support AI integrations.
Built-In Connector Description

MirrorSourceConnector

A source connector that replicates records between multiple Kafka clusters. It is part of Kafka’s MirrorMaker, which provides capabilities for mirroring data across Kafka clusters.

MirrorCheckpointConnector

A source connector that ensures the mirroring process can resume from where it left off in case of failures. It tracks and emits checkpoints that mirror the offsets of the source and target clusters.

MirrorHeartbeatConnector

A source connector that emits heartbeats to target topics at a defined interval, enabling MirrorMaker to track active topics on the source cluster and synchronize consumer groups across clusters.

If you want to use other connectors, you must create a custom Docker image that includes them as plugins. See Install a new connector.

Prerequisites

  • A Kubernetes cluster. You must have kubectl with at least version 1.25.0-0.

    To check if you have kubectl installed:

    kubectl version --short --client
  • Helm installed with at least version 3.10.0.

    To check if you have Helm installed:

    helm version
  • For better readability, you need jq to parse JSON results when using the Kafka Connect REST API.

  • An understanding of Kafka Connect.

Limitations

No TLS or SASL support for the Kafka Connect REST API: All incoming traffic to Kafka Connect, such as from Redpanda Console, is unauthenticated and sent in plain text. Although Kafka Connect supports TLS for network encryption and SASL for authentication, the Redpanda Connectors subchart does not. Outgoing traffic from Kafka Connect to Redpanda brokers does support TLS and SASL.

Deploy the Helm chart

The Redpanda Helm chart includes Kafka Connect (the Redpanda Connectors Helm chart) as a subchart so that you can deploy a Redpanda cluster, Kafka Connect, and Redpanda Console using a single chart. You can enable and configure the subchart in the connectors section of the Helm values.

The subchart includes a Pod that runs Kafka Connect and the built-in connectors. The Pod is managed by a Deployment that you can configure in the Helm values under connectors.deployment. Redpanda Console connects to Kafka Connect through the default redpanda-connectors Service. Kafka Connect connects to the Redpanda brokers through the default redpanda Service.

Redpanda Connectors deployed in a Kubernetes cluster with three worker nodes.
Do not schedule Pods that run Kafka Connect on the same nodes as Redpanda brokers. Redpanda brokers require access to all node resources. See Tolerations and Affinity rules.

When deploying Kafka Connect with Helm, you can choose between two modes:

Mode Description Recommended For

When connectors.deployment.create is false, the chart automatically configures and creates the Deployment resource with the following:

  • The URLs of Redpanda brokers that Kafka Connect should connect to

  • TLS settings for Redpanda brokers that have TLS enabled

  • SASL authentication settings for Redpanda brokers that have SASL enabled

A streamlined deployment with the option to modify specific configurations.

When connectors.deployment.create is true, you are responsible for configuring all aspects of the Deployment resource using the Helm values.

Full control over the Deployment resource and its configurations.

Automatic mode

In automatic mode, the subchart is automatically configured using the values in the Redpanda Helm chart. You don’t need to add any additional configuration. The chart automatically configures the Deployment resource with the values needed for Kafka Connect to communicate with your Redpanda cluster and for Redpanda Console to communicate with Kafka Connect.

All incoming traffic to Kafka Connect, such as from Redpanda Console, is unauthenticated (no SASL) and sent in plain text (no TLS). See Limitations.
  1. Deploy the Redpanda Helm chart with connectors enabled.

    Configure any additional Helm values that you want to override in the clusterSpec settings. See Configuration advice for details.

    • Helm + Operator

    • Helm

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha2
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        connectors:
          enabled: true
    kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
    • --values

    • --set

    redpanda-connectors.yaml
    connectors:
      enabled: true
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values redpanda-connectors.yaml --reuse-values
    helm upgrade --install redpanda redpanda/redpanda \
      --namespace <namespace> \
      --create-namespace \
      --set connectors.enabled=true
  2. Verify the deployment using Redpanda Console or the Kafka Connect REST API.

Manual mode

In manual mode, you are responsible for configuring the subchart using the connectors.connectors and connectors.deployment settings in the Helm values.

In this mode, you have full control over the Deployment resource and its configurations. However, no configurations are provided for you automatically.

  1. Deploy the Redpanda Helm chart with connectors enabled.

    Make sure to configure the following:

    • connectors.connectors.bootstrapServers: Kafka API endpoints on the Redpanda brokers for Kafka Connect to connect to.

    • connectors.connectors.brokerTLS (if tls.enabled is true): The brokers' TLS settings.

    • connectors.auth.sasl (if auth.sasl.enabled is true): The brokers' SASL authentication settings.

    See Configuration advice for details.

    • Helm + Operator

    • Helm

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha2
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        connectors:
          enabled: true
          deployment:
            create: true
          connectors:
            bootstrapServers: ""
            #brokerTLS:
          #auth:
            #sasl:
    kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
    • --values

    • --set

    redpanda-connectors.yaml
    connectors:
      enabled: true
      deployment:
        create: true
      connectors:
        bootstrapServers: ""
        #brokerTLS
      #auth:
        #sasl:
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --values redpanda-connectors.yaml --reuse-values
    helm upgrade --install redpanda redpanda/redpanda \
      --namespace <namespace> \
      --create-namespace \
      --set connectors.enabled=true \
      --set connectors.deployment.create=true \
      --set connectors.connectors.bootstrapServers=""
  2. Verify the deployment using Redpanda Console or the Kafka Connect REST API.

Configuration advice

This section provides advice for configuring the Redpanda Connectors subchart. All settings are nested in the connectors property of the Redpanda Helm chart. For all available settings, see Redpanda Connectors Helm Chart Specification.

Name overrides

Deploying multiple instances of the same Helm chart in a Kubernetes cluster can lead to naming conflicts. Using nameOverride and fullnameOverride helps differentiate between them. If you have a production and staging environment, different names help to avoid confusion.

  • Use nameOverride to customize:

    • The default labels app.kubernetes.io/component=<nameOverride> and app.kubernetes.io/name=<nameOverride>

    • The suffix in the name of the resources redpanda-<nameOverride>

  • Use fullnameOverride to customize the full name of the resources such as the Deployment and Services.

connectors:
  nameOverride: 'redpanda-connector-production'
  fullnameOverride: 'redpanda-connector-instance-prod'

For all available settings, see the Helm specification.

Labels

Kubernetes labels help you to organize, query, and manage your resources. Use labels to categorize Kubernetes resources in different deployments by environment, purpose, or team.

connectors:
  commonLabels:
    env: 'production'

For all available settings, see the Helm specification.

Tolerations

Tolerations and taints allow Pods to be scheduled onto nodes where they otherwise wouldn’t. If you have nodes dedicated to Kafka Connect with a taint dedicated=redpanda-connectors:NoSchedule, the following toleration allows the Pods to be scheduled on them.

connectors:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "redpanda-connectors"
    effect: "NoSchedule"

For all available settings, see the Helm specification.

Docker image

You can specify the image tag to deploy a known version of the Docker image. Avoid using the latest tag, which can lead to unexpected changes.

If you’re using a private repository, always ensure your nodes have the necessary credentials to pull the image.

connectors:
  image:
    repository: "redpanda/connectors"
    tag: "1.2.3"

For all available settings, see the Helm specification.

Kafka Connect

You can configure Kafka Connect using the connectors settings.

Change the default REST API port only if it conflicts with an existing port.

The bootstrapServers setting should point to the Kafka API endpoints on your Redpanda brokers.

If you install the chart in automatic mode, bootstrapServers is set automatically.

If you want to use Schema Registry, ensure the URL is set to the IP address or domain name of a Redpanda broker and that it includes the Schema Registry port.

connectors:
  connectors:
    restPort: 8082
    bootstrapServers: "redpanda-broker-0:9092"
    schemaRegistryURL: "http://schema-registry.default.svc.cluster.local:8081"

For all available settings, see the Helm specification.

Authentication

If your Redpanda cluster has SASL enabled, configure SASL authentication for secure communication with your Kafka connectors. If you install the Redpanda Helm chart in automatic mode, SASL authentication is configured automatically.

connectors:
  auth:
    sasl:
      enabled: true
      mechanism: "SCRAM-SHA-512"
      userName: "admin"
      secretRef: "sasl-password-secret"

For all available settings, see the Helm specification.

Container resources

Specify resource requests and limits. Ensure that javaMaxHeapSize is not greater than container.resources.limits.memory.

connectors:
  container:
    resources:
      requests:
        cpu: 1
        memory: 1Gi
      limits:
        cpu: 2
        memory: 2Gi
      javaMaxHeapSize: 2G
    javaGCLogEnabled: false

For all available settings, see the Helm specification.

Logging

In production, use the info logging level to avoid overwhelming the storage. For debugging purposes, temporarily change the logging level to debug.

connectors:
  logging:
    level: "info"

For all available settings, see the Helm specification.

Monitoring

If you have the Prometheus Operator, enable monitoring to deploy a PodMonitor resource for Kafka Connect. Observability is essential in production environments.

connectors:
  monitoring:
    enabled: true

For all available settings, see the Helm specification.

Number of replicas

You can scale the Kafka Connect Pods by modifying the deployment.replicas parameter in the Helm values. This parameter allows you to handle varying workloads by increasing or decreasing the number of running instances.

connectors:
  enabled: true
  deployment:
    create: true
    replicas: 3

The replicas: 3 setting ensures that three instances of the Kafka Connect Pod will be running. You can adjust this number based on your needs.

Redpanda Data recommends using an autoscaler such as Keda to increase the number of Pod replicas automatically when certain conditions, such as high CPU or memory usage, are met.

Deployment strategy

For smooth and uninterrupted updates, use the default RollingUpdate strategy. Additionally, set a budget to ensure a certain number of Pod replicas remain available during the update.

connectors:
  deployment:
    strategy:
      type: "RollingUpdate"
    updateStrategy:
      type: "RollingUpdate"
    budget:
      maxUnavailable: 1

For all available settings, see the Helm specification.

Probes

Probes determine the health and readiness of your Pods. Configure them based on the startup behavior of your connectors.

connectors:
  deployment:
    livenessProbe:
      initialDelaySeconds: 60
      periodSeconds: 10
    readinessProbe:
      initialDelaySeconds: 30
      periodSeconds: 10

For all available settings, see the Helm specification.

Deployment history

Keeping track of your deployment’s history is beneficial for rollback scenarios. Adjust the revisionHistoryLimit according to your storage considerations.

connectors:
  deployment:
    progressDeadlineSeconds: 600
    revisionHistoryLimit: 10

For all available settings, see the Helm specification.

Affinity rules

Affinities control Pod placement in the cluster based on various conditions. Set these according to your high availability and infrastructure needs.

connectors:
  deployment:
    podAntiAffinity:
      topologyKey: kubernetes.io/hostname
      type: hard
      weight: 100
      custom:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: "app"
              operator: "In"
              values:
              - "redpanda-connector"
          topologyKey: "kubernetes.io/hostname"
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: "app"
                operator: "In"
                values:
                - "redpanda-connector"
            topologyKey: "kubernetes.io/zone"

In this example:

  • The requiredDuringSchedulingIgnoredDuringExecution section ensures that the Kubernetes scheduler doesn’t place two Pods with the same app: redpanda-connector label on the same node due to the topologyKey: kubernetes.io/hostname.

  • The preferredDuringSchedulingIgnoredDuringExecution section is a soft rule that tries to ensure the Kubernetes scheduler doesn’t place two Pods with the same app: redpanda-connector label in the same zone. However, if it’s not possible, the scheduler can still place the Pods in the same zone.

For all available settings, see the Helm specification.

Graceful shutdown

If your connectors require additional time for a graceful shutdown, modify the terminationGracePeriodSeconds.

connectors:
  deployment:
    terminationGracePeriodSeconds: 30

For all available settings, see the Helm specification.

Service account

Restricting permissions is a best practice. Assign a dedicated service account for each deployment or app.

connectors:
  serviceAccount:
    create: true
    name: "redpanda-connector-service-account"

For all available settings, see the Helm specification.

Producers

When a source connector retrieves data from an external system for Redpanda, it assumes the role of a producer:

  • The source connector is responsible for transforming the external data into Kafka-compatible messages.

  • It then produces (writes) these messages to a specified Kafka topic.

The producerBatchSize and producerLingerMS settings specify how Kafka Connect groups messages before producing them.

connectors:
  connectors:
    producerBatchSize: 131072
    producerLingerMS: 1

For all available settings, see the Helm specification.

Topics

Kafka Connect leverages internal topics to track processed data, enhancing its fault tolerance:

  • The offset topic logs the last processed position from the external data source.

  • In events like failures or restarts, the connector uses this logged position to resume operations, ensuring no data duplication or omission.

connectors:
  connectors:
    storage:
      topic:
        offset: _internal_connectors_offsets

Here, _internal_connectors_offsets is the dedicated Kafka topic where Kafka Connect persists the offsets of the source connector.

For all available settings, see the Helm specification.

Verify the deployment

To verify that the deployment was successful, you can use either Redpanda Console or the Kafka Connect REST API:

Verify in Redpanda Console

  1. Expose Redpanda Console to your localhost:

    kubectl --namespace <namespace> port-forward svc/redpanda-console 8080:8080

    This command actively runs in the command-line window. To execute other commands while the command is running, open another command-line window.

  2. Open Redpanda Console on http://localhost:8080.

  3. Go to Connectors.

You should see:

  • A list of available connectors (types)

  • The address of your Kafka Connect cluster

  • The version of Kafka Connect that you are running

From here, you can create and configure instances of your connectors.

Verify with the Kafka Connect REST API

  1. Get the name of the Pod that’s running Kafka Connect:

    kubectl get pod -l app.kubernetes.io/name=connectors --namespace <namespace>
  2. View the version of Kafka Connect:

    kubectl exec <pod-name> --namespace <namespace> -- curl localhost:8083 | jq
    Example output
    {
      "version": "3.5.1",
      "commit": "2c6fb6c54472e90a",
      "kafka_cluster_id": "redpanda.58d6bd99-7f7c-4732-a398-b44bf892979a"
    }
  3. View the list of available connectors:

    curl localhost:8083/connector-plugins | jq
    Example output
    [
      {
        "class": "org.apache.kafka.connect.mirror.MirrorCheckpointConnector",
        "type": "source",
        "version": "3.5.1"
      },
      {
        "class": "org.apache.kafka.connect.mirror.MirrorHeartbeatConnector",
        "type": "source",
        "version": "3.5.1"
      },
      {
        "class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
        "type": "source",
        "version": "3.5.1"
      }
    ]

Install a new connector

To install new connectors other than the ones included in the Redpanda Connectors Docker image, you must:

  1. Prepare a JAR (Java archive) file for the connector.

  2. Mount the JAR file into the plugin directory of the Redpanda Connectors Docker image.

  3. Use that Docker image in the Helm chart.

Prepare a JAR file

Kafka Connect is written in Java. As such, connectors are also written in Java and packaged into JAR files. JAR files are used to distribute Java classes and associated metadata and resources in a single file. You can get JAR files for connectors in many ways, including:

  • Build from source: If you have the source code for a Java project, you can compile and package it into a JAR using build tools, such as:

    • Maven: Using the mvn package command.

    • Gradle: Using the gradle jar or gradle build command.

    • Java Development Kit (JDK): Using the jar command-line tool that comes with the JDK.

  • Maven Central Repository: If you’re looking for a specific Java library or framework, it may be available in the Maven Central Repository. From here, you can search for the library and download the JAR directly.

  • Vendor websites: If you are looking for commercial Java software or libraries, the vendor’s official website is a good place to check.

To avoid security risks, always verify the source of the JAR files. Do not download JAR files from unknown websites. Malicious JAR files can present a security risk to your execution environment.

Add the connector to the Docker image

The Redpanda Connectors Docker image is configured to find connectors in the /opt/kafka/redpanda-plugins directory. You must mount your connector’s JAR file to this directory in the Docker image.

  1. Create a new Dockerfile:

    Dockerfile
    FROM redpandadata/connectors:<version>
    
    COPY <path-to-jar-file> /opt/kafka/connect-plugins/<connector-name>/<jar-filename>

    Replace the following placeholders:

    • <version>: The version of the Redpanda Connectors Docker image that you want to use. For all available versions, see DockerHub.

    • <path-to-jar-file>: The path to the JAR file on your local system.

    • <connector-name>: A unique directory name in which to mount your JAR files.

    • <jar-filename>: The name of your JAR file, including the .jar file extension.

  2. Change into the directory where you created the Dockerfile and run:

    docker build -t <repo>/connectors:<version> .
    • Replace <repo> with the name of your Docker repository and <version> with your desired version or tag for the image.

  3. Push the image to your Docker repository:

    docker push <repo>/connectors:<version>

Deploy the Helm chart with your custom Docker image

  1. Modify the Helm values in the Redpanda Helm chart to use your new Docker image to deploy the Redpanda Connectors Helm chart:

    connectors:
      image:
        repository: <repo>/connectors
        tag: <version>
        pullPolicy: IfNotPresent

    Kafka Connect should discover the new connector automatically on startup.

  2. Get the name of the Pod that’s running Kafka Connect:

    kubectl get pod -l app.kubernetes.io/name=connectors --namespace <namespace>
  3. View all available connectors:

    kubectl exec <pod-name> --namespace <namespace> -- curl localhost:8083/connector-plugins | jq

You should see your new connector in the list.