Get Started with the Redpanda Connect Helm Chart

This guide explains how to deploy and configure Redpanda Connect on Kubernetes using the Helm chart. It covers the available deployment modes and walks you through building your own pipelines.

What is Redpanda Connect?

Redpanda Connect is a powerful stream processor that integrates data across various sources (inputs) and sinks (outputs), enabling seamless data flows between systems. It supports complex data processing tasks such as data enrichment, transformation, filtering, and routing, making it an ideal solution for data pipelines that require high performance and resilience.

Redpanda Connect includes Bloblang, a flexible mapping language for processing data with built-in functions for transformations, random data generation, and more. These functions allow for easy customization of data pipelines.

Common use cases for Redpanda Connect include:

  • Real-time data ingestion: Ingesting data from various sources into Redpanda or other data platforms.

  • Data transformation: Enriching and transforming data to match business requirements before forwarding it to a destination.

  • Data filtering and routing: Filtering data and routing it to the appropriate sinks based on predefined conditions.

Prerequisites

Install with Helm

To deploy Redpanda Connect on Kubernetes, use the official Helm chart.

helm repo add redpanda https://charts.redpanda.com (1)
helm repo update (2)
helm install redpanda-connect redpanda/connect --namespace <namespace> --create-namespace (3)
1 Adds the Redpanda Helm repository.
2 Updates your local Helm repository cache.
3 Installs Redpanda Connect in the given namespace. You can customize this deployment by configuring values in the Helm chart.

Deployment modes

Redpanda Connect can run in two different modes: standalone mode and streams mode.

  • Standalone mode: Allows you to run a single pipeline at a time, making it suitable for simpler configurations that don’t need to run concurrently.

  • Streams mode: Supports multiple pipelines running simultaneously, with each pipeline managed through a ConfigMap. Streams mode is ideal for complex data processing use cases.

Run pipelines in standalone mode

In standalone mode, you can configure a single pipeline within the config section of your Helm values file.

Hello world

In this example, you’ll produce a simple message and convert it to uppercase using a Bloblang method.

  1. Create a pipeline.yaml file.

    This file will be used to override the default Helm chart values for Redpanda Connect.

  2. Add the following configuration to pipeline.yaml:

    config:
      input:
        generate:
          mapping: |
            root.message = "Hello, Redpanda Connect!" (1)
          count: 1 (2)
      pipeline:
        processors:
          - mapping: |
              root.message = this.message.uppercase() (3)
      output:
        stdout: {} (4)
    1 Use Bloblang to create a JSON object with the message "Hello, Redpanda Connect!" stored in root.message.
    2 Generate the message only once.
    3 Convert the message field in the input to uppercase using the uppercase() method in Bloblang.
    4 Output the processed data to stdout, making it viewable in logs.
  3. Deploy the pipeline:

    helm upgrade --install redpanda-connect redpanda/connect --namespace <namespace> --values pipeline.yaml
  4. Check the logs:

    export POD_NAME=$(kubectl get pods --namespace <namespace> -l "app.kubernetes.io/name=redpanda-connect,app.kubernetes.io/instance=redpanda-connect" -o jsonpath="{.items[0].metadata.name}")
    kubectl logs $POD_NAME --namespace <namespace>

    You should see the message converted to uppercase in the output:

    {"message":"HELLO, REDPANDA CONNECT!"}
  5. Check the Pod’s status:

    kubectl get pods --namespace <namespace> -l app.kubernetes.io/name=redpanda-connect --watch

    The Pod enters a CrashLoopBackOff state because containers are expected to run continuously. When Redpanda Connect finishes processing the pipeline, the Pod exits, causing Kubernetes to restart it repeatedly.

    To prevent this status, you can configure Redpanda Connect to continue processing data indefinitely.

Produce continuous data

To produce data continuously, you can set input.generate.count to 0.

  1. Update the pipeline.yaml file to produce a message every second, indefinitely:

    config:
      input:
        generate:
          interval: 1s
          count: 0  # Setting count to 0 ensures it generates data indefinitely.
          mapping: |
            root.message = "Hello, Redpanda Connect!"
      pipeline:
        processors:
          - mapping: |
              root.message = this.uppercase()
      output:
        stdout: {}
  2. Deploy the updated configuration:

    helm upgrade --install redpanda-connect redpanda/connect --namespace <namespace> --values pipeline.yaml
  3. Watch the logs:

    export POD_NAME=$(kubectl get pods --namespace <namespace> -l "app.kubernetes.io/name=redpanda-connect,app.kubernetes.io/instance=redpanda-connect" -o jsonpath="{.items[0].metadata.name}")
    kubectl logs $POD_NAME --namespace <namespace> -f

    You should see in the logs that Redpanda Connect is producing the same message every second and its being converted to uppercase:

    {"message": "HELLO, REDPANDA CONNECT!"}
    {"message": "HELLO, REDPANDA CONNECT!"}
    {"message": "HELLO, REDPANDA CONNECT!"}
  4. Check the Pod’s status:

    kubectl get pods --namespace <namespace> -l app.kubernetes.io/name=redpanda-connect --watch

    The Pod should now be running without entering a CrashLoopBackOff state, as the generate input continuously feeds new data to the pipeline, preventing it from terminating.

Simulate realistic data streams

To make the output more realistic, use some Bloblang functions to generate varied data such as random names and emails.

  1. Update the pipeline.yaml file to generate some realistic user data.

    config:
      input:
        generate:
          interval: 1s
          count: 0
          mapping: |
            # Store the generated names in variables
            let first_name = fake("first_name")
            let last_name = fake("last_name")
    
            # Build the message
            root.user_id = counter()
            root.name = ($first_name + " " + $last_name)
            root.timestamp = now()
      pipeline:
        processors:
          - mapping: |
              root.name = this.name.uppercase()
      output:
        stdout: {}

    This configuration generates a JSON object with:

    • user_id: A unique identifier for each record, generated using the counter() function.

    • name: A randomly generated first and last name, using the fake() function. The first and last names are stored in variables and referenced using the $<variable-name> syntax.

    • timestamp: The current timestamp at the time of generation, using the now() function.

  2. Deploy the updated configuration:

    helm upgrade --install redpanda-connect redpanda/connect --namespace <namespace> --values pipeline.yaml
  3. Watch the logs:

    export POD_NAME=$(kubectl get pods --namespace <namespace> -l "app.kubernetes.io/name=redpanda-connect,app.kubernetes.io/instance=redpanda-connect" -o jsonpath="{.items[0].metadata.name}")
    kubectl logs $POD_NAME --namespace <namespace> -f

    You should see logs showing JSON objects similar to the following, with names in uppercase:

    {"name":"ZOIE SIPES"}
    {"name":"LORENA KERTZMANN"}
    {"name":"DALLAS BOYER"}
    {"name":"LOUIE WILDERMAN"}
    {"name":"EMILIA KOEPP"}
    {"name":"KALEIGH PACOCHA"}

Process data from a file input

To configure a pipeline that reads data from a file, first store the data in a ConfigMap. This ConfigMap will be mounted into the Redpanda Connect Pod, allowing it to read the file directly.

  1. Create a ConfigMap to provide the input data that Redpanda Connect will read. This example ConfigMap contains a JSON object with example user data:

    kubectl create configmap connect-input --from-literal=input-data='{"name": "Redpanda Connect", "email": "rp.connect@example.com"}' --namespace <namespace>

    This ConfigMap will act as the source for the file-based input in Redpanda Connect, allowing the pipeline to read and process this structured JSON data.

  2. Update the pipeline.yaml file to read data from the file mounted by the ConfigMap:

    pipeline.yaml
    extraVolumes:
      - name: input-config
        configMap:
          name: connect-input
    extraVolumeMounts:
      - name: input-config
        mountPath: /input (1)
        subPath: input-data
    config:
      input:
        file:
          paths:
            - "/input" (1)
      pipeline:
        processors:
          - mapping: |
              root.name = this.name.uppercase()
      output:
        stdout: {}
    1 For the input, use the contents of the file at the path where the ConfigMap data is mounted.
  3. Deploy the pipeline:

    helm upgrade --install redpanda-connect redpanda/connect --namespace <namespace> --values pipeline.yaml
  4. Check the logs:

    export POD_NAME=$(kubectl get pods --namespace <namespace> -l "app.kubernetes.io/name=redpanda-connect,app.kubernetes.io/instance=redpanda-connect" -o jsonpath="{.items[0].metadata.name}")
    kubectl logs $POD_NAME --namespace <namespace>

    You should see the username converted to uppercase in the output:

    {"name":"REDPANDA CONNECT"}

Run multiple pipelines in streams mode

In streams mode, each pipeline, defined in separate YAML files, runs simultaneously, making this mode ideal for high-throughput applications. All the YAML files must be bundled together into a ConfigMap that you can pass to Redpanda Connect.

  1. Define your pipeline configurations in the following separate YAML files:

    woof.yaml
    input:
      generate:
        mapping: root = "woof" # Generates a message with the word "woof" at regular intervals.
        interval: 5s
        count: 0
    output:
      stdout:
        codec: lines # Outputs each message as a new line in stdout.
    meow.yaml
    input:
      generate:
        mapping: root = "meow" # Generates a message with the word "meow" at regular intervals.
        interval: 2s
        count: 0
    output:
      stdout:
        codec: lines # Outputs each message as a new line in stdout.
  2. Bundle the configuration files into a ConfigMap, which Redpanda Connect will reference:

    kubectl create configmap connect-streams --from-file=woof.yaml --from-file=meow.yaml --namespace <namespace>
  3. Configure Redpanda Connect in streams mode and specify the name of the ConfigMap to use:

    connect.yaml
    streams:
      enabled: true (1)
      streamsConfigMap: "connect-streams" (2)
    1 Enable streams mode in Redpanda Connect.
    2 Use the given ConfigMap as the pipeline configuration.
  4. Deploy the chart:

    helm upgrade --install redpanda-connect redpanda/connect --namespace <namespace> --values connect.yaml
  5. Watch the logs:

    export POD_NAME=$(kubectl get pods --namespace <namespace> -l "app.kubernetes.io/name=redpanda-connect,app.kubernetes.io/instance=redpanda-connect" -o jsonpath="{.items[0].metadata.name}")
    kubectl logs $POD_NAME --namespace <namespace> -f

    You should see logs showing a combination of outputs from both pipelines:

    woof
    meow
    meow
    meow
    woof
    meow
    meow

Update the pipeline in streams mode

To update a pipeline in streams mode:

  1. Modify one of the configuration files locally.

    woof.yaml
    # Updated woof.yaml
    input:
      generate:
        mapping: root = "bark"  # Updated to generate a message with the word "bark" instead of "woof."
        interval: 5s
        count: 0
    output:
      stdout:
        codec: lines
  2. Update the ConfigMap with the modified file:

    kubectl create configmap connect-streams --from-file=woof.yaml --from-file=meow.yaml --namespace <namespace> --dry-run=client -o yaml | kubectl apply -f -
  3. Restart the Deployment:

    kubectl rollout restart deployment/redpanda-connect --namespace <namespace>

Global configuration

When deploying Redpanda Connect in streams mode, you can configure global tracing, logging, and HTTP settings to apply across all pipelines. Specify these in your values.yaml overrides under the metrics, logger, and tracing sections.

metrics:
  prometheus: {} # Enable Prometheus metrics collection.

tracing:
  openTelemetry:
    http: [] # Configure OpenTelemetry HTTP tracing.
    grpc: []
    tags: {}

logger:
  level: INFO # Set logging level (e.g., INFO, DEBUG).
  static_fields:
    '@service': redpanda-connect # Add static fields to logs for better traceability.

Access the HTTP server on Redpanda Connect

To manage and monitor Redpanda Connect, you can use its HTTP server, which provides useful endpoints for version checking, pipeline management, and more. By default, Redpanda Connect exposes this server using a Kubernetes ClusterIP Service, accessible only within the cluster.

  1. Forward the ports of the ClusterIP Service to your local device:

    kubectl port-forward svc/redpanda-connect 8080:80 --namespace <namespace>
  2. Access the HTTP server locally. For example, to check the Redpanda Connect version, run:

    curl http://localhost:8080/version

    Example output:

    {
      "version": "v4.38.0",
      "built": "2024-10-17T09:27:42Z"
    }

You can also configure external access using a LoadBalancer Service or an Ingress. See the Helm values for more details.

Uninstall Redpanda Connect

To remove Redpanda Connect and all related resources from your Kubernetes cluster, use the helm uninstall command to uninstall the chart:

helm uninstall redpanda-connect --namespace <namespace>

This command deletes all resources created by the Helm chart, including Deployments and Services.

Uninstalling the chart does not delete the ConfigMaps that you manually created outside of the Helm chart. To delete these ConfigMaps, do the following:

kubectl delete configmap connect-streams connect-input --namespace <namespace>

Next steps

  • Learn more about Bloblang, the mapping language for processing data in Redpanda Connect.

  • Try more hands-on examples with one of the Cookbooks.