Redpanda Connect Quickstart

Learn how to build, run, and update a data pipeline on a Redpanda Cloud cluster using Redpanda Connect.

This quickstart introduces you to Redpanda Connect for Redpanda Cloud. Using a single YAML configuration file, you can quickly build streaming data pipelines from scratch. No third-party connectors are required.

Redpanda Connect is currently in a limited availability (LA) release for BYOC and Dedicated clusters. While it is also available as a beta feature for Serverless Standard and Pro clusters, it is not suitable for production deployments.

Prerequisites

A Redpanda Cloud account for Serverless Standard, Serverless Pro, Dedicated, or standard BYOC (not BYOVPC). If you don’t already have an account, sign up for a free trial.

Before you start

Create the cluster, topic, and user you need to build and run your data pipeline.

Currently, there is a 10 pipeline limit per cluster.
  • Serverless Standard

  • Serverless Pro

  • Dedicated

  • BYOC

  1. Log in to Redpanda Cloud.

  2. On the Clusters page, click Create cluster, then under Serverless Standard, click Create.

  3. For cluster settings, enter connect-quickstart for the cluster name.

  4. Select a resource group. If you don’t have an existing resource group, go to the Resource groups page to create one and then return to the step 2.

  5. Select the default cloud provider, then click Create.

  1. Log in to Redpanda Cloud.

  2. On the Clusters page, click Create cluster, then under Serverless Pro, click Create.

  3. For cluster settings, enter connect-quickstart for the cluster name.

  4. Select a resource group. If you don’t have an existing resource group, go to the Resource groups page to create one and then return to the step 2.

  5. Select the default cloud provider, then click Create.

  1. Log in to Redpanda Cloud.

  2. On the Clusters page, click Create cluster, then under Dedicated, click Create.

  3. On the Cluster settings page, enter connect-quickstart for the cluster name.

  4. Select a resource group. If you don’t have an existing resource group, go to the Resource groups page to create one and then return to the step 2.

  5. Select your cloud provider, then use the default values for the remaining properties and click Next.

  6. On the Networking page, use the default Public connection type, and click Create.

    Wait while your cluster is created.

  1. Log in to Redpanda Cloud.

  2. On the Clusters page, click Create cluster, then under Bring Your Own Cloud, click Create.

  3. On the Cluster settings page, enter connect-quickstart for the cluster name.

  4. Select a resource group. If you don’t have an existing resource group, go to the Resource groups page to create one and then return to the step 2.

  5. Select your cloud provider, then use the default values for the remaining properties and click Next.

  6. On the Networking page, select a Private connection type and choose a CIDR range, which does not overlap with your existing VPCs or your Redpanda network.

  7. Click Next.

  8. On the Deploy page, follow the steps to log in to Redpanda Cloud and deploy the agent.

To complete your setup:

  1. Go to the Topics page, click Create topic and enter processed-emails for the topic name. Use default values for the remaining properties and click Create and then Close.

  2. Go to the Security page, and click Create user. Enter the username connect and take a note of the password. You will need to use this later. Use the default values for the remaining properties.

  3. Click Create and Done.

  4. Stay on the Access control page and click the ACLs tab.

  5. Select the connect user you have just created. Click Allow all operations and then scroll down to click OK.

Build your data pipeline

Configure your first data pipeline on the connect-quickstart cluster.

All Redpanda Connect configurations use a YAML file split into three sections:

Section In this data pipeline

The input or data source

A fake data source that generates a batch of email messages every second, populated with an ID, paragraph, email address, and a user name.

The pipeline with one or more processors

A mutation processor to add a title to every email message it processes.

The output or data sink

A kafka_franz output that writes messages to the connect-output topic on your cluster.

  1. Go to the Connect page on your cluster and click Create pipeline.

  2. In Pipeline name, enter emailprocessor-pipeline and add a short description. For example, Transforms email data using a mutation processor.

  3. In the Tasks box, leave the default value of 1. Tasks are used to allocate resources to a pipeline. One task is equivalent to 0.1 CPU and 400 MB of memory, and provides a message throughput of approximately 1 MB/sec.

  4. In the Configuration box, paste the following configuration.

    input:
     generate:
       interval: 1s
       mapping: |
         root.id = uuid_v4()
         root.user.name = fake("name")
         root.user.email = fake("email")
         root.content = fake("paragraph")
    
    pipeline:
     processors:
       - mutation: |
           root.title = "PRIVATE AND CONFIDENTIAL"
    
    output:
     kafka_franz:
       seed_brokers:
         - ${REDPANDA_BROKERS}
       sasl:
         - mechanism: SCRAM-SHA-256
           password: <cluster-password>
           username: connect
       topic: processed-emails
       tls:
         enabled: true
    • Replace <cluster-password> with the password of the connect user you set up in Before you start. To avoid exposing secrets, Redpanda Connect also supports secret variables. For more information, see Manage Secrets.

    • ${REDPANDA_BROKERS} is a contextual variable that references the bootstrap server address of your cluster. All Redpanda Cloud clusters automatically set this variable to the bootstrap server address so that you can add it to any of your pipelines.

    The Brave browser does not fully support code snippets.
  5. Click Create. Your pipeline details are displayed and the pipeline state changes from Starting to Running, which may take a few minutes. If you don’t see this state change, refresh your page.

    Redpanda Connect starts to ingest, process, and write transformed email messages to the processed-emails topic.

  6. After a few seconds, select the pipeline and click Stop.

Explore the processed data and logs

Take a look at the data your pipeline has already processed, and the logs that are available for troubleshooting.

To see the pipeline output:

  1. Go to the Topics page and select the processed-emails topic.

  2. Select any message to see the email message fields generated by the pipeline input, along with a title field added by the mutation processor. All messages have the following structure:

    {
        "content": "Aliquam quidem tempore expedita debitis ab. Officiis optio eveniet ab magni commodi...",
        "id": "35522c66-6fcd-47da-b97b-857b983477d1",
        "title": "PRIVATE AND CONFIDENTIAL",
        "user": {
            "email": "oCcXPTh@RrKHZRQ.info",
            "name": "King Francis Torphy"
        }
    }

To view the logs:

  1. Return to the Connect page on your cluster and select the emailprocessor-pipeline.

  2. Click the Logs tab and select each of the four log messages. You can see the sequence of events that start the data pipeline. For example, you can see when Redpanda Connect starts to write data to the topic:

    {
        "instance_id": "cr3j2rab2tks83v3gbh0",
        "label": "",
        "level": "INFO",
        "message": "Output type kafka_franz is now active",
        "path": "root.output",
        "pipeline_id": "cr3j2r6hqokqcph9p4b0",
        "time": "2024-08-22T12:39:09.729899336Z"
    }

Update your pipeline

Now try adding custom logging and an extra data transformation step to your configuration. You can make the updates while your data pipeline is running.

  1. Select the Configuration tab of your data pipeline.

  2. Click Start and wait for your pipeline to start running.

  3. Click Edit and overwrite the processors section of your configuration with the following snippet.

      processors:
        - mutation: |
            root.title = "PRIVATE AND CONFIDENTIAL"
            root.user.name = root.user.name.uppercase()
        - log:
           level: INFO
           message: 'Processed email for ${!this.user.name}'
           fields_mapping: |
             root.reason = "SUCCESS"
             root.id = this.id

    The snippet includes new configuration to:

    • Transform the name of each email sender to uppercase.

    • Change the logging level

    • Write a summary message and the reason for every email message processed, using the log processor.

  4. Click Update.

  5. Once the pipeline has started running, click the Logs tab and select the most recent (final) log message. You can see the custom logging fields along with the uppercase user’s name.

    {
        "id": "f64d1f1a-2d76-47ad-a215-52410ab4e22f",
        "instance_id": "cr3ncrvom8ofl3bn3rk0",
        "label": "",
        "level": "INFO",
        "message": "Processed email for MISS IMELDA REICHERT",
        "path": "root.pipeline.processors.1",
        "pipeline_id": "cr3me2uhqokqcph9p4bg",
        "reason": "SUCCESS",
        "time": "2024-08-22T17:33:46.676903284Z"
    }
  6. Click Stop.

Clean up

When you’ve finished experimenting with your data pipeline, you can delete the pipeline, topic, and cluster you created for this quickstart.

  1. On the Connect page, select the delete icon next to the emailprocessor-pipeline.

  2. Confirm your deletion to remove the data pipeline and associated logs.

  3. On the Topics page, delete the processed-emails topic.

  4. Go back to the Clusters page and delete the connect-quickstart cluster.