Get Started
This topic explains how to get started with Redpanda Connect.
Install
You can install Redpanda Connect using the rpk
command-line tool (CLI). The rpk
CLI allows you to create and manage data pipelines with Redpanda Connect as well as interact with Redpanda clusters.
Also interacting with a Redpanda cluster?
If you want to use rpk
to also communicate with a Redpanda cluster, ensure the version of rpk
that you install matches the version of Redpanda running in your cluster.
Linux
-
Download the
rpk
archive for Linux.-
To download the latest version of
rpk
:curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip
-
To download a version other than the latest:
curl -LO https://github.com/redpanda-data/redpanda/releases/download/v<version>/rpk-linux-amd64.zip
-
-
Ensure that you have the folder
~/.local/bin
:mkdir -p ~/.local/bin
-
Add it to your
$PATH
:export PATH="~/.local/bin:$PATH"
-
Unzip the
rpk
files to your~/.local/bin/
directory:unzip rpk-linux-amd64.zip -d ~/.local/bin/
-
Run
rpk --version
to display the version of the rpk binary:rpk --version
rpk version 24.1.12 (rev e2bfc05)
MacOS
-
Homebrew
-
Manual Download
-
If you don’t have Homebrew installed, install it.
-
Install
rpk
:brew install redpanda-data/tap/redpanda
-
Run
rpk --version
to display the version of the rpk binary:rpk --version
rpk version 24.1.12 (rev e2bfc05)
This method installs the latest version of rpk
, which is supported only with the latest version of Redpanda.
To install rpk through a manual download, choose an option that corresponds to your system architecture. For example, if you have an M1 or M2 chip, use the Apple Silicon instructions.
-
Intel macOS
-
Apple Silicon
-
Download the
rpk
archive for macOS.-
To download the latest version of
rpk
:curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-darwin-amd64.zip
-
To download a version other than the latest:
curl -LO https://github.com/redpanda-data/redpanda/releases/download/v<version>/rpk-darwin-amd64.zip
-
-
Ensure that you have the folder
~/.local/bin
:mkdir -p ~/.local/bin
-
Add it to your
$PATH
:export PATH=$PATH:~/.local/bin
-
Unzip the
rpk
files to your~/.local/bin/
directory:unzip rpk-darwin-amd64.zip -d ~/.local/bin/
-
Run
rpk --version
to display the version of the rpk binary:rpk --version
rpk version 24.1.12 (rev e2bfc05)
-
Download the
rpk
archive for macOS.-
To download the latest version of
rpk
:curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-darwin-arm64.zip
-
To download a version other than the latest:
curl -LO https://github.com/redpanda-data/redpanda/releases/download/v<version>/rpk-darwin-arm64.zip
-
-
Ensure that you have the folder
~/.local/bin
:mkdir -p ~/.local/bin
-
Add it to your
$PATH
:export PATH=$PATH:~/.local/bin
-
Unzip the
rpk
files to your~/.local/bin/
directory:unzip rpk-darwin-arm64.zip -d ~/.local/bin/
-
Run
rpk --version
to display the version of the rpk binary:rpk --version
rpk version 24.1.12 (rev e2bfc05)
Run
A Redpanda Connect stream pipeline is configured with a single config file, you can generate a fresh one with:
rpk connect create > connect.yaml
For Docker installations:
docker run --rm docker.redpanda.com/redpandadata/connect create > ./connect.yaml
The main sections that make up a config are input
, pipeline
and output
. When you generate a fresh config it’ll simply pipe stdin
to stdout
like this:
input:
stdin: {}
pipeline:
processors: []
output:
stdout: {}
Eventually we’ll want to configure a more useful input and output, but for now this is useful for quickly testing processors. You can execute this config with:
rpk connect run connect.yaml
For Docker installations:
docker run --rm -it -v $(pwd)/connect.yaml:/connect.yaml docker.redpanda.com/redpandadata/connect run
Anything you write to stdin will get written unchanged to stdout, cool! Resist the temptation to play with this for hours, there’s more stuff to try out.
Next, let’s add some processing steps in order to mutate messages. The most powerful one is the mapping
processor which allows us to perform mappings, let’s add a mapping to uppercase our messages:
input:
stdin: {}
pipeline:
processors:
- mapping: root = content().uppercase()
output:
stdout: {}
Now your messages should come out in all caps.
You can add as many processing steps as you like, and since processors are what make Redpanda Connect powerful they are worth experimenting with. Let’s create a more advanced pipeline that works with JSON documents:
input:
stdin: {}
pipeline:
processors:
- sleep:
duration: 500ms
- mapping: |
root.doc = this
root.first_name = this.names.index(0).uppercase()
root.last_name = this.names.index(-1).hash("sha256").encode("base64")
output:
stdout: {}
First, we sleep for 500 milliseconds just to keep the suspense going. Next, we restructure our input JSON document by nesting it within a field doc
, we map the upper-cased first element of names
to a new field first_name
. Finally, we map the hashed and base64 encoded value of the last element of names
to a new field last_name
.
Try running that config with some sample documents:
echo '{"id":"1","names":["celine","dion"]}
{"id":"2","names":["chad","robert","kroeger"]}' | rpk connect run connect.yaml
For Docker installations:
echo '{"id":"1","names":["celine","dion"]}
{"id":"2","names":["chad","robert","kroeger"]}' | docker run --rm -i -v $(pwd)/connect.yaml:/connect.yaml docker.redpanda.com/redpandadata/connect run
You should see this output in the logs:
{"doc":{"id":"1","names":["celine","dion"]},"first_name":"CELINE","last_name":"1VvPgCW9sityz5XAMGdI2BTA7/44Wb3cANKxqhiCo50="}
{"doc":{"id":"2","names":["chad","robert","kroeger"]},"first_name":"CHAD","last_name":"uXXg5wCKPjpyj/qbivPbD9H9CZ5DH/F0Q1Twytnt2hQ="}
See also: