Docs Labs Filter Messages into a New Topic using a Regex This is an example of how to filter messages from one topic into another using regular expressions (regex) and Redpanda data transforms. If a source topic contains a key or value that matches the regex, it will be produced to the sink topic. Regexes are implemented using Go’s regexp library, which uses the same syntax as RE2. See the RE2 wiki for help with syntax. The regex used in this example matches the typical email address pattern. Prerequisites You must have the following: At least version 1.20 of Go installed on your host machine. Install rpk on your host machine. Docker and Docker Compose installed on your host machine. Run the lab Clone this repository: git clone https://github.com/redpanda-data/redpanda-labs.git Change into the data-transforms/go/regex/ directory: cd redpanda-labs/data-transforms/regex Set the REDPANDA_VERSION environment variable to at least version 23.3.1. Data transforms was introduced in this version. For all available versions, see the GitHub releases. For example: export REDPANDA_VERSION=24.2.10 Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases. For example: export REDPANDA_CONSOLE_VERSION=2.7.2 Start Redpanda in Docker by running the following command: docker compose up -d --wait Set up your rpk profile: rpk profile create regex --from-profile profile.yml Create the required topics: rpk topic create src sink Build the transforms function: rpk transform build Deploy the transforms function: ./deploy-transform.sh See the file deploy-transform.sh to understand the regex used in the transform. Only input that matches the regular expression will be transformed. This example accepts the following environment variables: PATTERN (required): The regex to match against records. Here, the regex finds messages containing email addresses. MATCH_VALUE: By default, the regex matches record keys, but if set to true, the regex will match values. Run rpk topic produce: rpk topic produce src Paste the following into the prompt and press Ctrl+C to exit: Hello, please contact us at help@example.com. Hello, please contact us at support.example.com. Hello, please contact us at help@example.edu. Consume the sink topic to see that input lines containing email addresses were extracted and produced to the sink topic: rpk topic consume sink --num 2 { "topic": "sink", "value": "Hello, please contact us at help@example.com.", "timestamp": 1714525578013, "partition": 0, "offset": 0 } { "topic": "sink", "value": "Hello, please contact us at help@example.edu.", "timestamp": 1714525579192, "partition": 0, "offset": 1 } The second input line, Hello, please contact us at support.example.com., is not in the sink topic because it did not match the regex that identifies valid email addresses. You can also see the sink topic contents in Redpanda Console. Switch to the src topic to see all of the events, including the one that does not match the regex and is not in the sink topic. Clean up To shut down and delete the containers along with all your cluster data: docker compose down -v Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution