Filter Messages into a New Topic using a Regex

This is an example of how to filter messages from one topic into another using regular expressions (regex) and Redpanda data transforms. If a source topic contains a key or value that matches the regex, it will be produced to the sink topic.

Regexes are implemented using Go’s regexp library, which uses the same syntax as RE2. See the RE2 wiki for help with syntax.

Prerequisites

You must have the following:

Run the lab

  1. Clone this repository:

    git clone https://github.com/redpanda-data/redpanda-labs.git
  2. Change into the data-transforms/regex/ directory:

    cd redpanda-labs/data-transforms/regex
  3. Set the REDPANDA_VERSION environment variable to at least version 23.3.1. Data transforms was introduced in this version. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_VERSION=23.3.12
  4. Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_CONSOLE_VERSION=2.4.6
  5. Start Redpanda in Docker by running the following command:

    docker compose up -d --wait
  6. Set up your rpk profile:

    rpk profile create regex --from-profile profile.yml
  7. Create the required topics:

    rpk topic create src sink
  8. Build the transforms function:

    rpk transform build
  9. Deploy the transforms function:

    ./deploy-transform.sh

    This example accepts the following environment variables:

    • PATTERN (required): The regex to match against records. Here, the regex finds messages containing email addresses.

    • MATCH_VALUE: By default, the regex matches record keys, but if set to true, the regex will match values.

  10. Run rpk topic produce:

    rpk topic produce src
  11. Paste the following into the prompt and press Ctrl+C to exit:

    Hello, please contact us at help@example.com.
  12. Consume the sink topic to see the email address was extracted and produced to the sink topic:

    rpk topic consume sink --num 1
    {
      "topic": "sink",
      "value": "Hello, please contact us at help@example.com.",
      "timestamp": 1707749921393,
      "partition": 0,
      "offset": 0
    }

You can also see this in Redpanda Console.

Clean up

To shut down and delete the containers along with all your cluster data:

docker compose down -v