Transform JSON Messages into a New Topic using JQ

This lab contains a reusable data transform using jaq a rust version of the the popular jq command line JSON processor.

See the jq manual for more information on how to write a filter: https://jqlang.github.io/jq/manual/

Prerequisites

You must have the following:

  • At least version 1.75 of Rust installed on your host machine.

  • The Wasm target for Rust installed. To install this target, run the following:

    rustup target add wasm32-wasi
  • Install rpk on your host machine.

  • Docker and Docker Compose installed on your host machine.

Run the lab

  1. Clone this repository:

    git clone https://github.com/redpanda-data/redpanda-labs.git
  2. Change into the data-transforms/jq/ directory:

    cd redpanda-labs/data-transforms/jq
  3. Set the REDPANDA_VERSION environment variable to at least version 23.3.1. Data transforms was introduced in this version. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_VERSION=24.1.8
  4. Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_CONSOLE_VERSION=2.6.0
  5. Start Redpanda in Docker by running the following command:

    docker compose up -d --wait
  6. Set up your rpk profile:

    rpk profile create jq --from-profile profile.yml
  7. Create the required topics:

    rpk topic create src sink
  8. Deploy the transforms function:

    rpk transform build
    rpk transform deploy --var=FILTER='del(.email)' --input-topic=src --output-topic=sink

    This example accepts the following environment variable:

    • FILTER (required): The jq expression that will run on each record’s value.

  9. Run rpk topic produce:

    rpk topic produce src
  10. Paste the following into the prompt and press Ctrl+D to exit:

    {"foo":42,"email":"help@example.com"}
  11. Consume the sink topic to see the email address was deleted and the record produced to the sink topic:

    rpk topic consume sink --num 1
    {
      "topic": "sink",
      "value": "{\"foo\":42}",
      "timestamp": 1707749921393,
      "partition": 0,
      "offset": 0
    }

You can also see this in Redpanda Console.

Clean up

To shut down and delete the containers along with all your cluster data:

docker compose down -v