Flatten JSON Messages

This example uses Redpanda data transforms to take JSON messages in an input topic and flatten them using a customizable delimiter.

Example input topic
{
  "content": {
    "id": 123,
    "name": {
      "first": "Dave",
      "middle": null,
      "last": "Voutila"
    },
    "data": [1, "fish", 2, "fish"]
  }
}
Example output topic with flattened JSON
{
  "content.id": 123,
  "content.name.first": "Dave",
  "content.name.middle": null,
  "content.name.last": "Voutila",
  "content.data": [1, "fish", 2, "fish"]
}

Prerequisites

You must have the following:

Limitations

  • Arrays of objects are currently untested.

  • Providing a series of objects as input, not an an array, may result in a series of flattened objects as output.

  • Due to how JSON treats floating point values, values such as 1.0 that can be converted to an integer will lose the decimal point. For example 1.0 becomes 1.

Run the lab

  1. Clone this repository:

    git clone https://github.com/redpanda-data/redpanda-labs.git
  2. Change into the data-transforms/flatten/ directory:

    cd redpanda-labs/data-transforms/flatten
  3. Set the REDPANDA_VERSION environment variable to at least version 23.3.1. Data transforms was introduced in this version. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_VERSION=23.3.11
  4. Set the REDPANDA_CONSOLE_VERSION environment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_CONSOLE_VERSION=2.4.6
  5. Start Redpanda in Docker by running the following command:

    docker compose up -d --wait
  6. Set up your rpk profile:

    rpk profile create flatten --from-profile profile.yml
  7. Create the required topics iss_json and iss_avro:

    rpk topic create src sink
  8. Build and deploy the transforms function:

    rpk transform build
    rpk transform deploy --input-topic=src --output-topic=sink

    This example accepts the following environment variables:

    • RP_FLATTEN_DELIM: The delimiter to use when flattening the JSON fields. Defaults to ..

      For example:

      rpk transform deploy --var "RP_FLATTEN_DELIM=<delimiter>"
  9. Produce a JSON message to the source topic:

    rpk topic produce src
  10. Paste the following into the prompt and press Ctrl+C to exit:

    {"message": "success", "timestamp": 1707743943, "iss_position": {"latitude": "-28.5723", "longitude": "-149.4612"}}
  11. Consume the sink topic to see the flattened result:

    rpk topic consume sink --num 1
    {
      "topic": "sink",
      "value": "{\n  \"message\": \"success\"  \"timestamp\": 1.707743943e+09  \"iss_position.latitude\": \"-28.5723\",\n  \"iss_position.longitude\": \"-149.4612\"\n}\n",
      "timestamp": 1707744765541,
      "partition": 0,
      "offset": 0
    }

You can also see this in Redpanda Console.

Clean up

To shut down and delete the containers along with all your cluster data:

docker compose down -v