Flatten JSON Messages
This example uses Redpanda data transforms to take JSON messages in an input topic and flatten them using a customizable delimiter.
{
"content": {
"id": 123,
"name": {
"first": "Dave",
"middle": null,
"last": "Voutila"
},
"data": [1, "fish", 2, "fish"]
}
}
{
"content.id": 123,
"content.name.first": "Dave",
"content.name.middle": null,
"content.name.last": "Voutila",
"content.data": [1, "fish", 2, "fish"]
}
Prerequisites
You must have the following:
-
At least version 1.20 of Go installed on your host machine.
-
Install
rpkon your host machine. -
Docker and Docker Compose installed on your host machine.
Limitations
-
Arrays of objects are currently untested.
-
Providing a series of objects as input, not an an array, may result in a series of flattened objects as output.
-
Due to how JSON treats floating point values, values such as
1.0that can be converted to an integer will lose the decimal point. For example1.0becomes1.
Run the lab
-
Clone this repository:
git clone https://github.com/redpanda-data/redpanda-labs.git -
Change into the
data-transforms/flatten/directory:cd redpanda-labs/data-transforms/go/flatten -
Set the
REDPANDA_VERSIONenvironment variable to at least version v23.3.1. Data transforms was introduced in this version. For all available versions, see the GitHub releases.For example:
export REDPANDA_VERSION=v26.1.9 -
Set the
REDPANDA_CONSOLE_VERSIONenvironment variable to the version of Redpanda Console that you want to run. For all available versions, see the GitHub releases.You must use at least version v3.0.0 of Redpanda Console to deploy this lab. For example:
export REDPANDA_CONSOLE_VERSION=v3.7.3 -
Start Redpanda in Docker by running the following command:
docker compose up -d --wait -
Set up your rpk profile:
rpk profile create flatten --from-profile profile.yml -
Create the required topics
iss_jsonandiss_avro:rpk topic create src sink -
Build and deploy the transforms function:
rpk transform build rpk transform deploy --input-topic=src --output-topic=sinkThis example accepts the following environment variables:
-
RP_FLATTEN_DELIM: The delimiter to use when flattening the JSON fields. Defaults to..For example:
rpk transform deploy --var "RP_FLATTEN_DELIM=<delimiter>"
-
-
Produce a JSON message to the source topic:
rpk topic produce src -
Paste the following into the prompt and press Ctrl+C to exit:
{"message": "success", "timestamp": 1707743943, "iss_position": {"latitude": "-28.5723", "longitude": "-149.4612"}} -
Consume the sink topic to see the flattened result:
rpk topic consume sink --num 1{ "topic": "sink", "value": "{\n \"message\": \"success\" \"timestamp\": 1.707743943e+09 \"iss_position.latitude\": \"-28.5723\",\n \"iss_position.longitude\": \"-149.4612\"\n}\n", "timestamp": 1707744765541, "partition": 0, "offset": 0 }
You can also see this in Redpanda Console.
Clean up
To shut down and delete the containers along with all your cluster data:
docker compose down -v