Deploy Data Transforms

Learn how to build, deploy, share, and troubleshoot data transforms in Redpanda.

Prerequisites

Before you begin, ensure that you have the following:

Build the Wasm binary

To build a Wasm binary:

  1. Ensure your project directory contains a transform.yaml file.

  2. Build the Wasm binary using the rpk transform build command.

rpk transform build

You should now have a Wasm binary named <transform-name>.wasm, where <transform-name> is the name specified in your transform.yaml file. This binary is your data transform function, ready to be deployed to a Redpanda cluster or hosted on a network for others to use.

Deploy the Wasm binary

You can deploy your transform function using the rpk transform deploy command.

  1. Validate your setup against the pre-deployment checklist:

    • Do you meet the Prerequisites?

    • Does your transform function access any environment variables? If so, make sure to set them in the transform.yaml file or in the command-line when you deploy the binary.

    • Do your configured input and output topics already exist? Input and output topics must exist in your Redpanda cluster before you deploy the Wasm binary.

  2. Deploy the Wasm binary:

    rpk transform deploy

When the transform function reaches Redpanda, it starts processing new records that are written to the input topic.

Reprocess records

In some cases, you may need to reprocess records from an input topic that already contains data. Processing existing records can be useful, for example, to process historical data into a different format for a new consumer, to re-create lost data from a deleted topic, or to resolve issues with a previous version of a transform that processed data incorrectly.

To reprocess records, you can specify the starting point from which the transform function should process records in each partition of the input topic. The starting point can be either a partition offset or a timestamp.

The --from-offset flag is only effective the first time you deploy a transform function. On subsequent deployments of the same function, Redpanda resumes processing from the last committed offset. To reprocess existing records using an existing function, delete the function and redeploy it with the --from-offset flag.

To deploy a transform function and start processing records from a specific partition offset, use the following syntax:

rpk transform deploy --from-offset +/-<offset>

In this example, the transform function will start processing records from the beginning of each partition of the input topic:

rpk transform deploy --from-offset +0

To deploy a transform function and start processing records from a specific timestamp, use the following syntax:

rpk transform deploy --from-timestamp @<unix-timestamp>

In this example, the transform function will start processing from the first record in each partition of the input topic that was committed after the given timestamp:

rpk transform deploy --from-timestamp @1617181723

Share Wasm binaries

You can also deploy data transforms on a Redpanda cluster by providing an addressable path to the Wasm binary. This is useful for sharing transform functions across multiple clusters or teams within your organization.

For example, if the Wasm binary is hosted at https://my-site/my-transform.wasm, use the following command to deploy it:

rpk transform deploy --file=https://my-site/my-transform.wasm

Edit existing transform functions

To make changes to an existing transform function:

  1. Make your changes to the code.

  2. Rebuild the Wasm binary.

  3. Redeploy the Wasm binary to the same Redpanda cluster.

When you redeploy a Wasm binary with the same name, it will resume processing from the last offset it had previously processed. If you need to reprocess existing records, you must delete the transform function, and redeploy it with the --from-offset flag.

Deploy-time configuration overrides must be provided each time you redeploy a Wasm binary. Otherwise, they will be overwritten by default values or the configuration file’s contents.

Delete a transform function

To delete a transform function, use the following command:

rpk transform delete <transform-name>

For more details about this command, see rpk transform delete.

Troubleshoot

This section provides guidance on how to diagnose and troubleshoot issues with building or deploying data transforms.

Invalid transform environment

This error means that one or more of your configured custom environment variables are invalid.

Check your custom environment variables against the list of limitations.

Invalid WebAssembly

This error indicates that the binary is missing a required callback function:

Invalid WebAssembly - the binary is missing required transform functions. Check the broker support for the version of the data transforms SDK being used.

All transform functions must register a callback with the OnRecordWritten() method. For more details, see Develop Data Transforms.

Next steps

Set up monitoring for data transforms.