Configure Data Transforms

Learn how to configure data transforms in Redpanda, including editing the transform.yaml file, environment variables, and memory settings. This topic covers both the configuration of transform functions and the WebAssembly (Wasm) engine’s environment.

Configure transform functions

This section covers how to configure transform functions using the transform.yaml configuration file, command-line overrides, and environment variables.

Transform configuration file

When you initialize a data transforms project, a transform.yaml file is generated in the provided directory. You can use this configuration file to configure the transform function with settings, including input and output topics, the language used for the data transform, and any environment variables.

  • name: The name of the transform function.

  • description: A description of what the transform function does.

  • input-topic: The topic from which data is read.

  • output-topics: A list of up to eight topics to which the transformed data is written.

  • language: The language used for the transform function. The language is set to the one you defined during initialization.

  • env: A dictionary of custom environment variables that are passed to the transform function. Do not prefix keys with REDPANDA_. Check the list of all limitations.

Here is an example of a transform.yaml file:

name: redpanda-example
description: |
  This transform function is an example to demonstrate how to configure data transforms in Redpanda.
input-topic: example-input-topic
output-topics:
  - example-output-topic-1
  - example-output-topic-2
language: tinygo-no-goroutines
env:
  DATA_TRANSFORMS_ARE_AWESOME: 'true'

Override configurations with command-line options

You can set the name of the transform function, environment variables, and input and output topics on the command-line when you deploy the transform. These command-line settings take precedence over those specified in the transform.yaml file.

Built-In environment variables

As well as custom environment variables set in either the command-line or the configuration file, Redpanda makes some built-in environment variables available to your transform functions. These variables include:

  • REDPANDA_INPUT_TOPIC: The input topic specified.

  • REDPANDA_OUTPUT_TOPIC_0..REDPANDA_OUTPUT_TOPIC_N: The output topics in the order specified on the command line or in the configuration file. For example, REDPANDA_OUTPUT_TOPIC_0 is the first variable, REDPANDA_OUTPUT_TOPIC_1 is the second variable, and so on.

Transform functions are isolated from the broker’s internal environment variables to maintain security and encapsulation. Each transform function only uses the environment variables explicitly provided to it.

Configure the Wasm engine

This section covers how to configure the Wasm engine environment using Redpanda cluster configuration properties.

Enable data transforms

To use data transforms, you must enable it for a Redpanda cluster using the data_transforms_enabled property.

Configure memory resources for data transforms

Redpanda reserves memory for each transform function within the broker. You need enough memory for your input record and output record to be in memory at the same time.

Set the following properties based on the number of functions you have and the amount of memory you anticipate needing.

  • data_transforms_per_core_memory_reservation: Increase this setting if you plan to deploy a large number of data transforms or if your transforms are memory-intensive. Reducing it may limit the number of concurrent transforms.

  • data_transforms_per_function_memory_limit: Adjust this setting if individual transform functions require more memory to process records efficiently. Reducing it may cause memory errors in complex transforms.

The maximum number of functions that can be deployed to a cluster is equal to data_transforms_per_core_memory_reservation / data_transforms_per_function_memory_limit. When that limit is hit, Redpanda cannot allocate memory for the VM and the transforms stay in errored states.

Configure maximum binary size

You can set the maximum size for a deployable Wasm binary that the broker can store using the data_transforms_binary_max_size property.

Increase this setting if your Wasm binaries are larger than the default limit. Setting it too low may prevent deployment of valid transform functions.

Configure commit interval

You can set the interval at which data transforms commit their progress using the data_transforms_commit_interval_ms property.

Adjust this setting to control how frequently the transform function’s progress is committed. Shorter intervals may provide more frequent progress updates but can increase load. Longer intervals reduce load but may delay progress updates.

Configure transform logging

Redpanda provides several properties to configure logging for data transforms:

  • data_transforms_logging_buffer_capacity_bytes: Increase this value if your transform logs are large or if you need to buffer more log data before flushing. Reducing this value may cause more frequent log flushing.

  • data_transforms_logging_flush_interval_ms: Adjust this value to control how frequently logs are flushed to the transform_logs topic. Shorter intervals provide more frequent log updates but can increase load. Longer intervals reduce load but may delay log updates.

  • data_transforms_logging_line_max_bytes: Increase this value if your log messages are frequently truncated. Setting this value too low may truncate important log information.

Configure runtime limits

You can set the maximum runtime for starting up a data transform and the time it takes for a single record to be transformed using the data_transforms_runtime_limit_ms property.

Adjust this value only if your transform functions need more time to process each record or to start up.