Monitor Data Transforms
This topic provides guidelines on how to monitor the health of your data transforms and view logs.
Prerequisites
Set up monitoring for your cluster.
Performance
You can identify performance bottlenecks by monitoring latency and CPU usage:
If latency is high, investigate the transform logic for inefficiencies or consider scaling the resources. High CPU usage might indicate the need for optimization in the code or an increase in allocated CPU resources.
Reliability
Tracking execution errors and error states helps in maintaining the reliability of your data transforms:
Make sure to implement robust error handling and logging within your transform functions to help with troubleshooting.
Resource usage
Monitoring memory usage metrics and total execution time ensures that the Wasm engine does not exceed allocated resources, helping in efficient resource management:
If memory usage is consistently high or exceeds the maximum allocated memory:
-
Review and optimize your transform functions to reduce memory consumption. This step can involve optimizing data structures, reducing memory allocations, and ensuring efficient handling of records.
-
Consider increasing the allocated memory for the Wasm engine. Adjust the
data_transforms_per_core_memory_reservation
anddata_transforms_per_function_memory_limit settings
to provide more memory to each function and the overall Wasm engine.
Throughput
Keeping track of read and write bytes and processor lag helps in understanding the data flow through your transforms, enabling better capacity planning and scaling:
If there is a significant lag or low throughput, investigate potential bottlenecks in the data flow or consider scaling your infrastructure to handle higher throughput.
View logs for data transforms
Runtime logs for transform functions are written to an internal topic called _redpanda.transform_logs
. You can read these logs by using the rpk transform logs
command.
rpk transform logs <transform-name>
Replace <transform-name>
with the configured name of the transform function.
You can also view logs in Redpanda Console. |
By default, Redpanda provides several settings to manage logging for data transforms, such as buffer capacity, flush interval, and maximum log line length. These settings ensure that logging operates efficiently without overwhelming the system. However, you may need to adjust these settings based on your specific requirements and workloads. For information on how to configure logging, see the Configure transform logging section of the configuration guide.