Telemetry

From version 4.38.0, the default Redpanda Connect configuration includes a telemetry service that sends anonymized usage statistics to Redpanda. Redpanda analyzes the data to find out how often each Redpanda Connect component is used in production environments, and common usage patterns. This data analysis aims to help:

  • Identify gaps in functionality.

  • Prioritize the delivery of new features, enhancements, and bug fixes.

For example, if usage data shows that most aws_s3 outputs are paired with a mutation processor, then embedding a mutation field in the component might be a useful enhancement.

Data collection method

When you execute a configuration, the Redpanda Connect instance sends a JSON payload to the collection server. The payload contains a high-level, anonymized summary of the contents of the configuration file. Field values are never transmitted, nor are decorations of the configuration, such as label names.

For example, if you execute this configuration:

input:
  label: message_input
  generate:
    interval: 1s
    mapping: 'root.input_field = "string"'

output:
  label: message_output
  aws_s3:
    bucket: bucket_name
    path: filename.txt

Redpanda extracts the following details from the data collected:

  • A unique identifier for the Redpanda Connect instance.

  • How long the configuration has been running for.

  • Components used in the configuration. In this case, a generate input and an aws_s3 output.

  • The IP address of the running Redpanda Connect instance.

The code responsible for extracting usage data is available in the connect repository in the file: ./payload.go.

Data collection frequency

To avoid analyzing instances used for experimentation or testing, a Redpanda Connect instance must have been running for at least 5 minutes before any usage data is collected. Once usage data starts to be emitted, it is sent to the collection server once every 24 hours.

Disable the telemetry service

The telemetry service configuration is only included in the build artifacts released through GitHub or Redpanda’s official Docker images. This includes the rpk connect plugin, which is used to manage installations and upgrades of Redpanda Connect. Custom builds do not send usage data.

To continue to use the official Redpanda artifacts or images but disable the telemetry service, you can either block internet traffic or run the following command:

rpk connect run --disable-telemetry

Disabling telemetry does not affect the normal operation of Redpanda Connect.