Monitor Connectors

You can monitor the health of your Redpanda managed connectors with metrics that Redpanda exports through a Prometheus HTTPS endpoint. You can use Grafana to visualize the metrics and set up alerts.

The most important metrics to be monitored by alerts are:

  • connector failed tasks

  • connector lag / connector lag rate

Limitations

The connectors dashboard renders metrics that are exported by managed connectors. However, when a connector does not create a task (for example, an empty topic list), the dashboard will not show metrics for that connector.

Configure Prometheus

Prometheus is a system monitoring and alerting tool. It collects and stores metrics as time-series data identified by a metric name and key/value pairs.

You can quickly get Prometheus and Grafana running locally, but not for production instances. For production instances, deploy Prometheus and Grafana as a standalone or managed service, as described below.

To configure and use Prometheus to monitor Redpanda managed connector metrics:

  1. In Redpanda Cloud, go to Overview > How to connect > Prometheus. Click the Copy icon for Prometheus YAML to copy its content into your clipboard.

  2. Edit the prometheus.yml file in the Prometheus root folder to add the Redpanda configuration under scrape_configs.

    scrape_configs:
    - job_name: redpandaCloud
        static_configs:
        - targets:
            - ...
        metrics_path: /api/cloud/prometheus/public_metrics
        basic_auth:
        username: prometheus
        password: ...
        scheme: https
  3. Save the configuration file, and restart Prometheus to apply changes.

  4. Observe in Prometheus that metrics from Redpanda endpoints are scraped.

Import the Grafana dashboard

You can use Grafana to query, visualize, and generate alerts for metrics. Redpanda provides a Grafana dashboard for connectors.

To create and use the Grafana dashboard to gather telemetry for your managed connectors, import the connectors dashboard JSON file (Connectors.json).

Managed connector metrics

You can monitor the following metrics for your Redpanda managed connectors.

Connector tasks

Number of tasks for a specific connector, grouped by status:

  • running - Tasks that are healthy and running.

  • paused - Tasks that were paused by a user request.

  • failed - Tasks that failed during execution.

Expect only running and paused tasks. Create an alert for failed tasks.


Sink connector lag

The number of records still to be processed by a connector. This metric is emitted for sink connectors only (last_offset - current_offset).

For newly-created connectors, the metric is high until the connector sinks all historical data.

Expect the lag not to increase over time.


MM2 replication latency

Age of the last record written to the target cluster by the MirrorMaker 2 connector. This metric is emitted for each partition.

For newly-created connectors, the metric is high until the connector processes all historical data.

Expect the latency to not increase over time.


Count of the records sent to target (by topic)

Count of records sent to the cluster by source connectors for each topic.


Redpanda consumer latency

The Redpanda consumer fetch latency for sink connectors.


Redpanda producer latency

The Redpanda producer request latency for source connectors.


Bytes in

Bytes per second (throughput) of data from Redpanda to managed connectors.


Bytes out

Bytes per second (throughput) of data from managed connectors to Redpanda.


Record error rate

  • record errors - Total number of record errors seen in connector tasks.

  • record failures - Total number of record failures seen in connector tasks.

  • record skipped - Total number of records skipped by connector tasks.


Producer record rate

  • record sent - Total number of records sent by connector producers.

  • record retry - Total number of records sent retries by connector producers.


Producer record error rate

Rate of producer errors when producing records to Redpanda.

Connectors support

Redpanda Support monitors managed connectors 24/7 to ensure the service is available. If an incident occurs, Redpanda Support follows an incident response process to quickly mitigate it.

Consumer lag

A connector generally performs lower than expected when it is underprovisioned.

Increase the number of Max Tasks (tasks.max) in the connector configuration for a given number of instances and instance types. For more information, see: Sizing Connectors.

Additional reasons for increasing consumer lag:

  • Available memory for the connector is too low.

  • Insufficient number of instances. Autoscaling is based on the total running task count for connectors.

Sink connector lag rate metric

The sink connector lag rate metric shows the difference between a topic max offset rate and a sink connector committed offsets rate. When the message rate for the topic is greater than the sink connector consume rate, the lag rate metric is positive. You should expect the metric to drop below 0 regularly, which means progress is being made and the connector is able to catch up with the produce rate.

Contact Redpanda support to align connector instances with your needs.

Connector in a failed state

If a connector is in a failed state, first check the connector configuration and logs. If a connector fails, it typically occurs immediately after a configuration change.

  • Check exception details and stacktrace by clicking Show Error.

  • Check connector logs in the Logs tab.

  • Restart the connector by clicking Restart.

The following table lists the most frequent connector configuration issues that cause a failed status:

Issue Action

External system connectivity issue

  • Check that the external system is up and running.

  • Check that the external system is available.

  • Check the connector configuration to confirm that external system properties are correct (URL, table name, bucket name).

External system authentication issue

  • Check that the given account exists in an external system.

  • Check the credentials defined in the connector configuration.

Incorrect topic name or topic name pattern

  • Check that the expected topic is created.

  • Check that the given topic name pattern matches at least one topic name.

Out Of Memory error

  • Change the connector configuration, lower the connector cache buffer size, and decrease the maximum records allowed in a batch.

  • Limit the number of topics set in Topics to export (topics) or Topics regex (topics.regex) properties.

  • Decrease Max Tasks (tasks.max) in the connector configuration.

  • Contact Redpanda support.