Monitor Connectors in Kubernetes
You can monitor the health of your Redpanda Connectors with metrics that are exported through a Prometheus endpoint at the default port 9404. You can use Grafana to visualize the metrics and set up alerts.
Prerequisites
-
A Kubernetes cluster. You must have
kubectl
with at least version 1.21.To check if you have
kubectl
installed:kubectl version --short --client
-
Helm installed with at least version 3.6.0.
To check if you have Helm installed:
helm version
Limitations
The connectors dashboard renders metrics that are exported by managed connectors. However, when a connector does not create a task (for example, an empty topic list), the dashboard will not show metrics for that connector.
Configure Prometheus
Prometheus is a system monitoring and alerting tool. It collects and stores metrics as time-series data identified by a metric name and key/value pairs.
To configure Prometheus to monitor Redpanda metrics in Kubernetes, you can use the Prometheus Operator:
-
Follow the steps to deploy the Prometheus Operator.
Make sure to configure the Prometheus resource to target your Pods that are running Kafka Connect:
prometheus.yaml
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: prometheus spec: serviceAccountName: prometheus podMonitorNamespaceSelector: matchLabels: name: <namespace> podMonitorSelector: matchLabels: app.kubernetes.io/name: connectors resources: requests: memory: 400Mi enableAdminAPI: false
-
podMonitorNamespaceSelector.matchLabels.name
: The namespace in which Redpanda is deployed. The Prometheus Operator looks for PodMonitor resources in this namespace. -
podMonitorSelector.matchLabels.app.kubernetes.io/name
: The value offullnameOverride
in your Redpanda Helm chart. The default isconnectors
. The Redpanda Helm chart creates the PodMonitor resource with this label.
-
-
Deploy the Redpanda Connectors subchart with monitoring enabled to deploy the PodMonitor resource:
-
Helm + Operator
-
Helm
redpanda-cluster.yaml
apiVersion: cluster.redpanda.com/v1alpha1 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: connectors: enabled: true monitoring: enabled: true scrapeInterval: 30s
kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
-
--values
-
--set
prometheus-monitoring.yaml
connectors: enabled: true monitoring: enabled: true scrapeInterval: 30s
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values prometheus-monitoring.yaml --reuse-values
helm upgrade --install redpanda redpanda/redpanda \ --namespace <namespace> \ --create-namespace \ --set connectors.enabled=true \ --set connectors.monitoring.enabled=true \ --set connectors.monitoring.scrapeInterval="30s"
-
-
Wait until all Pods are running:
kubectl -n <namespace> rollout status statefulset redpanda --watch
-
Ensure that the PodMonitor was deployed:
kubectl get podmonitor --namespace <namespace>
-
Ensure that you’ve exposed the Prometheus Service.
-
Expose the Prometheus server to your localhost:
kubectl port-forward svc/prometheus 9090
-
Open Prometheus, and verify that Prometheus is scraping metrics from your endpoints.
Import the Grafana dashboard
You can use Grafana to query, visualize, and generate alerts for metrics. Redpanda provides a Grafana dashboard for connectors.
To create and use the Grafana dashboard to gather telemetry for your
managed connectors, import the connectors dashboard JSON file
(Connectors.json
).
Managed connector metrics
You can monitor the following metrics for your Redpanda managed connectors.
Connector tasks
Number of tasks for a specific connector, grouped by status:
-
running
- Tasks that are healthy and running. -
paused
- Tasks that were paused by a user request. -
failed
- Tasks that failed during execution.
Expect only running
and paused
tasks. Create an alert for failed
tasks.
Sink connector lag
The number of records still to be processed by a connector. This metric
is emitted for sink connectors only (last_offset
-
current_offset
).
For newly-created connectors, the metric is high until the connector sinks all historical data.
Expect the lag not to increase over time.
MM2 replication latency
Age of the last record written to the target cluster by the MirrorMaker 2 connector. This metric is emitted for each partition.
For newly-created connectors, the metric is high until the connector processes all historical data.
Expect the latency to not increase over time.
Count of the records sent to target (by topic)
Count of records sent to the cluster by source connectors for each topic.
Record error rate
-
record errors
- Total number of record errors seen in connector tasks. -
record failures
- Total number of record failures seen in connector tasks. -
record skipped
- Total number of records skipped by connector tasks.