Docs Self-Managed Reference Monitoring Metrics Internal Metrics This is documentation for Self-Managed v23.3. To view the latest available version of the docs, see v24.2. Internal Metrics This page describes the original metrics available on <node ip>:9644/metrics. These are provided for internal support, troubleshooting, and backward compatibility. For information about the latest metrics available on :9644/public_metrics, see Monitoring. Configure Prometheus: internal metrics Prometheus is a systems monitoring and alerting tool. It collects and stores metrics as time-series data identified by metric name and key/value pairs. Redpanda exports Prometheus metrics on :9644/public_metrics, which has been available since v22.2. To generate the Redpanda configuration on an existing Prometheus instance for both the /public_metrics endpoint as well as the /metrics endpoint, run: rpk generate prometheus-config --job-name redpanda-metrics-test --node-addrs 'localhost:9644' --internal-metrics The output is a YAML object that you can add to the scrape_configs list. - job_name: redpanda-node static_configs: - targets: - 172.31.18.239:9644 - 172.31.18.238:9643 - 172.31.18.237:9642 Edit the prometheus.yml file in the Prometheus root folder to add the Redpanda configuration under the scrape_configs: scrape_configs: … - job_name: redpanda static_configs: - targets: - 172.31.18.239:9644 - 172.31.18.238:9643 - 172.31.18.237:9642 … The number of targets may change depending on the total number of running nodes. Save the configuration file, and restart Prometheus to apply changes. You can use this sample Prometheus configuration file. See rpk details in the rpk command reference for Prometheus. Configure Grafana: internal metrics Now that you have the metrics, you need a tool to query, visualize, and generate alerts. Grafana talks to Prometheus and lets you create dashboards with multiple graphic components. To generate a comprehensive Grafana dashboard, run: rpk generate grafana-dashboard --datasource <name> --metrics-endpoint <url> where: <name> is the name of the Prometheus data source configured in your Grafana instance. <url> is the address to a Redpanda node’s metrics endpoint. The default is <node ip>:9644/metrics. See details in the rpk command reference for Grafana. Out of the box, Grafana generates panels tracking latency for 50%, 95%, and 99% (based on the maximum latency set), throughput, and error segmentation by type. Pipe the command’s output to a file, and import it into Grafana. rpk generate grafana-dashboard \ --datasource prometheus \ --metrics-endpoint 172.31.18.237:9642/metrics > redpanda-dashboard.json In Grafana, import this generated JSON file. The default Grafana dashboard shows information about available Redpanda nodes, partitions, latency, and throughput graphics. You can use the imported dashboard to create new panels. Click + in the left pane, and select Add a new panel. On the Query tab, select Prometheus data source. Decide which metric you want to monitor, click Metrics browser, and start typing vectorized to show available metrics from the Redpanda cluster. Another way to see available metrics with their description is to access the cluster using a web browser on port 9644. (The port can change depending on your configuration.) You can dynamically set queries to calculate the values in the format you want. For example, to display the total number of partitions in your cluster, run this query on Grafana: count(count by (topic,partition) (vectorized_storage_log_partition_size{namespace="kafka"})) To show the number of partitions for a specific topic, run: count(count by (topic,partition) (vectorized_storage_log_partition_size{topic="<topic_name>"})) Monitor internal metrics Most metrics are useful for debugging, but the following metrics can be useful to measure system health: Metric Definition Diagnostics vectorized_application_uptime Redpanda uptime in milliseconds vectorized_cluster_partition_last_stable_offset Last stable offset If this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance vectorized_io_queue_delay Total delay time in the queue Can indicate latency caused by disk operations in seconds vectorized_io_queue_queue_length Number of requests in the queue Can indicate latency caused by disk operations vectorized_kafka_rpc_active_connections kafka_rpc: Currently active connections Shows the number of clients actively connected vectorized_kafka_rpc_connects kafka_rpc: Number of accepted connections Compare to the value at a previous time to derive the rate of accepted connections vectorized_kafka_rpc_received_bytes kafka_rpc: Number of bytes received from the clients in valid requests Compare to the value at a previous time to derive the throughput in Kafka layer in bytes/sec received vectorized_kafka_rpc_requests_completed kafka_rpc: Number of successful requests Compare to the value at a previous time to derive the messages per second per shard vectorized_kafka_rpc_requests_pending kafka_rpc: Number of requests being processed by server vectorized_kafka_rpc_sent_bytes kafka_rpc: Number of bytes sent to clients vectorized_kafka_rpc_service_errors kafka_rpc: Number of service errors vectorized_raft_leadership_changes Number of leadership changes High value can indicate nodes failing and causing leadership changes vectorized_reactor_utilization CPU utilization Shows the true utilization of the CPU by Redpanda process vectorized_storage_log_compacted_segment Number of compacted segments vectorized_storage_log_log_segments_created Number of created log segments vectorized_storage_log_partition_size Current size of partition in bytes vectorized_storage_log_read_bytes Total number of bytes read vectorized_storage_log_written_bytes Total number of bytes written Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution Internal Metrics Reference rpk Commands