Skip to main content
Version: 22.2


You can monitor the health of your system to optimize performance. The most useful metrics are available on <node ip>:9644/public_metrics. This section describes how to use Prometheus queries to monitor your system, for example, to check consumer group lag or Kafka producer request latency.

For information about the original metrics available on :9644/metrics, see Internal Metrics.

Configure Prometheus

Prometheus is a systems monitoring and alerting tool. It collects and stores metrics as time-series data identified by metric name and key/value pairs. Redpanda exports Prometheus metrics on :9644/public_metrics. To generate the configuration on an existing Prometheus instance, run:

rpk generate prometheus-config

The output is a YAML object that you can add to the scrape_configs list.

- job_name: redpanda
- targets:
metrics_path: /public_metrics

When you run the command on a node where Redpanda is running, it displays other available nodes. If Redpanda is not running, you can set the flag --seed-addr to specify a remote Redpanda node to discover the additional ones, or set --node-addrs with a comma-separated list of all known cluster node addresses. For example:

rpk generate prometheus-config --job-name redpanda-metrics-test --node-addrs,,

Edit the prometheus.yml file in the Prometheus root folder to add the Redpanda configuration under the scrape_configs:


- job_name: redpanda-metrics-test
- targets:
metrics_path: /public_metrics

The number of targets may change depending on the total running nodes.

Save the configuration file, and restart Prometheus to apply changes.

After you have the metrics, you can use a tool like Grafana to query, visualize, and generate alerts. For example, to display the total number of partitions in your cluster, run this query on Grafana:


To show the number of partitions for a specific topic, run:

count(count by (redpanda_topic, redpanda_partition)

Enable statistics

Redpanda ships with an additional systemd service that runs periodically and reports resource usage and configuration data to Redpanda's metrics API. This is enabled by default, and the data is anonymous.

To identify your cluster's data, so Redpanda can monitor it and alert you to possible issues, set the organization (your company domain) and cluster_id (usually your team or project name) configuration fields. For example:

rpk config set organization ''
rpk config set cluster_id 'us-west-2'

To opt out of all metrics reporting, set rpk.enable_usage_stats to false:

rpk config set rpk.enable_usage_stats false

Monitor metrics


Labels associate a metric with a Redpanda object. For example: redpanda_kafka_request_bytes_total{redpanda_topic="topic1", redpanda_request="consume"} 100 indicates that 100 total bytes were consumed for topic1.

Cluster-level metrics

redpanda_cluster_brokersNumber of configured brokers in the cluster (that is, the size of the cluster).nonegauge
redpanda_cluster_partitionsNumber of partitions managed by the cluster. This includes partitions of the controller topic, but not replicas.nonegauge
redpanda_cluster_topicsNumber of topics in the cluster.nonegauge
redpanda_cluster_unavailable_partitionsNumber of unavailable partitions in the cluster (that is, partitions that lack quorum among replicants).nonegauge

Cluster-level queries

Brokers in a cluster


Topics in a cluster


Partitions in a cluster


Unavailable partitions in a cluster


Kafka consumer rate

sum(rate(redpanda_kafka_request_bytes_total{redpanda_request="consume"} [5m] )) by (redpanda_request)

Kafka producer rate

sum(rate(redpanda_kafka_request_bytes_total{redpanda_request="produce"} [5m] )) by (redpanda_request)

Infrastructure-level metrics

redpanda_io_queue_total_read_opsTotal read operations passed in the queue.class ("default" | "compaction" | "raft"), ioshard, mountpoint, shardcounter
redpanda_io_queue_total_write_opsTotal write operations passed in the queue.class ("default" | "compaction" | "raft"), ioshard, mountpoint, shardcounter
redpanda_storage_disk_free_bytesDisk storage bytes free.nonecounter
redpanda_storage_disk_total_bytesTotal size of attached storage, in bytes.nonecounter
redpanda_storage_disk_free_space_alertStatus of low storage space alert: 0-OK, 1-low space 2-degradednonegauge
redpanda_memory_allocated_memoryAllocated memory size in bytes.shardgauge
redpanda_memory_free_memoryFree memory size in bytes.shardgauge
redpanda_rpc_request_errors_totalNumber of RPC errors.redpanda_server ("kafka" | "internal")counter
redpanda_rpc_request_latency_secondsRPC latency.redpanda_server ("kafka" | "internal")histogram

Infrastructure-level queries

Redpanda CPU


Redpanda memory

redpanda_memory_allocated_memory / (redpanda_memory_free_memory + redpanda_memory_allocated_memory)

Redpanda disk usage

1 - (redpanda_storage_disk_free_bytes / redpanda_storage_disk_total_bytes)

Redpanda I/O queue


Redpanda RPC latency

histogram_quantile(0.99, (sum(rate(redpanda_rpc_request_latency_seconds_bucket[5m])) by (le, provider, region, instance, namespace, pod, redpanda_server)))

Redpanda RPC request rate


Redpanda RPC error rate


Cloud storage errors


Service-level metrics

redpanda_pandaproxy_request_latency_secondsLatency of the request indicated by the label in HTTP Proxy. The measurement includes the time spent waiting for resources to become available, processing the request, and dispatching the response.shard, operationhistogram
redpanda_schema_registry_request_errors_totalTotal number of Schema Registry server errors.redpanda_status ("5xx" | "4xx" | "3xx")counter
redpanda_schema_registry_request_latency_secondsLatency of the request indicated by the label in the Schema Registry. The measurement includes the time spent waiting for resources to become available, processing the request, and dispatching the response.nonehistogram

Service-level queries

Schema Registry request latency

histogram_quantile(0.99, (sum(rate(redpanda_schema_registry_request_latency_seconds_bucket[5m])) by (le, provider, region, instance, namespace, pod)))

Schema Registry request rate

rate(redpanda_schema_registry_request_latency_seconds_count[5m]) + sum without(redpanda_status)(rate(redpanda_schema_registry_request_errors_total[5m]))

Schema Registry request error rate


Partition-level metrics

redpanda_kafka_max_offsetLatest committed offset for the partition (that is, the offset of the last message safely persisted on most replicas).redpanda_namespace, redpanda_topic, redpanda_partitiongauge
redpanda_kafka_under_replicated_replicasNumber of under-replicated replicas for the partition (that is, replicas that are live but are not at the latest offset.)redpanda_namespace, redpanda_topic, redpanda_partitiongauge

Partition-level queries

Max offset


Under-replicated replicas

redpanda_kafka_under_replicated_replicas > 0

Topic-level metrics

redpanda_kafka_replicasNumber of configured replicas per topic.redpanda_namespace, redpanda_topicgauge
redpanda_kafka_request_bytes_totalTotal number of bytes produced or consumed per topic.redpanda_namespace, redpanda_topic, redpanda_request ("produce" | "consume")counter

Topic-level queries

Kafka consumer rate

sum by(redpanda_namespace,redpanda_topic)(rate(redpanda_kafka_request_bytes_total{redpanda_request="consume"}[5m]))

Kafka producer rate

sum by(redpanda_namespace,redpanda_topic)(rate(redpanda_kafka_request_bytes_total{redpanda_request="produce"}[5m]))

Partition count

sum(group(redpanda_kafka_max_offset) by (redpanda_namespace, redpanda_topic, redpanda_partition, redpanda_topic)) by (redpanda_namespace, redpanda_topic)

Broker-level metrics

redpanda_kafka_request_latency_secondsLatency of produce/consume requests per broker. This measures from the moment a request is initiated on the partition to the moment the response is fulfilled.request ("produce" | "consume")histogram

Broker-level queries

Kafka consumer request latency

histogram_quantile(0.99, sum(rate(redpanda_kafka_request_latency_seconds_bucket{redpanda_request="consume"}[5m])) by (le, provider, region, instance, namespace, pod))

Kafka producer request latency

histogram_quantile(0.99, sum(rate(redpanda_kafka_request_latency_seconds_bucket{redpanda_request="produce"}[5m])) by (le, provider, region, instance, namespace, pod))

Rate of Kafka consumer requests


Rate of Kafka producer requests


Rate of Kafka consumers

sum(rate(redpanda_kafka_request_bytes_total{redpanda_request="consume"} [5m])) by (provider, region, instance, namespace, pod)

Rate of Kafka producers

sum(rate(redpanda_kafka_request_bytes_total{redpanda_request="produce"}[5m])) by (instance, provider, region, namespace, pod)

Consumer group metrics

redpanda_kafka_consumer_group_committed_offsetConsumer group committed, topic, partitiongauge
redpanda_kafka_consumer_group_consumersNumber of consumers in a group.groupgauge
redpanda_kafka_consumer_group_topicsNumber of topics in a group.groupgauge

Consumer group queries

Consumers per consumer group

sum by(redpanda_group)(redpanda_kafka_consumer_group_consumers)

Topics per consumer group

sum by(redpanda_group)(redpanda_kafka_consumer_group_topics)

Consumer group lag

max by(redpanda_namespace, redpanda_topic, redpanda_partition)(redpanda_kafka_max_offset{redpanda_namespace="kafka"}) - on(redpanda_topic, redpanda_partition) group_right max by(redpanda_group, redpanda_topic, redpanda_partition)(redpanda_kafka_consumer_group_committed_offset)

REST proxy metrics

redpanda_rest_proxy_request_errors_totalTotal number of rest_proxy server errors.redpanda_status ("5xx" | "4xx" | "3xx")counter
redpanda_rest_proxy_request_latency_secondsInternal latency of request for rest_proxy.nonehistogram

REST proxy queries

REST proxy request latency

histogram_quantile(0.99, (sum(rate(redpanda_rest_proxy_request_latency_seconds_bucket[5m])) by (le, provider, region, instance, namespace, pod)))

REST proxy request rate

rate(redpanda_rest_proxy_request_latency_seconds_count[5m]) + sum without(redpanda_status)(rate(redpanda_rest_proxy_request_errors_total[5m]))

REST proxy request error rate