Public Metrics Reference
This section provides reference descriptions about the public metrics exported from Redpanda’s /public_metrics
endpoint.
Use /public_metrics for your primary dashboards for system health. Use /metrics for detailed analysis and debugging. |
In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered. To see the available public metrics in your system, query the
|
Cluster metrics
redpanda_cluster_brokers
Number of configured, fully commissioned brokers in a cluster.
Type: gauge
How to monitor: create an alert for when this gauge dips below a steady-state threshold, as a node(s) has become unresponsive.
redpanda_cluster_controller_log_limit_requests_available_rps
Limit on the available requests per second (RPS) for a cluster controller log.
Type: gauge
Labels:
-
redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")
redpanda_cluster_controller_log_limit_requests_dropped
Number of requests dropped by a cluster controller log due to exceeding redpanda_cluster_controller_log_limit_requests_available_rps.
Type: counter
Labels:
-
redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")
Usage: when this counter increases, it indicates that the controller is dropping requests.
redpanda_cluster_partition_moving_from_node
Number of partition replicas in the cluster that are currently being removed from a node.
Type: gauge
Usage: when this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions that is causing movement of partition replicas.
redpanda_cluster_partition_moving_to_node
Number of partition replicas in the cluster that are currently being added or moved to a node.
Type: gauge
Usage: when this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions that is causing movement of partition replicas.
redpanda_cluster_partition_node_cancelling_movements
During a partition movement cancellation operation, the number of partition replicas that were being moved that now need to be cancelled.
Type: gauge
How to monitor: when this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions that is causing movement of partition replicas.
redpanda_cluster_partitions
Number of partitions managed by a cluster. Includes partitions of the controller topic, but not replicas.
Type: gauge
redpanda_cluster_unavailable_partitions
Number of unavailable partitions (the partitions that lack quorum among their replica group) in the cluster.
Type: gauge
Usage: when this gauge is non-zero, it indicates that a partition(s) doesn’t have quorum and thus doesn’t have an active leader. To mitigate, consider increasing the number of brokers and the partition replication factor.
redpanda_io_queue_total_read_ops
Total read operations passed in the I/O queue.
Type: counter
Labels:
-
class=("default" | "compaction" | "raft")
-
ioshard
-
mountpoint
-
shard
redpanda_io_queue_total_write_ops
Total write operations passed in the I/O queue.
Type: counter
Labels:
-
class=("default" | "compaction" | "raft")
-
ioshard
-
mountpoint
-
shard
redpanda_memory_available_memory
Total memory potentially available (free plus reclaimable memory) to a CPU shard (core), in bytes.
Type: gauge
Labels:
-
shard
redpanda_memory_available_memory_low_water_mark
The low watermark for available memory at process start.
Type: gauge
Labels:
-
shard
redpanda_rpc_active_connections
Number of currently active connections.
Type: gauge
Labels:
-
redpanda_server=("kafka" | "internal")
redpanda_rpc_request_errors_total
Number of RPC errors.
Type: counter
Labels:
-
redpanda_server=("kafka" | "internal")
Usage: when this counter increases, analyze the logged errors.
redpanda_rpc_request_latency_seconds
RPC latency in seconds.
Type: histogram
Labels:
-
redpanda_server=("kafka" | "internal")
redpanda_scheduler_runtime_seconds_total
Accumulated runtime of the task queue associated with a scheduling group.
Type: counter
Labels:
-
redpanda_scheduling_group=("admin" | "archival_upload" | "cache_background_reclaim" | "cluster" | "coproc" | "kafka" | "log_compaction" | "main" | "node_status" | "raft" | "raft_learner_recovery")
-
shard
redpanda_storage_disk_free_space_alert
Alert for low disk storage: 0-OK
, 1-low space
, 2-degraded
.
Type: gauge
redpanda_pandaproxy_request_latency_seconds
Latency in seconds of the request indicated by the label in HTTP Proxy. The measurement includes the time spent waiting for resources to become available, processing the request, and dispatching the response.
Type: histogram
redpanda_schema_registry_request_errors_total
Total number of Schema Registry errors.
Type: counter
Labels:
-
redpanda_status=("5xx" | "4xx" | "3xx")
redpanda_schema_registry_request_latency_seconds
Latency of the request indicated by the label in the Schema Registry. The measurement includes the time spent waiting for resources to become available, processing the request, and dispatching the response.
Type: histogram
redpanda_kafka_max_offset
Offset of the last message committed for the partition.
Type: gauge
Labels:
-
redpanda_namespace
-
redpanda_partition
-
redpanda_topic
redpanda_kafka_under_replicated_replicas
Number of replicas in the partition that are live but not at the latest offset, redpanda_kafka_max_offset.
Type: gauge
Labels:
-
redpanda_namespace
-
redpanda_partition
-
redpanda_topic
redpanda_raft_recovery_partition_movement_available_bandwidth
Bandwidth available for partition movement, in bytes per sec.
Type: gauge
Labels:
-
shard
redpanda_kafka_partitions
Configured number of partitions for a topic.
Type: gauge
Labels:
-redpanda_namespace
-redpanda_topic
redpanda_kafka_replicas
The number of configured replicas per topic.
Type: gauge
Labels:
-
redpanda_namespace
-
redpanda_topic
redpanda_kafka_request_bytes_total
Total number of bytes produced or consumed per topic.
Type: counter
Labels:
-
redpanda_namespace
-
redpanda_topic
-
redpanda_request=("produce" | "consume")
redpanda_raft_leadership_changes
Number of leadership changes across all partitions of a given topic.
Type: counter
Labels:
-
redpanda_namespace
-
redpanda_topic
redpanda_kafka_request_latency_seconds
Latency of produce/consume requests per broker. This duration measures from when a request is initiated on the partition to when the response is fulfilled.
Type: histogram
Labels:
-
redpanda_request=("produce" | "consume")
redpanda_kafka_consumer_group_committed_offset
Committed offset of a consumer group.
Type: gauge
Labels:
-
group
-
topic
-
partition
redpanda_kafka_consumer_group_consumers
Number of consumers in a consumer group.
Type: gauge
Labels:
-
group
redpanda_kafka_consumer_group_topics
Number of topics in a consumer group.
Type: gauge
Labels:
-
group
redpanda_rest_proxy_request_errors_total
Total number of REST proxy server errors.
Type: counter
Labels:
-
redpanda_status("5xx" | "4xx" | "3xx")
redpanda_rest_proxy_request_latency_seconds
Internal latency of REST proxy requests.
Type: histogram