Public Metrics Reference

This section provides reference descriptions about the public metrics exported from Redpanda’s /public_metrics endpoint.

Use /public_metrics for your primary dashboards for system health.

Use /metrics for detailed analysis and debugging.

In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered.

To see the available public metrics in your system, query the /public_metrics endpoint:

curl http://<node-addr>:9644/public_metrics | grep "[HELP|TYPE]"

Cluster metrics

redpanda_cluster_brokers

Number of configured, fully commissioned brokers in a cluster.

Type: gauge

How to monitor: create an alert for when this gauge dips below a steady-state threshold, as a node(s) has become unresponsive.

redpanda_cluster_controller_log_limit_requests_available_rps

Limit on the available requests per second (RPS) for a cluster controller log.

Type: gauge

Labels:

redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")

redpanda_cluster_controller_log_limit_requests_dropped

Number of requests dropped by a cluster controller log due to exceeding redpanda_cluster_controller_log_limit_requests_available_rps.

Type: counter

Labels:

redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")

Usage: when this counter increases, it indicates that the controller is dropping requests.

redpanda_cluster_partition_moving_from_node

Number of partition replicas in the cluster that are currently being removed from a node.

Type: gauge

Usage: when this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions that is causing movement of partition replicas.

redpanda_cluster_partition_moving_to_node

Number of partition replicas in the cluster that are currently being added or moved to a node.

Type: gauge

Usage: when this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions that is causing movement of partition replicas.

redpanda_cluster_partition_node_cancelling_movements

During a partition movement cancellation operation, the number of partition replicas that were being moved that now need to be cancelled.

Type: gauge

How to monitor: when this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions that is causing movement of partition replicas.

redpanda_cluster_partitions

Number of partitions managed by a cluster. Includes partitions of the controller topic, but not replicas.

Type: gauge

redpanda_cluster_topics

Number of topics in a cluster.

Type: gauge

redpanda_cluster_unavailable_partitions

Number of unavailable partitions (the partitions that lack quorum among their replica group) in the cluster.

Type: gauge

Usage: when this gauge is non-zero, it indicates that a partition(s) doesn’t have quorum and thus doesn’t have an active leader. To mitigate, consider increasing the number of brokers and the partition replication factor.

Infrastructure metrics

redpanda_cpu_busy_seconds_total

Total CPU busy time in seconds.

Type: counter

redpanda_io_queue_total_read_ops

Total read operations passed in the I/O queue.

Type: counter

Labels:

class=("default" | "compaction" | "raft")
ioshard
mountpoint
shard

redpanda_io_queue_total_write_ops

Total write operations passed in the I/O queue.

Type: counter

Labels:

class=("default" | "compaction" | "raft")
ioshard
mountpoint
shard

redpanda_memory_allocated_memory

Total allocated memory in bytes.

Type: gauge

Labels:

shard

redpanda_memory_available_memory

Total memory potentially available (free plus reclaimable memory) to a CPU shard (core), in bytes.

Type: gauge

Labels:

shard

redpanda_memory_available_memory_low_water_mark

The low watermark for available memory at process start.

Type: gauge

Labels:

shard

redpanda_memory_free_memory

Available memory in bytes.

Type: gauge

Labels:

shard

redpanda_rpc_active_connections

Number of currently active connections.

Type: gauge

Labels:

redpanda_server=("kafka" | "internal")

redpanda_rpc_request_errors_total

Number of RPC errors.

Type: counter

Labels:

redpanda_server=("kafka" | "internal")

Usage: when this counter increases, analyze the logged errors.

redpanda_rpc_request_latency_seconds

RPC latency in seconds.

Type: histogram

Labels:

redpanda_server=("kafka" | "internal")

redpanda_scheduler_runtime_seconds_total

Accumulated runtime of the task queue associated with a scheduling group.

Type: counter

Labels:

redpanda_scheduling_group=("admin" | "archival_upload" | "cache_background_reclaim" | "cluster" | "coproc" | "kafka" | "log_compaction" | "main" | "node_status" | "raft" | "raft_learner_recovery")
shard

redpanda_storage_disk_free_bytes

Available disk storage in bytes.

Type: gauge

redpanda_storage_disk_free_space_alert

Alert for low disk storage: 0-OK, 1-low space, 2-degraded.

Type: gauge

redpanda_storage_disk_total_bytes

Total size in bytes of attached storage.

Type: gauge

redpanda_uptime_seconds_total

Total CPU runtime (uptime) in seconds.

Type: gauge

Raft metrics

redpanda_node_status_rpcs_received

Number of node status RPCs received by a node.

Type: gauge

redpanda_node_status_rpcs_sent

Number of node status RPCs sent by a node.

Type: gauge

redpanda_node_status_rpcs_timed_out

Number of timed out node status RPCs from a node.

Type: gauge

Service metrics

redpanda_pandaproxy_request_latency_seconds

Latency in seconds of the request indicated by the label in HTTP Proxy. The measurement includes the time spent waiting for resources to become available, processing the request, and dispatching the response.

Type: histogram

redpanda_schema_registry_request_errors_total

Total number of Schema Registry errors.

Type: counter

Labels:

redpanda_status=("5xx" | "4xx" | "3xx")

redpanda_schema_registry_request_latency_seconds

Latency of the request indicated by the label in the Schema Registry. The measurement includes the time spent waiting for resources to become available, processing the request, and dispatching the response.

Type: histogram

Partition metrics

redpanda_kafka_max_offset

Offset of the last message committed for the partition.

Type: gauge

Labels:

redpanda_namespace
redpanda_partition
redpanda_topic

redpanda_kafka_under_replicated_replicas

Number of replicas in the partition that are live but not at the latest offset, redpanda_kafka_max_offset.

Type: gauge

Labels:

redpanda_namespace
redpanda_partition
redpanda_topic

redpanda_raft_recovery_partition_movement_available_bandwidth

Bandwidth available for partition movement, in bytes per sec.

Type: gauge

Labels:

shard

Topic metrics

redpanda_kafka_partitions

Configured number of partitions for a topic.

Type: gauge

Labels: -redpanda_namespace -redpanda_topic

redpanda_kafka_replicas

The number of configured replicas per topic.

Type: gauge

Labels:

redpanda_namespace
redpanda_topic

redpanda_kafka_request_bytes_total

Total number of bytes produced or consumed per topic.

Type: counter

Labels:

redpanda_namespace
redpanda_topic
redpanda_request=("produce" | "consume")

redpanda_raft_leadership_changes

Number of leadership changes across all partitions of a given topic.

Type: counter

Labels:

redpanda_namespace
redpanda_topic

Broker metrics

redpanda_kafka_request_latency_seconds

Latency of produce/consume requests per broker. This duration measures from when a request is initiated on the partition to when the response is fulfilled.

Type: histogram

Labels:

redpanda_request=("produce" | "consume")

group

REST proxy metrics

redpanda_rest_proxy_request_errors_total

Total number of REST proxy server errors.

Type: counter

Labels:

redpanda_status("5xx" | "4xx" | "3xx")

redpanda_rest_proxy_request_latency_seconds

Internal latency of REST proxy requests.

Type: histogram

Application metrics

redpanda_application_build

Redpanda build information.

Type: gauge

Labels:

redpanda_revision=<redpanda-revision-ID>
redpanda_version=<redpanda-version-number>

redpanda_application_uptime_seconds_total

Type: counter

Labels:

S3
- redpanda_endpoint
- redpanda_region
Azure Blob Storage (ABS)
- redpanda_endpoint
- redpanda_storage_account

Was this helpful?

Ask in the community

Share your feedback

Make a contribution

What do you like about this doc?

Let us know what we do well:

Let us contact you about your feedback:

What did you not like about this doc?

Let us know what we can improve:

Let us contact you about your feedback:

Public Metrics Reference

Cluster metrics

redpanda_cluster_brokers

redpanda_cluster_controller_log_limit_requests_available_rps

redpanda_cluster_controller_log_limit_requests_dropped

redpanda_cluster_partition_moving_from_node

redpanda_cluster_partition_moving_to_node

redpanda_cluster_partition_node_cancelling_movements

redpanda_cluster_partitions

redpanda_cluster_topics

redpanda_cluster_unavailable_partitions

Infrastructure metrics

redpanda_cpu_busy_seconds_total

redpanda_io_queue_total_read_ops

redpanda_io_queue_total_write_ops

redpanda_memory_allocated_memory

redpanda_memory_available_memory

redpanda_memory_available_memory_low_water_mark

redpanda_memory_free_memory

redpanda_rpc_active_connections

redpanda_rpc_request_errors_total

redpanda_rpc_request_latency_seconds

redpanda_scheduler_runtime_seconds_total

redpanda_storage_disk_free_bytes

redpanda_storage_disk_free_space_alert

redpanda_storage_disk_total_bytes

redpanda_uptime_seconds_total

Raft metrics

redpanda_node_status_rpcs_received

redpanda_node_status_rpcs_sent

redpanda_node_status_rpcs_timed_out

Service metrics

redpanda_pandaproxy_request_latency_seconds

redpanda_schema_registry_request_errors_total

redpanda_schema_registry_request_latency_seconds

Partition metrics

redpanda_kafka_max_offset

redpanda_kafka_under_replicated_replicas

redpanda_raft_recovery_partition_movement_available_bandwidth

Topic metrics

redpanda_kafka_partitions

redpanda_kafka_replicas

redpanda_kafka_request_bytes_total

redpanda_raft_leadership_changes

Broker metrics

redpanda_kafka_request_latency_seconds

Consumer group metrics

redpanda_kafka_consumer_group_committed_offset

redpanda_kafka_consumer_group_consumers

redpanda_kafka_consumer_group_topics

REST proxy metrics

redpanda_rest_proxy_request_errors_total

redpanda_rest_proxy_request_latency_seconds

Application metrics

redpanda_application_build

redpanda_application_uptime_seconds_total

Cloud metrics

redpanda_cloud_client_backoff

redpanda_cloud_client_download_backoff

redpanda_cloud_client_downloads

redpanda_cloud_client_not_found

redpanda_cloud_client_upload_backoff

redpanda_cloud_client_uploads

Related topics

Simple online edits

Contribution guide