Docs Self-Managed Reference Monitoring Metrics Internal Metrics This is documentation for Self-Managed v24.2. To view the latest available version of the docs, see v24.3. Internal Metrics This section provides reference descriptions about the internal metrics exported from Redpanda’s /metrics endpoint. Use /public_metrics for your primary dashboards for system health. Use /metrics for detailed analysis and debugging. In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered. To see the available internal metrics in your system, query the /metrics endpoint: curl http://<node-addr>:9644/metrics | grep "[HELP|TYPE]" Internal metrics Most internal metrics are useful for debugging. The following subset of internal metrics can be useful to monitor system health. vectorized_application_uptime Redpanda uptime in milliseconds. vectorized_cloud_storage_read_bytes Number of bytes Redpanda has read from cloud storage. This is tracked on a per-topic and per-partition basis, and increments each time a cloud storage read operation successfully completes. vectorized_cluster_partition_last_stable_offset Last stable offset. If this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance. vectorized_cluster_partition_schema_id_validation_records_failed Number of records that failed schema ID validation. vectorized_cluster_partition_start_offset Raft snapshot start offset. vectorized_io_queue_delay Total delay time in the queue. Can indicate latency caused by disk operations in seconds. vectorized_io_queue_queue_length Number of requests in the queue. Can indicate latency caused by disk operations. vectorized_kafka_quotas_balancer_runs Number of times the throughput quota balancer has executed. Type: counter vectorized_kafka_quotas_quota_effective Current effective quota for the quota balancer, in bytes per second. Type: counter vectorized_kafka_quotas_client_quota_throttle_time Client quota throttling delay, in seconds, per rule and quota type based on client throughput limits. Type: histogram Labels: quota_rule=("not_applicable" | "kafka_client_default" | "cluster_client_default" | "kafka_client_prefix" | "cluster_client_prefix" | "kafka_client_id") quota_type=("produce_quota" | "fetch_quota" | "partition_mutation_quota") vectorized_kafka_quotas_client_quota_throughput Client quota throughput per rule and quota type. Type: histogram Labels: quota_rule=("not_applicable" | "kafka_client_default" | "cluster_client_default" | "kafka_client_prefix" | "cluster_client_prefix" | "kafka_client_id") quota_type=("produce_quota" | "fetch_quota" | "partition_mutation_quota") vectorized_kafka_quotas_throttle_time Histogram of throttle times, in seconds, based on broker-wide throughput limits. Type: histogram vectorized_kafka_quotas_traffic_intake Total amount of Kafka traffic (in bytes) taken in from clients for processing that was considered by the throttler. Type: counter vectorized_kafka_quotas_traffic_egress Total amount of Kafka traffic (in bytes) published to clients that was considered by the throttler. Type: counter vectorized_kafka_rpc_active_connections Number of currently active Kafka RPC connections, or clients. vectorized_kafka_rpc_connects Number of accepted Kafka RPC connections. Compare to the value at a previous time to derive the rate of accepted connections. vectorized_kafka_rpc_produce_bad_create_time An incrementing counter for the number of times a producer created a message with a timestamp skewed from the broker’s date and time. This metric is related to the following properties: log_message_timestamp_alert_before_ms: Increment this gauge when the create_timestamp on a message is too far in the past as compared to the broker’s time. log_message_timestamp_alert_after_ms: Increment this gauge when the create_timestamp on a message is too far in the future as compared to the broker’s time. vectorized_kafka_rpc_received_bytes Number of bytes received from Kafka RPC clients in valid requests. Compare to the value at a previous time to derive the throughput in Kafka layer in bytes/sec received. vectorized_kafka_rpc_requests_completed Number of successful Kafka RPC requests. Compare to the value at a previous time to derive the messages per second per shard. vectorized_kafka_rpc_requests_pending Number of Kafka RPC requests being processed by a server. vectorized_kafka_rpc_sent_bytes Number of bytes sent to Kafka RPC clients. vectorized_kafka_rpc_service_errors Number of Kafka RPC service errors. vectorized_ntp_archiver_compacted_replaced_bytes Number of bytes removed from cloud storage by compaction operations. This is tracked on a per-topic and per-partition basis. This metric resets every time partition leadership changes. It tracks whether or not compaction is performing operations on cloud storage. The namespace label supports the following options for this metric: kafka - User topics kafka_internal - Internal Kafka topic, such as consumer groups redpanda - Redpanda-only internal data Labels: namespace=("kafka" | "kafka_internal" | "redpanda") topic partition vectorized_ntp_archiver_pending The difference between the last committed offset and the last offset uploaded to Tiered Storage for each partition. A value of zero for this metric indicates that all data for a partition is uploaded to Tiered Storage. This metric is impacted by the cloud_storage_segment_max_upload_interval_sec tunable property. If this interval is set to 5 minutes, the archiver will upload committed segments to Tiered Storage every 5 minutes or less. If this metric continues growing for longer than the configured interval, it can indicate a potential network issue with the upload path for that partition. The namespace label supports the following options for this metric: kafka - User topics kafka_internal - Internal Kafka topic, such as consumer groups redpanda - Redpanda-only internal data Labels: namespace=("kafka" | "kafka_internal" | "redpanda") topic partition vectorized_raft_leadership_changes Number of leadership changes. High value can indicate nodes failing and causing leadership changes. vectorized_reactor_utilization Redpanda process utilization. Shows the true utilization of the CPU by a Redpanda process. This metric has per-broker and per-shard granularity. If a shard (CPU core) is at 100% utilization for a continuous period of real-time processing, for example more than a few seconds, you will likely observe high latency for partitions assigned to that shard. Use topic-aware intra-broker partition balancing to balance partition assignments and alleviate load on individual shards. vectorized_storage_log_compacted_segment Number of compacted segments. vectorized_storage_log_compaction_removed_bytes Number of bytes removed from local storage by compaction operations. This is tracked on a per-topic and per-partition basis. It tracks whether compaction is performing operations on local storage. The namespace label supports the following options for this metric: kafka - User topics kafka_internal - Internal Kafka topic, such as consumer groups redpanda - Redpanda-only internal data Labels: namespace=("kafka" | "kafka_internal" | "redpanda") topic partition vectorized_storage_log_log_segments_created Number of created log segments. vectorized_storage_log_partition_size Current size of partition in bytes. vectorized_storage_log_read_bytes Total number of bytes read. vectorized_storage_log_written_bytes Total number of bytes written. Related topics Learn how to monitor Redpanda Public metrics reference Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution Public Metrics rpk Commands