Internal Metrics Reference
This section provides reference descriptions about the internal metrics exported from Redpanda's /metrics
endpoint.
Use /public_metrics for your primary dashboards for system health.
Use /metrics for detailed analysis and debugging.
In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered.
To see the available internal metrics in your system, query the /metrics
endpoint:
curl <node-addr>:9644/metrics | grep "[HELP|TYPE]"
Internal metrics
Most internal metrics are useful for debugging. The following subset of internal metrics can be useful to monitor system health.
vectorized_application_uptime
Redpanda uptime in milliseconds.
vectorized_cluster_partition_last_stable_offset
Last stable offset.
If this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance.
vectorized_io_queue_delay
Total delay time in the queue.
Can indicate latency caused by disk operations in seconds.
vectorized_io_queue_queue_length
Number of requests in the queue.
Can indicate latency caused by disk operations.
vectorized_kafka_rpc_active_connections
Number of currently active Kafka RPC connections, or clients.
vectorized_kafka_rpc_connects
Number of accepted Kafka RPC connections.
Compare to the value at a previous time to derive the rate of accepted connections.
vectorized_kafka_rpc_received_bytes
Number of bytes received from Kafka RPC clients in valid requests.
Compare to the value at a previous time to derive the throughput in Kafka layer in bytes/sec received.
vectorized_kafka_rpc_requests_completed
Number of successful Kafka RPC requests.
Compare to the value at a previous time to derive the messages per second per shard.
vectorized_kafka_rpc_requests_pending
Number of Kafka RPC requests being processed by a server.
vectorized_kafka_rpc_sent_bytes
Number of bytes sent to Kafka RPC clients.
vectorized_kafka_rpc_service_errors
Number of Kafka RPC service errors.
vectorized_raft_leadership_changes
Number of leadership changes.
High value can indicate nodes failing and causing leadership changes.
vectorized_reactor_utilization
CPU utilization.
Shows the true utilization of the CPU by Redpanda process.
vectorized_storage_log_compacted_segment
Number of compacted segments.
vectorized_storage_log_log_segments_created
Number of created log segments.
vectorized_storage_log_partition_size
Current size of partition in bytes.
vectorized_storage_log_read_bytes
Total number of bytes read.
vectorized_storage_log_written_bytes
Total number of bytes written.