Internal Metrics Reference
This section provides reference descriptions about the internal metrics exported from Redpanda’s /metrics
endpoint.
Use /public_metrics for your primary dashboards for system health. Use /metrics for detailed analysis and debugging. |
In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered. To see the available internal metrics in your system, query the
|
Internal metrics
Most internal metrics are useful for debugging. The following subset of internal metrics can be useful to monitor system health.
vectorized_cluster_partition_last_stable_offset
Last stable offset.
If this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance.
vectorized_cluster_partition_schema_id_validation_records_failed
Number of records that failed schema ID validation.
vectorized_io_queue_delay
Total delay time in the queue.
Can indicate latency caused by disk operations in seconds.
vectorized_io_queue_queue_length
Number of requests in the queue.
Can indicate latency caused by disk operations.
vectorized_kafka_rpc_active_connections
Number of currently active Kafka RPC connections, or clients.
vectorized_kafka_rpc_connects
Number of accepted Kafka RPC connections.
Compare to the value at a previous time to derive the rate of accepted connections.
vectorized_kafka_rpc_produce_bad_create_time
An incrementing counter for the number of times a producer created a message with a timestamp skewed from the broker’s date and time. This metric is related to the following properties:
-
log_message_timestamp_alert_before_ms
: Increment this gauge when thecreate_timestamp
on a message is too far in the past as compared to the broker’s time. -
log_message_timestamp_alert_after_ms
: Increment this gauge when thecreate_timestamp
on a message is too far in the future as compared to the broker’s time.
vectorized_kafka_rpc_received_bytes
Number of bytes received from Kafka RPC clients in valid requests.
Compare to the value at a previous time to derive the throughput in Kafka layer in bytes/sec received.
vectorized_kafka_rpc_requests_completed
Number of successful Kafka RPC requests.
Compare to the value at a previous time to derive the messages per second per shard.
vectorized_raft_leadership_changes
Number of leadership changes.
High value can indicate nodes failing and causing leadership changes.