Internal Metrics Reference

This section provides reference descriptions about the internal metrics exported from Redpanda’s /metrics endpoint.

Use /public_metrics for your primary dashboards for system health.

Use /metrics for detailed analysis and debugging.

In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered.

To see the available internal metrics in your system, query the /metrics endpoint:

curl http://<node-addr>:9644/metrics | grep "[HELP|TYPE]"

Internal metrics

Most internal metrics are useful for debugging. The following subset of internal metrics can be useful to monitor system health.


vectorized_application_uptime

Redpanda uptime in milliseconds.


vectorized_cluster_partition_last_stable_offset

Last stable offset.

If this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance.


vectorized_cluster_partition_schema_id_validation_records_failed

Number of records that failed schema ID validation.


vectorized_cluster_partition_start_offset

Raft snapshot start offset.


vectorized_io_queue_delay

Total delay time in the queue.

Can indicate latency caused by disk operations in seconds.


vectorized_io_queue_queue_length

Number of requests in the queue.

Can indicate latency caused by disk operations.


vectorized_kafka_rpc_active_connections

Number of currently active Kafka RPC connections, or clients.


vectorized_kafka_rpc_connects

Number of accepted Kafka RPC connections.

Compare to the value at a previous time to derive the rate of accepted connections.


vectorized_kafka_rpc_produce_bad_create_time

An incrementing counter for the number of times a producer created a message with a timestamp skewed from the broker’s date and time. This metric is related to the following properties:

  • log_message_timestamp_alert_before_ms: Increment this gauge when the create_timestamp on a message is too far in the past as compared to the broker’s time.

  • log_message_timestamp_alert_after_ms: Increment this gauge when the create_timestamp on a message is too far in the future as compared to the broker’s time.


vectorized_kafka_rpc_received_bytes

Number of bytes received from Kafka RPC clients in valid requests.

Compare to the value at a previous time to derive the throughput in Kafka layer in bytes/sec received.


vectorized_kafka_rpc_requests_completed

Number of successful Kafka RPC requests.

Compare to the value at a previous time to derive the messages per second per shard.


vectorized_kafka_rpc_requests_pending

Number of Kafka RPC requests being processed by a server.


vectorized_kafka_rpc_sent_bytes

Number of bytes sent to Kafka RPC clients.


vectorized_kafka_rpc_service_errors

Number of Kafka RPC service errors.


vectorized_raft_leadership_changes

Number of leadership changes.

High value can indicate nodes failing and causing leadership changes.


vectorized_reactor_utilization

Redpanda process utilization.

Shows the true utilization of the CPU by a Redpanda process.


vectorized_storage_log_compacted_segment

Number of compacted segments.


vectorized_storage_log_log_segments_created

Number of created log segments.


vectorized_storage_log_partition_size

Current size of partition in bytes.


vectorized_storage_log_read_bytes

Total number of bytes read.


vectorized_storage_log_written_bytes

Total number of bytes written.