Metrics Reference
This section provides reference descriptions for the public metrics exported from Redpanda’s /public_metrics endpoint.
|
In a live system, Redpanda metrics are exported only for features that are in use. For example, Redpanda does not export metrics for consumer groups if no groups are registered. To see the available public metrics in your system, query the
|
Cluster metrics
redpanda_cluster_brokers
Total number of fully commissioned brokers configured in the cluster.
Type: gauge
Usage: Create an alert if this gauge falls below a steady-state threshold, which may indicate that a broker has become unresponsive.
Available in Serverless: No
redpanda_cluster_controller_log_limit_requests_available_rps
The upper limit on the requests per second (RPS) that the cluster controller log is allowed to process, segmented by command group.
Type: gauge
Labels:
-
redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")
Available in Serverless: No
redpanda_cluster_controller_log_limit_requests_dropped
The cumulative number of requests dropped by the controller log because the incoming rate exceeded the available RPS limit.
Type: counter
Labels:
-
redpanda_cmd_group=("move_operations" | "topic_operations" | "configuration_operations" | "node_management_operations" | "acls_and_users_operations")
Usage: A rising counter indicates that requests are being dropped, which could signal overload or misconfiguration.
Available in Serverless: No
redpanda_cluster_features_enterprise_license_expiry_sec
Number of seconds remaining until the Enterprise Edition license expires.
Type: gauge
Usage:
-
A value of
-1indicates that no license is present. -
A value of
0signifies an expired license.
Use this metric to proactively monitor license status and trigger alerts for timely renewal.
Available in Serverless: No
redpanda_cluster_latest_cluster_metadata_manifest_age
The amount of time in seconds since the last time Redpanda uploaded metadata files to Tiered Storage for your cluster. A value of 0 indicates metadata has not yet been uploaded.
When performing a whole cluster restore operation, metadata for new clusters will not include any changes made to a source cluster that is newer than this age.
Type: gauge
Usage: On a healthy system, this should not exceed the value set for cloud_storage_cluster_metadata_upload_interval_ms. You may consider setting an alert if this remains 0 for longer than 1.5 * cloud_storage_cluster_metadata_upload_interval_ms as that may indicate a configuration issue.
Available in Serverless: No
redpanda_cluster_members_backend_queued_node_operations
The number of node operations queued per shard that are awaiting processing by the backend.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_cluster_non_homogenous_fips_mode
Count of brokers whose FIPS mode configuration differs from the rest of the cluster.
Type: gauge
Usage: Indicates inconsistencies in security configurations that might affect compliance or interoperability.
Available in Serverless: No
redpanda_cluster_partition_moving_from_node
Number of partition replicas that are in the process of being removed from a broker.
Type: gauge
Usage: A non-zero value can indicate ongoing or unexpected partition reassignments. Investigate if this metric remains elevated.
Available in Serverless: No
redpanda_cluster_partition_moving_to_node
Number of partition replicas in the cluster currently being added or moved to a broker.
Type: gauge
Usage: When this gauge is non-zero, determine whether there is an expected or unexpected reassignment of partitions causing partition replicas movement.
Available in Serverless: No
redpanda_cluster_partition_node_cancelling_movements
During a partition movement cancellation operation, the number of partition replicas that were being moved that now need to be canceled.
Type: gauge
Usage: Track this metric to verify that partition reassignments are proceeding as expected; persistent non-zero values may warrant further investigation.
Available in Serverless: No
redpanda_cluster_partition_num_with_broken_rack_constraint
During a partition movement cancellation operation, the number of partition replicas that were scheduled for movement but now require cancellation.
Type: gauge
Usage: A non-zero value may indicate issues in the partition reassignment process that need attention.
Available in Serverless: No
redpanda_cluster_partitions
Total number of logical partitions managed by the cluster. This includes partitions for the controller topic but excludes replicas.
Type: gauge
Available in Serverless: Yes
redpanda_cluster_topics
The total number of topics configured within the cluster.
Type: gauge
Available in Serverless: Yes
redpanda_cluster_unavailable_partitions
Number of partitions that are unavailable due to a lack of quorum among their replica set.
Type: gauge
Usage: A non-zero value indicates that some partitions do not have an active leader. Consider increasing the number of brokers or the replication factor if this persists.
Available in Serverless: No
Debug bundle metrics
redpanda_debug_bundle_failed_generation_count
Running count of debug bundle generation failures, reported per shard.
Type: counter
Labels:
-
shard
Available in Serverless: No
redpanda_debug_bundle_last_failed_bundle_timestamp_seconds
Unix epoch timestamp of the last failed debug bundle generation, per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
Iceberg metrics
redpanda_iceberg_rest_client_active_gets
Number of active GET requests.
Type: gauge
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_active_puts
Number of active PUT requests.
Type: gauge
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_active_requests
Number of active HTTP requests (includes PUT and GET).
Type: gauge
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_commit_table_update_requests
Total number of requests sent to the commit_table_update endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_commit_table_update_requests_failed
Number of requests sent to the commit_table_update endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_create_namespace_requests
Total number of requests sent to the create_namespace endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_create_namespace_requests_failed
Number of requests sent to the create_namespace endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_create_table_requests
Total number of requests sent to the create_table endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_create_table_requests_failed
Number of requests sent to the create_table endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_drop_table_requests
Total number of requests sent to the drop_table endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_drop_table_requests_failed
Number of requests sent to the drop_table endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_get_config_requests
Total number of requests sent to the config endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_get_config_requests_failed
Number of requests sent to the config endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_load_table_requests
Total number of requests sent to the load_table endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_load_table_requests_failed
Number of requests sent to the load_table endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_oauth_token_requests
Total number of requests sent to the oauth_token endpoint.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_oauth_token_requests_failed
Number of requests sent to the oauth_token endpoint that failed.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_request_timeouts
Total number of catalog requests that could no longer be retried because they timed out. This may occur if the catalog is not responding.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_num_transport_errors
Total number of transport errors (TCP and TLS).
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_total_gets
Number of completed GET requests.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_total_inbound_bytes
Total number of bytes received from the Iceberg REST catalog.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_total_outbound_bytes
Total number of bytes sent to the Iceberg REST catalog.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_total_puts
Number of completed PUT requests.
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_rest_client_total_requests
Number of completed HTTP requests (includes PUT and GET).
Type: counter
Labels:
-
role
Available in Serverless: No
redpanda_iceberg_translation_decompressed_bytes_processed
Number of bytes consumed post-decompression for processing that may or may not succeed in being processed. For example, if Redpanda fails to communicate with the coordinator preventing processing of a batch, this metric still increases.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_iceberg_translation_dlq_files_created
Number of created Parquet files for the dead letter queue (DLQ) table.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_iceberg_translation_files_created
Number of created Parquet files (not counting the DLQ table).
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_iceberg_translation_invalid_records
Number of invalid records handled by translation.
Type: counter
Labels:
-
redpanda_cause -
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_iceberg_translation_parquet_bytes_added
Number of bytes in created Parquet files (not counting the DLQ table).
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_iceberg_translation_parquet_rows_added
Number of rows in created Parquet files (not counting the DLQ table).
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_iceberg_translation_raw_bytes_processed
Number of raw, potentially compressed bytes, consumed for processing that may or may not succeed in being processed. For example, if Redpanda fails to communicate with the coordinator preventing processing of a batch, this metric still increases.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
Infrastructure metrics
redpanda_cpu_busy_seconds_total
Total time (in seconds) the CPU has been actively processing tasks.
Type: counter
Usage: Useful for tracking overall CPU utilization.
Labels:
-
shard
Available in Serverless: No
redpanda_io_queue_total_read_ops
Cumulative count of read operations processed by the I/O queue.
Type: counter
Labels:
-
class=("default" | "compaction" | "raft") -
iogroup -
mountpoint -
shard
Available in Serverless: No
redpanda_io_queue_total_write_ops
Cumulative count of write operations processed by the I/O queue.
Type: counter
Labels:
-
class=("default" | "compaction" | "raft") -
iogroup -
mountpoint -
shard
Available in Serverless: No
redpanda_memory_allocated_memory
Total memory allocated (in bytes) per CPU shard. This includes all memory currently held by Redpanda on that shard, including memory in the batch cache that has been allocated but could be reclaimed if needed.
Type: gauge
Labels:
-
shard
Usage: This metric counts all allocated memory, including reclaimable batch cache memory, so it may appear high even when the system is not under memory pressure. To monitor for memory exhaustion, use redpanda_memory_available_memory instead, which deducts reclaimable memory and gives a more accurate view of how much memory is actually free.
To see raw per-shard values, query the metric directly:
redpanda_memory_allocated_memory
To see total allocated memory across all shards on a broker:
sum by (instance) (redpanda_memory_allocated_memory)
Available in Serverless: No
redpanda_memory_available_memory
Total memory (in bytes) available to a CPU shard—including both free and reclaimable memory.
Type: gauge
Labels:
-
shard
Usage: This metric is more useful than redpanda_memory_allocated_memory for monitoring memory pressure, as it accounts for reclaimable memory in the batch cache. A low value indicates the system is approaching memory exhaustion.
Available in Serverless: No
redpanda_memory_available_memory_low_water_mark
The lowest recorded available memory (in bytes) per CPU shard since the process started.
Type: gauge
Labels:
-
shard
Usage: This metric helps identify the closest the system has come to memory exhaustion. Useful for capacity planning and understanding historical memory pressure patterns.
Available in Serverless: No
redpanda_memory_free_memory
Total unallocated (free) memory in bytes available for each CPU shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_rpc_active_connections
Current number of active RPC client connections on a shard.
Type: gauge
Labels:
-
redpanda_server=("kafka" | "internal")
Available in Serverless: No
redpanda_rpc_received_bytes
Number of bytes received from the clients in valid requests.
The redpanda_server label supports the following options for this metric:
-
kafka: Data sent over the Kafka API -
internal: Inter-broker traffic
Type: counter
Labels:
-
redpanda_server
Available in Serverless: No
redpanda_rpc_request_errors_total
Cumulative count of RPC errors encountered, segmented by server type.
Type: counter
Labels:
-
redpanda_server=("kafka" | "internal")
Usage: Use this metric to diagnose potential issues in RPC communication.
Available in Serverless: No
redpanda_rpc_request_latency_seconds
Histogram capturing the latency (in seconds) for RPC requests.
Type: histogram
Labels:
-
redpanda_server=("kafka" | "internal")
Available in Serverless: No
redpanda_rpc_sent_bytes
Number of bytes sent to clients.
The redpanda_server label supports the following options for this metric:
-
kafka: Data sent over the Kafka API -
internal: Inter-broker traffic
Type: counter
Labels:
-
redpanda_server
Available in Serverless: No
redpanda_scheduler_runtime_seconds_total
Total accumulated runtime (in seconds) for the task queue associated with each scheduling group per shard.
Type: counter
Labels:
-
redpanda_scheduling_group=("admin" | "archival_upload" | "cache_background_reclaim" | "cluster" | "coproc" | "kafka" | "log_compaction" | "main" | "node_status" | "raft" | "raft_learner_recovery") -
shard
Available in Serverless: No
redpanda_storage_cache_disk_free_bytes
Amount of free disk space (in bytes) available on the cache storage.
Type: gauge
Usage: Monitor this to ensure sufficient cache storage capacity.
Available in Serverless: No
redpanda_storage_cache_disk_free_space_alert
Alert indicator for cache storage free space, where:
-
0= OK -
1= Low space -
2= Degraded
Type: gauge
Available in Serverless: No
redpanda_storage_cache_disk_total_bytes
Total size of attached storage, in bytes.
Type: gauge
Available in Serverless: No
redpanda_storage_disk_free_bytes
Amount of free disk space (in bytes) available on attached storage.
Type: gauge
Available in Serverless: No
redpanda_storage_disk_free_space_alert
Alert indicator for overall disk storage free space, where:
-
0= OK -
1= Low space -
2= Degraded
Type: gauge
Available in Serverless: No
Raft metrics
Redpanda Connect metrics
input_connection_failed
Number of input connections to the Redpanda Connect pipeline that have failed.
Type: counter
Available in Serverless: Yes
input_connection_lost
Number of times a connection to the upstream system is lost in a Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
input_connection_up
Number of active input connections to the Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
input_latency_ns
Input latency for the Redpanda Connect pipeline.
Type: summary
Available in Serverless: Yes
input_received
Number of records received by the Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
output_batch_sent
Number of batches produced by the Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
output_connection_failed
Number of output connections from the Redpanda Connect pipeline that have failed.
Type: counter
Available in Serverless: Yes
output_connection_lost
Number of times the connection to the downstream system is lost in a Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
output_connection_up
Number of active output connections from the Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
output_error
Number of errors encountered in the Redpanda Connect pipeline output.
Type: counter
Available in Serverless: Yes
output_latency_ns
Output latency for the Redpanda Connect pipeline.
Type: summary
Available in Serverless: Yes
output_sent
Records sent by the Redpanda Connect pipeline.
Type: counter
Available in Serverless: Yes
processor_batch_received
Number of record batches received as input in a Redpanda Connect pipeline processor.
Type: counter
Available in Serverless: Yes
processor_batch_sent
Number of record batches produced as output by a Redpanda Connect pipeline processor.
Type: counter
Available in Serverless: Yes
processor_error
Number of errors encountered by a Redpanda Connect pipeline processor.
Type: counter
Available in Serverless: Yes
processor_latency_ns
Processing time in nanoseconds of a Redpanda Connect pipeline processor.
Type: summary
Available in Serverless: Yes
Serverless metrics
redpanda_serverless_ingress_bytes_total
Total raw bytes sent by clients to the Serverless cluster.
Type: counter
Available in Serverless: Yes
redpanda_serverless_egress_bytes_total
Total raw bytes sent by the Serverless cluster to clients.
Type: counter
Available in Serverless: Yes
redpanda_serverless_connections_active
Number of active client connections.
Type: gauge
Available in Serverless: Yes
redpanda_serverless_connections_created_total
Total number of client connections created.
Type: counter
Available in Serverless: Yes
redpanda_serverless_connections_duration_seconds
Total duration (in seconds) of client connections.
Type: summary
Available in Serverless: Yes
redpanda_serverless_resource_limit
Resource limits for the Serverless cluster:
-
Partition quota
-
Topic quota
-
Ingress quota
-
Egress quota
-
Connection quota
To increase resource limits, contact Redpanda Support.
Type: gauge
Available in Serverless: Yes
Service metrics
redpanda_authorization_result
Cumulative count of authorization results, categorized by result type.
Type: counter
Labels:
-
type
Available in Serverless: No
redpanda_kafka_rpc_sasl_session_expiration_total
Total number of SASL session expirations observed.
Type: counter
Available in Serverless: No
redpanda_kafka_rpc_sasl_session_reauth_attempts_total
Total number of SASL reauthentication attempts made by clients.
Type: counter
Available in Serverless: No
redpanda_kafka_rpc_sasl_session_revoked_total
Total number of SASL sessions that have been revoked.
Type: counter
Available in Serverless: No
redpanda_rest_proxy_request_latency_seconds
Histogram capturing the latency (in seconds) for REST proxy requests. The measurement includes waiting for resource availability, processing, and response dispatch.
Type: histogram
Available in Serverless: No
redpanda_schema_registry_cache_schema_count
Total number of schemas currently stored in the Schema Registry cache.
Type: gauge
Available in Serverless: Yes
redpanda_schema_registry_cache_schema_memory_bytes
Memory usage (in bytes) by schemas stored in the Schema Registry cache.
Type: gauge
Available in Serverless: No
redpanda_schema_registry_cache_subject_count
Count of subjects stored in the Schema Registry cache.
Type: gauge
Labels:
-
deleted
Available in Serverless: No
redpanda_schema_registry_cache_subject_version_count
Count of versions available for each subject in the Schema Registry cache.
Type: gauge
Labels:
-
deleted -
subject
Available in Serverless: No
redpanda_schema_registry_inflight_requests_memory_usage_ratio
Ratio of memory used by in-flight requests in the Schema Registry, reported per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_schema_registry_inflight_requests_usage_ratio
Usage ratio for in-flight Schema Registry requests, reported per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_schema_registry_queued_requests_memory_blocked
Count of Schema Registry requests queued due to memory constraints, reported per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
Partition metrics
redpanda_kafka_max_offset
High watermark offset for a partition, used to calculate consumer group lag.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_partition -
redpanda_topic
Related topics:
Available in Serverless: No
redpanda_kafka_request_bytes_total
Total number of bytes read from or written to the partitions of a topic. The total may include fetched bytes that are not returned to the client.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic -
redpanda_request=("produce" | "consume")
Available in Serverless: Yes
redpanda_kafka_under_replicated_replicas
Number of partition replicas that are live yet lag behind the latest offset, redpanda_kafka_max_offset.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_partition -
redpanda_topic
Available in Serverless: No
redpanda_raft_leadership_changes
Total count of leadership changes (such as successful leader elections) across all partitions for a given topic.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_raft_learners_gap_bytes
Total number of bytes that must be delivered to learner replicas to bring them up to date.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_raft_recovery_offsets_pending
Sum of offsets across partitions on a broker that still need to be recovered.
Type: gauge
Available in Serverless: No
redpanda_raft_recovery_partition_movement_available_bandwidth
Available network bandwidth (in bytes per second) for partition movement operations.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_raft_recovery_partition_movement_consumed_bandwidth
Network bandwidth (in bytes per second) currently being consumed for partition movement.
Type: gauge
Labels:
-
shard
Available in Serverless: No
Topic metrics
redpanda_cluster_partition_schema_id_validation_records_failed
Count of records that failed schema ID validation during ingestion.
Type: counter
Available in Serverless: No
redpanda_kafka_partitions
Configured number of partitions for a topic.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: Yes
redpanda_kafka_records_fetched_total
Total number of records fetched from a topic.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: Yes
redpanda_kafka_records_produced_total
Total number of records produced to a topic.
Type: counter
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: Yes
redpanda_kafka_replicas
Configured number of replicas for a topic.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: Yes
Broker metrics
redpanda_kafka_handler_latency_seconds
Histogram capturing the latency for processing Kafka requests at the broker level.
Type: histogram
Available in Serverless: No
redpanda_kafka_request_latency_seconds
Histogram capturing the latency (in seconds) for produce/consume requests at the broker. This duration spans from request initiation to response fulfillment.
Type: histogram
Labels:
-
redpanda_request=("produce" | "consume")
Available in Serverless: No
redpanda_kafka_quotas_client_quota_throttle_time
Histogram of client quota throttling delays (in seconds) per quota rule and type.
Type: histogram
Labels:
-
quota_rule=("not_applicable" | "kafka_client_default" | "cluster_client_default" | "kafka_client_prefix" | "cluster_client_prefix" | "kafka_client_id") -
quota_type=("produce_quota" | "fetch_quota" | "partition_mutation_quota")
Available in Serverless: No
redpanda_kafka_quotas_client_quota_throughput
Histogram of client quota throughput per quota rule and type.
Type: histogram
Labels:
-
quota_rule=("not_applicable" | "kafka_client_default" | "cluster_client_default" | "kafka_client_prefix" | "cluster_client_prefix" | "kafka_client_id") -
quota_type=("produce_quota" | "fetch_quota" | "partition_mutation_quota")
Available in Serverless: No
Consumer group metrics
redpanda_kafka_consumer_group_committed_offset
Committed offset for a consumer group, segmented by topic and partition.
To enable this metric, you must include the partition option in the enable_consumer_group_metrics cluster property.
Type: gauge
Labels:
-
redpanda_group -
redpanda_partition -
redpanda_topic -
shard
Available in Serverless: No
redpanda_kafka_consumer_group_consumers
Number of active consumers within a consumer group.
To enable this metric, you must include the group option in the enable_consumer_group_metrics cluster property.
Type: gauge
Labels:
-
redpanda_group -
shard
Available in Serverless: Yes
redpanda_kafka_consumer_group_lag_max
Maximum consumer group lag across topic partitions. This metric is useful for identifying the most delayed partition in the consumer group.
To enable this metric, you must include the consumer_lag option in the enable_consumer_group_metrics cluster property.
Type: gauge
Labels:
-
redpanda_group
Available in Serverless: Yes
Related topics:
redpanda_kafka_consumer_group_lag_sum
Sum of consumer group lag for all topic partitions. This metric is useful for tracking the total lag across all partitions.
To enable this metric, you must include the consumer_lag option in the enable_consumer_group_metrics cluster property.
Type: gauge
Labels:
-
redpanda_group
Available in Serverless: Yes
Related topics:
redpanda_kafka_consumer_group_topics
Number of topics being consumed by a consumer group.
To enable this metric, you must include the group option in the enable_consumer_group_metrics cluster property.
Type: gauge
Labels:
-
redpanda_group -
shard
Available in Serverless: Yes
REST proxy metrics
redpanda_rest_proxy_inflight_requests_memory_usage_ratio
Ratio of memory used by in-flight REST proxy requests, measured per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_rest_proxy_inflight_requests_usage_ratio
Usage ratio for in-flight REST proxy requests, measured per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
redpanda_rest_proxy_queued_requests_memory_blocked
Count of REST proxy requests queued due to memory limitations, measured per shard.
Type: gauge
Labels:
-
shard
Available in Serverless: No
Application metrics
redpanda_application_build
Build information for Redpanda, including the revision and version details.
Type: gauge
Labels:
-
redpanda_revision -
redpanda_version
Available in Serverless: Yes
Cloud metrics
redpanda_cloud_client_backoff
Total number of object storage requests that experienced backoff delays.
Type: counter
Labels:
-
For S3 and GCP:
-
redpanda_endpoint -
redpanda_region
-
-
For Azure Blob Storage (ABS):
-
redpanda_endpoint -
redpanda_storage_account
-
Available in Serverless: No
redpanda_cloud_client_client_pool_utilization
Utilization of the object storage pool(0 - unused, 100 - fully utilized).
Type: gauge
Labels:
-
redpanda_endpoint -
redpanda_region -
shard
Available in Serverless: No
redpanda_cloud_client_download_backoff
Total number of object storage download requests that experienced backoff delays.
Type: counter
Labels:
-
For S3 and GCP:
-
redpanda_endpoint -
redpanda_region
-
-
For Azure Blob Storage (ABS):
-
redpanda_endpoint -
redpanda_storage_account
-
Available in Serverless: No
redpanda_cloud_client_downloads
Total number of successful download requests from object storage.
Type: counter
Labels:
-
For S3 and GCP:
-
redpanda_endpoint -
redpanda_region
-
-
For Azure Blob Storage (ABS):
-
redpanda_endpoint -
redpanda_storage_account
-
Available in Serverless: No
redpanda_cloud_client_lease_duration
Histogram representing the lease duration for object storage clients.
Type: histogram
Available in Serverless: No
redpanda_cloud_client_not_found
Total number of object storage requests that resulted in a "not found" error.
Type: counter
Labels:
-
For S3 and GCP:
-
redpanda_endpoint -
redpanda_region
-
-
For Azure Blob Storage (ABS):
-
redpanda_endpoint -
redpanda_storage_account
-
Available in Serverless: No
redpanda_cloud_client_num_borrows
Count of instances where a shard borrowed a object storage client from another shard.
Type: counter
Labels:
-
redpanda_endpoint -
redpanda_region -
shard
Available in Serverless: No
TLS metrics
redpanda_tls_certificate_expires_at_timestamp_seconds
Unix epoch timestamp for the expiration of the shortest-lived installed TLS certificate.
Type: gauge
Labels:
-
area -
detail
Usage: Useful for proactive certificate renewal by indicating the next certificate set to expire.
Available in Serverless: No
redpanda_tls_certificate_serial
The least significant 4 bytes of the serial number for the certificate that will expire next.
Type: gauge
Labels:
-
area -
detail
Usage: Provides a quick reference to identify the certificate in question.
Available in Serverless: No
redpanda_tls_certificate_valid
Indicator of whether a resource has at least one valid TLS certificate installed. Returns 1 if a valid certificate is present and 0 if not.
Type: gauge
Labels:
-
area -
detail
Usage: Aids in continuous monitoring of certificate validity across resources.
Available in Serverless: No
redpanda_tls_loaded_at_timestamp_seconds
Unix epoch timestamp marking the last time a TLS certificate was loaded for a resource.
Type: gauge
Labels:
-
area -
detail
Usage: Indicates recent certificate updates across resources.
Available in Serverless: No
redpanda_tls_truststore_expires_at_timestamp_seconds
Unix epoch timestamp representing the expiration time of the shortest-lived certificate authority (CA) in the installed truststore.
Type: gauge
Labels:
-
area -
detail
Usage: Helps identify when any CA in the chain is nearing expiration.
Available in Serverless: No
Data transforms metrics
redpanda_transform_execution_errors
Counter for the number of errors encountered during the invocation of data transforms.
Type: counter
Labels:
-
function_name
Available in Serverless: No
redpanda_transform_execution_latency_sec
Histogram tracking the execution latency (in seconds) for processing a single record using data transforms.
Type: histogram
Labels:
-
function_name
Available in Serverless: No
redpanda_transform_failures
Counter for each failure encountered by a data transform processor.
Type: counter
Labels:
-
function_name
Available in Serverless: No
redpanda_transform_processor_lag
Number of records pending processing in the input topic for a data transform.
Type: gauge
Labels:
-
function_name
Available in Serverless: No
redpanda_transform_read_bytes
Cumulative count of bytes read as input to data transforms.
Type: counter
Labels:
-
function_name
Available in Serverless: No
redpanda_transform_state
Current count of transform processors in a specific state (running, inactive, or errored).
Type: gauge
Labels:
-
function_name -
state=("running" | "inactive" | "errored")
Available in Serverless: No
redpanda_transform_write_bytes
Cumulative count of bytes output from data transforms.
Type: counter
Labels:
-
function_name
Available in Serverless: No
redpanda_wasm_binary_executable_memory_usage
Number of bytes (memory) used by executable WebAssembly binaries.
Type: gauge
Available in Serverless: No
redpanda_wasm_engine_cpu_seconds_total
Total CPU time (in seconds) consumed by WebAssembly functions.
Type: counter
Labels:
-
function_name
Available in Serverless: No
Object storage metrics
redpanda_cloud_storage_active_segments
Number of remote log segments that are currently hydrated and available for read operations.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_anomalies
Count of missing partition manifest anomalies detected for the topic.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_op_hit
Total number of successful get requests that found the requested object in the cache.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_op_in_progress_files
Number of files currently being written to the cache.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_op_miss
Total count of get requests that did not find the requested object in the cache.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_op_put
Total number of objects successfully written into the cache.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_space_files
Current number of objects stored in the cache.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_space_hwm_files
High watermark for the number of objects stored in the cache.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_space_hwm_size_bytes
High watermark for the total size (in bytes) of cached objects.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_space_size_bytes
Total size (in bytes) of objects currently stored in the cache.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_space_tracker_size
Current count of entries in the cache access tracker.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_cache_space_tracker_syncs
Total number of times the cache access tracker was synchronized with disk data.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_trim_carryover_trims
Count of times the cache trim operation was invoked using a carryover strategy.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_trim_exhaustive_trims
Count of instances where a fast cache trim was insufficient and an exhaustive trim was required.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_trim_failed_trims
Count of cache trim operations that failed to free the expected amount of space, possibly indicating a bug or misconfiguration.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_trim_fast_trims
Count of successful fast cache trim operations.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cache_trim_in_mem_trims
Count of cache trim operations performed using the in-memory access tracker.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_cloud_log_size
Total size (in bytes) of user-visible log data stored in Tiered Storage. This value increases with every segment offload and decreases when segments are deleted due to retention or compaction.
Type: gauge
Usage: Segmented by redpanda_namespace (e.g., kafka, kafka_internal, or redpanda), redpanda_topic, and redpanda_partition.
Available in Serverless: No
redpanda_cloud_storage_deleted_segments
Count of log segments that have been deleted from object storage due to retention policies or compaction processes.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_errors_total
Cumulative count of errors encountered during object storage operations, segmented by direction.
Type: counter
Labels:
-
redpanda_direction
Available in Serverless: No
redpanda_cloud_storage_housekeeping_drains
Count of times the object storage upload housekeeping queue was fully drained.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_housekeeping_jobs_completed
Total number of successfully executed object storage housekeeping jobs.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_housekeeping_jobs_failed
Total number of object storage housekeeping jobs that failed.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_housekeeping_jobs_skipped
Count of object storage housekeeping jobs that were skipped during execution.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_housekeeping_pauses
Count of times object storage upload housekeeping was paused.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_housekeeping_requests_throttled_average_rate
Average rate (per shard) of requests that were throttled during object storage operations.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_housekeeping_resumes
Count of instances when object storage upload housekeeping resumed after a pause.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_housekeeping_rounds
Total number of rounds executed by the object storage upload housekeeping process.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_jobs_cloud_segment_reuploads
Count of log segments reuploaded from object storage sources (either from the cache or via direct download).
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_jobs_local_segment_reuploads
Count of log segments reuploaded from the local data directory.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_jobs_manifest_reuploads
Total number of partition manifest reuploads performed by housekeeping jobs.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_jobs_metadata_syncs
Total number of archival configuration updates (metadata synchronizations) executed by housekeeping jobs.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_jobs_segment_deletions
Total count of log segments deleted by housekeeping jobs.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_limits_downloads_throttled_sum
Total cumulative time (in milliseconds) during which downloads were throttled.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_partition_manifest_uploads_total
Total number of successful partition manifest uploads to object storage.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_partition_readers
Number of active partition reader instances (fetch/timequery operations) reading from Tiered Storage.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_partition_readers_delayed
Count of partition read operations delayed due to reaching the reader limit, suggesting potential saturation of Tiered Storage reads.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_paused_archivers
Number of paused archivers.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_cloud_storage_readers
Total number of segment read cursors for hydrated remote log segments.
Type: gauge
Available in Serverless: No
redpanda_cloud_storage_segment_index_uploads_total
Total number of successful segment index uploads to object storage.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_segment_materializations_delayed
Count of segment materialization operations that were delayed because of reader limit constraints.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_segment_readers_delayed
Count of segment reader operations delayed due to reaching the reader limit. This indicates a cluster is saturated with Tiered Storage reads.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_segment_uploads_total
Total number of successful data segment uploads to object storage.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_segments
Total number of log segments accounted for in object storage for the topic.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_cloud_storage_segments_pending_deletion
Total number of log segments pending deletion from object storage for the topic.
Type: gauge
Labels:
-
redpanda_namespace -
redpanda_topic
Available in Serverless: No
redpanda_cloud_storage_spillover_manifest_uploads_total
Total number of successful spillover manifest uploads to object storage.
Type: counter
Available in Serverless: No
redpanda_cloud_storage_spillover_manifests_materialized_bytes
Total bytes of memory used by spilled manifests that are currently cached in memory.
Type: gauge
Available in Serverless: No
Shadow link metrics
redpanda_shadow_link_shadow_lag
The lag of the shadow partition against the source partition, calculated as source partition last stable offset (LSO) minus shadow partition high watermark (HWM). Monitor this metric to understand replication lag for each partition and ensure your recovery point objective (RPO) requirements are being met.
Type: gauge
Labels:
-
shadow_link_name- Name of the shadow link -
topic- Topic name -
partition- Partition identifier
redpanda_shadow_link_shadow_topic_state
Number of shadow topics in the respective states. Monitor this metric to track the health and status distribution of shadow topics across your shadow links.
Type: gauge
Labels:
-
shadow_link_name- Name of the shadow link -
state- Topic state (active, failed, paused, failing_over, failed_over, promoting, promoted)
redpanda_shadow_link_client_errors
Total number of errors encountered by the Kafka client during shadow link operations. Monitor this metric to identify connection issues, authentication failures, or other client-side problems affecting shadow link replication.
Type: counter
Labels:
-
shadow_link_name- Name of the shadow link
redpanda_shadow_link_total_bytes_fetched
Total number of bytes fetched by a sharded replicator (bytes received by the client). Use this metric to track data transfer volume from the source cluster.
Type: counter
Labels:
-
shadow_link_name- Name of the shadow link -
shard- Shard identifier
redpanda_shadow_link_total_bytes_written
Total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Use this metric to monitor data written to the shadow cluster.
Type: counter
Labels:
-
shadow_link_name- Name of the shadow link -
shard- Shard identifier
redpanda_shadow_link_total_records_fetched
Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster.
Type: counter
Labels:
-
shadow_link_name- Name of the shadow link -
shard- Shard identifier
redpanda_shadow_link_total_records_written
Total number of records written by a sharded replicator (records written to the write_at_offset_stm). Use this metric to monitor message throughput to the shadow cluster.
Type: counter
Labels:
-
shadow_link_name- Name of the shadow link -
shard- Shard identifier