# Cluster Diagnostics

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [streaming-full.txt](https://docs.redpanda.com/streaming-full.txt)

---
title: Cluster Diagnostics
latest-operator-version: v26.1.4
# EOL = End-of-Life (support lifecycle status)
page-is-nearing-eol: "false"
page-is-past-eol: "true"
page-eol-date: April 30, 2025
latest-console-tag: v3.7.3
latest-connect-version: 4.93.0
docname: cluster-maintenance/cluster-diagnostics
page-component-name: streaming
page-version: "24.1"
page-component-version: "24.1"
page-component-title: Streaming
page-relative-src-path: cluster-maintenance/cluster-diagnostics.adoc
page-edit-url: https://github.com/redpanda-data/docs/edit/v/24.1/modules/manage/pages/cluster-maintenance/cluster-diagnostics.adoc
description: Use tools and tests to help diagnose and debug a Redpanda cluster.
page-git-created-date: "2023-08-03"
page-git-modified-date: "2024-02-29"
support-status: past end-of-life
---

<!-- Source: https://docs.redpanda.com/streaming/24.1/manage/cluster-maintenance/cluster-diagnostics.md -->

This topic provides guides for using tools and tests to help diagnose and debug a Redpanda cluster.

## [](#self-test)Disk and network self-test benchmarks

When anomalous behavior arises in a cluster and you’re trying to figure out whether it’s caused by faulty hardware (disks, NICs) of a cluster’s machines, run [rpk cluster self-test](https://docs.redpanda.com/streaming/24.1/reference/rpk/rpk-cluster/rpk-cluster-self-test/) (self-test) to characterize their performance and compare it with their expected, vendor-specified performance.

Self-test runs a set of benchmarks to determine the maximum performance of a machine’s disks and network connections. For disks, it runs throughput and latency tests by performing concurrent sequential operations. For networks, it selects unique pairs of Redpanda nodes as client/server pairs, then it runs throughput tests between them. Self-test runs each benchmark for a configurable duration, and it returns IOPS, throughput, and latency metrics.

### [](#self-test-command-examples)Self-test command examples

To begin using self-test, run the `self-test start` command.

rpk cluster self-test start

For command help, run `rpk cluster self-test start -h`. For additional command flags, see the [rpk cluster self-test start](https://docs.redpanda.com/streaming/24.1/reference/rpk/rpk-cluster/rpk-cluster-self-test-start/) reference.

Before it starts, `self-test start` asks for your confirmation to run its potentially large workload.

Example start output:

? Redpanda self-test will run benchmarks of disk and network hardware that will consume significant system resources. Do not start self-test if large workloads are already running on the system. (Y/n)
Redpanda self-test has started, test identifier: "031be460-246b-46af-98f2-5fc16f03aed3", To check the status run:
rpk cluster self-test status

The `self-test start` command returns immediately, and self-test runs its benchmarks asynchronously.

To check on the status of self-test, run the `self-test status` command.

```bash
rpk cluster self-test status
```

For command help, run `rpk cluster self-test status -h`. For additional command flags, see the [rpk cluster self-test status](https://docs.redpanda.com/streaming/24.1/reference/rpk/rpk-cluster/rpk-cluster-self-test-status/) reference.

If benchmarks are currently running, `self-test status` returns a test-in-progress message.

Example status output:

$ rpk cluster self-test status
Nodes \[0 1 2\] are still running jobs

> 💡 **TIP**
>
> To automate checking the status of self-test, the `status` command can output its results in JSON format by using the `--format=json` option:
>
> ```bash
> rpk cluster self-test status --format=json
> ```

If benchmarks have completed, `self-test status` returns their results.

Example status output: test results

Test results are grouped by node ID. Each test returns the following:

-   **NAME**: Description of the test.

-   **INFO**: Detail about the test run attached by Redpanda itself.

-   **TYPE**: Either `disk` or `network` test.

-   **TEST ID**: Unique identifier given to jobs of a run. All IDs in a test should match. If they don’t match, then newer and/or older test results have been included erroneously.

-   **TIMEOUTS**: Number of timeouts incurred during the test.

-   **DURATION**: Duration of the test.

-   **IOPS**: Number of operations per second. For disk, it’s `seastar::dma_read` and `seastar::dma_write`. For network, it’s `rpc.send()`

-   **THROUGHPUT**: For disk, it’s throughput rate in bytes per second. For network, it’s throughput rate in bits per second in. (Note: GiB vs. Gib is the correct notation displayed by the UI.)

-   **LATENCY**: 50th, 90th, etc. percentiles of operation latency, reported in microseconds.


```none
$ rpk cluster self-test status
NODE ID: 1 | STATUS: IDLE
=========================
NAME        512K sequential r/w throughput disk test
INFO        write run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5001ms
IOPS        1590 req/sec
THROUGHPUT  795.2MiB/sec
LATENCY     P50    P90     P99      P999     MAX
            831us  5887us  11263us  24575us  507903us

NAME        512K sequential r/w throughput disk test
INFO        read run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5001ms
IOPS        4504 req/sec
THROUGHPUT  2.2GiB/sec
LATENCY     P50    P90     P99     P999    MAX
            703us  1599us  4351us  6399us  10239us

NAME        4k sequential r/w latency/iops disk test
INFO        write run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5031ms
IOPS        289 req/sec
THROUGHPUT  144.7MiB/sec
LATENCY     P50    P90      P99      P999     MAX
            543us  34815us  69631us  77823us  77823us

NAME        4k sequential r/w latency/iops disk test
INFO        read run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        8275 req/sec
THROUGHPUT  4.041GiB/sec
LATENCY     P50    P90    P99    P999    MAX
            191us  447us  831us  2175us  278527us

NAME        8K Network Throughput Test
INFO        Test performed against node: 0
TYPE        network
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        61254 req/sec
THROUGHPUT  3.74Gib/sec
LATENCY     P50    P90    P99    P999   MAX
            159us  207us  303us  415us  1087us

NAME        8K Network Throughput Test
INFO        Test performed against node: 2
TYPE        network
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        54814 req/sec
THROUGHPUT  3.35Gib/sec
LATENCY     P50    P90    P99    P999   MAX
            167us  255us  367us  511us  25599us

NODE ID: 0 | STATUS: IDLE
=========================
NAME        512K sequential r/w throughput disk test
INFO        write run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5002ms
IOPS        1593 req/sec
THROUGHPUT  796.8MiB/sec
LATENCY     P50    P90     P99      P999     MAX
            735us  5887us  11263us  69631us  507903us

NAME        512K sequential r/w throughput disk test
INFO        read run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        4372 req/sec
THROUGHPUT  2.135GiB/sec
LATENCY     P50    P90     P99     P999    MAX
            735us  1599us  4351us  7423us  9215us

NAME        4k sequential r/w latency/iops disk test
INFO        write run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5026ms
IOPS        286 req/sec
THROUGHPUT  143.1MiB/sec
LATENCY     P50    P90      P99      P999     MAX
            543us  34815us  69631us  77823us  77823us

NAME        4k sequential r/w latency/iops disk test
INFO        read run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        8269 req/sec
THROUGHPUT  4.038GiB/sec
LATENCY     P50    P90    P99    P999    MAX
            191us  447us  831us  2175us  278527us

NAME        8K Network Throughput Test
INFO        Test performed against node: 1
TYPE        network
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        61612 req/sec
THROUGHPUT  3.76Gib/sec
LATENCY     P50    P90    P99    P999   MAX
            159us  207us  303us  431us  1151us

NAME        8K Network Throughput Test
INFO        Test performed against node: 2
TYPE        network
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        60306 req/sec
THROUGHPUT  3.68Gib/sec
LATENCY     P50    P90    P99    P999   MAX
            159us  215us  351us  495us  11263us

NODE ID: 2 | STATUS: IDLE
=========================
NAME        512K sequential r/w throughput disk test
INFO        write run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5001ms
IOPS        1580 req/sec
THROUGHPUT  790MiB/sec
LATENCY     P50    P90     P99      P999     MAX
            671us  5887us  12287us  47103us  507903us

NAME        512K sequential r/w throughput disk test
INFO        read run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        3932 req/sec
THROUGHPUT  1.92GiB/sec
LATENCY     P50    P90     P99     P999    MAX
            831us  1791us  4351us  7167us  9215us

NAME        4k sequential r/w latency/iops disk test
INFO        write run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5027ms
IOPS        280 req/sec
THROUGHPUT  140.1MiB/sec
LATENCY     P50    P90      P99      P999     MAX
            575us  34815us  73727us  86015us  86015us

NAME        4k sequential r/w latency/iops disk test
INFO        read run
TYPE        disk
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        8699 req/sec
THROUGHPUT  4.248GiB/sec
LATENCY     P50    P90    P99    P999    MAX
            183us  367us  831us  2175us  278527us

NAME        8K Network Throughput Test
INFO        Test performed against node: 0
TYPE        network
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        60027 req/sec
THROUGHPUT  3.66Gib/sec
LATENCY     P50    P90    P99    P999   MAX
            159us  223us  351us  511us  11775us

NAME        8K Network Throughput Test
INFO        Test performed against node: 1
TYPE        network
TEST ID     5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS    0
DURATION    5000ms
IOPS        63090 req/sec
THROUGHPUT  3.85Gib/sec
LATENCY     P50    P90    P99    P999   MAX
            151us  207us  319us  463us  17407us
```

> 📝 **NOTE**
>
> If self-test returns write results that are unexpectedly and significantly lower than read results, it may be because the Redpanda `rpk` client hardcodes the `DSync` option to `true`. When `DSync` is enabled, files are opened with the `O_DSYNC` flag set, and this represents the actual setting that Redpanda uses when it writes to disk.

To stop a running self-test, run the `self-test stop` command.

rpk cluster self-test stop

Example stop output:

$ rpk cluster self-test stop
All self-test jobs have been stopped

For command help, run `rpk cluster self-test stop -h`. For additional command flags, see the [rpk cluster self-test stop](https://docs.redpanda.com/streaming/24.1/reference/rpk/rpk-cluster/rpk-cluster-self-test-stop/) reference.

For more details about self-test, including command flags, see [rpk cluster self-test](https://docs.redpanda.com/streaming/24.1/reference/rpk/rpk-cluster/rpk-cluster-self-test/).