rpk debug bundle

In Kubernetes, you must run the rpk debug bundle command inside a container that’s running a Redpanda broker.

Concept

The rpk debug bundle command collects environment data that can help debug and diagnose issues with a Redpanda cluster, a broker, or the machine it’s running on. It then bundles the collected data into a ZIP file, called a diagnostics bundle.

Diagnostic bundle files

The files and directories in the diagnostics bundle differ depending on the environment in which Redpanda is running:

Common files

  • Kafka metadata: Broker configs, topic configs, start/committed/end offsets, groups, group commits.

  • Controller logs: The controller logs directory up to a limit set by --controller-logs-size-limit flag

  • Data directory structure: A file describing the data directory’s contents.

  • redpanda configuration: The redpanda configuration file (redpanda.yaml; SASL credentials are stripped).

  • /proc/cpuinfo: CPU information like make, core count, cache, frequency.

  • /proc/interrupts: IRQ distribution across CPU cores.

  • Resource usage data: CPU usage percentage, free memory available for the redpanda process.

  • Clock drift: The ntp clock delta (using pool.ntp.org as a reference) and round trip time.

  • Admin API calls: Cluster and broker configurations, cluster health data, CPU profiles, and license key information.

  • Broker metrics: The broker’s Prometheus metrics, fetched through its admin API (/metrics and /public_metrics).

Bare-metal

  • Kernel: The kernel logs ring buffer (syslog) and parameters (sysctl).

  • DNS: The DNS info as reported by 'dig', using the hosts in /etc/resolv.conf.

  • Disk usage: The disk usage for the data directory, as output by 'du'.

  • Redpanda logs: The broker’s Redpanda logs written to journald since yesterday (00:00:00 of the previous day based on systemd.time). If --logs-since or --logs-until is passed, only the logs within the resulting time frame are included.

  • Socket info: The active sockets data output by 'ss'.

  • Running process info: As reported by 'top'.

  • Virtual memory stats: As reported by 'vmstat'.

  • Network config: As reported by 'ip addr'.

  • lspci: List the PCI buses and the devices connected to them.

  • dmidecode: The DMI table contents. Only included if this command is run as root.

Extra requests for partitions

You can provide a list of partitions to save additional admin API requests specifically for those partitions.

The partition flag accepts the format <namespace>/[topic]/[partitions…​] where the namespace is optional, if the namespace is not provided, rpk will assume 'kafka'. For example:

Topic 'foo', partitions 1, 2 and 3:

--partitions foo/1,2,3

Namespace _redpanda-internal, topic 'bar', partition 2

--partitions _redpanda-internal/bar/2

If you have an upload URL from the Redpanda support team, provide it in the --upload-url flag to upload your diagnostics bundle to Redpanda.

Kubernetes

  • Kubernetes Resources: Kubernetes manifests for all resources in the given Kubernetes namespace using --namespace, or the shorthand version -n.

  • redpanda logs: Logs of each Pod in the given Kubernetes namespace. If --logs-since is passed, only the logs within the given timeframe are included.

Usage

rpk debug bundle [flags]

Flags

Value Type Description

--controller-logs-size-limit

string

Sets the limit of the controller log size that can be stored in the bundle. Multipliers are also supported, e.g. 3MB, 1GiB (default 20MB).

--cpu-profiler-wait

duration

Specifies the duration for collecting samples for the CPU profiler (for example, 30s, 1.5m). Must be higher than 15s (default 30s).

-h, --help

-

Display documentation for rpk debug bundle.

-l, --label-selector

stringArray

Comma-separated label selectors to filter your resources. e.g: <label>=<value>,<label>=<value> (k8s only) (default ` [app.kubernetes.io/name=redpanda]`).

--logs-since

string

Include logs dated from specified date onward. This flag accepts a journalctl date format such as YYYY-MM-DD, yesterday, or today. Refer to the journalctl documentation for more options (default yesterday).

--logs-size-limit

string

Read the logs until the given size is reached. Multipliers are also supported, e.g. 3MB, 1GiB (default 100MiB).

--logs-until

string

Include logs older than the specified date. This flag accepts a journalctl date format such as YYYY-MM-DD, yesterday, or today. Refer to the journalctl documentation for more options (default yesterday).
Not supported in Kubernetes

--metrics-interval

duration

The amount of time to wait before capturing the second snapshot of the metrics endpoints, for example 30s (30 seconds) or 1.5m (90 seconds). This interval is useful because some metrics are counters that need values at two points in time. Default: 12s.

--metrics-samples

int

Number of metrics samples to take (at the interval of --metrics-interval). Must be higher or equals 2 (default 2).

-n, --namespace

string

The Kubernetes namespace in which the Redpanda cluster is running. Default: redpanda
Kubernetes only.

-o, --output

string

The file path where the debug file will be written (default ./<timestamp>-bundle.zip).

-p, --partition

stringArray

Comma-separated partition IDs; when provided, rpk saves extra admin API requests for those partitions. Check help for extended usage.

--timeout

duration

The amount of time to wait for child commands to execute, for example 30s (30 seconds) or 1.5m (90 seconds). (default 31s).

--upload-url

string

If provided, where to upload the bundle in addition to creating a copy on disk.

--config

string

Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml.

-X, --config-opt

stringArray

Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail.

--profile

string

Profile to use. See rpk profile for more details.

-v, --verbose

-

Enable verbose logging.

Result

The files and directories in the diagnostics bundle differ depending on the environment in which Redpanda is running.

  • Linux

  • Kubernetes

File or Directory Description

/admin

Cluster and broker configurations, cluster health data, and license key information.

/controller

Binary-encoded replicated logs that contain the history of configuration changes as well as internal settings.
Redpanda can replay the events that took place in the cluster to arrive at a similar state.

data-dir.txt

Metadata for the Redpanda data directory of the broker on which the rpk debug bundle command was executed.

kafka.json

Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits.

redpanda.log

Redpanda logs for the broker.
If --logs-since is passed, only the logs within the given timeframe are included.

/metrics

Prometheus metrics from both the /metrics endpoint and the public_metrics endpoint.

/proc

CPU details of the broker on which the rpk debug bundle command was executed.
The directory includes a cpuinfo file with CPU information such as processor model, core count, cache size, frequency, as well as an interrupts file that contains IRQ distribution across CPU cores.

redpanda.yaml

The Redpanda configuration file of the broker on which the rpk debug bundle command was executed.
Sensitive data is removed and replaced with (REDACTED).

resource-usage.json

Redpanda resource usage data, such as CPU usage and free memory available.

/utils

Data from the node on which the broker is running. This directory includes:

  • du.txt: The disk usage of the data directory of the broker on which the rpk debug bundle command was executed, as output by the du command.

  • ntp.txt: The NTP clock delta (using ntppool as a reference) and round trip time of the broker on which the rpk debug bundle command was executed.

  • uname.txt: System information, such as the kernel version, hostname, and architecture, as output by the uname command.

  • dig.txt: The DNS resolution information for the node, as output by the dig command.

  • dmidecode.txt: System hardware information from the node, as output by the the dmidecode command. Requires root privileges.

  • free.txt: The amount of free and used memory on the node, as output by the free command.

  • ip.txt: Network interface information, including IP addresses and network configuration, as output by the ip command.

  • lspci.txt: Information about PCI devices on the node, as output by the lspci command.

  • ss.txt: Active socket connections, as output by the ss command, showing network connections, listening ports, and more.

  • sysctl.txt: Kernel parameters of the system, as output by the sysctl command.

  • top.txt: The top processes by CPU and memory usage, as output by the top command.

  • vmstat.txt: Virtual memory statistics, including CPU usage, memory, and IO operations, as output by the vmstat command.

File or Directory Description

/admin

Cluster and broker configurations, cluster health data, and license key information.

/controller

Binary-encoded replicated logs that contain the history of configuration changes as well as internal settings.
Redpanda can replay the events that took place in the cluster to arrive at a similar state.

data-dir.txt

Metadata for the Redpanda data directory of the broker on which the rpk debug bundle command was executed.

kafka.json

Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits.

redpanda.log

Redpanda logs for the broker.
If --logs-since is passed, only the logs within the given timeframe are included.

/metrics

Prometheus metrics from both the /metrics endpoint and the public_metrics endpoint.

/proc

CPU details of the broker on which the rpk debug bundle command was executed.
The directory includes a cpuinfo file with CPU information such as processor model, core count, cache size, frequency, as well as an interrupts file that contains IRQ distribution across CPU cores.

redpanda.yaml

The Redpanda configuration file of the broker on which the rpk debug bundle command was executed.
Sensitive data is removed and replaced with (REDACTED).

resource-usage.json

Redpanda resource usage data, such as CPU usage and free memory available.

/utils

Data from the node on which the broker is running. This directory includes:

  • du.txt: The disk usage of the data directory of the broker on which the rpk debug bundle command was executed, as output by the du command.

  • ntp.txt: The NTP clock delta (using ntppool as a reference) and round trip time of the broker on which the rpk debug bundle command was executed.

  • uname.txt: System information, such as the kernel version, hostname, and architecture, as output by the uname command.

  • dig.txt: The DNS resolution information for the node, as output by the dig command.

  • dmidecode.txt: System hardware information from the node, as output by the the dmidecode command. Requires root privileges.

  • free.txt: The amount of free and used memory on the node, as output by the free command.

  • ip.txt: Network interface information, including IP addresses and network configuration, as output by the ip command.

  • lspci.txt: Information about PCI devices on the node, as output by the lspci command.

  • ss.txt: Active socket connections, as output by the ss command, showing network connections, listening ports, and more.

  • sysctl.txt: Kernel parameters of the system, as output by the sysctl command.

  • top.txt: The top processes by CPU and memory usage, as output by the top command.

  • vmstat.txt: Virtual memory statistics, including CPU usage, memory, and IO operations, as output by the vmstat command.

Examples

Collect Redpanda logs from a specific timeframe

rpk debug bundle --logs-since "2022-02-01" --logs-size-limit 3MiB

Use a custom Kubernetes namespace

rpk debug bundle --namespace <namespace>