rpk debug bundle

In Kubernetes, you must run the rpk debug bundle command inside a container that’s running a Redpanda broker.

Concept

The rpk debug bundle command collects environment data that can help debug and diagnose issues with a Redpanda cluster, a broker, or the machine it’s running on. It then bundles the collected data into a ZIP file, called a diagnostics bundle.

Diagnostic bundle files

The files and directories in the diagnostics bundle differ depending on the environment in which Redpanda is running:

Common files

  • Kafka metadata: Broker configs, topic configs, start/committed/end offsets, groups, group commits.

  • Controller logs: The controller logs directory up to a limit set by --controller-logs-size-limit flag

  • Data directory structure: A file describing the data directory’s contents.

  • redpanda configuration: The redpanda configuration file (redpanda.yaml; SASL credentials are stripped).

  • /proc/cpuinfo: CPU information like make, core count, cache, frequency.

  • /proc/interrupts: IRQ distribution across CPU cores.

  • Resource usage data: CPU usage percentage, free memory available for the redpanda process.

  • Clock drift: The ntp clock delta (using pool.ntp.org as a reference) and round trip time.

  • Admin API calls: Cluster and broker configurations, cluster health data, and license key information.

  • Broker metrics: The broker’s Prometheus metrics, fetched through its admin API (/metrics and /public_metrics).

Bare-metal

  • Kernel: The kernel logs ring buffer (syslog) and parameters (sysctl).

  • DNS: The DNS info as reported by 'dig', using the hosts in /etc/resolv.conf.

  • Disk usage: The disk usage for the data directory, as output by 'du'.

  • redpanda logs: The node’s redpanda logs written to journald. If --logs-since or --logs-until are passed, then only the logs within the resulting time frame will be included.

  • Socket info: The active sockets data output by 'ss'.

  • Running process info: As reported by 'top'.

  • Virtual memory stats: As reported by 'vmstat'.

  • Network config: As reported by 'ip addr'.

  • lspci: List the PCI buses and the devices connected to them.

  • dmidecode: The DMI table contents. Only included if this command is run as root.

Extra requests for partitions

You can provide a list of partitions to save additional admin API requests specifically for those partitions.

The partition flag accepts the format <namespace>/[topic]/[partitions…​] where the namespace is optional, if the namespace is not provided, rpk will assume 'kafka'. For example:

Topic 'foo', partitions 1, 2 and 3:

--partitions foo/1,2,3

Namespace _redpanda-internal, topic 'bar', partition 2

--partitions _redpanda-internal/bar/2

If you have an upload URL from the Redpanda support team, provide it in the --upload-url flag to upload your diagnostics bundle to Redpanda.

Kubernetes

  • Kubernetes Resources: Kubernetes manifests for all resources in the given Kubernetes namespace (via --namespace).

  • redpanda logs: Logs of each Pod in the given Kubernetes namespace. If --logs-since is passed, only the logs within the given timeframe are included.

Usage

rpk debug bundle [flags]

Flags

Value Type Description

--controller-logs-size-limit

string

Sets the limit of the controller log size that can be stored in the bundle. Multipliers are also supported, e.g. 3MB, 1GiB (default "20MB").

-h, --help

-

Display documentation for rpk debug bundle.

-l, --label-selector

stringArray

Comma-separated label selectors to filter your resources. e.g: <label>=<value>,<label>=<value> (k8s only) (default [app.kubernetes.io/name=redpanda]).

--logs-since

string

Include log entries on or newer than the specified date in journalctl date format, for example YYYY-MM-DD.

--logs-size-limit

string

Read the logs until the given size is reached. Multipliers are also supported, e.g. 3MB, 1GiB (default "100MiB").

--logs-until

string

Include log entries on or older than the specified date in journalctl date format, for example YYYY-MM-DD.
Not supported in Kubernetes

--metrics-interval

duration

The amount of time to wait before capturing the second snapshot of the metrics endpoints, for example 30s (30 seconds) or 1.5m (90 seconds). This interval is useful because some metrics are counters that need values at two points in time. Default: 12s.
Kubernetes only

--namespace

string

The Kubernetes namespace in which the Redpanda cluster is running. Default: redpanda
Kubernetes only.

-o, --output

string

The file path where the debug file will be written (default ./<timestamp>-bundle.zip).

-p, --partition

stringArray

Comma-separated partition IDs; when provided, rpk saves extra admin API requests for those partitions. Check help for extended usage.

--timeout

duration

The amount of time to wait for child commands to execute, for example 30s (30 seconds) or 1.5m (90 seconds). Default: 10s.

--upload-url

string

If provided, where to upload the bundle in addition to creating a copy on disk.

--config

string

Redpanda or rpk config file; default search paths are /var/lib/redpanda/.config/rpk/rpk.yaml, $PWD/redpanda.yaml, and /etc/redpanda/redpanda.yaml.

-X, --config-opt

stringArray

Override rpk configuration settings. See rpk -X or execute rpk -X help for inline detail or rpk -X list for terser detail.

--profile

string

Profile to use. See rpk profile for more details.

-v, --verbose

-

Enable verbose logging.

Result

The files and directories in the diagnostics bundle differ depending on the environment in which Redpanda is running.

  • Linux

  • Kubernetes

For some data, Redpanda requires the rpk debug bundle command to be run with root privileges. The names of the files or directories that are generated only with root privileges are labeled Requires root privileges.
File or Directory Description

data-dir.txt

Metadata for the Redpanda data directory of the broker on which the rpk debug bundle command was executed.

dig.txt

The DNS information, as output by the dig command, using the hosts in the /etc/resolv.conf file.

dmidecode.txt

The contents of the DMI table (system management BIOS or SMBIOS).
Requires root privileges

du.txt

The disk usage of the data directory of the broker on which the rpk debug bundle command was executed, as output by the du command.

ip.txt

Network configuration, as output by the ip addr command.

kafka.json

Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits.

lspci.txt

PCI buses and the devices connected to them.

ntp.txt

The NTP clock delta (using ntppool as a reference) and round trip time of the broker on which the rpk debug bundle command was executed.

/proc

CPU details of the broker on which the rpk debug bundle command was executed.
The directory includes a cpuinfo file with CPU information such as processor model, core count, cache size, and frequency, as well as an interrupts file that contains IRQ distribution across CPU cores.

redpanda.log

The Redpanda logs written to journald. If --logs-since and/or --logs-until are passed, then only the logs within the given timeframe are included.

prometheus-metrics.txt

The local broker’s Prometheus metrics, fetched through its admin API.

redpanda.yaml

The Redpanda configuration file of the broker on which the rpk debug bundle command was executed.<br/>Sensitive data is removed and replaced with (REDACTED).

resource-usage.json

Redpanda resource usage data, such as CPU usage and free memory available.

ss.txt

Data about active sockets, as output by the ss command.

syslog.txt

The kernel logs ring buffer, as output by the syslog command.

top.txt

Information about the running processes, as output by the top command. Check system processes.

vmstat.txt

Virtual memory statistics, as output by the vmstat command.

Redpanda collects some data from the Kubernetes API. To communicate with the Kubernetes API, Redpanda requires a ClusterRole attached to the default ServiceAccount for the Pods. The files and directories that are generated only when the ClusterRole exists are labeled Requires ClusterRole.
File or Directory Description

/admin

Cluster and broker configurations, cluster health data, and license key information.
Requires ClusterRole.

/controller

Binary-encoded replicated logs that contain the history of configuration changes as well as internal settings.
Redpanda can replay the events that took place in the cluster to arrive at a similar state.

data-dir.txt

Metadata for the Redpanda data directory of the broker on which the rpk debug bundle command was executed.

du.txt

The disk usage of the data directory of the broker on which the rpk debug bundle command was executed, as output by the du command.

/k8s

Kubernetes manifests for all resources in the given Kubernetes namespace.
Requires ClusterRole.

kafka.json

Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits.

/logs

Logs of each Pod in the given Kubernetes namespace.
If --logs-since is passed, only the logs within the given timeframe are included.
Requires ClusterRole.

/metrics

Prometheus metrics from both the /metrics endpoint and the public_metrics endpoint.
One directory for each broker’s metrics.
Requires ClusterRole.

ntp.txt

The NTP clock delta (using ntppool as a reference) and round trip time of the broker on which the rpk debug bundle command was executed.

/proc

CPU details of the broker on which the rpk debug bundle command was executed.
The directory includes a cpuinfo file with CPU information such as processor model, core count, cache size, frequency, as well as an interrupts file that contains IRQ distribution across CPU cores.

redpanda.yaml

The Redpanda configuration file of the broker on which the rpk debug bundle command was executed.
Sensitive data is removed and replaced with (REDACTED).

resource-usage.json

Redpanda resource usage data, such as CPU usage and free memory available.

Examples

Collect Redpanda logs from a specific timeframe

rpk debug bundle --logs-since "2022-02-01" --logs-size-limit 3MiB

Use a custom Kubernetes namespace

rpk debug bundle --namespace <namespace>