rpk debug bundle
In Kubernetes, you must run the rpk debug bundle command inside a container that’s running a Redpanda broker.
|
Concept
The rpk debug bundle
command collects environment data that can help debug and diagnose issues with a Redpanda cluster, a broker, or the machine it’s running on. It
then bundles the collected data into a ZIP file, called a diagnostics bundle.
Diagnostic bundle files
The files and directories in the diagnostics bundle differ depending on the environment in which Redpanda is running:
Common files
-
Kafka metadata: Broker configs, topic configs, start/committed/end offsets, groups, group commits.
-
Controller logs: The controller logs directory up to a limit set by --controller-logs-size-limit flag
-
Data directory structure: A file describing the data directory’s contents.
-
redpanda configuration: The redpanda configuration file (
redpanda.yaml
; SASL credentials are stripped). -
/proc/cpuinfo: CPU information like make, core count, cache, frequency.
-
/proc/interrupts: IRQ distribution across CPU cores.
-
Resource usage data: CPU usage percentage, free memory available for the redpanda process.
-
Clock drift: The ntp clock delta (using pool.ntp.org as a reference) and round trip time.
-
Admin API calls: Cluster and broker configurations, cluster health data, CPU profiles, and license key information.
-
Broker metrics: The broker’s Prometheus metrics, fetched through its admin API (/metrics and /public_metrics).
Bare-metal
-
Kernel: The kernel logs ring buffer (syslog) and parameters (sysctl).
-
DNS: The DNS info as reported by 'dig', using the hosts in /etc/resolv.conf.
-
Disk usage: The disk usage for the data directory, as output by 'du'.
-
redpanda logs: The node’s redpanda logs written to journald. If --logs-since or --logs-until are passed, then only the logs within the resulting time frame will be included.
-
Socket info: The active sockets data output by 'ss'.
-
Running process info: As reported by 'top'.
-
Virtual memory stats: As reported by 'vmstat'.
-
Network config: As reported by 'ip addr'.
-
lspci: List the PCI buses and the devices connected to them.
-
dmidecode: The DMI table contents. Only included if this command is run as root.
Extra requests for partitions
You can provide a list of partitions to save additional admin API requests specifically for those partitions.
The partition flag accepts the format <namespace>/[topic]/[partitions…]
where the namespace is optional, if the namespace is not provided, rpk
will assume 'kafka'. For example:
Topic 'foo', partitions 1, 2 and 3:
--partitions foo/1,2,3
Namespace _redpanda-internal, topic 'bar', partition 2
--partitions _redpanda-internal/bar/2
If you have an upload URL from the Redpanda support team, provide it in the --upload-url flag to upload your diagnostics bundle to Redpanda.
Kubernetes
-
Kubernetes Resources: Kubernetes manifests for all resources in the given Kubernetes namespace (via --namespace).
-
redpanda logs: Logs of each Pod in the given Kubernetes namespace. If --logs-since is passed, only the logs within the given timeframe are included.
Flags
Value | Type | Description |
---|---|---|
--controller-logs-size-limit |
string |
Sets the limit of the controller log size that can be stored in the bundle. Multipliers are also supported, e.g. 3MB, 1GiB (default "20MB"). |
--cpu-profiler-wait |
duration |
Specifies the duration for collecting samples for the CPU profiler (for example, 30s, 1.5m). Must be higher than 15s (default is 30s). |
-h, --help |
- |
Display documentation for |
-l, --label-selector |
stringArray |
Comma-separated label selectors to filter your resources. e.g: <label>=<value>,<label>=<value> (k8s only) (default [app.kubernetes.io/name=redpanda]). |
--logs-since |
string |
Include log entries on or newer than the specified date in journalctl date format, for example YYYY-MM-DD. |
--logs-size-limit |
string |
Read the logs until the given size is reached. Multipliers are also supported, e.g. 3MB, 1GiB (default "100MiB"). |
--logs-until |
string |
Include log entries on or older than the
specified date in journalctl date format, for example YYYY-MM-DD. |
--metrics-interval |
duration |
The amount of time to wait before
capturing the second snapshot of the metrics endpoints, for example
|
--namespace |
string |
The Kubernetes namespace in which the Redpanda
cluster is running. Default: |
-o, --output |
string |
The file path where the debug file will be written (default ./<timestamp>-bundle.zip). |
-p, --partition |
stringArray |
Comma-separated partition IDs; when provided, |
--timeout |
duration |
The amount of time to wait for child commands to
execute, for example |
--upload-url |
string |
If provided, where to upload the bundle in addition to creating a copy on disk. |
--config |
string |
Redpanda or |
-X, --config-opt |
stringArray |
Override |
--profile |
string |
Profile to use. See |
-v, --verbose |
- |
Enable verbose logging. |
Result
The files and directories in the diagnostics bundle differ depending on the environment in which Redpanda is running.
-
Linux
-
Kubernetes
For some data, Redpanda requires the rpk debug bundle command to be run with root privileges.
The names of the files or directories that are generated only with root privileges are labeled Requires root privileges.
|
File or Directory | Description |
---|---|
|
Metadata for the Redpanda data directory of the broker on which the |
|
The DNS information, as output by the |
|
The contents of the DMI table (system management BIOS or SMBIOS). |
|
The disk usage of the data directory of the broker on which the |
|
Network configuration, as output by the |
|
Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits. |
|
PCI buses and the devices connected to them. |
|
The NTP clock delta (using |
|
CPU details of the broker on which the |
|
The Redpanda logs written to journald. If |
|
The local broker’s Prometheus metrics, fetched through its admin API. |
|
The Redpanda configuration file of the broker on which the |
|
Redpanda resource usage data, such as CPU usage and free memory available. |
|
Data about active sockets, as output by the |
|
The kernel logs ring buffer, as output by the |
|
Information about the running processes, as output by the |
Check system processes. |
|
Redpanda collects some data from the Kubernetes API. To communicate with the Kubernetes API, Redpanda requires a ClusterRole attached to the default ServiceAccount for the Pods. The files and directories that are generated only when the ClusterRole exists are labeled Requires ClusterRole. |
File or Directory | Description |
---|---|
|
Cluster and broker configurations, cluster health data, and license key information. |
|
Binary-encoded replicated logs that contain the history of configuration changes as well as internal settings. |
|
Metadata for the Redpanda data directory of the broker on which the |
|
The disk usage of the data directory of the broker on which the |
|
Kubernetes manifests for all resources in the given Kubernetes namespace. |
|
Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits. |
|
Logs of each Pod in the given Kubernetes namespace. |
|
Prometheus metrics from both the |
|
The NTP clock delta (using |
|
CPU details of the broker on which the |
|
The Redpanda configuration file of the broker on which the |
|
Redpanda resource usage data, such as CPU usage and free memory available. |