Generate a Debug Bundle in Kubernetes

Use rpk or Redpanda Console to generate a debug bundle to diagnose issues yourself, or send it to the Redpanda support team to help resolve your issue.

Generate a debug bundle with rpk

To generate a debug bundle with rpk, you can run the rpk debug bundle command on each broker in the cluster.

  1. Create a ClusterRole to allow Redpanda to collect information from the Kubernetes API:

    • Helm + Operator

    • Helm

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha2
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        serviceAccount:
          create: true
        rbac:
          enabled: true
    kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
    You must deploy the Redpanda Operator with the --set rbac.createRPKBundleCRs=true flag to give it the required ClusterRoles.
    • --values

    • --set

    serviceaccount.yaml
    serviceAccount:
      create: true
    rbac:
      enabled: true
    helm upgrade --install redpanda redpanda/redpanda --namespace redpanda --create-namespace \
      --values serviceaccount.yaml --reuse-values
    helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
      --set serviceAccount.create=true \
      --set rbac.enabled=true

    If you aren’t using the Helm chart, you can create the ClusterRole manually:

    kubectl create clusterrolebinding redpanda --clusterrole=view --serviceaccount=redpanda:default
  2. Execute the rpk debug bundle command on a broker:

    kubectl exec -it --namespace <namespace> redpanda-0 -c redpanda -- rpk debug bundle --namespace <namespace>

    If you have an upload URL from the Redpanda support team, provide it in the --upload-url flag to upload your debug bundle to Redpanda.

    kubectl exec -it --namespace <namespace> redpanda-0 -c redpanda -- rpk debug bundle \
      --upload-url <url> \
      --namespace <namespace>

    Example output:

    Creating bundle file...
    
    Debug bundle saved to "/var/lib/redpanda/1675440652-bundle.zip"
  3. On your host machine, make a directory in which to save the debug bundle:

    mkdir debug-bundle
  4. Copy the debug bundle ZIP file to the debug-bundle directory on your host machine.

    Replace <bundle-name> with the name of your ZIP file.

    kubectl cp <namespace>/redpanda-0:/var/lib/redpanda/<bundle-name> debug-bundle/<bundle-name>.zip
  5. Unzip the file on your host machine.

    cd debug-bundle
    unzip <bundle-name>.zip
  6. Remove the debug bundle from the Redpanda broker:

    kubectl exec redpanda-0 -c redpanda --namespace <namespace> -- rm /var/lib/redpanda/<bundle-name>.zip
    To avoid manually deleting debug bundles, you can configure the debug_bundle_auto_removal_seconds property to automatically remove them after a period of time. See Automatically remove debug bundles.

When you’ve finished troubleshooting, remove the debug bundle from your host machine:

rm -r debug-bundle

For a description of the files and directories, see Contents of the debug bundle.

Generate a debug bundle with Redpanda Console

Automatically remove debug bundles

To avoid manually deleting debug bundles, you can configure the debug_bundle_auto_removal_seconds property. This cluster configuration property automatically deletes debug bundles after the specified number of seconds. By default, this property is not set, meaning debug bundles are retained indefinitely.

Only one debug bundle can exist at a time. If you generate a new debug bundle, any existing bundle from a previous run will be automatically deleted.

Changes to this property take effect immediately and do not require a cluster restart.

To set this property, use the config.cluster.debug_bundle_auto_removal_seconds field:

  • Helm + Operator

  • Helm

redpanda-cluster.yaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    config:
      cluster:
        debug_bundle_auto_removal_seconds: <seconds>

For example, to retain debug bundles for 1 day:

redpanda-cluster.yaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    config:
      cluster:
        debug_bundle_auto_removal_seconds: 86400

Apply the changes with:

kubectl apply -f redpanda-cluster.yaml --namespace <namespace>

Update the values.yaml file or use the --set flag to specify the property:

  • --values

  • --set

cloud-storage.yaml
config:
  cluster:
    debug_bundle_auto_removal_seconds: <seconds>

For example, to retain debug bundles for 1 day:

cloud-storage.yaml
config:
  cluster:
    debug_bundle_auto_removal_seconds: 86400

Apply the changes with:

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --values cloud-storage.yaml --reuse-values
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set config.cluster.debug_bundle_auto_removal_seconds=<seconds>

For example, to retain debug bundles for 1 day:

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set config.cluster.debug_bundle_auto_removal_seconds=86400

Choose where the debug bundle is saved

The debug_bundle_storage_dir property allows you to specify a custom directory for storing debug bundles. By default, debug bundles are stored in the Redpanda data directory. Configuring a custom storage directory can help manage storage capacity and isolate debug data from operational data.

Changes to this property take effect immediately and do not require a cluster restart.

Before you change this property:

  • Ensure that your chosen directory has sufficient storage capacity to handle debug bundles.

    Factors such as the volume of logs can increase the bundle size. While it is difficult to define an exact storage requirement due to variability in bundle size, 200 MB should be sufficient for most cases.

  • Verify the directory’s permissions to ensure Redpanda can write to it. By default, Redpanda operates as the redpanda user within the redpanda group.

To set this property, use the config.cluster.debug_bundle_storage_dir field:

  • Helm + Operator

  • Helm

redpanda-cluster.yaml
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    config:
      cluster:
        debug_bundle_storage_dir: <path-to-directory>

For example:

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    config:
      cluster:
        debug_bundle_storage_dir: /var/log/redpanda/debug_bundles

Apply the changes with:

kubectl apply -f redpanda-cluster.yaml --namespace <namespace>

Update the values.yaml file or use the --set flag to specify the property:

  • --values

  • --set

config:
  cluster:
    debug_bundle_storage_dir: <path-to-directory>

For example, to store debug bundles in /var/log/redpanda/debug_bundles:

config:
  cluster:
    debug_bundle_storage_dir: /var/log/redpanda/debug_bundles

Apply the changes with:

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --values values.yaml --reuse-values
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set config.cluster.debug_bundle_storage_dir=<path-to-directory>

For example:

helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set config.cluster.debug_bundle_storage_dir=/var/log/redpanda/debug_bundles

Inspect the debug bundle

After downloading the debug bundle files, you can inspect the contents to debug your cluster. This section provides some useful data points to check while troubleshooting.

Most files in the debug bundle are JSON files. To make it easier to read these files, this section uses jq. To install jq, see the jq downloads page.

View the version of Redpanda on all brokers

cat admin/brokers.json | jq '.[] | .version'

Example output:

"v24.3.1"
"v24.3.1"
"v24.3.1"

View the maintenance status of all brokers

cat admin/brokers.json | jq '.[] | .node_id, .maintenance_status'
Example output
0
{
  "draining": false,
  "finished": false,
  "errors": false,
  "partitions": 0,
  "eligible": 0,
  "transferring": 0,
  "failed": 0
}
1
{
  "draining": false,
  "finished": false,
  "errors": false,
  "partitions": 0,
  "eligible": 0,
  "transferring": 0,
  "failed": 0
}
2
{
  "draining": false,
  "finished": false,
  "errors": false,
  "partitions": 0,
  "eligible": 0,
  "transferring": 0,
  "failed": 0
}

View the cluster configuration

cat admin/cluster_config.json | jq
Example output
{
  "abort_index_segment_size": 50000,
  "abort_timed_out_transactions_interval_ms": 10000,
  "admin_api_require_auth": false,
  "aggregate_metrics": false,
  "alter_topic_cfg_timeout_ms": 5000,
  "append_chunk_size": 16384,
  "auto_create_topics_enabled": false,
  "cloud_storage_access_key": null,
  "cloud_storage_api_endpoint": null,
  "cloud_storage_api_endpoint_port": 443,
  "cloud_storage_azure_container": null,
  "cloud_storage_azure_shared_key": null,
  "cloud_storage_azure_storage_account": null,
  "cloud_storage_bucket": null,
  ...
  "target_quota_byte_rate": 2147483648,
  "tm_sync_timeout_ms": 10000,
  "topic_fds_per_partition": 5,
  "topic_memory_per_partition": 1048576,
  "topic_partitions_per_shard": 1000,
  "topic_partitions_reserve_shard0": 2,
  "transaction_coordinator_cleanup_policy": "delete",
  "transaction_coordinator_delete_retention_ms": 604800000,
  "transaction_coordinator_log_segment_size": 1073741824,
  "transactional_id_expiration_ms": 604800000,
  "tx_log_stats_interval_s": 10,
  "tx_timeout_delay_ms": 1000,
  "wait_for_leader_timeout_ms": 5000,
  "zstd_decompress_workspace_bytes": 8388608
}

Check Enterprise Edition license keys

cat admin/license.json | jq
Example output
{
  "loaded": false,
  "license": {
    "format_version": 0,
    "org": "",
    "type": "",
    "expires": 0,
    "sha256": ""
  }
}

View metadata about the Redpanda data directory

You can inspect the size of directories in the Redpanda data directory and identify anomalies using a pre-generated report. This is useful for troubleshooting issues such as imbalances in partition sizes or unexpected data growth.

To check the size of the directories and look for anomalies:

cat utils/du.txt

The du.txt file provides information about the size of each directory. Anomalies to look for include:

  • One partition of a topic being significantly larger than others.

  • Directories containing more data than expected for a specific topic or partition.

Example output
33M	/var/lib/redpanda/data/redpanda/kvstore/0_0
33M	/var/lib/redpanda/data/redpanda/kvstore
33M	/var/lib/redpanda/data/redpanda/controller/0_0
33M	/var/lib/redpanda/data/redpanda/controller
65M	/var/lib/redpanda/data/redpanda
65M	/var/lib/redpanda/data

To check the file permissions, file size, and last modification date of the files:

cat data-dir.txt | jq
Example output
{
  "/var/lib/redpanda/data": {
    "size": "4.096kB",
    "mode": "dgrwxrwxrwx",
    "modified": "2023-02-02 15:21:12.430878371 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/config_cache.yaml": {
    "size": "340B",
    "mode": "-rw-r--r--",
    "modified": "2023-02-02 15:21:22.434878593 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/pid.lock": {
    "size": "2B",
    "mode": "-rw-r--r--",
    "modified": "2023-02-02 15:21:10.502878322 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda": {
    "size": "4.096kB",
    "mode": "dgrwxr-xr-x",
    "modified": "2023-02-02 15:21:10.650878326 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda/controller": {
    "size": "4.096kB",
    "mode": "dgrwxr-xr-x",
    "modified": "2023-02-02 15:21:10.650878326 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda/controller/0_0": {
    "size": "4.096kB",
    "mode": "dgrwxr-xr-x",
    "modified": "2023-02-02 15:21:12.346878368 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log": {
    "size": "4.096kB",
    "mode": "-rw-r--r--",
    "modified": "2023-02-02 15:21:32.450878771 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda/kvstore": {
    "size": "4.096kB",
    "mode": "dgrwxr-xr-x",
    "modified": "2023-02-02 15:21:10.590878324 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda/kvstore/0_0": {
    "size": "4.096kB",
    "mode": "dgrwxr-xr-x",
    "modified": "2023-02-02 15:21:10.602878325 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.log": {
    "size": "8.192kB",
    "mode": "-rw-r--r--",
    "modified": "2023-02-02 15:21:32.458878772 +0000 UTC",
    "user": "",
    "group": "redpanda"
  },
  "/var/lib/redpanda/data/startup_log": {
    "size": "26B",
    "mode": "-rw-r--r--",
    "modified": "2023-02-02 15:21:10.510878323 +0000 UTC",
    "user": "",
    "group": "redpanda"
  }
}

View cluster metadata

cat kafka.json | jq '.[0]'
Example output
{
  "Name": "metadata",
  "Response": {
    "Cluster": "redpanda.14a3f9b6-1c74-4ffd-806a-4ab48db78120",
    "Controller": 0,
    "Brokers": [
      {
        "NodeID": 0,
        "Port": 9093,
        "Host": "redpanda-0.redpanda.<namespace>.svc.cluster.local.",
        "Rack": null
      },
      {
        "NodeID": 1,
        "Port": 9093,
        "Host": "redpanda-1.redpanda.<namespace>.svc.cluster.local.",
        "Rack": null
      },
      {
        "NodeID": 2,
        "Port": 9093,
        "Host": "redpanda-2.redpanda.<namespace>.svc.cluster.local.",
        "Rack": null
      }
    ],
    "Topics": {}
  },
  "Error": null
}

View topic and broker configurations

cat kafka.json | jq '.[1:]'
Example output
[
  {
    "Name": "topic_configs",
    "Response": null,
    "Error": null
  },
  {
    "Name": "broker_configs",
    "Response": [
      {
        "Name": "0",
        "Configs": [
          {
            "Key": "listeners",
            "Value": "internal://0.0.0.0:9093,default://0.0.0.0:9094",
            "Sensitive": false,
            "Source": "STATIC_BROKER_CONFIG",
            "Synonyms": [
              {
                "Key": "kafka_api",
                "Value": "internal://0.0.0.0:9093,default://0.0.0.0:9094",
                "Source": "STATIC_BROKER_CONFIG"
              },
              {
                "Key": "kafka_api",
                "Value": "plain://127.0.0.1:9092",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "advertised.listeners",
            "Value": "internal://redpanda-0.redpanda.<namespace>.svc.cluster.local.:9093,default://203.0.113.3:31092",
            "Sensitive": false,
            "Source": "STATIC_BROKER_CONFIG",
            "Synonyms": [
              {
                "Key": "advertised_kafka_api",
                "Value": "internal://redpanda-0.redpanda.<namespace>.svc.cluster.local.:9093,default://203.0.113.3:31092",
                "Source": "STATIC_BROKER_CONFIG"
              },
              {
                "Key": "advertised_kafka_api",
                "Value": "",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "log.segment.bytes",
            "Value": "134217728",
            "Sensitive": false,
            "Source": "DEFAULT_CONFIG",
            "Synonyms": [
              {
                "Key": "log_segment_size",
                "Value": "134217728",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "log.retention.bytes",
            "Value": "18446744073709551615",
            "Sensitive": false,
            "Source": "DEFAULT_CONFIG",
            "Synonyms": [
              {
                "Key": "retention_bytes",
                "Value": "18446744073709551615",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "log.retention.ms",
            "Value": "604800000",
            "Sensitive": false,
            "Source": "DEFAULT_CONFIG",
            "Synonyms": [
              {
                "Key": "delete_retention_ms",
                "Value": "604800000",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "num.partitions",
            "Value": "1",
            "Sensitive": false,
            "Source": "DEFAULT_CONFIG",
            "Synonyms": [
              {
                "Key": "default_topic_partitions",
                "Value": "1",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "default.replication.factor",
            "Value": "1",
            "Sensitive": false,
            "Source": "DEFAULT_CONFIG",
            "Synonyms": [
              {
                "Key": "default_topic_replications",
                "Value": "1",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          },
          {
            "Key": "log.dirs",
            "Value": "/var/lib/redpanda/data",
            "Sensitive": false,
            "Source": "STATIC_BROKER_CONFIG",
            "Synonyms": [
              {
                "Key": "data_directory",
                "Value": "/var/lib/redpanda/data",
                "Source": "STATIC_BROKER_CONFIG"
              }
            ]
          },
          {
            "Key": "auto.create.topics.enable",
            "Value": "false",
            "Sensitive": false,
            "Source": "DEFAULT_CONFIG",
            "Synonyms": [
              {
                "Key": "auto_create_topics_enabled",
                "Value": "false",
                "Source": "DEFAULT_CONFIG"
              }
            ]
          }
        ],
        "Err": null
      },
      {
        "Name": "1",
        "Configs": [
          ...
        ]
        ...
      },
      {
        "Name": "1",
        "Configs": [
          ...
        ]
        ...
      },
    ],
    "Error": null
  },
  {
    "Name": "log_start_offsets",
    "Response": {},
    "Error": null
  },
  {
    "Name": "last_stable_offsets",
    "Response": {},
    "Error": null
  },
  {
    "Name": "high_watermarks",
    "Response": {},
    "Error": null
  },
  {
    "Name": "groups",
    "Response": null,
    "Error": null
  }
]

View the Redpanda logs

cat logs/redpanda-0.txt # logs/redpanda-1.txt logs/redpanda-2.txt

Check for clock drift

cat utils/ntp.txt | jq

Use the output to check for clock drift. For details about how NTP works, see the NTP documentation.

Example output
{
  "host": "pool.ntp.org",
  "roundTripTimeMs": 3,
  "remoteTimeUTC": "2023-02-02T15:22:51.763175934Z",
  "localTimeUTC": "2023-02-02T15:22:51.698044603Z",
  "precisionMs": 0,
  "offset": -458273
}

View Kubernetes manifests

tree k8s
Example output
k8s
├── configmaps.json
├── endpoints.json
├── events.json
├── limitranges.json
├── persistentvolumeclaims.json
├── pods.json
├── replicationcontrollers.json
├── resourcequotas.json
├── serviceaccounts.json
└── services.json

Contents of the debug bundle

The debug bundle includes the following files and directories:

Redpanda collects some data from the Kubernetes API. To communicate with the Kubernetes API, Redpanda requires a ClusterRole attached to the default ServiceAccount for the Pods. The files and directories that are generated only when the ClusterRole exists are labeled Requires ClusterRole.
File or Directory Description

/admin

Cluster and broker configurations, cluster health data, and license key information.
Requires ClusterRole.

/controller

Binary-encoded replicated logs that contain the history of configuration changes as well as internal settings.
Redpanda can replay the events that took place in the cluster to arrive at a similar state.

data-dir.txt

Metadata for the Redpanda data directory of the broker on which the rpk debug bundle command was executed.

/k8s

Kubernetes manifests for all resources in the given Kubernetes namespace.
Requires ClusterRole.

kafka.json

Kafka metadata, such as broker configuration, topic configuration, offsets, groups, and group commits.

/logs

Logs from the Pods that run Redpanda in the given Kubernetes namespace.
If --logs-since is passed, only the logs within the given timeframe are included.
Requires ClusterRole.

/metrics

Prometheus metrics from both the /metrics endpoint and the public_metrics endpoint.
Requires ClusterRole.

/proc

CPU details of the broker on which the rpk debug bundle command was executed.
The directory includes a cpuinfo file with CPU information such as processor model, core count, cache size, frequency, as well as an interrupts file that contains IRQ distribution across CPU cores.

redpanda.yaml

The Redpanda configuration file of the broker on which the rpk debug bundle command was executed.
Sensitive data is removed and replaced with (REDACTED).

resource-usage.json

Redpanda resource usage data, such as CPU usage and free memory available.

/utils

Data from the node on which the broker is running. This directory includes:

  • du.txt: The disk usage of the data directory of the broker on which the rpk debug bundle command was executed, as output by the du command.

  • ntp.txt: The NTP clock delta (using ntppool as a reference) and round trip time of the broker on which the rpk debug bundle command was executed.

  • uname.txt: System information, such as the kernel version, hostname, and architecture, as output by the uname command.

Suggested reading