Docs Self-Managed Manage Kubernetes Recovery Mode This is documentation for Self-Managed v23.3, which is no longer supported. To view the latest available version of the docs, see v24.3. Recovery Mode in Kubernetes Recovery mode allows you to repair and restore a failed cluster that cannot start normally due to issues such as system crashes or out-of-memory (OOM) errors. In recovery mode, Redpanda limits functionality to cluster configuration changes and other manual administrative actions so that you can repair the cluster. Enabled functionality In recovery mode, Redpanda enables the following functionality so that you can repair the cluster: Kafka API Modify topic properties Delete topics Add and remove access control lists (ACLs) Edit consumer group metadata Admin API Edit cluster configuration properties Add and remove users Add new brokers to the cluster Delete WASM transforms Disabled functionality In recovery mode, Redpanda disables the following functionality to provide a more stable environment for troubleshooting issues and restoring the cluster to a usable state. The following APIs are disabled because some connections, especially malicious ones, can disrupt availability for all users, including admin users: Kafka API (fetch and produce requests) HTTP Proxy Schema Registry The following node-wide and cluster-wide processes are disabled as they may disrupt recovery operations: Partition and leader balancers Tiered Storage housekeeping Tiered Storage cache management Compaction Redpanda does not load user-managed partitions on disk to prevent triggering partition leadership elections and replication that may occur on startup. Prerequisites You must have the following: A running Redpanda deployment on a Kubernetes cluster. If you are using the Redpanda Helm chart, you need permission to upgrade the Helm release in the namespace where it’s deployed. If you are using the Redpanda Operator, you need access to the Redpanda resource manifest. Start Redpanda in recovery mode A broker can only enter recovery mode as it starts up, and not while it is already running. You first set the broker configuration property to enable recovery mode, and then do a broker restart. When you enable recovery mode in the Redpanda Helm chart or Redpanda resource, the Helm chart triggers a restart automatically. Enable recovery mode: Helm + Operator Helm redpanda-cluster.yaml apiVersion: cluster.redpanda.com/v1alpha1 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: config: node: recovery_mode_enabled: true kubectl apply -f redpanda-cluster.yaml --namespace <namespace> --values --set recovery-mode.yaml config: node: recovery_mode_enabled: true helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values recovery-mode.yaml --reuse-values helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set config.node.recovery_mode_enabled=true Check whether the cluster has entered recovery mode: kubectl --namespace <namespace> exec -i -t <pod-name> -c redpanda -- \ rpk cluster health You should see a list of brokers that are in recovery mode. For example: CLUSTER HEALTH OVERVIEW ======================= Healthy: true Unhealthy reasons: [] Controller ID: 0 All nodes: [0 1 2] Nodes down: [] Nodes in recovery mode: [0 1 2] Leaderless partitions (0): [] Under-replicated partitions (0): [] In recovery mode, all private Redpanda topics such as __consumer_offsets are accessible. Data in user-created topics is not available, but you can still manage the metadata for these topics. Exit recovery mode Exit recovery mode by disabling it on all brokers: Helm + Operator Helm redpanda-cluster.yaml apiVersion: cluster.redpanda.com/v1alpha1 kind: Redpanda metadata: name: redpanda spec: chartRef: {} clusterSpec: config: node: recovery_mode_enabled: false kubectl apply -f redpanda-cluster.yaml --namespace <namespace> --values --set recovery-mode.yaml config: node: recovery_mode_enabled: false helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values recovery-mode.yaml --reuse-values helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set config.node.recovery_mode_enabled=false Disable partitions Problems that prevent normal cluster startup may be isolated to certain partitions or topics. You can use rpk or the Admin API to disable these partitions at the topic level, or individual partition level. A disabled partition or topic returns a Replica Not Available error code for Kafka API requests. To disable a partition, you need a healthy controller in the cluster, so you must start the cluster in recovery mode if a problematic partition is affecting cluster startup. If you disable a partition while in recovery mode, starting Redpanda again in non-recovery mode leaves the partition in a deactivated state. You must explicitly re-enable the partition. You can also disable a partition outside of recovery mode, if the issue is localized to the partition and does not interfere with cluster startup. The following examples show you how to use the Admin API to enable or disable partitions. The examples are based on the assumption that the Admin API port is 9644. For help connecting to the Admin API, see Connect to Redpanda in Kubernetes. Use kafka as the partition-namespace when making API calls to manage partitions in user topics. Disable a specific partition of a topic rpk Curl rpk cluster partitions disable <topic-name> --partitions <comma-delimited-partition-id> curl -X POST -d '{"disabled": true}' http://localhost:9644/cluster/partitions/<partition-namespace>/<topic-name>/<partition-id> Enable a specific partition of a topic rpk Curl rpk cluster partitions enable <topic-name> --partitions <comma-delimited-partition-id> curl -X POST -d '{"disabled": false}' http://localhost:9644/cluster/partitions/<partition-namespace>/<topic-name>/<partition-id> Disable all partitions of a specific topic rpk Curl rpk cluster partitions disable <topic-name> --all curl -X POST -d '{"disabled": true}' http://localhost:9644/cluster/partitions/<partition-namespace>/<topic-name> Enable all partitions of a specific topic rpk Curl rpk cluster partitions enable <topic-name> --all curl -X POST -d '{"disabled": false}' http://localhost:9644/cluster/partitions/<partition-namespace>/<topic-name> List all disabled partitions rpk Curl rpk cluster partitions list --all --disabled-only curl http://localhost:9644/cluster/partitions?disabled=true List all disabled partitions of a specific topic rpk Curl rpk cluster partitions list <topic-names> --disabled-only curl http://localhost:9644/cluster/partitions/<partition-namespace>/<topic-name>?disabled=true Suggested reading Admin API Perform a Rolling Restart Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution Whole Cluster Restore Monitor