Docs Cloud Manage Disaster Recovery Shadowing Failover Configure Failover Page options Copy as Markdown Copied! View as plain text Ask AI about this topic Add MCP server to VS Code Failover is the process of modifying shadow topics or an entire shadow cluster from read-only replicas to fully writable resources, and ceasing replication from the source cluster. You can fail over individual topics for selective workload migration or fail over the entire cluster for comprehensive disaster recovery. This critical operation transforms your shadow resources into operational production assets, allowing you to redirect application traffic when the source cluster becomes unavailable. You can failover a shadow link using the Redpanda Cloud UI, rpk, or the Data Plane API. See Failover Runbook for immediate step-by-step disaster procedures. Shadowing is supported on BYOC and Dedicated clusters running Redpanda version 25.3 and later. Failover behavior When you initiate failover, Redpanda performs the following operations: Stops replication: Halts all data fetching from the source cluster for the specified topics or entire shadow link Failover topics: Converts read-only shadow topics into regular, writable topics Updates topic state: Changes topic status from ACTIVE to FAILING_OVER, then FAILED_OVER Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported. To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity. Failover commands Get Data Plane API URL If using the Data Plane API, run the following to get the Data Plane API URL of the shadow cluster: export DATAPLANE_API_URL=`curl https://api.cloud.redpanda.com/v1/clusters/<shadow-cluster-id> \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | jq .cluster.dataplane_api` You can perform failover at different levels of granularity to match your disaster recovery needs: Individual topic failover To fail over a specific shadow topic while leaving other topics in the shadow link still replicating, run: Cloud UI rpk Data Plane API On the Shadow Link page, select your shadow link. For any of the topics you want to failover, click the corresponding Failover button. Click to confirm the failover action. The failover process promotes the selected topics to writable status. rpk shadow failover <shadow-link-name> --topic <topic-name> For detailed command options, see rpk shadow failover. Send a POST /shadowlink/{shadow_link_name}/failover request to the Data Plane API. Specify the name of the shadow topic in the request body: curl -X POST "$DATAPLANE_API_URL/v1/shadowlink/<shadow-link-name>/failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" \ -d '{ "shadowTopicName": "<shadow-topic-name>" }' Use this approach when you need to selectively failover specific workloads or when testing failover procedures. Complete shadow link failover (cluster failover) To fail over all shadow topics associated with the shadow link simultaneously, run: Cloud UI rpk Data Plane API On the Shadow Link page, select your shadow link. Click Failover All Topics. Click to confirm the failover action. The failover process promotes all topics to writable status. rpk shadow failover <shadow-link-name> --all Send a POST /shadowlink/{shadow_link_name}/failover request to the Data Plane API. If you do not specify a shadow topic in the request body, this command requests a failover of all shadow topics associated with the shadow link: curl -X POST "$DATAPLANE_API_URL/v1/shadowlink/<shadow-link-name>/failover" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" Use this approach during a complete regional disaster when you need to activate the entire shadow cluster as your new production environment. Force delete shadow link (emergency failover) Cloud UI rpk Control Plane API All failover actions in the Cloud UI include force delete functionality by default. When you failover a shadow link, all topics are immediately promoted to writable status. rpk shadow delete force deletes the shadow link by default in Redpanda Cloud: rpk shadow delete <shadow-link-name> Use the Control Plane API to force delete a shadow link: curl -X DELETE 'https://api.redpanda.com/v1/shadow-links/<shadow-link-id>' \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" Force deleting a shadow link is irreversible and immediately fails over all topics in the link, bypassing the normal failover state transitions. This action should only be used as a last resort when topics are stuck in transitional states and you need immediate access to all replicated data. Failover states Shadow link states The shadow link itself has a simple state model: ACTIVE: Shadow link is operating normally, replicating data PAUSED: Shadow link replication is temporarily halted by user action Shadow links do not have dedicated failover states. Instead, the link’s operational status is determined by the collective state of its shadow topics. Shadow topic states Individual shadow topics progress through specific states during failover: ACTIVE: Normal replication state before failover FAULTED: Shadow topic has encountered an error and is not replicating FAILING_OVER: Failover initiated, replication stopping FAILED_OVER: Failover completed successfully, topic fully writable PAUSED: Replication temporarily halted by user action Monitor failover progress To monitor failover progress using the status command, run: Cloud UI rpk Data Plane API Track the progress of failover operations from the Shadow Link page in the Cloud UI. rpk shadow status <shadow-link-name> The output shows individual topic states and any issues encountered during the failover process. For detailed command options, see rpk shadow status. curl "https://$DATAPLANE_API_URL/v1/shadowlinks/<shadow-link-name>" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${RP_CLOUD_TOKEN}" Task states during monitoring: ACTIVE: Task is operating normally and replicating data FAULTED: Task encountered an error and requires attention NOT_RUNNING: Task is not currently executing LINK_UNAVAILABLE: Task cannot communicate with the source cluster For detailed information about shadow link tasks and their roles, see Shadow link tasks. Post-failover cluster behavior After successful failover, your shadow cluster exhibits the following characteristics: Topic accessibility: Failed over topics become fully writable and readable. Applications can produce and consume messages normally. All Kafka APIs are available for failedover topics. Original offsets and timestamps are preserved. Shadow link status: The shadow link remains but stops replicating data. Link status shows topics in FAILED_OVER state. You can safely delete the shadow link after successful failover. Operational limitations: No automatic fallback mechanism to the original source cluster. Data transforms remain disabled until you manually re-enable them. Audit log history from the source cluster is not available (new audit logs begin immediately). Failover considerations and limitations Before implementing failover procedures, understand these key considerations that affect your disaster recovery strategy and operational planning. Data consistency: Some data loss may occur due to replication lag at the time of failover. Consumer group offsets are preserved, allowing applications to resume from their last committed position. In-flight transactions at the source cluster are not replicated and will be lost. Recovery-point-objective (RPO): The amount of potential data loss depends on replication lag when disaster occurs. Monitor lag metrics to understand your effective RPO. Network partitions: If the source cluster becomes accessible again after failover, do not attempt to write to both clusters simultaneously. This creates a scenario with potential data inconsistencies, since metadata starts to diverge. Testing requirements: Regularly test failover procedures in non-production environments to validate your disaster recovery processes and measure RTO. Next steps After completing failover: Update your application connection strings to point to the shadow cluster Verify that applications can produce and consume messages normally Consider deleting the shadow link if failover was successful and permanent For emergency situations, see Failover Runbook. Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution 🎉 Thanks for your feedback! Monitor Shadowing Failover Runbook