Disaster Recovery with Envoy and Shadowing

This lab demonstrates a disaster recovery setup that combines Redpanda Shadowing with Envoy proxy.

  • Shadowing provides offset-preserving, byte-for-byte data replication between clusters

  • Envoy provides transparent client routing without requiring client reconfiguration

Envoy is a high-performance proxy that can route traffic intelligently based on backend health. In this setup, Envoy routes Kafka clients to the active cluster and automatically fails over to the shadow cluster when the source becomes unavailable. This eliminates the need to reconfigure clients during disaster recovery.

In this lab, you will:

  • Set up Shadowing for offset-preserving data replication

  • Configure Envoy for automatic client routing during failover

  • Execute a complete disaster recovery failover

Prerequisites

Run the lab

  1. Clone this repository:

    git clone https://github.com/redpanda-data/redpanda-labs.git
    cd redpanda-labs/docker-compose/envoy-shadowing
  2. Start the environment:

    docker compose up -d --wait
  3. Verify both clusters are healthy:

    docker exec redpanda-source rpk cluster health
    docker exec redpanda-shadow rpk cluster health
  4. Create a topic on the source cluster:

    docker exec redpanda-source rpk topic create demo-topic --partitions 3 --replicas 1
  5. Create a shadow link to replicate data from source to shadow:

    docker exec redpanda-shadow rpk shadow create \
      --config-file /config/shadow-link.yaml \
      --no-confirm \
      -X admin.hosts=redpanda-shadow:9644
  6. Verify the shadow link is active:

    docker exec redpanda-shadow rpk shadow status demo-shadow-link -X admin.hosts=redpanda-shadow:9644
  7. Produce messages through Envoy (routes to source cluster):

    docker exec python-client python3 /scripts/test-producer.py
  8. Verify data replicated to shadow (lag should be 0):

    docker exec redpanda-shadow rpk shadow status demo-shadow-link -X admin.hosts=redpanda-shadow:9644 | grep -A5 "demo-topic"

Simulate disaster and failover

  1. Stop the source cluster to simulate a disaster:

    docker stop redpanda-source

    Envoy detects the failure in 10-15 seconds and routes traffic to the shadow cluster.

  2. Read replicated data from shadow through Envoy:

    docker exec python-client python3 /scripts/test-consumer.py

    Consumers can read from shadow topics immediately after Envoy fails over.

  3. Execute shadow failover to enable writes:

    docker exec redpanda-shadow rpk shadow failover demo-shadow-link --all --no-confirm \
      -X admin.hosts=redpanda-shadow:9644

    Shadow topics are read-only until you run the failover command. This prevents split-brain scenarios where both clusters accept writes.

  4. Produce new messages to the failed-over shadow cluster:

    docker exec python-client python3 /scripts/test-producer.py

Clean up

Stop and remove the demo environment:

docker compose down -v

What you explored

In this lab, you:

  • Set up Shadowing between source and shadow clusters with offset-preserving replication

  • Configured Envoy for automatic client routing based on cluster health

  • Simulated a disaster by stopping the source cluster

  • Verified consumers can read replicated data through Envoy immediately after failover

  • Executed rpk shadow failover to enable writes on the shadow cluster

  • Produced new messages to the failed-over cluster without client reconfiguration

The following table summarizes the roles of each component in this disaster recovery setup:

Component Role Automatic?

Shadowing

Data replication with preserved offsets

Yes

Envoy

Client routing to healthy cluster

Yes

rpk shadow failover

Enable writes on shadow topics

No (manual)