Monitor Kubernetes Shadow Links

This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales.

If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply.

Monitor your shadow links to ensure proper replication performance and understand your disaster recovery readiness. For Kubernetes deployments, you can monitor shadow links using the Redpanda Operator’s ShadowLink resource status or by using rpk commands directly.

See Kubernetes Failover Runbook for immediate step-by-step disaster procedures.

Status commands

  • Operator

  • Helm

To list existing shadow links:

kubectl get shadowlink --namespace <shadow-namespace>
Example output
NAME   SYNCED
link   True

A healthy shadow link shows True for SYNCED. If SYNCED is False, use kubectl describe to investigate the issue.

To view detailed shadow link status and configuration:

kubectl describe shadowlink --namespace <shadow-namespace> <shadowlink-name>
Example output
Name:         link
Namespace:    redpanda-system
API Version:  cluster.redpanda.com/v1alpha2
Kind:         ShadowLink
Status:
  Conditions:
    Status:                True
    Type:                  Synced
    Message:               Shadow link is synced
  Shadow Topics:
    Name:   orders
    State:  active
    Name:   inventory
    State:  active
  Tasks:
    Name:    Source Topic Sync
    State:   active
    Name:    Consumer Group Shadowing
    State:   active
    Name:    Security Migrator
    State:   active

The kubectl describe output shows:

  • Shadow link state: Overall operational state in the Status section

  • Individual topic states: Current state of each replicated topic under Shadow Topics

  • Task status: Health of replication tasks under Tasks

  • Sync status: Whether the resource is properly synced (Synced: True in conditions)

  • Configuration: Complete shadow link configuration including connection settings and filters

Look for Synced: True in Conditions and active state for topics and tasks.

For more detailed monitoring or troubleshooting, you can also use rpk commands as shown in the Helm tab.

To list existing shadow links:

kubectl exec --namespace <shadow-namespace> <shadow-pod-name> --container redpanda -- \
  rpk shadow list
Example output
NAME                  UID                                   STATE
disaster-recovery-link 70f25b41-9bad-4e31-9f81-d302c8676397  ACTIVE

To view shadow link configuration details:

kubectl exec --namespace <shadow-namespace> <shadow-pod-name> --container redpanda -- \
  rpk shadow describe <shadow-link-name>

For detailed command options, see rpk shadow list and rpk shadow describe. This command shows the complete configuration of the shadow link, including connection settings, filters, and synchronization options.

To check your shadow link status and ensure proper operation:

kubectl exec --namespace <shadow-namespace> <shadow-pod-name> --container redpanda -- \
  rpk shadow status <shadow-link-name>
Example output
OVERVIEW
===
NAME   disaster-recovery-link
UID    70f25b41-9bad-4e31-9f81-d302c8676397
STATE  ACTIVE

TASKS
===
NAME                      BROKER_ID  SHARD  STATE   REASON
Source Topic Sync         0          0      ACTIVE  Source Topic Sync has started
Consumer Group Shadowing  0          0      ACTIVE  Group mirroring task finished successfully
Security Migrator Task    0          0      ACTIVE  Security Migrator Task has started

TOPICS
===
Name: orders, State: ACTIVE
      PARTITION  SRC_LSO  SRC_HWM  DST_HWM  LAG
      0          1000     1234     1230     4
      1          2000     2456     2450     6

Name: inventory, State: ACTIVE
      PARTITION  SRC_LSO  SRC_HWM  DST_HWM  LAG
      0          500      789      789      0

Key indicators:

  • STATE: ACTIVE: Shadow link is replicating

  • Tasks: ACTIVE: All replication tasks are running

  • LAG: Message count difference between source and shadow (lower is better)

For troubleshooting specific issues, you can use command options to show individual status sections. See rpk shadow status for available status options.

The status output includes the following:

  • Shadow link state: Overall operational state (ACTIVE, PAUSED).

  • Individual topic states: Current state of each replicated topic (ACTIVE, FAULTED, FAILING_OVER, FAILED_OVER, PAUSED).

  • Task status: Health of replication tasks across brokers (ACTIVE, FAULTED, NOT_RUNNING, LINK_UNAVAILABLE). For details about shadow link tasks, see Shadow link tasks.

  • Lag information: Replication lag per partition showing source vs shadow high watermarks (HWM).

Troubleshoot

Topics in FAULTED state

When monitoring shadow links, you may see topics showing FAULTED state in status output.

Check shadow cluster logs for specific error messages:

kubectl logs --namespace <shadow-namespace> <shadow-pod-name> --container redpanda | grep -i "shadow\|error"

Common causes include:

  • Source topic deleted: topic no longer exists on source cluster

  • Permission denied: shadow link service account lacks required permissions

  • Network interruption: temporary connectivity issues

If the source topic still exists and should be replicated, delete and recreate the shadow link to reset the faulted state.

High replication lag

When monitoring shadow links, you may see LAG values continuously increasing in rpk shadow status.

Check the following:

  • Check source cluster load: high produce rate may exceed replication capacity

  • Check shadow cluster resources: CPU, memory, or disk constraints

  • Check network bandwidth: verify sufficient bandwidth between clusters

To resolve:

  • Scale shadow cluster resources if constrained

  • Verify network connectivity and bandwidth

  • Review topic configuration for optimization opportunities

When monitoring shadow links, you may see tasks showing LINK_UNAVAILABLE state with "No brokers available" message.

Common causes include:

  • Source cluster requires SASL authentication but shadow link not configured for it

  • Source cluster unreachable from shadow cluster

  • Network policy blocking traffic between clusters

To resolve:

  • Verify SASL configuration if source cluster requires authentication

  • Test network connectivity: kubectl exec into shadow pod and try connecting to source cluster

  • Check Kubernetes NetworkPolicies and firewall rules

Shadowing provides comprehensive metrics to track replication performance and health with the public_metrics endpoint.

Metric Type Description

redpanda_shadow_link_shadow_lag

Gauge

The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by shadow_link_name, topic, and partition to understand replication lag for each partition.

redpanda_shadow_link_total_bytes_fetched

Count

The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by shadow_link_name and shard to track data transfer volume from the source cluster.

redpanda_shadow_link_total_bytes_written

Count

The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses shadow_link_name and shard labels to monitor data written to the shadow cluster.

redpanda_shadow_link_client_errors

Count

The number of errors seen by the client. Track by shadow_link_name and shard to identify connection or protocol issues between clusters.

redpanda_shadow_link_shadow_topic_state

Gauge

Number of shadow topics in the respective states. Labeled by shadow_link_name and state to monitor topic state distribution across your shadow links.

redpanda_shadow_link_total_records_fetched

Count

The total number of records fetched by the sharded replicator (records received by the client). Monitor by shadow_link_name and shard to track message throughput from the source.

redpanda_shadow_link_total_records_written

Count

The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses shadow_link_name and shard labels to monitor message throughput to the shadow cluster.

See also: Public Metrics

Monitoring best practices

Health check procedures

Establish regular monitoring workflows to ensure shadow link health:

  • Operator

  • Helm

# Check all shadow links are synced and healthy
kubectl get shadowlink --namespace <shadow-namespace>

# View detailed status for a specific shadow link
kubectl describe shadowlink --namespace <shadow-namespace> <shadowlink-name>

# Check for any shadow links with issues (not synced)
kubectl get shadowlink --namespace <shadow-namespace> -o json | \
  jq '.items[] | select(.status.conditions[] | select(.type=="Synced" and .status!="True")) | .metadata.name'
# Check all shadow links are active
kubectl exec --namespace <shadow-namespace> <shadow-pod-name> --container redpanda -- \
  rpk shadow list | grep -v "ACTIVE" || echo "All shadow links healthy"

# Monitor lag for critical topics
kubectl exec --namespace <shadow-namespace> <shadow-pod-name> --container redpanda -- \
  rpk shadow status <shadow-link-name> | grep -E "LAG|Lag"

Alert conditions

Configure monitoring alerts for the following conditions, which indicate problems with Shadowing:

  • High replication lag: When redpanda_shadow_link_shadow_lag exceeds your RPO requirements

  • Connection errors: When redpanda_shadow_link_client_errors increases rapidly

  • Topic state changes: When topics move to FAULTED state

  • Task failures: When replication tasks enter FAULTED or NOT_RUNNING states

  • Throughput drops: When bytes/records fetched drops significantly

  • Link unavailability: When tasks show LINK_UNAVAILABLE indicating source cluster connectivity issues

For more information about shadow link tasks and their states, see Shadow link tasks.