Monitor Shadowing
Monitor your shadow links to ensure proper replication performance and understand your disaster recovery readiness. Use rpk commands, metrics, and status information to track shadow link health and troubleshoot issues.
|
See Failover Runbook for immediate step-by-step disaster procedures. |
Status commands
To list existing shadow links:
-
Cloud UI
-
rpk
-
Control Plane API
At the organization level of the Cloud UI, navigate to Shadow Link.
rpk shadow list
curl 'https://api.redpanda.com/v1/shadow-links' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}"
To view shadow link configuration details:
-
Cloud UI
-
rpk
-
Control Plane API
-
From the Shadow Link page, select the shadow link you want to view.
-
Click the Tasks tab to view all tasks and their status.
rpk shadow describe <shadow-link-name>
For detailed command options, see rpk shadow list and rpk shadow describe. This command shows the complete configuration of the shadow link, including connection settings, filters, and synchronization options.
curl 'https://api.redpanda.com/v1/shadow-links/<shadow-link-id>' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}"
To check your shadow link status and ensure proper operation:
-
Cloud UI
-
rpk
-
Cloud API
-
From the Shadow Link page, select the shadow link you want to view.
-
Click the Tasks tab to view all tasks and their status.
rpk shadow status <shadow-link-name>
For troubleshooting specific issues, you can use command options to show individual status sections. See rpk shadow status for available status options. The status output includes the following:
# Get Data Plane API URL of shadow cluster
export DATAPLANE_API_URL=`curl https://api.cloud.redpanda.com/v1/clusters/<shadow-cluster-id> \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | jq .cluster.dataplane_api`
curl "https://$DATAPLANE_API_URL/v1/shadowlinks/<shadow-link-name>" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}"
# View topic state
curl "https://$DATAPLANE_API_URL/v1/shadowlinks/<shadow-link-name>/topic" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}"
The status includes the following:
-
Shadow link state: Overall operational state (
ACTIVE,PAUSED). -
Individual topic states: Current state of each replicated topic (
ACTIVE,FAULTED,FAILING_OVER,FAILED_OVER,PAUSED). -
Task status: Health of replication tasks across brokers (
ACTIVE,FAULTED,NOT_RUNNING,LINK_UNAVAILABLE). For details about shadow link tasks, see Shadow link tasks. -
Lag information: Replication lag per partition showing source vs shadow high watermarks (HWM).
Metrics
Shadowing provides comprehensive metrics to track replication performance and health with the public_metrics endpoint.
| Metric | Type | Description |
|---|---|---|
|
Gauge |
The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by |
|
Count |
The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by |
|
Count |
The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses |
|
Count |
The number of errors seen by the client. Track by |
|
Gauge |
Number of shadow topics in the respective states. Labeled by |
|
Count |
The total number of records fetched by the sharded replicator (records received by the client). Monitor by |
|
Count |
The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses |
See also: Metrics Reference
Monitoring best practices
Health check procedures
Establish regular monitoring workflows to ensure shadow link health:
-
Cloud UI
-
rpk
-
Cloud API
-
From the Shadow Link page, select the shadow link you want to view.
-
Click the Tasks tab to view all tasks and their status.
# Check all shadow links are active
rpk shadow list | grep -v "ACTIVE" || echo "All shadow links healthy"
# Monitor lag for critical topics
rpk shadow status <shadow-link-name> | grep -E "LAG|Lag"
# Check all shadow links are active
curl 'https://api.redpanda.com/v1/shadow-links' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}" | \
jq -r 'if all(.state == "SHADOW_LINK_STATE_ACTIVE") then "All shadow links healthy" else .[] | select(.state != "SHADOW_LINK_STATE_ACTIVE") end'
# Monitor lag for critical topics
curl "https://$DATAPLANE_API_URL/v1/shadowlinks/<shadow-link-name>/topic" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${RP_CLOUD_TOKEN}"
Alert conditions
Configure monitoring alerts for the following conditions, which indicate problems with Shadowing:
-
High replication lag: When
redpanda_shadow_link_shadow_lagexceeds your RPO requirements -
Connection errors: When
redpanda_shadow_link_client_errorsincreases rapidly -
Topic state changes: When topics move to
FAULTEDstate -
Task failures: When replication tasks enter
FAULTEDorNOT_RUNNINGstates -
Throughput drops: When bytes/records fetched drops significantly
-
Link unavailability: When tasks show
LINK_UNAVAILABLEindicating source cluster connectivity issues
For more information about shadow link tasks and their states, see Shadow link tasks.