Remote Read Replicas
|
This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales. If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply. |
A Remote Read Replica topic is a read-only topic that mirrors a topic on a different cluster. Remote Read Replicas work with both Tiered Storage and archival storage.
When a topic has object storage enabled, you can create a separate remote cluster just for consumers of this topic, and populate its topics from remote storage. A read-only topic on a remote cluster can serve any consumer, without increasing the load on the origin cluster. Use cases for Remote Read Replicas include data analytics, offline model training, and development clusters.
You can create Remote Read Replica topics in a Redpanda cluster that directly accesses data stored in object storage. Because these read-only topics access data directly from object storage instead of the topics' origin cluster, there’s no impact to the performance of the cluster. Topic data can be consumed within a region of your choice, regardless of the region where it was produced.
|
Prerequisites
You need the following:
-
An origin cluster with Tiered Storage set up. Multi-region buckets or containers are not supported.
-
A topic on the origin cluster, which you can use as a Remote Read Replica topic on the remote cluster.
-
A separate remote cluster.
-
AWS: The remote cluster can be in the same or a different region as the origin cluster’s S3 bucket. For cross-region Remote Read Replica topics, see Create a cross-region Remote Read Replica topic on AWS.
-
GCP: The remote cluster can be in the same or a different region as the bucket/container.
-
Azure: Remote read replicas are not supported.
-
This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales.
If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply.
To check if you already have a license key applied to your cluster:
rpk cluster license info
Configure object storage for the remote cluster
You must configure access to the same object storage as the origin cluster.
To set up a Remote Read Replica topic on a separate remote cluster:
-
Create a remote cluster for the Remote Read Replica topic. For GCP, the remote cluster can be in the same or a different region as the bucket/container. For AWS, the remote cluster can be in the same or a different region, but cross-region Remote Read Replica topics require additional configuration. See Create a cross-region Remote Read Replica topic on AWS.
-
Run
rpk cluster config edit, and then specify properties specific to your object storage provider (your cluster will require a restart after any changes to these properties):-
Amazon S3
-
Google Cloud Storage
-
Azure Blob Storage
cloud_storage_enabled : true cloud_storage_access_key : <your-acess-key> cloud_storage_secret_key : <your-secret-key> cloud_storage_region : <your-region> cloud_storage_bucket : <your-bucket> #Optional. Should not be changed after writing data to itModifying the cloud_storage_bucketproperty after writing data to a bucket could cause data loss.cloud_storage_enabled : true cloud_storage_access_key : <your-acess-key> cloud_storage_secret_key : <your-secret-key> cloud_storage_region : <your-region> cloud_storage_api_endpoint : `storage.googleapis.com` cloud_storage_bucket : <your-bucket> #Optional. Should not be changed after writing data to it.Modifying the cloud_storage_bucketproperty after writing data to a bucket could cause data loss.cloud_storage_enabled : true cloud_storage_azure_container : <your-container> cloud_storage_azure_storage_account : <your-storage-account> cloud_storage_azure_shared_key: <your-shared-key> -
For a complete reference on object storage properties, see Object Storage Properties.
Create a Remote Read Replica topic
To create the Remote Read Replica topic, run:
rpk topic create <topic_name> -c redpanda.remote.readreplica=<bucket_name>
-
For
<topic_name>, use the same name as the original topic. -
For
<bucket_name>, use the bucket/container specified in thecloud_storage_bucketorcloud_storage_azure_containerproperties for the origin cluster.
|
Create a cross-region Remote Read Replica topic on AWS
Use this configuration only when the remote cluster is in a different AWS region than the origin cluster’s S3 bucket. For same-region AWS or GCP deployments, use the standard topic creation command.
Prerequisites
You must explicitly set the cloud_storage_url_style cluster property to virtual_host or path on the remote cluster. The default value does not support cross-region Remote Read Replicas.
Create the topic
To create a cross-region Remote Read Replica topic, append region and endpoint query-string parameters to the bucket name.
In the following example, replace the placeholders:
-
<topic_name>: The name of the topic in the cluster hosting the Remote Read Replica. -
<bucket_name>: The S3 bucket configured on the origin cluster (cloud_storage_bucket). -
<origin_bucket_region>: The AWS region of the origin cluster’s S3 bucket (not the remote cluster’s region).
rpk topic create <topic_name> \
-c redpanda.remote.readreplica=<bucket_name>?region=<origin_bucket_region>&endpoint=s3.<origin_bucket_region>.amazonaws.com
For example, if the origin cluster stores data in a bucket called my-bucket in us-east-1:
rpk topic create my-topic \
-c redpanda.remote.readreplica=my-bucket?region=us-east-1&endpoint=s3.us-east-1.amazonaws.com
The endpoint value must not include the bucket name. When using virtual_host URL style, Redpanda automatically prepends the bucket name to the endpoint. When using path URL style, Redpanda appends the bucket name as a path segment.
|
Limits
Each unique combination of region and endpoint creates a separate object storage target on the remote cluster. A cluster supports a maximum of 10 targets.
How targets are counted depends on cloud_storage_url_style:
-
virtual_host: Each unique combination of bucket, region, and endpoint counts as one target. You can create up to 10 distinct cross-region Remote Read Replica topics for each cluster. -
path: Each unique combination of region and endpoint counts as one target (the bucket name is not part of the key). You can create cross-region Remote Read Replica topics for multiple buckets using the same region/endpoint combination, with a maximum of 10 distinct region/endpoint combinations for each cluster.
Reduce lag in data availability
When object storage is enabled on a topic, Redpanda copies closed log segments to the configured object store.
Log segments are closed when the value of the segment size has been reached.
A topic’s object store thus lags behind the local copy by the log_segment_size or,
if set, by the topic’s segment.bytes value. To reduce this lag in the data availability for the Remote Read Replica:
-
You can lower the value of
segment.bytes. This lets Redpanda archive smaller log segments more frequently, at the cost of increasing I/O and file count. -
Redpanda Streaming deployments can set an idle timeout with
cloud_storage_segment_max_upload_interval_secto force Redpanda to periodically archive the contents of open log segments to object storage. This is useful if a topic’s write rate is low and log segments are kept open for long periods of time. The appropriate interval may depend on your total partition count: a system with less partitions can handle a higher number of segments per partition.