Skip to main content
Version: 22.3

Remote Read Replicas

info

This feature requires an Enterprise license. To upgrade, contact Redpanda sales.

A Remote Read Replica topic is a read-only topic that mirrors a topic on a different cluster.

When a topic has cloud storage enabled, you can create a separate remote cluster just for consumers of this topic, and populate its topics from cloud storage. A read-only topic on a remote cluster can serve any consumer, without increasing the load on the origin cluster.

You can create Remote Read Replica topics in a Redpanda cluster that directly accesses data stored in cloud object storage. Because these read-only topics access data directly from object storage instead of the topics' origin cluster, there's no impact to the performance of the cluster. Furthermore, topic data can be consumed within a region of your choice, regardless of the region where it was produced. If you plan to create a Remote Read Replica topic in another region, consider using a multi-region bucket to simplify deployment and optimize performance. Sample use cases for Remote Read Replicas include data analytics, offline model training, and development clusters.

Creating a topic with archival storage or Tiered Storage

Before you can create a Remote Read Replica, you must create a topic on the origin cluster, and set up an S3 bucket for the topic's cloud storage. Remote Read Replicas work with both archival storage and Tiered Storage.

  1. Create an S3 bucket to store the original data.

  2. Create the original topic by running rpk topic create <topic_name>, and specifying the number of partitions and the number of replicas.

  3. Enable cloud storage on the origin cluster by running rpk cluster config edit, and then specify the following cluster configuration properties:

    PropertyDescription
    cloud_storage_enabled: trueMust be set to true to enable the cloud storage feature.
    cloud_storage_bucket: “<bucket_name>”The S3 bucket where the original data is stored.
    cloud_storage_access_key: “<access_key>”The access key used for authentication. When using IAM roles, you do not have to specify this property.
    cloud_storage_secret_key: “<secret_key>”The secret key used for authentication. When using IAM roles, you do not have to specify this property.
    cloud_storage_region: “<region>”The region where the S3 bucket is located. This must be a supported region.
    cloud_storage_enable_remote_writeWhen using archival storage or Tiered Storage on the origin cluster, set to true to enable data to be uploaded from Redpanda and written to cloud storage for all topics. To only enable data upload for a specific topic, set cloud_storage_enable_remote_write: false and run rpk topic create <topic_name> -c redpanda.remote.write=true when you create the topic.
    cloud_storage_enable_remote_readWhen using Tiered Storage on the origin cluster, set to true to enable consumers to read from all topics in cloud storage. To enable consumers to only read from one topic, set cloud_storage_enable_remote_read: false and run rpk topic create <topic_name> -c redpanda.remote.read=true when you create the topic.
  4. If you're using Google Cloud Storage, set the cloud_storage_api_endpoint property to storage.googleapis.com.

Setting up a Remote Read Replica

To set up a Remote Read Replica topic on a separate remote cluster:

  1. Create a remote cluster for the Remote Read Replica topic in the same region as the S3 bucket used for the origin cluster. If the S3 bucket is a multi-region bucket, you can create the read replica cluster in any region that has that bucket. If the S3 bucket is a single-region bucket, the remote cluster must be in the same region as the S3 bucket.

  2. Run rpk cluster config edit, and then specify the following cluster configuration properties:

    PropertyDescription
    cloud_storage_enabled: trueMust be set to true to enable the cloud storage feature.
    cloud_storage_bucket: “none”No S3 bucket is needed for the remote cluster.
    cloud_storage_access_key: “<access_key>”The access key that allows read access to the S3 bucket. When using IAM roles, you do not have to specify this property.
    cloud_storage_secret_key: “<secret_key>”The secret key that allows read access to the S3 bucket. When using IAM roles, you do not have to specify this property.
    cloud_storage_region: “<region>”The region where the remote cluster is created.
  3. If you're using Google Cloud Storage, set the cloud_storage_api_endpoint property to storage.googleapis.com.

  4. Create the Remote Read Replica topic by running rpk topic create <topic_name> -c redpanda.remote.readreplica=<bucket_name>. For <topic_name>, use the same name as the original topic. For <bucket_name>, use the S3 bucket specified in the cloud_storage_bucket property for the origin cluster.


Suggested reading