Skip to main content
Version: 23.1

Remote Read Replicas in Linux

Loading...
important

This feature requires an Enterprise license. To upgrade, contact Redpanda sales.

A Remote Read Replica topic is a read-only topic that mirrors a topic on a different cluster. Remote Read Replicas work with both Tiered Storage and archival storage.

When a topic has cloud storage enabled, you can create a separate remote cluster just for consumers of this topic, and populate its topics from remote storage. A read-only topic on a remote cluster can serve any consumer, without increasing the load on the origin cluster. Use cases for Remote Read Replicas include data analytics, offline model training, and development clusters.

You can create Remote Read Replica topics in a Redpanda cluster that directly accesses data stored in cloud object storage. Because these read-only topics access data directly from cloud object storage instead of the topics' origin cluster, there's no impact to the performance of the cluster. Furthermore, topic data can be consumed within a region of your choice, regardless of the region where it was produced.

tip

To create a Remote Read Replica topic in another region, consider using a multi-region bucket to simplify deployment and optimize performance.

Create a topic with remote storage

Before you can create a Remote Read Replica, you must create a topic on the origin cluster, and set up a bucket or container for the topic's cloud storage.

  1. Create a bucket or container to store the original data.

  2. Create the original topic by running rpk topic create <topic_name>, and specifying the number of partitions and the number of replicas.

  3. Enable cloud storage on the origin cluster by running rpk cluster config edit, and then specify the following cluster configuration properties:

    PropertyDescription
    cloud_storage_enabledMust be set to true to enable cloud storage.
    cloud_storage_bucketAWS or GCS bucket name where the original data is stored.
    Required for AWS and GCS.
    cloud_storage_access_keyAWS or GCS access key.
    Required for AWS and GCS authentication with access keys.
    cloud_storage_secret_keyAWS or GCS secret key.
    Required for AWS and GCS authentication with access keys.
    cloud_storage_regionCloud storage region.
    Required for AWS and GCS.
    cloud_storage_api_endpointAWS or GCS API endpoint.
    - For AWS, this can be left blank. It’s generated automatically using the region and bucket.
    - For GCS, use storage.googleapis.com.
    cloud_storage_azure_containerAzure container name.
    Required for ABS.
    cloud_storage_azure_storage_accountAzure account name.
    Required for ABS.
    cloud_storage_azure_shared_keyAzure shared key.
    Required for ABS.
    cloud_storage_enable_remote_writeWhen using archival storage or Tiered Storage on the origin cluster, set to true to enable data to be uploaded from Redpanda and written to cloud storage for all topics.
    To only enable data upload for a specific topic, set cloud_storage_enable_remote_write: false and run rpk topic create <topic_name> -c redpanda.remote.write=true when you create the topic.
    cloud_storage_enable_remote_readWhen using Tiered Storage on the origin cluster, set to true to enable consumers to read from all topics in cloud storage.
    To enable consumers to only read from one topic, set cloud_storage_enable_remote_read: false and run rpk topic create <topic_name> -c redpanda.remote.read=true when you create the topic.

Set up a Remote Read Replica

To set up a Remote Read Replica topic on a separate remote cluster:

  1. Create a remote cluster for the Remote Read Replica topic.

    • If that's a multi-region bucket/container, you can create the read replica cluster in any region that has that bucket/container.
    • If that's a single-region bucket/container, the remote cluster must be in the same region as the bucket/container.
  2. Run rpk cluster config edit, and then specify the following cluster configuration properties:

    PropertyDescription
    cloud_storage_enabledMust be set to true to enable cloud storage.
    cloud_storage_bucket: "none"No AWS or GCS bucket is needed for the remote cluster.
    cloud_storage_access_keyAWS or GCS access key.
    Required for AWS and GCS authentication with access keys.
    cloud_storage_secret_keyAWS or GCS secret key.
    Required for AWS and GCS authentication with access keys.
    cloud_storage_regionCloud storage region of the remote cluster.
    Required for AWS and GCS.
    cloud_storage_api_endpointAWS or GCS API endpoint.
    - For AWS, this can be left blank. It’s generated automatically using the region and bucket.
    - For GCS, use storage.googleapis.com.
    cloud_storage_azure_containerAzure container name.
    Required for ABS.
    cloud_storage_azure_storage_accountAzure account name.
    Required for ABS.
    cloud_storage_azure_shared_keyAzure shared key.
    Required for ABS.
  3. Create the Remote Read Replica topic by running rpk topic create <topic_name> -c redpanda.remote.readreplica=<bucket_name>.

    For <topic_name>, use the same name as the original topic. For <bucket_name>, use the bucket specified in the cloud_storage_bucket property for the origin cluster.

    note

    Do not use redpanda.remote.read or redpanda.remote.write with redpanda.remote.readreplica. Redpanda ignores the values for remote read and remote write properties on read replica topics.

Reduce lag in data availability

When cloud storage is enabled on a topic, Redpanda copies closed log segments to the configured object store. Log segments are closed when the value of the segment size has been reached. A topic’s object store thus lags behind the local copy by the log_segment_size or, if set, by the topic's segment.bytes value. To reduce this lag in the data availability for the Remote Read Replica:

  • You can lower the value of segment.bytes. This lets Redpanda archive smaller log segments more frequently, at the cost of increasing I/O and file count.
  • Self-hosted implementations running version 22.3 or higher can set an idle timeout with cloud_storage_segment_max_upload_interval_sec to force Redpanda to periodically archive the contents of open log segments to object storage. This is useful if a topic’s write rate is low and log segments are kept open for long periods of time. The appropriate interval may depend on your total partition count: a system with less partitions can handle a higher number of segments per partition.

Suggested reading

What do you like about this doc?




Optional: Share your email address if we can contact you about your feedback.

Let us know what we do well: