Remote Read Replicas in Kubernetes
This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales. If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply. |
A Remote Read Replica topic is a read-only topic that mirrors a topic on a different cluster. Remote Read Replicas work with both Tiered Storage and archival storage.
When a topic has object storage enabled, you can create a separate remote cluster just for consumers of this topic, and populate its topics from remote storage. A read-only topic on a remote cluster can serve any consumer, without increasing the load on the origin cluster. Use cases for Remote Read Replicas include data analytics, offline model training, and development clusters.
You can create Remote Read Replica topics in a Redpanda cluster that directly accesses data stored in object storage. Because these read-only topics access data directly from object storage instead of the topics' origin cluster, there’s no impact to the performance of the cluster. Topic data can be consumed within a region of your choice, regardless of the region where it was produced.
|
For default values and documentation for configuration options, see the values.yaml
file.
Prerequisites
You need the following:
-
An origin cluster with Tiered Storage set up. Multi-region buckets or containers are not supported.
-
A topic on the origin cluster, which you can use as a Remote Read Replica topic on the remote cluster.
-
A separate remote cluster.
-
AWS: The remote cluster must be in the same region as the origin cluster’s storage bucket/container.
-
GCP: The remote cluster can be in the same or a different region as the bucket/container.
-
Azure: Remote read replicas are not supported.
-
This feature requires an enterprise license. To get a trial license key or extend your trial period, generate a new trial license key. To purchase a license, contact Redpanda Sales.
If Redpanda has enterprise features enabled and it cannot find a valid license, restrictions apply.
To check if you already have a license key applied to your cluster:
rpk cluster license info
Configure object storage for the remote cluster
You must configure access to the same object storage as the origin cluster.
-
Amazon S3
-
Google Cloud Storage
-
Azure Blob Storage
You can configure access to Amazon S3 with either an IAM role attached to the instance or with access keys.
Use IAM roles
To configure access to an S3 bucket with an IAM role:
-
Configure an IAM role with read permissions for the S3 bucket.
-
Override the following required cluster properties in the Helm chart:
-
--values
-
--set
cloud-storage.yaml
storage: tiered: config: cloud_storage_enabled: true cloud_storage_credentials_source: aws_instance_metadata cloud_storage_region: <region> cloud_storage_bucket: "none"
yamlhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values cloud-storage.yaml
bashhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set storage.tiered.config.cloud_storage_enabled=true \ --set storage.tiered.config.cloud_storage_credentials_source=aws_instance_metadata \ --set storage.tiered.config.cloud_storage_region=<region> \ --set storage.tiered.config.cloud_storage_bucket="none"
bashReplace the following placeholders:
-
<region>
: The region of your S3 bucket.
-
Use access keys
To configure access to an S3 bucket with access keys instead of an IAM role:
-
Grant a user the following permissions to read objects on the bucket to be used with the cluster (or on all buckets):
-
GetObject
-
ListBucket
-
-
Create a Secret in which to store the access key and secret key.
apiVersion: v1 kind: Secret metadata: name: storage-secrets namespace: <namespace> type: Opaque data: access-key: <base64-encoded-access-key> secret-key: <base64-encoded-secret-key>
yaml-
Replace
<base64-encoded-access-key>
with your base64-encoded access key. -
Replace
<base64-encoded-secret-key>
with your base64-encoded secret key.
-
-
Override the following required cluster properties in the Helm chart:
-
--values
-
--set
cloud-storage.yaml
storage: tiered: credentialsSecretRef: accessKey: name: storage-secrets key: access-key secretKey: name: storage-secrets key: secret-key config: cloud_storage_enabled: true cloud_storage_credentials_source: config_file cloud_storage_region: <region> cloud_storage_bucket: "none"
yamlhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values cloud-storage.yaml
bashhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set storage.tiered.config.cloud_storage_enabled=true \ --set storage.tiered.credentialsSecretRef.accessKey.name=storage-secrets \ --set storage.tiered.credentialsSecretRef.accessKey.key=access-key \ --set storage.tiered.credentialsSecretRef.secretKey.name=storage-secrets \ --set storage.tiered.credentialsSecretRef.secretKey.key=secret-key \ --set storage.tiered.config.cloud_storage_credentials_source=config_file \ --set storage.tiered.config.cloud_storage_region=<region> \ --set storage.tiered.config.cloud_storage_bucket="none"
bashReplace
<region>
with the region of your S3 bucket. -
You can configure access to Google Cloud Storage with either an IAM role attached to the instance or with access keys.
Use IAM roles
To configure access to Google Cloud Storage with an IAM role, override the following required cluster properties in the Helm chart:
-
--values
-
--set
cloud-storage.yaml
storage:
tiered:
config:
cloud_storage_enabled: true
cloud_storage_credentials_source: gcp_instance_metadata
cloud_storage_region: <region>
cloud_storage_bucket: "none"
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
--values cloud-storage.yaml
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
--set storage.tiered.config.cloud_storage_enabled=true \
--set storage.tiered.config.cloud_storage_credentials_source=aws_instance_metadata \
--set storage.tiered.config.cloud_storage_region=<region> \
--set storage.tiered.config.cloud_storage_bucket="none"
Replace <region>
with the region of your bucket.
Use access keys
To configure access to Google Cloud Storage with access keys instead of an IAM role:
-
Create a Secret in which to store the access key and secret key.
apiVersion: v1 kind: Secret metadata: name: storage-secrets namespace: <namespace> type: Opaque data: access-key: <base64-encoded-access-key> secret-key: <base64-encoded-secret-key>
yaml-
Replace
<base64-encoded-access-key>
with your base64-encoded access key. -
Replace
<base64-encoded-secret-key>
with your base64-encoded secret key.
-
-
Override the following required cluster properties in the Helm chart:
-
--values
-
--set
cloud-storage.yaml
storage: tiered: credentialsSecretRef: accessKey: name: storage-secrets key: access-key secretKey: name: storage-secrets key: secret-key config: cloud_storage_enabled: true cloud_storage_credentials_source: config_file cloud_storage_api_endpoint: storage.googleapis.com cloud_storage_region: <region> cloud_storage_bucket: "none"
yamlhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values cloud-storage.yaml
bashhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set storage.tiered.config.cloud_storage_enabled=true \ --set storage.tiered.credentialsSecretRef.accessKey.name=storage-secrets \ --set storage.tiered.credentialsSecretRef.accessKey.key=access-key \ --set storage.tiered.credentialsSecretRef.secretKey.name=storage-secrets \ --set storage.tiered.credentialsSecretRef.secretKey.key=secret-key \ --set storage.tiered.config.cloud_storage_credentials_source=config_file \ --set storage.tiered.config.cloud_storage_api_endpoint=storage.googleapis.com \ --set storage.tiered.config.cloud_storage_region=<region> \ --set storage.tiered.config.cloud_storage_bucket="none"
bashReplace
<region>
with the region of your bucket. -
To configure access to Azure Blob Storage(ABS):
-
Create a Secret in which to store the access key.
apiVersion: v1 kind: Secret metadata: name: storage-secrets namespace: <namespace> type: Opaque data: access-key: <base64-encoded-access-key>
yaml-
Replace
<base64-encoded-access-key>
with your base64-encoded access key.
-
-
Override the following required cluster properties in the Helm chart:
-
--values
-
--set
cloud-storage.yaml
storage: tiered: credentialsSecretRef: secretKey: configurationKey: cloud_storage_azure_shared_key name: storage-secrets key: access-key config: cloud_storage_enabled: true cloud_storage_azure_storage_account: <account-name> cloud_storage_azure_container: "none"
yamlhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --values cloud-storage.yaml
bashhelm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \ --set storage.tiered.config.cloud_storage_enabled=true \ --set storage.tiered.credentialsSecretRef.secretKey.configurationKey=cloud_storage_azure_shared_key \ --set storage.tiered.credentialsSecretRef.secretKey.name=storage-secrets \ --set storage.tiered.credentialsSecretRef.secretKey.key=access-key \ --set storage.tiered.config.cloud_storage_azure_storage_account=<account-name> \ --set storage.tiered.config.cloud_storage_azure_container="none"
bash -
Replace <account-name>
with the name of your Azure account.
Create a Remote Read Replica topic
To create the Remote Read Replica topic, run:
rpk topic create <topic_name> -c redpanda.remote.readreplica=<bucket_name>
-
For
<topic_name>
, use the same name as the original topic. -
For
<bucket_name>
, use the bucket/container specified in thecloud_storage_bucket
orcloud_storage_azure_container
properties for the origin cluster.
|
Reduce lag in data availability
When object storage is enabled on a topic, Redpanda copies closed log segments to the configured object store.
Log segments are closed when the value of the segment size has been reached.
A topic’s object store thus lags behind the local copy by the log_segment_size
or,
if set, by the topic’s segment.bytes
value. To reduce this lag in the data availability for the Remote Read Replica:
-
You can lower the value of
segment.bytes
. This lets Redpanda archive smaller log segments more frequently, at the cost of increasing I/O and file count. -
Redpanda Self-Managed deployments can set an idle timeout with
storage.tiered.config.cloud_storage_segment_max_upload_interval_sec
to force Redpanda to periodically archive the contents of open log segments to object storage. This is useful if a topic’s write rate is low and log segments are kept open for long periods of time. The appropriate interval may depend on your total partition count: a system with less partitions can handle a higher number of segments per partition.