Create a GCS Sink Connector
The Google Cloud Storage (GCS) Sink connector stores Redpanda messages in a Google Cloud Storage bucket.
Prerequisites
Before you can create a GCS Sink connector in the Redpanda Cloud, you must:
-
Create a Google Cloud account.
-
Create a service account that will be used to connect to the GCS service.
-
Create a service account key and download it.
-
Create a custom role, which must have the following permissions:
-
storage.objects.create
to create items in the GCS bucket -
storage.objects.delete
to overwrite items in the GCS bucket
-
-
Create a GCS bucket to which to send data.
-
Grant permissions to the bucket your created for your service account. Use the role created in step 4.
Limitations
The GCS Sink connector has the following limitations:
-
You can use only the
STRING
andBYTES
input formats forCSV
output format. -
You can use only the
PARQUET
format when your messages contain schema.
Create a GCS Sink connector
To create the GCS Sink connector:
-
In Redpanda Cloud, click Connectors in the navigation menu, and then click Create Connector.
-
Select Export to Google Cloud Storage.
-
On the Create Connector page, specify the following required connector configuration options:
Property Description Topics to export
Comma-separated list of the cluster topics you want to replicate to GCS.
Topics regex
Java regular expression of topics to replicate. For example: specify
.
to replicate all available topics in the cluster. Applicable only when *Use regular expressions is selected.GCS Credentials JSON
JSON object with GCS credentials.
GCS bucket name
Name of an existing GCS bucket to store output files in.
Kafka message key format
Format of the key in the Kafka topic. Use
BYTES
for no conversion.Kafka message value format
Format of the value in the Kafka topic. Use
BYTES
for no conversion.GCS file format
Format of the files created in GCS:
CSV
(the default),JSON
,JSONL AVRO
, orPARQUET
. You can use theCSV
format output only withBYTES
andSTRING
.Avro codec
The Avro compression codec to be used for Avro output files. Available values:
null
(the default),deflate
,snappy
, andbzip2
.Max Tasks
Maximum number of tasks to use for this connector. The default is
1
. Each task replicates exclusive set of partitions assigned to it.Connector name
Globally-unique name to use for this connector.
-
Click Next. Review the connector properties specified, then click Create.
Advanced GCS Sink connector configuration
In most instances, the preceding basic configuration properties are sufficient. If you require any additional property settings, then specify any of the following optional advanced connector configuration properties by selecting Show advanced options on the Create Connector page:
Property | Description |
---|---|
|
The template for file names on GCS. Supports
|
|
The prefix to be added to the name of each file put in GCS. |
|
Fields to place into output files. Supported values are: 'key', 'value', 'offset', 'timestamp', and 'headers'. |
|
The type of encoding to be used for the value field. Supported values are: 'none' and 'base64'. |
|
The compression type to be used for files put into GCS. Supported values are: 'none', 'gzip', 'snappy', and 'zstd'. |
|
The maximum number of records to put in a single file. Must be a non-negative number. 0 is interpreted as "unlimited", which is the default. In this case files are only flushed after |
|
The time interval to periodically flush files and commit offsets. Value specified must be a non-negative number. Default is 60 seconds. 0 indicates that it is disabled. In this case, files are only flushed after reaching |
|
If set to |
|
Initial retry delay in milliseconds. The default value is 1000. |
|
Maximum retry delay in milliseconds. The default value is 32000. |
|
Retry delay multiplier. The default value is 2.0. |
|
Retry max attempts. The default value is 6. |
|
Retry total timeout in milliseconds. The default value is 50000. |
|
Retry backoff in milliseconds. In case of transient exceptions, useful for performing recovery. Maximum value is 86400000 (24 hours). |
|
Error tolerance response during connector operation. Default value is |
|
The name of the topic to be used as the dead letter queue (DLQ) for messages that result in an error when processed by this sink connector, its transformations, or converters. The topic name is blank by default, which means that no messages are recorded in the DLQ. |
|
Replication factor used to create the dead letter queue topic when it doesn’t already exist. |
|
When |
Map data
Use the appropriate key or value converter (input data format) for your data as follows:
-
JSON
when your messages are JSON-encoded. SelectMessage JSON contains schema
, with theschema
andpayload
fields. -
AVRO
when your messages contain AVRO-encoded messages, with schema stored in the Schema Registry. -
STRING
when your messages contain textual data. -
BYTES
when your messages contain arbitrary data.
You can also select the output data format for your GCS files as follows:
-
CSV
to produce data in theCSV
format. ForCSV
only, you can setSTRING
andBYTES
input formats. -
JSON
to produce data in theJSON
format as an array of record objects. -
JSONL
to produce data in theJSON
format, each message as a separate JSON, one per line. -
PARQUET
to produce data in thePARQUET
format when your messages contain schema. -
AVRO
to produce data in theAVRO
format when your messages contain schema.
Test the connection
After the connector is created, check the GCS bucket for a new file. Files should appear after the file flush interval (default is 60 seconds).
Troubleshoot
If there are any connection issues, an error message is returned. Depending on
the GCS bucket check
property value, the error results in a failed connector
(GCS bucket check = true
) or a failed task (GCS bucket check = false
).
Additional errors and corrective actions follow.
Message | Action |
---|---|
Failed to read credentials from JSON string |
The credentials given as JSON file in the |
The specified bucket does not exist |
Create the bucket if the bucket does not exist, or correct the bucket name if the bucket exists, but the specified |
No files in the GCS bucket |
Be sure to wait until the connector performs the first file flush (default is 60 seconds). |