gcp_cloud_storage

Downloads objects within a Google Cloud Storage bucket, optionally filtered by a prefix.

  • Common

  • Advanced

# Common config fields, showing default values
input:
  label: ""
  gcp_cloud_storage:
    bucket: "" # No default (required)
    prefix: ""
    credentials_json: "" # No default (optional)
    scanner:
      to_the_end: {}
# All config fields, showing default values
input:
  label: ""
  gcp_cloud_storage:
    bucket: "" # No default (required)
    prefix: ""
    credentials_json: "" # No default (optional)
    scanner:
      to_the_end: {}
    delete_objects: false

Metadata

This input adds the following metadata fields to each message:

- gcs_key
- gcs_bucket
- gcs_last_modified
- gcs_last_modified_unix
- gcs_content_type
- gcs_content_encoding
- All user defined metadata

You can access these metadata fields using function interpolation.

Credentials

By default Redpanda Connect will use a shared credentials file when connecting to GCP services. You can find out more in Google Cloud Platform.

Fields

bucket

The name of the bucket from which to download objects.

Type: string

prefix

Optional path prefix, if set only objects with the prefix are consumed.

Type: string

Default: ""

credentials_json

This field contains sensitive information. Review your cluster security before adding it to your configuration.

Type: string

Default: ""

scanner

The scanner by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the csv scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.

Type: scanner

Default: {"to_the_end":{}}

delete_objects

Whether to delete downloaded objects from the bucket once they are processed.

Type: bool

Default: false