aws_s3
Uploads messages to an Amazon S3 bucket as objects, using the path specified in the path field.
Introduced in version 3.36.0.
-
Common
-
Advanced
outputs:
label: ""
aws_s3:
bucket: "" # No default (required)
path: ${!counter()}-${!timestamp_unix_nano()}.txt
tags: {}
content_type: application/octet-stream
metadata:
exclude_prefixes: []
max_in_flight: 64
batching:
count: 0
byte_size: 0
period: ""
check: ""
processors: [] # No default (optional)
outputs:
label: ""
aws_s3:
bucket: "" # No default (required)
path: ${!counter()}-${!timestamp_unix_nano()}.txt
tags: {}
content_type: application/octet-stream
content_encoding: ""
cache_control: ""
content_disposition: ""
content_language: ""
website_redirect_location: ""
metadata:
exclude_prefixes: []
storage_class: STANDARD
kms_key_id: ""
checksum_algorithm: ""
server_side_encryption: ""
force_path_style_urls: false
max_in_flight: 64
timeout: 5s
object_canned_acl: ""
batching:
count: 0
byte_size: 0
period: ""
check: ""
processors: [] # No default (optional)
region: "" # No default (optional)
endpoint: "" # No default (optional)
tcp:
connect_timeout: 0s
keep_alive:
idle: 15s
interval: 15s
count: 9
tcp_user_timeout: 0s
credentials:
profile: "" # No default (optional)
id: "" # No default (optional)
secret: "" # No default (optional)
token: "" # No default (optional)
from_ec2_role: "" # No default (optional)
role: "" # No default (optional)
role_external_id: "" # No default (optional)
To use a different path for each object, use function interpolation, which is evaluated for each message in a batch.
Metadata
Redpanda Connect sends metadata fields as headers. To mutate or remove these values, see the metadata docs.
Tags
The tags field accepts key/value pairs to attach to objects as tags, and the values support interpolation functions:
output:
aws_s3:
bucket: TODO
path: ${!counter()}-${!timestamp_unix_nano()}.tar.gz
tags:
Key1: Value1
Timestamp: ${!meta("Timestamp")}
Credentials
By default, Redpanda Connect uses a shared credentials file when connecting to AWS services. You can also set credentials explicitly at the component level to transfer data across accounts. You can find out more in AWS credentials.
Batching
It’s common to want to upload messages to S3 as batched archives. The easiest way to do this is to batch your messages at the output level and join the batch of messages with an archive or compress processor.
For example, the following configuration uploads messages as a .tar.gz archive of documents:
output:
aws_s3:
bucket: TODO
path: ${!counter()}-${!timestamp_unix_nano()}.tar.gz
batching:
count: 100
period: 10s
processors:
- archive:
format: tar
- compress:
algorithm: gzip
This configuration uploads JSON documents as a single large document containing an array of objects:
output:
aws_s3:
bucket: TODO
path: ${!counter()}-${!timestamp_unix_nano()}.json
batching:
count: 100
processors:
- archive:
format: json_array
Bucket name format
The bucket field accepts a bucket name only, not an ARN. For example, use my-bucket, not arn:aws:s3:::my-bucket.
S3-compatible storage
The endpoint and force_path_style_urls fields let you connect to S3-compatible storage services such as Cloudflare R2, MinIO, or DigitalOcean Spaces.
For Cloudflare R2, set endpoint to your account endpoint URL and enable force_path_style_urls:
output:
aws_s3:
bucket: r2-bucket
path: ${!uuid_v4()}.json
endpoint: https://<account-id>.r2.cloudflarestorage.com
force_path_style_urls: true
region: auto
credentials:
id: <r2-access-key-id>
secret: <r2-secret-access-key>
Find your account ID in the Cloudflare dashboard under R2 > Overview > Account Details. Generate API credentials under R2 > Manage R2 API Tokens.
Performance
This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field max_in_flight.
Fields
batching
Configure a batching policy.
Type: object
# Examples:
batching:
byte_size: 5000
count: 0
period: 1s
# ---
batching:
count: 10
period: 1s
# ---
batching:
check: this.contains("END BATCH")
count: 0
period: 1m
batching.byte_size
The number of bytes at which the batch is flushed. Set to 0 to disable size-based batching.
Type: int
Default: 0
batching.check
A Bloblang query that should return a boolean value indicating whether a message should end a batch.
Type: string
Default: ""
# Examples:
check: this.type == "end_of_transaction"
batching.count
The number of messages after which the batch is flushed. Set to 0 to disable count-based batching.
Type: int
Default: 0
batching.period
A period in which an incomplete batch should be flushed regardless of its size.
Type: string
Default: ""
# Examples:
period: 1s
# ---
period: 1m
# ---
period: 500ms
batching.processors[]
A list of processors to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.
Type: processor
# Examples:
processors:
- archive:
format: concatenate
# ---
processors:
- archive:
format: lines
# ---
processors:
- archive:
format: json_array
cache_control
The cache control to set for each object. This field supports interpolation functions.
Type: string
Default: ""
checksum_algorithm
The algorithm used to validate each object during its upload to the Amazon S3 bucket.
Type: string
Default: ""
Options: CRC32, CRC32C, SHA1, SHA256
content_disposition
The content disposition to set for each object. This field supports interpolation functions.
Type: string
Default: ""
content_encoding
An optional content encoding to set for each object. This field supports interpolation functions.
Type: string
Default: ""
content_language
The content language to set for each object. This field supports interpolation functions.
Type: string
Default: ""
content_type
The content type to set for each object. This field supports interpolation functions.
Type: string
Default: application/octet-stream
credentials
Optional manual configuration of AWS credentials to use. More information can be found in Amazon Web Services.
Type: object
credentials.from_ec2_role
Use the credentials of a host EC2 machine configured to assume an IAM role associated with the instance.
Requires version 4.2.0 or later.
Type: bool
credentials.secret
The secret for the credentials being used.
|
This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Secrets. |
Type: string
credentials.token
The token for the credentials being used, required when using short term credentials.
Type: string
force_path_style_urls
Forces the client API to use path style URLs, which helps when connecting to custom endpoints.
Type: bool
Default: false
max_in_flight
The maximum number of messages to have in flight at a given time. Increase this to improve throughput.
Type: int
Default: 64
metadata
Specify criteria for which metadata values are attached to objects as headers.
Type: object
metadata.exclude_prefixes[]
Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages.
Type: array
Default: []
object_canned_acl
The object canned ACL value. Leave empty to omit the ACL from upload requests, which is required for buckets that have ACLs disabled (the AWS default since 2023).
Type: string
Default: ""
Options: `, `private, public-read, public-read-write, authenticated-read, aws-exec-read, bucket-owner-read, bucket-owner-full-control
path
The path of each message to upload. This field supports interpolation functions.
Type: string
Default: ${!counter()}-${!timestamp_unix_nano()}.txt
# Examples:
path: ${!counter()}-${!timestamp_unix_nano()}.txt
# ---
path: ${!meta("kafka_key")}.json
# ---
path: ${!json("doc.namespace")}/${!json("doc.id")}.json
server_side_encryption
An optional server-side encryption algorithm.
Requires version 3.63.0 or later.
Type: string
Default: ""
storage_class
The storage class to set for each object. This field supports interpolation functions.
Type: string
Default: STANDARD
Options: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, DEEP_ARCHIVE
tags
Key/value pairs to store with the object as tags. This field supports interpolation functions.
Type: string
Default: {}
# Examples:
tags:
Key1: Value1
Timestamp: ${!meta("Timestamp")}
tcp
Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for:
-
High-latency networks: Increase
connect_timeoutto allow more time for connection establishment -
Long-lived connections: Configure
keep_alivesettings to detect and recover from stale connections -
Unstable networks: Tune keep-alive probes to balance between quick failure detection and avoiding false positives
-
Linux systems with specific requirements: Use
tcp_user_timeout(Linux 2.6.37+) to control data acknowledgment timeouts
Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements.
Type: object
tcp.connect_timeout
Maximum amount of time a dial will wait for a connect to complete. Zero disables.
Type: string
Default: 0s
tcp.keep_alive.count
Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9.
Type: int
Default: 9
tcp.keep_alive.idle
Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes.
Type: string
Default: 15s
tcp.keep_alive.interval
Duration between keep-alive probes. Zero defaults to 15s.
Type: string
Default: 15s
tcp.tcp_user_timeout
Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep_alive.idle must be greater than this value per RFC 5482. Zero disables.
Type: string
Default: 0s
timeout
The maximum period to wait on an upload before abandoning it and reattempting.
Type: string
Default: 5s
website_redirect_location
The website redirect location to set for each object. This field supports interpolation functions.
Type: string
Default: ""