Collapse

aws_s3

Available in: Cloud, Self-Managed

Sends message parts as objects to an Amazon S3 bucket. Each object is uploaded with the path specified with the path field.

Common
Advanced

outputs:
  label: ""
  aws_s3:
    bucket: "" # No default (required)
    path: ${!counter()}-${!timestamp_unix_nano()}.txt
    tags: {}
    content_type: application/octet-stream
    metadata:
      exclude_prefixes: []
    max_in_flight: 64
    batching:
      count: 0
      byte_size: 0
      period: ""
      check: ""
      processors: [] # No default (optional)

outputs:
  label: ""
  aws_s3:
    bucket: "" # No default (required)
    path: ${!counter()}-${!timestamp_unix_nano()}.txt
    tags: {}
    content_type: application/octet-stream
    content_encoding: ""
    cache_control: ""
    content_disposition: ""
    content_language: ""
    content_md5: ""
    website_redirect_location: ""
    metadata:
      exclude_prefixes: []
    storage_class: STANDARD
    kms_key_id: ""
    checksum_algorithm: ""
    server_side_encryption: ""
    force_path_style_urls: false
    max_in_flight: 64
    timeout: 5s
    object_canned_acl: private
    batching:
      count: 0
      byte_size: 0
      period: ""
      check: ""
      processors: [] # No default (optional)
    region: "" # No default (optional)
    endpoint: "" # No default (optional)
    tcp:
      connect_timeout: 0s
      keep_alive:
        idle: 15s
        interval: 15s
        count: 9
      tcp_user_timeout: 0s
    credentials:
      profile: "" # No default (optional)
      id: "" # No default (optional)
      secret: "" # No default (optional)
      token: "" # No default (optional)
      from_ec2_role: "" # No default (optional)
      role: "" # No default (optional)
      role_external_id: "" # No default (optional)

In order to have a different path for each object you should use function interpolations described in Bloblang queries, which are calculated per message of a batch.

Metadata

Metadata fields on messages will be sent as headers, in order to mutate these values (or remove them) check out the metadata docs.

Batching

It’s common to want to upload messages to S3 as batched archives. The easiest way to do this is to batch your messages at the output level and join the batch of messages with an archive or compress processor.

For example, the following configuration uploads messages as a .tar.gz archive of documents:

output:
  aws_s3:
    bucket: TODO
    path: ${!counter()}-${!timestamp_unix_nano()}.tar.gz
    batching:
      count: 100
      period: 10s
      processors:
        - archive:
            format: tar
        - compress:
            algorithm: gzip

Alternatively, this configuration uploads JSON documents as a single large document containing an array of objects:

output:
  aws_s3:
    bucket: TODO
    path: ${!counter()}-${!timestamp_unix_nano()}.json
    batching:
      count: 100
      processors:
        - archive:
            format: json_array

Performance

This output benefits from sending multiple messages in flight in parallel for improved performance. You can tune the max number of in flight messages (or message batches) with the field max_in_flight.

Fields

`batching`

Allows you to configure a batching policy.

Type: object

# Examples:
batching:
  byte_size: 5000
  count: 0
  period: 1s

# ---

batching:
  count: 10
  period: 1s

# ---

batching:
  check: this.contains("END BATCH")
  count: 0
  period: 1m

`batching.byte_size`

An amount of bytes at which the batch should be flushed. If 0 disables size based batching.

Type: int

Default: 0

`batching.check`

A Bloblang query that should return a boolean value indicating whether a message should end a batch.

Type: string

Default: ""

# Examples:
check: this.type == "end_of_transaction"

`batching.count`

A number of messages at which the batch should be flushed. If 0 disables count based batching.

Type: int

Default: 0

`batching.period`

A period in which an incomplete batch should be flushed regardless of its size.

Type: string

Default: ""

# Examples:
period: 1s

# ---

period: 1m

# ---

period: 500ms

`batching.processors[]`

A list of processors to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.

Type: processor

# Examples:
processors:
  - archive:
      format: concatenate


# ---

processors:
  - archive:
      format: lines


# ---

processors:
  - archive:
      format: json_array

`bucket`

The bucket to upload messages to.

Type: string

`cache_control`

The cache control to set for each object. This field supports interpolation functions.

Type: string

Default: ""

`checksum_algorithm`

The algorithm used to validate each object during its upload to the Amazon S3 bucket.

Type: string

Default: ""

Options: CRC32, CRC32C, SHA1, SHA256

`content_disposition`

The content disposition to set for each object. This field supports interpolation functions.

Type: string

Default: ""

`content_encoding`

An optional content encoding to set for each object. This field supports interpolation functions.

Type: string

Default: ""

`content_language`

The content language to set for each object. This field supports interpolation functions.

Type: string

Default: ""

`content_md5`

The content MD5 to set for each object. This field supports interpolation functions.

Type: string

Default: ""

`content_type`

The content type to set for each object. This field supports interpolation functions.

Type: string

Default: application/octet-stream

`credentials`

Optional manual configuration of AWS credentials to use. More information can be found in Amazon Web Services.

Type: object

`credentials.from_ec2_role`

Use the credentials of a host EC2 machine configured to assume an IAM role associated with the instance.

Type: bool

`credentials.id`

The ID of credentials to use.

Type: string

`credentials.profile`

A profile from ~/.aws/credentials to use.

Type: string

`credentials.role`

A role ARN to assume.

Type: string

`credentials.role_external_id`

An external ID to provide when assuming a role.

Type: string

`credentials.secret`

The secret for the credentials being used.

This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Manage Secrets before adding it to your configuration.

Type: string

`credentials.token`

The token for the credentials being used, required when using short term credentials.

Type: string

`endpoint`

Allows you to specify a custom endpoint for the AWS API.

Type: string

`force_path_style_urls`

Forces the client API to use path style URLs, which helps when connecting to custom endpoints.

Type: bool

Default: false

`kms_key_id`

An optional server-side encryption key.

Type: string

Default: ""

`max_in_flight`

The maximum number of messages to have in flight at a given time. Increase this to improve throughput.

Type: int

Default: 64

`metadata`

Specify criteria for which metadata values are attached to objects as headers.

Type: object

`metadata.exclude_prefixes[]`

Provide a list of explicit metadata key prefixes to be excluded when adding metadata to sent messages.

Type: array

Default: []

`object_canned_acl`

The object canned ACL value.

Type: string

Default: private

Options: private, public-read, public-read-write, authenticated-read, aws-exec-read, bucket-owner-read, bucket-owner-full-control

`path`

The path of each message to upload. This field supports interpolation functions.

Type: string

Default: ${!counter()}-${!timestamp_unix_nano()}.txt

# Examples:
path: ${!counter()}-${!timestamp_unix_nano()}.txt

# ---

path: ${!meta("kafka_key")}.json

# ---

path: ${!json("doc.namespace")}/${!json("doc.id")}.json

`region`

The AWS region to target.

Type: string

`server_side_encryption`

An optional server-side encryption algorithm.

Type: string

Default: ""

`storage_class`

The storage class to set for each object. This field supports interpolation functions.

Type: string

Default: STANDARD

Options: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, DEEP_ARCHIVE

`tags`

Key/value pairs to store with the object as tags. This field supports interpolation functions.

Type: string

Default: {}

# Examples:
tags:
  Key1: Value1
  Timestamp: ${!meta("Timestamp")}

`tcp`

Configure TCP socket-level settings to optimize network performance and reliability. These low-level controls are useful for:

High-latency networks: Increase connect_timeout to allow more time for connection establishment
Long-lived connections: Configure keep_alive settings to detect and recover from stale connections
Unstable networks: Tune keep-alive probes to balance between quick failure detection and avoiding false positives
Linux systems with specific requirements: Use tcp_user_timeout (Linux 2.6.37+) to control data acknowledgment timeouts

Most users should keep the default values. Only modify these settings if you’re experiencing connection stability issues or have specific network requirements.

Type: object

`tcp.connect_timeout`

Maximum amount of time a dial will wait for a connect to complete. Zero disables.

Type: string

Default: 0s

`tcp.keep_alive`

TCP keep-alive probe configuration.

Type: object

`tcp.keep_alive.count`

Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9.

Type: int

Default: 9

`tcp.keep_alive.idle`

Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes.

Type: string

Default: 15s

`tcp.keep_alive.interval`

Duration between keep-alive probes. Zero defaults to 15s.

Type: string

Default: 15s

`tcp.tcp_user_timeout`

Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep_alive.idle must be greater than this value per RFC 5482. Zero disables.

Type: string

Default: 0s

`timeout`

The maximum period to wait on an upload before abandoning it and reattempting.

Type: string

Default: 5s

`website_redirect_location`

The website redirect location to set for each object. This field supports interpolation functions.

Type: string

Default: ""

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

aws_s3

Metadata

Tags

Credentials

Batching

Performance

Fields

batching

batching.byte_size

batching.check

batching.count

batching.period

batching.processors[]

bucket

cache_control

checksum_algorithm

content_disposition

content_encoding

content_language

content_md5

content_type

credentials

credentials.from_ec2_role

credentials.id

credentials.profile

credentials.role

credentials.role_external_id

credentials.secret

credentials.token

endpoint

force_path_style_urls

kms_key_id

max_in_flight

metadata

metadata.exclude_prefixes[]

object_canned_acl

path

region

server_side_encryption

storage_class

tags

tcp

tcp.connect_timeout

tcp.keep_alive

tcp.keep_alive.count

tcp.keep_alive.idle

tcp.keep_alive.interval

tcp.tcp_user_timeout

timeout

website_redirect_location

Simple online edits

Contribution guide

`batching`

`batching.byte_size`

`batching.check`

`batching.count`

`batching.period`

`batching.processors[]`

`bucket`

`cache_control`

`checksum_algorithm`

`content_disposition`

`content_encoding`

`content_language`

`content_md5`

`content_type`

`credentials`

`credentials.from_ec2_role`

`credentials.id`

`credentials.profile`

`credentials.role`

`credentials.role_external_id`

`credentials.secret`

`credentials.token`

`endpoint`

`force_path_style_urls`

`kms_key_id`

`max_in_flight`

`metadata`

`metadata.exclude_prefixes[]`

`object_canned_acl`

`path`

`region`

`server_side_encryption`

`storage_class`

`tags`

`tcp`

`tcp.connect_timeout`

`tcp.keep_alive`

`tcp.keep_alive.count`

`tcp.keep_alive.idle`

`tcp.keep_alive.interval`

`tcp.tcp_user_timeout`

`timeout`

`website_redirect_location`