# aws_s3

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [cloud-data-platform-full.txt](https://docs.redpanda.com/cloud-data-platform-full.txt)

---
title: aws_s3
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-connect-version: 4.93.0
latest-redpanda-tag: v26.1.9
docname: connect/components/inputs/aws_s3
page-component-name: cloud-data-platform
page-version: master
page-component-version: master
page-component-title: Cloud
page-relative-src-path: connect/components/inputs/aws_s3.adoc
page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/aws_s3.adoc
page-git-created-date: "2024-09-09"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/aws_s3.md -->

**Type:** Input ▼

[Input](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/aws_s3/)[Cache](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/caches/aws_s3/)[Output](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/outputs/aws_s3/)

**Available in:** Cloud, [Self-Managed](https://docs.redpanda.com/connect/components/inputs/aws_s3/%20%22View%20the%20Self-Managed%20version%20of%20this%20component%22)

Downloads objects within an Amazon S3 bucket, optionally filtered by a prefix, either by walking the items in the bucket or by streaming upload notifications in real time.

#### Common

```yml
inputs:
  label: ""
  aws_s3:
    bucket: ""
    prefix: ""
    scanner:
      to_the_end: {}
    sqs:
      url: ""
      endpoint: ""
      key_path: Records.*.s3.object.key
      bucket_path: Records.*.s3.bucket.name
      envelope_path: ""
      delay_period: ""
      max_messages: 10
      wait_time_seconds: 0
      nack_visibility_timeout: 0
```

#### Advanced

```yml
inputs:
  label: ""
  aws_s3:
    bucket: ""
    prefix: ""
    region: "" # No default (optional)
    endpoint: "" # No default (optional)
    tcp:
      connect_timeout: 0s
      keep_alive:
        idle: 15s
        interval: 15s
        count: 9
      tcp_user_timeout: 0s
    credentials:
      profile: "" # No default (optional)
      id: "" # No default (optional)
      secret: "" # No default (optional)
      token: "" # No default (optional)
      from_ec2_role: "" # No default (optional)
      role: "" # No default (optional)
      role_external_id: "" # No default (optional)
    force_path_style_urls: false
    delete_objects: false
    scanner:
      to_the_end: {}
    sqs:
      url: ""
      endpoint: ""
      key_path: Records.*.s3.object.key
      bucket_path: Records.*.s3.bucket.name
      envelope_path: ""
      delay_period: ""
      max_messages: 10
      wait_time_seconds: 0
      nack_visibility_timeout: 0
```

## [](#stream-objects-on-upload-with-sqs)Stream objects on upload with SQS

A common pattern for consuming S3 objects is to emit upload notification events from the bucket either directly to an SQS queue, or to an SNS topic that is consumed by an SQS queue, and then have your consumer listen for events that prompt it to download the newly uploaded objects. More information about this pattern and how to set it up can be found in the [Amazon S3 docs](https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html).

Redpanda Connect is able to follow this pattern when you configure an `sqs.url`, where it consumes events from SQS and downloads only the object keys contained in those events. For this to work, Redpanda Connect needs to know where within the event the key and bucket names can be found, specified as [dot paths](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/field_paths/) with the fields `sqs.key_path` and `sqs.bucket_path`. The default values for these fields should already be correct when following the guide above.

If your notification events are being routed to SQS via an SNS topic, the events are enveloped by SNS, in which case you also need to specify the field `sqs.envelope_path`, which in the case of SNS to SQS will usually be `Message`.

When using SQS, make sure you have sensible values for `sqs.max_messages` and also the visibility timeout of the queue itself. When Redpanda Connect consumes an S3 object the SQS message that triggered it is not deleted until the S3 object has been sent onwards. This ensures at-least-once crash resiliency, but also means that if the S3 object takes longer to process than the visibility timeout of your queue, then the same objects might be processed multiple times.

## [](#download-large-files)Download large files

When downloading large files, process them in streamed parts to avoid loading the entire file into memory at once. To do this, specify a [`scanner`](#scanner) that determines how to break the input into smaller individual messages.

## [](#bucket-and-prefix)Bucket and prefix

The `bucket` field accepts a bucket name only, not an ARN. For example, use `my-bucket`, not `arn:aws:s3:::my-bucket`.

The `prefix` field accepts a single string. To consume from multiple prefixes in the same bucket, use multiple `aws_s3` inputs in a [`broker` input](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/broker/):

```yaml
input:
  broker:
    inputs:
      - aws_s3:
          bucket: my-bucket
          prefix: logs/app1/
      - aws_s3:
          bucket: my-bucket
          prefix: logs/app2/
```

## [](#credentials)Credentials

By default, Redpanda Connect uses a shared credentials file when connecting to AWS services. You can also set credentials explicitly at the component level to transfer data across accounts. You can find out more in [AWS credentials](https://docs.redpanda.com/cloud-data-platform/develop/connect/guides/cloud/aws/).

## [](#s3-compatible-storage)S3-compatible storage

The `endpoint` and `force_path_style_urls` fields let you connect to S3-compatible storage services such as Cloudflare R2, MinIO, or DigitalOcean Spaces.

For Cloudflare R2, set `endpoint` to your account endpoint URL and enable `force_path_style_urls`:

```yaml
input:
  aws_s3:
    bucket: r2-bucket
    endpoint: https://<account-id>.r2.cloudflarestorage.com
    force_path_style_urls: true
    region: auto
    credentials:
      id: <r2-access-key-id>
      secret: <r2-secret-access-key>
```

Find your account ID in the Cloudflare dashboard under **R2 > Overview > Account Details**. Generate API credentials under **R2 > Manage R2 API Tokens**.

## [](#metadata)Metadata

This input adds the following metadata fields to each message:

-   s3\_key

-   s3\_bucket

-   s3\_last\_modified\_unix

-   s3\_last\_modified (RFC3339)

-   s3\_content\_type

-   s3\_content\_encoding

-   s3\_version\_id

-   All user defined metadata


You can access these metadata fields using [function interpolation](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/interpolation/#bloblang-queries). User-defined metadata is case insensitive in AWS, so keys are often received in capitalized form. To normalize them, map all metadata keys to lowercase or uppercase using a Bloblang mapping such as `meta = meta().map_each_key(key → key.lowercase())`.

## [](#fields)Fields

### [](#bucket)`bucket`

The bucket to consume from. If the field `sqs.url` is specified this field is optional.

**Type**: `string`

**Default**: `""`

### [](#credentials-2)`credentials`

Optional manual configuration of AWS credentials to use. More information can be found in [Amazon Web Services](https://docs.redpanda.com/cloud-data-platform/develop/connect/guides/cloud/aws/).

**Type**: `object`

### [](#credentials-from_ec2_role)`credentials.from_ec2_role`

Use the credentials of a host EC2 machine configured to assume [an IAM role associated with the instance](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html).

**Type**: `bool`

### [](#credentials-id)`credentials.id`

The ID of credentials to use.

**Type**: `string`

### [](#credentials-profile)`credentials.profile`

A profile from `~/.aws/credentials` to use.

**Type**: `string`

### [](#credentials-role)`credentials.role`

A role ARN to assume.

**Type**: `string`

### [](#credentials-role_external_id)`credentials.role_external_id`

An external ID to provide when assuming a role.

**Type**: `string`

### [](#credentials-secret)`credentials.secret`

The secret for the credentials being used.

> ⚠️ **CAUTION**
>
> This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/secret-management/) before adding it to your configuration.

**Type**: `string`

### [](#credentials-token)`credentials.token`

The token for the credentials being used, required when using short term credentials.

**Type**: `string`

### [](#delete_objects)`delete_objects`

Whether to delete downloaded objects from the bucket once they are processed.

**Type**: `bool`

**Default**: `false`

### [](#endpoint)`endpoint`

Allows you to specify a custom endpoint for the AWS API.

**Type**: `string`

### [](#force_path_style_urls)`force_path_style_urls`

Forces the client API to use path style URLs for downloading keys, which is often required when connecting to custom endpoints.

**Type**: `bool`

**Default**: `false`

### [](#prefix)`prefix`

An optional path prefix, if set only objects with the prefix are consumed when walking a bucket.

**Type**: `string`

**Default**: `""`

### [](#region)`region`

The AWS region to target.

**Type**: `string`

### [](#scanner)`scanner`

The [scanner](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/scanners/about/) by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the `csv` scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.

**Type**: `scanner`

**Default**:

```yaml
to_the_end: {}
```

### [](#sqs)`sqs`

Consume SQS messages in order to trigger key downloads.

**Type**: `object`

### [](#sqs-bucket_path)`sqs.bucket_path`

A [dot path](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/field_paths/) whereby the bucket name can be found in SQS messages.

**Type**: `string`

**Default**: `Records.*.s3.bucket.name`

### [](#sqs-delay_period)`sqs.delay_period`

An optional period of time to wait from when a notification was originally sent to when the target key download is attempted.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
delay_period: 10s

# ---

delay_period: 5m
```

### [](#sqs-endpoint)`sqs.endpoint`

A custom endpoint to use when connecting to SQS.

**Type**: `string`

**Default**: `""`

### [](#sqs-envelope_path)`sqs.envelope_path`

A [dot path](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/field_paths/) of a field to extract an enveloped JSON payload for further extracting the key and bucket from SQS messages. This is specifically useful when subscribing an SQS queue to an SNS topic that receives bucket events.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
envelope_path: Message
```

### [](#sqs-key_path)`sqs.key_path`

A [dot path](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/field_paths/) whereby object keys are found in SQS messages.

**Type**: `string`

**Default**: `Records.*.s3.object.key`

### [](#sqs-max_messages)`sqs.max_messages`

The maximum number of SQS messages to consume from each request.

**Type**: `int`

**Default**: `10`

### [](#sqs-nack_visibility_timeout)`sqs.nack_visibility_timeout`

Custom SQS Nack Visibility timeout in seconds. Default is 0

**Type**: `int`

**Default**: `0`

### [](#sqs-url)`sqs.url`

An optional SQS URL to connect to. When specified this queue will control which objects are downloaded.

**Type**: `string`

**Default**: `""`

### [](#sqs-wait_time_seconds)`sqs.wait_time_seconds`

Whether to set the wait time. Enabling this activates long-polling. Valid values: 0 to 20.

**Type**: `int`

**Default**: `0`

### [](#tcp)`tcp`

TCP socket configuration.

**Type**: `object`

### [](#tcp-connect_timeout)`tcp.connect_timeout`

Maximum amount of time a dial will wait for a connect to complete. Zero disables.

**Type**: `string`

**Default**: `0s`

### [](#tcp-keep_alive)`tcp.keep_alive`

TCP keep-alive probe configuration.

**Type**: `object`

### [](#tcp-keep_alive-count)`tcp.keep_alive.count`

Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9.

**Type**: `int`

**Default**: `9`

### [](#tcp-keep_alive-idle)`tcp.keep_alive.idle`

Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes.

**Type**: `string`

**Default**: `15s`

### [](#tcp-keep_alive-interval)`tcp.keep_alive.interval`

Duration between keep-alive probes. Zero defaults to 15s.

**Type**: `string`

**Default**: `15s`

### [](#tcp-tcp_user_timeout)`tcp.tcp_user_timeout`

Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep\_alive.idle must be greater than this value per RFC 5482. Zero disables.

**Type**: `string`

**Default**: `0s`