# kafka

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [cloud-data-platform-full.txt](https://docs.redpanda.com/cloud-data-platform-full.txt)

---
title: kafka
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-connect-version: 4.93.0
latest-redpanda-tag: v26.1.9
docname: connect/components/inputs/kafka
page-component-name: cloud-data-platform
page-version: master
page-component-version: master
page-component-title: Cloud
page-relative-src-path: connect/components/inputs/kafka.adoc
page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/inputs/kafka.adoc
page-git-created-date: "2024-09-09"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/kafka.md -->

**Type:** Input ▼

[Input](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/kafka/)[Output](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/outputs/kafka/)

**Available in:** Cloud, [Self-Managed](https://docs.redpanda.com/connect/components/inputs/kafka/%20%22View%20the%20Self-Managed%20version%20of%20this%20component%22)

> ⚠️ **WARNING: Deprecated in 4.68.0**
>
> Deprecated in 4.68.0
>
> This component is deprecated and will be removed in the next major version release. Please consider moving onto the unified [`redpanda` input](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/redpanda/) and [`redpanda` output](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/outputs/redpanda/) components.

Connects to Kafka brokers and consumes one or more topics.

#### Common

```yml
inputs:
  label: ""
  kafka:
    addresses: [] # No default (required)
    topics: [] # No default (required)
    target_version: "" # No default (optional)
    consumer_group: ""
    checkpoint_limit: 1024
    auto_replay_nacks: true
```

#### Advanced

```yml
inputs:
  label: ""
  kafka:
    addresses: [] # No default (required)
    topics: [] # No default (required)
    target_version: "" # No default (optional)
    tls:
      enabled: false
      skip_cert_verify: false
      enable_renegotiation: false
      root_cas: ""
      root_cas_file: ""
      client_certs: []
    sasl:
      mechanism: none
      user: ""
      password: ""
      access_token: ""
      token_cache: ""
      token_key: ""
    consumer_group: ""
    client_id: benthos
    instance_id: "" # No default (optional)
    rack_id: ""
    start_from_oldest: true
    checkpoint_limit: 1024
    auto_replay_nacks: true
    timely_nacks_maximum_wait: "" # No default (optional)
    commit_period: 1s
    max_processing_period: 100ms
    extract_tracing_map: "" # No default (optional)
    group:
      session_timeout: 10s
      heartbeat_interval: 3s
      rebalance_timeout: 60s
    fetch_buffer_cap: 256
    multi_header: false
    batching:
      count: 0
      byte_size: 0
      period: ""
      check: ""
      processors: [] # No default (optional)
```

Offsets are managed within Kafka under the specified consumer group, and partitions for each topic are automatically balanced across members of the consumer group.

The Kafka input allows parallel processing of messages from different topic partitions, and messages of the same topic partition are processed with a maximum parallelism determined by the field [`checkpoint_limit`](#checkpoint_limit).

To enforce ordered processing of partition messages, set the [`checkpoint_limit`](#checkpoint_limit) to `1`, which makes sure that a message is only processed after the previous message is delivered.

Batching messages before processing can be enabled using the [`batching`](#batching) field, and this batching is performed per-partition such that messages of a batch will always originate from the same partition. This batching mechanism is capable of creating batches of greater size than the [`checkpoint_limit`](#checkpoint_limit), in which case the next batch will only be created upon delivery of the current one.

## [](#metadata)Metadata

This input adds the following metadata fields to each message:

-   kafka\_key

-   kafka\_topic

-   kafka\_partition

-   kafka\_offset

-   kafka\_lag

-   kafka\_timestamp\_ms

-   kafka\_timestamp\_unix

-   kafka\_tombstone\_message

-   All existing message headers (version 0.11+)


The field `kafka_lag` is the calculated difference between the high water mark offset of the partition at the time of ingestion and the current message offset.

You can access these metadata fields using [function interpolation](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/interpolation/#bloblang-queries).

## [](#ordering)Ordering

By default messages of a topic partition can be processed in parallel, up to a limit determined by the field `checkpoint_limit`. However, if strict ordered processing is required then this value must be set to 1 in order to process shard messages in lock-step. When doing so it is recommended that you perform batching at this component for performance as it will not be possible to batch lock-stepped messages at the output level.

## [](#troubleshooting)Troubleshooting

If you’re seeing issues writing to or reading from Kafka with this component then it’s worth trying out the newer [`kafka_franz` input](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/inputs/kafka_franz/).

-   I’m seeing logs that report `Failed to connect to kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`, but the brokers are definitely reachable.


Unfortunately this error message will appear for a wide range of connection problems even when the broker endpoint can be reached. Double check your authentication configuration and also ensure that you have [enabled TLS](#tlsenabled) if applicable.

## [](#fields)Fields

### [](#addresses)`addresses[]`

A list of broker addresses to connect to. If an item of the list contains commas it will be expanded into multiple addresses.

**Type**: `array`

```yaml
# Examples:
addresses:
  - "localhost:9092"

# ---

addresses:
  - "localhost:9041,localhost:9042"

# ---

addresses:
  - "localhost:9041"
  - "localhost:9042"
```

### [](#auto_replay_nacks)`auto_replay_nacks`

Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.

**Type**: `bool`

**Default**: `true`

### [](#batching)`batching`

Allows you to configure a [batching policy](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/batching/).

**Type**: `object`

```yaml
# Examples:
batching:
  byte_size: 5000
  count: 0
  period: 1s

# ---

batching:
  count: 10
  period: 1s

# ---

batching:
  check: this.contains("END BATCH")
  count: 0
  period: 1m
```

### [](#batching-byte_size)`batching.byte_size`

An amount of bytes at which the batch should be flushed. If `0` disables size based batching.

**Type**: `int`

**Default**: `0`

### [](#batching-check)`batching.check`

A [Bloblang query](https://docs.redpanda.com/cloud-data-platform/develop/connect/guides/bloblang/about/) that should return a boolean value indicating whether a message should end a batch.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
check: this.type == "end_of_transaction"
```

### [](#batching-count)`batching.count`

A number of messages at which the batch should be flushed. If `0` disables count based batching.

**Type**: `int`

**Default**: `0`

### [](#batching-period)`batching.period`

A period in which an incomplete batch should be flushed regardless of its size.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
period: 1s

# ---

period: 1m

# ---

period: 500ms
```

### [](#batching-processors)`batching.processors[]`

A list of [processors](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/processors/about/) to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op.

**Type**: `processor`

```yaml
# Examples:
processors:
  - archive:
      format: concatenate

# ---

processors:
  - archive:
      format: lines

# ---

processors:
  - archive:
      format: json_array
```

### [](#checkpoint_limit)`checkpoint_limit`

The maximum number of messages of the same topic and partition that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level to work on individual partitions. Any given offset will not be committed unless all messages under that offset are delivered in order to preserve at least once delivery guarantees.

**Type**: `int`

**Default**: `1024`

### [](#client_id)`client_id`

An identifier for the client connection.

**Type**: `string`

**Default**: `benthos`

### [](#commit_period)`commit_period`

The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown.

**Type**: `string`

**Default**: `1s`

### [](#consumer_group)`consumer_group`

An identifier for the consumer group of the connection. This field can be explicitly made empty in order to disable stored offsets for the consumed topic partitions.

**Type**: `string`

**Default**: `""`

### [](#extract_tracing_map)`extract_tracing_map`

EXPERIMENTAL: A [Bloblang mapping](https://docs.redpanda.com/cloud-data-platform/develop/connect/guides/bloblang/about/) that attempts to extract an object containing tracing propagation information, which will then be used as the root tracing span for the message. The specification of the extracted fields must match the format used by the service wide tracer.

**Type**: `string`

```yaml
# Examples:
extract_tracing_map: root = @

# ---

extract_tracing_map: root = this.meta.span
```

### [](#fetch_buffer_cap)`fetch_buffer_cap`

The maximum number of unprocessed messages to fetch at a given time.

**Type**: `int`

**Default**: `256`

### [](#group)`group`

Tuning parameters for consumer group synchronization.

**Type**: `object`

### [](#group-heartbeat_interval)`group.heartbeat_interval`

A period in which heartbeats should be sent out.

**Type**: `string`

**Default**: `3s`

### [](#group-rebalance_timeout)`group.rebalance_timeout`

A period after which rebalancing is abandoned if unresolved.

**Type**: `string`

**Default**: `60s`

### [](#group-session_timeout)`group.session_timeout`

A period after which a consumer of the group is kicked after no heartbeats.

**Type**: `string`

**Default**: `10s`

### [](#instance_id)`instance_id`

When you specify a [`consumer_group`](#consumer_group), assign a unique value to `instance_id` to help brokers identify each input after restarts and prevent unnecessary rebalances.

**Type**: `string`

### [](#max_processing_period)`max_processing_period`

A maximum estimate for the time taken to process a message, this is used for tuning consumer group synchronization.

**Type**: `string`

**Default**: `100ms`

### [](#multi_header)`multi_header`

Decode headers into lists to allow handling of multiple values with the same key

**Type**: `bool`

**Default**: `false`

### [](#rack_id)`rack_id`

A rack identifier for this client.

**Type**: `string`

**Default**: `""`

### [](#sasl)`sasl`

Enables SASL authentication.

**Type**: `object`

### [](#sasl-access_token)`sasl.access_token`

A static OAUTHBEARER access token

**Type**: `string`

**Default**: `""`

### [](#sasl-mechanism)`sasl.mechanism`

The SASL authentication mechanism, if left empty SASL authentication is not used.

**Type**: `string`

**Default**: `none`

| Option | Summary |
| --- | --- |
| OAUTHBEARER | OAuth Bearer based authentication. |
| PLAIN | Plain text authentication. NOTE: When using plain text auth it is extremely likely that you’ll also need to enable TLS. |
| SCRAM-SHA-256 | Authentication using the SCRAM-SHA-256 mechanism. |
| SCRAM-SHA-512 | Authentication using the SCRAM-SHA-512 mechanism. |
| none | Default, no SASL authentication. |

### [](#sasl-password)`sasl.password`

A PLAIN password. It is recommended that you use environment variables to populate this field.

> ⚠️ **CAUTION**
>
> This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/secret-management/) before adding it to your configuration.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
password: ${PASSWORD}
```

### [](#sasl-token_cache)`sasl.token_cache`

Instead of using a static `access_token` allows you to query a [`cache`](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/caches/about/) resource to fetch OAUTHBEARER tokens from

**Type**: `string`

**Default**: `""`

### [](#sasl-token_key)`sasl.token_key`

Required when using a `token_cache`, the key to query the cache with for tokens.

**Type**: `string`

**Default**: `""`

### [](#sasl-user)`sasl.user`

A PLAIN username. It is recommended that you use environment variables to populate this field.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
user: ${USER}
```

### [](#start_from_oldest)`start_from_oldest`

Determines whether to consume from the oldest available offset, otherwise messages are consumed from the latest offset. The setting is applied when creating a new consumer group or the saved offset no longer exists.

**Type**: `bool`

**Default**: `true`

### [](#target_version)`target_version`

The version of the Kafka protocol to use. This limits the capabilities used by the client and should ideally match the version of your brokers. Defaults to the oldest supported stable version.

**Type**: `string`

```yaml
# Examples:
target_version: 2.1.0

# ---

target_version: 3.1.0
```

### [](#timely_nacks_maximum_wait)`timely_nacks_maximum_wait`

EXPERIMENTAL: Specify a maximum period of time in which each message can be consumed and awaiting either acknowledgement or rejection before rejection is instead forced. This can be useful for avoiding situations where certain downstream components can result in blocked confirmation of delivery that exceeds SLAs. Accepts Go duration format strings such as `100ms`, `1s`, or `5s`.

**Type**: `string`

### [](#tls)`tls`

Custom TLS settings can be used to override system defaults.

**Type**: `object`

### [](#tls-client_certs)`tls.client_certs[]`

A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both.

**Type**: `object`

**Default**: `[]`

```yaml
# Examples:
client_certs:
  - cert: foo
    key: bar

# ---

client_certs:
  - cert_file: ./example.pem
    key_file: ./example.key
```

### [](#tls-client_certs-cert)`tls.client_certs[].cert`

A plain text certificate to use.

**Type**: `string`

**Default**: `""`

### [](#tls-client_certs-cert_file)`tls.client_certs[].cert_file`

The path of a certificate to use.

**Type**: `string`

**Default**: `""`

### [](#tls-client_certs-key)`tls.client_certs[].key`

A plain text certificate key to use.

> ⚠️ **CAUTION**
>
> This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/secret-management/) before adding it to your configuration.

**Type**: `string`

**Default**: `""`

### [](#tls-client_certs-key_file)`tls.client_certs[].key_file`

The path of a certificate key to use.

**Type**: `string`

**Default**: `""`

### [](#tls-client_certs-password)`tls.client_certs[].password`

A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format.

Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext.

> ⚠️ **CAUTION**
>
> This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/secret-management/) before adding it to your configuration.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
password: foo

# ---

password: ${KEY_PASSWORD}
```

### [](#tls-enable_renegotiation)`tls.enable_renegotiation`

Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you’re seeing the error message `local error: tls: no renegotiation`.

**Type**: `bool`

**Default**: `false`

### [](#tls-enabled)`tls.enabled`

Whether custom TLS settings are enabled.

**Type**: `bool`

**Default**: `false`

### [](#tls-root_cas)`tls.root_cas`

An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate.

> ⚠️ **CAUTION**
>
> This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see [Manage Secrets](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/secret-management/) before adding it to your configuration.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
root_cas: |-
  -----BEGIN CERTIFICATE-----
  ...
  -----END CERTIFICATE-----
```

### [](#tls-root_cas_file)`tls.root_cas_file`

An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate.

**Type**: `string`

**Default**: `""`

```yaml
# Examples:
root_cas_file: ./root_cas.pem
```

### [](#tls-skip_cert_verify)`tls.skip_cert_verify`

Whether to skip server side certificate verification.

**Type**: `bool`

**Default**: `false`

### [](#topics)`topics[]`

A list of topics to consume from. Multiple comma separated topics can be listed in a single element. Partitions are automatically distributed across consumers of a topic. Alternatively, it’s possible to specify explicit partitions to consume from with a colon after the topic name, e.g. `foo:0` would consume the partition 0 of the topic foo. This syntax supports ranges, e.g. `foo:0-10` would consume partitions 0 through to 10 inclusive.

**Type**: `array`

```yaml
# Examples:
topics:
  - foo
  - bar

# ---

topics:
  - "foo,bar"

# ---

topics:
  - "foo:0"
  - "bar:1"
  - "bar:3"

# ---

topics:
  - "foo:0,bar:1,bar:3"

# ---

topics:
  - "foo:0-5"
```