# csv

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: csv
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: inputs/csv
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: inputs/csv.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/inputs/csv.adoc
page-git-created-date: "2024-05-24"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/inputs/csv.md -->

**Type:** Input ▼

[Input](https://docs.redpanda.com/connect/components/inputs/csv/)[Scanner](https://docs.redpanda.com/connect/components/scanners/csv/)

**Available in:** Self-Managed

Reads one or more CSV files as structured records following the format described in RFC 4180.

#### Common

```yml
inputs:
  label: ""
  csv:
    paths: [] # No default (required)
    parse_header_row: true
    delimiter: ,
    lazy_quotes: false
    auto_replay_nacks: true
```

#### Advanced

```yml
inputs:
  label: ""
  csv:
    paths: [] # No default (required)
    parse_header_row: true
    delimiter: ,
    lazy_quotes: false
    delete_on_finish: false
    batch_count: 1
    auto_replay_nacks: true
```

This input offers more control over CSV parsing than the [`file` input](https://docs.redpanda.com/connect/components/inputs/file/).

When parsing with a header row each line of the file will be consumed as a structured object, where the key names are determined from the header now. For example, the following CSV file:

```csv
foo,bar,baz
first foo,first bar,first baz
second foo,second bar,second baz
```

Would produce the following messages:

```json
{"foo":"first foo","bar":"first bar","baz":"first baz"}
{"foo":"second foo","bar":"second bar","baz":"second baz"}
```

If, however, the field `parse_header_row` is set to `false` then arrays are produced instead, like follows:

```json
["first foo","first bar","first baz"]
["second foo","second bar","second baz"]
```

## [](#metadata)Metadata

This input adds the following metadata fields to each message:

```text
- header
- path
- mod_time_unix
- mod_time (RFC3339)
```

You can access these metadata fields using [function interpolation](https://docs.redpanda.com/connect/configuration/interpolation/#bloblang-queries).

Note: The `header` field is only set when `parse_header_row` is `true`.

### [](#output-csv-column-order)Output CSV column order

When [creating CSV](https://docs.redpanda.com/connect/guides/bloblang/advanced/#creating-csv) from Redpanda Connect messages, the columns must be sorted lexicographically to make the output deterministic. Alternatively, when using the `csv` input, one can leverage the `header` metadata field to retrieve the column order:

```yaml
input:
  csv:
    paths:
      - ./foo.csv
      - ./bar.csv
    parse_header_row: true

  processors:
    - mapping: |
        map escape_csv {
          root = if this.re_match("[\"\n,]+") {
            "\"" + this.replace_all("\"", "\"\"") + "\""
          } else {
            this
          }
        }

        let header = if count(@path) == 1 {
          @header.map_each(c -> c.apply("escape_csv")).join(",") + "\n"
        } else { "" }

        root = $header + @header.map_each(c -> this.get(c).string().apply("escape_csv")).join(",")

output:
  file:
    path: ./output/${! @path.filepath_split().index(-1) }
```

## [](#fields)Fields

### [](#auto_replay_nacks)`auto_replay_nacks`

Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.

**Type**: `bool`

**Default**: `true`

### [](#batch_count)`batch_count`

Optionally process records in batches. This can help to speed up the consumption of exceptionally large CSV files. When the end of the file is reached the remaining records are processed as a (potentially smaller) batch.

**Type**: `int`

**Default**: `1`

### [](#delete_on_finish)`delete_on_finish`

Whether to delete input files from the disk once they are fully consumed.

**Type**: `bool`

**Default**: `false`

### [](#delimiter)`delimiter`

The delimiter to use for splitting values in each record. It must be a single character.

**Type**: `string`

**Default**: `,`

### [](#lazy_quotes)`lazy_quotes`

If set to `true`, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.

Requires version 4.1.0 or later.

**Type**: `bool`

**Default**: `false`

### [](#parse_header_row)`parse_header_row`

Whether to reference the first row as a header row. If set to true the output structure for messages will be an object where field keys are determined by the header row. Otherwise, each message will consist of an array of values from the corresponding CSV row.

**Type**: `bool`

**Default**: `true`

### [](#paths)`paths[]`

A list of file paths to read from. Each file will be read sequentially until the list is exhausted, at which point the input will close. Glob patterns are supported, including super globs (double star).

**Type**: `array`

```yaml
# Examples:
paths:
  - /tmp/foo.csv
  - /tmp/bar/*.csv
  - /tmp/data/**/*.csv
```