# file

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: file
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: inputs/file
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: inputs/file.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/inputs/file.adoc
page-git-created-date: "2024-05-24"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/inputs/file.md -->

**Type:** Input ▼

[Input](https://docs.redpanda.com/connect/components/inputs/file/)[Cache](https://docs.redpanda.com/connect/components/caches/file/)[Output](https://docs.redpanda.com/connect/components/outputs/file/)

**Available in:** Self-Managed

Consumes data from files on disk, emitting messages according to a chosen codec.

#### Common

```yml
inputs:
  label: ""
  file:
    paths: [] # No default (required)
    scanner:
      lines: {}
    auto_replay_nacks: true
```

#### Advanced

```yml
inputs:
  label: ""
  file:
    paths: [] # No default (required)
    scanner:
      lines: {}
    delete_on_finish: false
    auto_replay_nacks: true
```

## [](#metadata)Metadata

This input adds the following metadata fields to each message:

```text
- path
- mod_time_unix
- mod_time (RFC3339)
```

You can access these metadata fields using [function interpolation](https://docs.redpanda.com/connect/configuration/interpolation/#bloblang-queries).

## [](#fields)Fields

### [](#auto_replay_nacks)`auto_replay_nacks`

Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.

**Type**: `bool`

**Default**: `true`

### [](#delete_on_finish)`delete_on_finish`

Whether to delete input files from the disk once they are fully consumed.

**Type**: `bool`

**Default**: `false`

### [](#paths)`paths[]`

A list of paths to consume sequentially. Glob patterns are supported, including super globs (double star).

**Type**: `array`

### [](#scanner)`scanner`

The [scanner](https://docs.redpanda.com/connect/components/scanners/about/) by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the `csv` scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.

Requires version 4.25.0 or later.

**Type**: `scanner`

**Default**:

```yaml
lines: {}
```

## [](#examples)Examples

### [](#read-a-bunch-of-csvs)Read a Bunch of CSVs

If we wished to consume a directory of CSV files as structured documents we can use a glob pattern and the `csv` scanner:

```yaml
input:
  file:
    paths: [ ./data/*.csv ]
    scanner:
      csv: {}
```