# parquet

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: parquet
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: inputs/parquet
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: inputs/parquet.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/inputs/parquet.adoc
page-git-created-date: "2024-05-24"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/inputs/parquet.md -->

**Type:** Input ▼

[Input](https://docs.redpanda.com/connect/components/inputs/parquet/)[Processor](https://docs.redpanda.com/connect/components/processors/parquet/)

**Available in:** Self-Managed

Reads and decodes [Parquet files](https://parquet.apache.org/docs/) into a stream of structured messages.

Introduced in version 4.8.0.

#### Common

```yml
inputs:
  label: ""
  parquet:
    paths: [] # No default (required)
    auto_replay_nacks: true
```

#### Advanced

```yml
inputs:
  label: ""
  parquet:
    paths: [] # No default (required)
    batch_count: 1
    auto_replay_nacks: true
```

This input uses [https://github.com/parquet-go/parquet-go](https://github.com/parquet-go/parquet-go), which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases.

By default any BYTE\_ARRAY or FIXED\_LEN\_BYTE\_ARRAY value will be extracted as a byte slice (`[]byte`) unless the logical type is UTF8, in which case they are extracted as a string (`string`).

When a value extracted as a byte slice exists within a document which is later JSON serialized by default it will be base 64 encoded into strings, which is the default for arbitrary data fields. It is possible to convert these binary values to strings (or other data types) using Bloblang transformations such as `root.foo = this.foo.string()` or `root.foo = this.foo.encode("hex")`, etc.

## [](#fields)Fields

### [](#auto_replay_nacks)`auto_replay_nacks`

Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.

**Type**: `bool`

**Default**: `true`

### [](#batch_count)`batch_count`

Optionally process records in batches. This can help to speed up the consumption of exceptionally large files. When the end of the file is reached the remaining records are processed as a (potentially smaller) batch.

**Type**: `int`

**Default**: `1`

### [](#paths)`paths[]`

A list of file paths to read from. Each file will be read sequentially until the list is exhausted, at which point the input will close. Glob patterns are supported, including super globs (double star).

**Type**: `array`

```yaml
# Examples:
paths: /tmp/foo.parquet

# ---

paths: /tmp/bar/*.parquet

# ---

paths: /tmp/data/**/*.parquet
```