# parquet

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: parquet
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: processors/parquet
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: processors/parquet.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/processors/parquet.adoc
page-git-created-date: "2024-05-24"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/processors/parquet.md -->

**Type:** Processor ▼

[Processor](https://docs.redpanda.com/connect/components/processors/parquet/)[Input](https://docs.redpanda.com/connect/components/inputs/parquet/)

**Available in:** Self-Managed

> ⚠️ **WARNING: Deprecated**
>
> Deprecated
>
> This component is deprecated and will be removed in the next major version release. Please consider moving onto [alternative components](#alternatives).

Converts batches of documents to or from [Parquet files](https://parquet.apache.org/docs/).

Introduced in version 3.62.0.

```yml
# Config fields, showing default values
label: ""
parquet:
  operator: "" # No default (required)
  compression: snappy
  schema_file: schemas/foo.json # No default (optional)
  schema: |- # No default (optional)
    {
      "Tag": "name=root, repetitiontype=REQUIRED",
      "Fields": [
        {"Tag":"name=name,inname=NameIn,type=BYTE_ARRAY,convertedtype=UTF8, repetitiontype=REQUIRED"},
        {"Tag":"name=age,inname=Age,type=INT32,repetitiontype=REQUIRED"}
      ]
    }
```

## [](#alternatives)Alternatives

This processor is now deprecated, it’s recommended that you use the new [`parquet_decode`](https://docs.redpanda.com/connect/components/processors/parquet_decode/) and [`parquet_encode`](https://docs.redpanda.com/connect/components/processors/parquet_encode/) processors as they provide a number of advantages, the most important of which is better error messages for when schemas are mismatched or files could not be consumed.

## [](#troubleshooting)Troubleshooting

This processor is experimental and the error messages that it provides are often vague and unhelpful. An error message of the form `interface \{} is nil, not <value type>` implies that a field of the given type was expected but not found in the processed message when writing parquet files.

Unfortunately the name of the field will sometimes be missing from the error, in which case it’s worth double checking the schema you provided to make sure that there are no typos in the field names, and if that doesn’t reveal the issue it can help to mark fields as OPTIONAL in the schema and gradually change them back to REQUIRED until the error returns.

## [](#define-the-schema)Define the schema

The schema must be specified as a JSON string, containing an object that describes the fields expected at the root of each document. Each field can itself have more fields defined, allowing for nested structures:

```json
{
  "Tag": "name=root, repetitiontype=REQUIRED",
  "Fields": [
    {"Tag": "name=name, inname=NameIn, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=REQUIRED"},
    {"Tag": "name=age, inname=Age, type=INT32, repetitiontype=REQUIRED"},
    {"Tag": "name=id, inname=Id, type=INT64, repetitiontype=REQUIRED"},
    {"Tag": "name=weight, inname=Weight, type=FLOAT, repetitiontype=REQUIRED"},
    {
      "Tag": "name=favPokemon, inname=FavPokemon, type=LIST, repetitiontype=OPTIONAL",
      "Fields": [
        {"Tag": "name=name, inname=PokeName, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=REQUIRED"},
        {"Tag": "name=coolness, inname=Coolness, type=FLOAT, repetitiontype=REQUIRED"}
      ]
    }
  ]
}
```

A schema can be derived from a source file using [https://github.com/xitongsys/parquet-go/tree/master/tool/parquet-tools](https://github.com/xitongsys/parquet-go/tree/master/tool/parquet-tools):

```sh
./parquet-tools -cmd schema -file foo.parquet
```

## [](#fields)Fields

### [](#compression)`compression`

The type of compression to use when writing parquet files, this field is ignored when consuming parquet files.

**Type**: `string`

**Default**: `snappy`

**Options**: `uncompressed`, `snappy`, `gzip`, `lz4`, `zstd`

### [](#operator)`operator`

Determines whether the processor converts messages into a parquet file or expands parquet files into messages. Converting into JSON allows subsequent processors and mappings to convert the data into any other format.

**Type**: `string`

| Option | Summary |
| --- | --- |
| from_json | Compress a batch of JSON documents into a file. |
| to_json | Expand a file into one or more JSON messages. |

### [](#schema)`schema`

A schema used to describe the parquet files being generated or consumed, the format of the schema is a JSON document detailing the tag and fields of documents. The schema can be found at: [https://pkg.go.dev/github.com/xitongsys/parquet-go#readme-json](https://pkg.go.dev/github.com/xitongsys/parquet-go#readme-json). Either a `schema_file` or `schema` field must be specified when creating Parquet files via the `from_json` operator.

**Type**: `string`

```yaml
# Examples:
schema: |-
  {
    "Tag": "name=root, repetitiontype=REQUIRED",
    "Fields": [
      {"Tag":"name=name,inname=NameIn,type=BYTE_ARRAY,convertedtype=UTF8, repetitiontype=REQUIRED"},
      {"Tag":"name=age,inname=Age,type=INT32,repetitiontype=REQUIRED"}
    ]
  }
```

### [](#schema_file)`schema_file`

A file path containing a schema used to describe the parquet files being generated or consumed, the format of the schema is a JSON document detailing the tag and fields of documents. The schema can be found at: [https://pkg.go.dev/github.com/xitongsys/parquet-go#readme-json](https://pkg.go.dev/github.com/xitongsys/parquet-go#readme-json). Either a `schema_file` or `schema` field must be specified when creating Parquet files via the `from_json` operator.

**Type**: `string`

```yaml
# Examples:
schema_file: schemas/foo.json
```