# parquet_encode

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: parquet_encode
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: processors/parquet_encode
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: processors/parquet_encode.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/processors/parquet_encode.adoc
page-git-created-date: "2024-05-24"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/processors/parquet_encode.md -->

**Available in:** [Cloud](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/processors/parquet_encode/%20%22View%20the%20Cloud%20version%20of%20this%20component%22), Self-Managed

Encodes [Parquet files](https://parquet.apache.org/docs/) from a batch of structured messages.

Introduced in version 4.4.0.

#### Common

```yml
processors:
  label: ""
  parquet_encode:
    schema: [] # No default (optional)
    schema_metadata: ""
    default_compression: uncompressed
```

#### Advanced

```yml
processors:
  label: ""
  parquet_encode:
    schema: [] # No default (optional)
    schema_metadata: ""
    default_compression: uncompressed
    default_encoding: DELTA_LENGTH_BYTE_ARRAY
    default_timestamp_unit: NANOSECOND
```

## [](#fields)Fields

### [](#default_compression)`default_compression`

The default compression type to use for fields.

**Type**: `string`

**Default**: `uncompressed`

**Options**: `uncompressed`, `snappy`, `gzip`, `brotli`, `zstd`, `lz4raw`

### [](#default_encoding)`default_encoding`

The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support `DELTA_LENGTH_BYTE_ARRAY`.

Requires version 4.11.0 or later.

**Type**: `string`

**Default**: `DELTA_LENGTH_BYTE_ARRAY`

**Options**: `DELTA_LENGTH_BYTE_ARRAY`, `PLAIN`

### [](#default_timestamp_unit)`default_timestamp_unit`

The precision used when encoding TIMESTAMP logical types. The default `NANOSECOND` matches historical behaviour, but `TIMESTAMP(NANOS)` is not readable by Apache Spark (Databricks), AWS Athena or DuckDB; set this to `MICROSECOND` (or `MILLISECOND`) when writing Parquet files intended for consumption by those engines.

Requires version 4.89.0 or later.

**Type**: `string`

**Default**: `NANOSECOND`

**Options**: `NANOSECOND`, `MICROSECOND`, `MILLISECOND`

### [](#schema)`schema[]`

Parquet schema.

**Type**: `object`

### [](#schema-fields)`schema[].fields[]`

A list of child fields.

**Type**: `array`

```yaml
# Examples:
fields:
  - name: foo
    type: INT64
  - name: bar
    type: BYTE_ARRAY
```

### [](#schema-name)`schema[].name`

The name of the column.

**Type**: `string`

### [](#schema-optional)`schema[].optional`

Whether the field is optional.

**Type**: `bool`

**Default**: `false`

### [](#schema-repeated)`schema[].repeated`

Whether the field is repeated.

**Type**: `bool`

**Default**: `false`

### [](#schema-type)`schema[].type`

The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8.

**Type**: `string`

**Options**: `BOOLEAN`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BYTE_ARRAY`, `UTF8`, `TIMESTAMP`, `BSON`, `ENUM`, `JSON`, `UUID`

### [](#schema_metadata)`schema_metadata`

Optionally specify a metadata field containing a schema definition to use for encoding instead of a statically defined schema. For batches of messages, the first message’s schema will be applied to all subsequent messages of the batch.

**Type**: `string`

**Default**: `""`

## [](#examples)Examples

### [](#writing-parquet-files-to-aws-s3)Writing Parquet Files to AWS S3

In this example we use the batching mechanism of an `aws_s3` output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it.

```yaml
output:
  aws_s3:
    bucket: TODO
    path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet'
    batching:
      count: 1000
      period: 10s
      processors:
        - parquet_encode:
            schema:
              - name: id
                type: INT64
              - name: weight
                type: DOUBLE
              - name: content
                type: BYTE_ARRAY
            default_compression: zstd
```