# parquet_decode

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: parquet_decode
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: processors/parquet_decode
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: processors/parquet_decode.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/processors/parquet_decode.adoc
page-git-created-date: "2024-05-24"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/processors/parquet_decode.md -->

**Available in:** [Cloud](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/processors/parquet_decode/%20%22View%20the%20Cloud%20version%20of%20this%20component%22), Self-Managed

Decodes [Parquet files](https://parquet.apache.org/docs/) into a batch of structured messages.

Introduced in version 4.4.0.

```yml
# Configuration fields, showing default values
label: ""
parquet_decode:
  handle_logical_types: v1
```

## [](#fields)Fields

### [](#handle_logical_types)`handle_logical_types`

Set to `v2` to enable enhanced decoding of logical types, or keep the default value (`v1`) to ignore logical type metadata when decoding values.

In Parquet format, logical types are represented using standard physical types along with metadata that provides additional context. For example, UUIDs are stored as a `FIXED_LEN_BYTE_ARRAY` physical type, but the schema metadata identifies them as UUIDs. By enabling `v2`, this processor uses the metadata descriptions of logical types to produce more meaningful values during decoding.

> 📝 **NOTE**
>
> For backward compatibility, this field enables logical-type handling for the specified Parquet format version, and all earlier versions. When creating new pipelines, Redpanda recommends that you use the newest documented version.

**Type**: `string`

**Default**: `v1`

| Option | Summary |
| --- | --- |
| v1 | No special handling of logical types |
| v2 | TIMESTAMP - decodes as an RFC3339 string describing the time. If the isAdjustedToUTC flag is set to true in the parquet file, the time zone will be set to UTC. If it is set to false the time zone will be set to local time.UUID - decodes as a string, i.e. 00112233-4455-6677-8899-aabbccddeeff. |

```yaml
# Examples:
handle_logical_types: v2
```

## [](#examples)Examples

### [](#reading-parquet-files-from-aws-s3)Reading Parquet Files from AWS S3

In this example we consume files from AWS S3 as they’re written by listening onto an SQS queue for upload events. We make sure to use the `to_the_end` scanner which means files are read into memory in full, which then allows us to use a `parquet_decode` processor to expand each file into a batch of messages. Finally, we write the data out to local files as newline delimited JSON.

```yaml
input:
  aws_s3:
    bucket: TODO
    prefix: foos/
    scanner:
      to_the_end: {}
    sqs:
      url: TODO
  processors:
    - parquet_decode: {}

output:
  file:
    codec: lines
    path: './foos/${! meta("s3_key") }.jsonl'
```