Docs Connect Components Processors parquet_encode parquet_encode Available in: Cloud, Self-Managed Encodes Parquet files from a batch of structured messages. Introduced in version 4.4.0. Common Advanced # Common configuration fields, showing default values label: "" parquet_encode: schema: [] # No default (required) default_compression: uncompressed # All configuration fields, showing default values label: "" parquet_encode: schema: [] # No default (required) default_compression: uncompressed default_encoding: DELTA_LENGTH_BYTE_ARRAY Fields schema Parquet schema. Type: array schema[].name The name of the column you want to encode. Type: string schema[].type The data type of the column to encode. This field is only applicable for leaf columns with no child fields. The following options include logical types. Type: string Options: BOOLEAN , INT32 , INT64 , FLOAT , DOUBLE , BYTE_ARRAY , TIMESTAMP , BSON , ENUM , JSON , UUID schema[].repeated Whether a field is repeated. Type: bool Default: false schema[].optional Whether a field is optional. Type: bool Default: false schema[].fields A list of child fields. Type: array # Examples fields: - name: foo type: INT64 - name: bar type: BYTE_ARRAY default_compression The default compression type to use for fields. Type: string Default: uncompressed Options: uncompressed , snappy , gzip , brotli , zstd , lz4raw default_encoding The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support DELTA_LENGTH_BYTE_ARRAY. Type: string Default: DELTA_LENGTH_BYTE_ARRAY Requires version 4.11.0 or newer Options: DELTA_LENGTH_BYTE_ARRAY , PLAIN Examples Writing Parquet files to AWS S3 In this example, a pipeline uses an aws_s3 output as a batching mechanism. Messages are collected in memory and encoded into a Parquet file, which is then uploaded to an AWS S3 bucket. output: aws_s3: bucket: TODO path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet' batching: count: 1000 period: 10s processors: - parquet_encode: schema: - name: id type: INT64 - name: weight type: DOUBLE - name: content type: BYTE_ARRAY default_compression: zstd Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution parquet_decode parse_log