Docs Cloud Redpanda Connect Components Processors parquet_encode parquet_encode Available in: Cloud, Self-Managed Encodes Parquet files from a batch of structured messages. Common Advanced # Common config fields, showing default values label: "" parquet_encode: schema: [] # No default (required) default_compression: uncompressed # All config fields, showing default values label: "" parquet_encode: schema: [] # No default (required) default_compression: uncompressed default_encoding: DELTA_LENGTH_BYTE_ARRAY This processor uses https://github.com/parquet-go/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases. Examples Writing Parquet Files to AWS S3 In this example we use the batching mechanism of an aws_s3 output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it. output: aws_s3: bucket: TODO path: 'stuff/${! timestamp_unix() }-${! uuid_v4() }.parquet' batching: count: 1000 period: 10s processors: - parquet_encode: schema: - name: id type: INT64 - name: weight type: DOUBLE - name: content type: BYTE_ARRAY default_compression: zstd Fields schema Parquet schema. Type: array schema[].name The name of the column. Type: string schema[].type The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8. Type: string Options: BOOLEAN , INT32 , INT64 , FLOAT , DOUBLE , BYTE_ARRAY , UTF8 . schema[].repeated Whether the field is repeated. Type: bool Default: false schema[].optional Whether the field is optional. Type: bool Default: false schema[].fields A list of child fields. Type: array # Examples fields: - name: foo type: INT64 - name: bar type: BYTE_ARRAY default_compression The default compression type to use for fields. Type: string Default: "uncompressed" Options: uncompressed , snappy , gzip , brotli , zstd , lz4raw . default_encoding The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support DELTA_LENGTH_BYTE_ARRAY and is therefore best left unset where possible. Type: string Default: "DELTA_LENGTH_BYTE_ARRAY" Options: DELTA_LENGTH_BYTE_ARRAY , PLAIN . Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution parquet_decode parse_log