Docs Cloud Redpanda Connect Components Processors parquet_decode parquet_decode Available in: Cloud, Self-Managed Decodes Parquet files into a batch of structured messages. # Config fields, showing default values label: "" parquet_decode: {} This processor uses https://github.com/parquet-go/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases. Examples Reading Parquet Files from AWS S3 In this example we consume files from AWS S3 as they’re written by listening onto an SQS queue for upload events. We make sure to use the to_the_end scanner which means files are read into memory in full, which then allows us to use a parquet_decode processor to expand each file into a batch of messages. Finally, we write the data out to local files as newline delimited JSON. input: aws_s3: bucket: TODO prefix: foos/ scanner: to_the_end: {} sqs: url: TODO processors: - parquet_decode: {} output: file: codec: lines path: './foos/${! meta("s3_key") }.jsonl' Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution parallel parquet_encode