Docs Connect Components Inputs parquet parquet Type: InputProcessor Available in: Self-Managed Reads and decodes Parquet files into a stream of structured messages. Introduced in version 4.8.0. Common Advanced # Common config fields, showing default values input: label: "" parquet: paths: [] # No default (required) auto_replay_nacks: true # All config fields, showing default values input: label: "" parquet: paths: [] # No default (required) batch_count: 1 auto_replay_nacks: true This input uses https://github.com/parquet-go/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases. By default any BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY value will be extracted as a byte slice ([]byte) unless the logical type is UTF8, in which case they are extracted as a string (string). When a value extracted as a byte slice exists within a document which is later JSON serialized by default it will be base 64 encoded into strings, which is the default for arbitrary data fields. It is possible to convert these binary values to strings (or other data types) using Bloblang transformations such as root.foo = this.foo.string() or root.foo = this.foo.encode("hex"), etc. Fields paths A list of file paths to read from. Each file will be read sequentially until the list is exhausted, at which point the input will close. Glob patterns are supported, including super globs (double star). Type: array # Examples paths: /tmp/foo.parquet paths: /tmp/bar/*.parquet paths: /tmp/data/**/*.parquet batch_count Optionally process records in batches. This can help to speed up the consumption of exceptionally large files. When the end of the file is reached the remaining records are processed as a (potentially smaller) batch. Type: int Default: 1 auto_replay_nacks Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to false these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. Type: bool Default: true Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution ockam_kafka postgres_cdc