Scanners

For such inputs it’s necessary to define a mechanism by which the stream of source bytes can be chopped into smaller logical messages, processed and outputted as a continuous process whilst the stream is being read, as this dramatically reduces the memory usage of Redpanda Connect as a whole and results in a more fluid flow of data.

The way in which we define this chopping mechanism is through scanners, configured as a field on each input that requires one. For example, if we wished to consume files line-by-line, which each individual line being processed as a discrete message, we could use the lines scanner with our file input:

Common
Advanced

input:
  file:
    paths: [ "./*.txt" ]
    scanner:
      lines: {}

# Instead of newlines, use a custom delimiter:
input:
  file:
    paths: [ "./*.txt" ]
    scanner:
      lines:
        custom_delimiter: "---END---"
        max_buffer_size: 100_000_000 # 100MB line buffer

A scanner is a plugin similar to any other core Redpanda Connect component (inputs, processors, outputs, etc), which means it’s possible to define your own scanners that can be utilized by inputs that need them.

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

Scanners

Simple online edits

Contribution guide