Cloud

ollama_embeddings

Available in: Self-Managed

Ollama connectors are currently only available on BYOC GCP clusters.

When Redpanda Connect runs a data pipeline with a Ollama processor in it, Redpanda Cloud deploys a GPU-powered instance for the exclusive use of that pipeline. As pricing is based on resource consumption, this can have cost implications.

Generates vector embeddings from text, using the Ollama API.

Common
Advanced

processors:
  label: ""
  ollama_embeddings:
    model: "" # No default (required)
    text: "" # No default (optional)
    runner:
      context_size: "" # No default (optional)
      batch_size: "" # No default (optional)
      gpu_layers: "" # No default (optional)
      threads: "" # No default (optional)
      use_mmap: "" # No default (optional)
    server_address: "" # No default (optional)

processors:
  label: ""
  ollama_embeddings:
    model: "" # No default (required)
    text: "" # No default (optional)
    runner:
      context_size: "" # No default (optional)
      batch_size: "" # No default (optional)
      gpu_layers: "" # No default (optional)
      threads: "" # No default (optional)
      use_mmap: "" # No default (optional)
    server_address: "" # No default (optional)
    cache_directory: "" # No default (optional)
    download_url: "" # No default (optional)

This processor sends text to your chosen Ollama large language model (LLM) and creates vector embeddings, using the Ollama API. Vector embeddings are long arrays of numbers that represent values or objects, in this case text.

By default, the processor starts and runs a locally installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the server_address field. You can download and install Ollama from the Ollama website.

For more information, see the Ollama documentation.

Fields

`cache_directory`

If server_address is not set - the directory to download the ollama binary and use as a model cache.

Type: string

# Examples:
cache_directory: /opt/cache/connect/ollama

`download_url`

If server_address is not set - the URL to download the ollama binary from. Defaults to the official Ollama GitHub release for this platform.

Type: string

`model`

The name of the Ollama LLM to use. For a full list of models, see the Ollama website.

Type: string

# Examples:
model: nomic-embed-text

# ---

model: mxbai-embed-large

# ---

model: snowflake-artic-embed

# ---

model: all-minilm

`runner`

Options for the model runner that are used when the model is first loaded into memory.

Type: object

`runner.batch_size`

The maximum number of requests to process in parallel.

Type: int

`runner.context_size`

Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to processor.

Type: int

`runner.gpu_layers`

This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.

Type: int

`runner.threads`

Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.

Type: int

`runner.use_mmap`

Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed.

Type: bool

`server_address`

The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server.

Type: string

# Examples:
server_address: http://127.0.0.1:11434

`text`

The text you want to create vector embeddings for. By default, the processor submits the entire payload as a string. This field supports interpolation functions.

Type: string

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

ollama_embeddings

Fields

cache_directory

download_url

model

runner

runner.batch_size

runner.context_size

runner.gpu_layers

runner.threads

runner.use_mmap

server_address

text

Simple online edits

Contribution guide

`cache_directory`

`download_url`

`model`

`runner`

`runner.batch_size`

`runner.context_size`

`runner.gpu_layers`

`runner.threads`

`runner.use_mmap`

`server_address`

`text`