# ollama_chat

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [connect-full.txt](https://docs.redpanda.com/connect-full.txt)

---
title: ollama_chat
latest-connect-version: 4.93.0
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-redpanda-tag: v26.1.9
docname: processors/ollama_chat
page-component-name: connect
page-version: master
page-component-version: master
page-component-title: Connect
page-relative-src-path: processors/ollama_chat.adoc
page-edit-url: https://github.com/redpanda-data/rp-connect-docs/edit/main/modules/components/pages/processors/ollama_chat.adoc
page-git-created-date: "2024-08-15"
page-git-modified-date: "2026-05-26"
---

<!-- Source: https://docs.redpanda.com/connect/components/processors/ollama_chat.md -->

**Available in:** Self-Managed

Generates responses to messages in a chat conversation using the Ollama API and external tools.

Introduced in version 4.32.0.

#### Common

```yml
processors:
  label: ""
  ollama_chat:
    model: "" # No default (required)
    prompt: "" # No default (optional)
    image: "" # No default (optional)
    response_format: text
    max_tokens: "" # No default (optional)
    temperature: "" # No default (optional)
    save_prompt_metadata: false
    history: "" # No default (optional)
    tools: []
    runner:
      context_size: "" # No default (optional)
      batch_size: "" # No default (optional)
      gpu_layers: "" # No default (optional)
      threads: "" # No default (optional)
      use_mmap: "" # No default (optional)
    server_address: "" # No default (optional)
```

#### Advanced

```yml
processors:
  label: ""
  ollama_chat:
    model: "" # No default (required)
    prompt: "" # No default (optional)
    system_prompt: "" # No default (optional)
    image: "" # No default (optional)
    response_format: text
    max_tokens: "" # No default (optional)
    temperature: "" # No default (optional)
    num_keep: "" # No default (optional)
    seed: "" # No default (optional)
    top_k: "" # No default (optional)
    top_p: "" # No default (optional)
    repeat_penalty: "" # No default (optional)
    presence_penalty: "" # No default (optional)
    frequency_penalty: "" # No default (optional)
    stop: [] # No default (optional)
    save_prompt_metadata: false
    history: "" # No default (optional)
    max_tool_calls: 3
    tools: []
    runner:
      context_size: "" # No default (optional)
      batch_size: "" # No default (optional)
      gpu_layers: "" # No default (optional)
      threads: "" # No default (optional)
      use_mmap: "" # No default (optional)
    server_address: "" # No default (optional)
    cache_directory: "" # No default (optional)
    download_url: "" # No default (optional)
```

This processor sends prompts to your chosen Ollama large language model (LLM) and generates text from the responses using the Ollama API and external tools.

By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can [download and install Ollama from the Ollama website](https://ollama.com/download).

For more information, see the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) and [examples](#examples).

## [](#fields)Fields

### [](#cache_directory)`cache_directory`

If `server_address` is not set - the directory to download the Ollama binary and use as a model cache.

**Type**: `string`

```yaml
# Examples:
cache_directory: /opt/cache/connect/ollama
```

### [](#download_url)`download_url`

If `server_address` is not set - the URL to download the Ollama binary from. Defaults to the official Ollama GitHub release for this platform.

**Type**: `string`

### [](#frequency_penalty)`frequency_penalty`

Positive values penalize new tokens based on the frequency of their appearance in the text so far. This decreases the model’s likelihood to repeat the same line verbatim.

**Type**: `float`

### [](#history)`history`

Include historical messages in a chat request. You must use a Bloblang query to create an array of objects in the form of `[{"role": "", "content":""}]` where:

-   `role` is the sender of the original messages, either `system`, `user`, `assistant`, or `tool`.

-   `content` is the text of the original messages.


**Type**: `string`

### [](#image)`image`

An optional image to submit along with the [`prompt`](#prompt) value. The result is a byte array.

Requires version 4.38.0 or later.

**Type**: `string`

```yaml
# Examples:
image: root = this.image.decode("base64") # decode base64 encoded image
```

### [](#max_tokens)`max_tokens`

The maximum number of tokens to predict and output. Limiting the amount of output means that requests are processed faster and have a fixed limit on the cost.

**Type**: `int`

### [](#max_tool_calls)`max_tool_calls`

The maximum number of sequential calls you can make to external tools to retrieve additional information to answer a prompt.

**Type**: `int`

**Default**: `3`

### [](#model)`model`

The name of the Ollama LLM to use. For a full list of models, see the [Ollama website](https://ollama.com/models).

**Type**: `string`

```yaml
# Examples:
model: llama3.1

# ---

model: gemma2

# ---

model: qwen2

# ---

model: phi3
```

### [](#num_keep)`num_keep`

Specify the number of tokens from the initial prompt to retain when the model resets its internal context. By default, this value is set to `4`. Use `-1` to retain all tokens from the initial prompt.

**Type**: `int`

### [](#presence_penalty)`presence_penalty`

Positive values penalize new tokens if they have appeared in the text so far. This increases the model’s likelihood to talk about new topics.

**Type**: `float`

### [](#prompt)`prompt`

The prompt you want to generate a response for. By default, the processor submits the entire payload as a string. This field supports [interpolation functions](https://docs.redpanda.com/connect/configuration/interpolation/#bloblang-queries).

**Type**: `string`

### [](#repeat_penalty)`repeat_penalty`

Sets how strongly to penalize repetitions. A higher value, for example 1.5, will penalize repetitions more strongly. A lower value, for example 0.9, will be more lenient.

**Type**: `float`

### [](#response_format)`response_format`

The format of the response the Ollama model generates. If specifying JSON output, then the `prompt` should specify that the output should be in JSON as well.

**Type**: `string`

**Default**: `text`

**Options**: `text`, `json`

### [](#runner)`runner`

Options for the model runner that are used when the model is first loaded into memory.

**Type**: `object`

### [](#runner-batch_size)`runner.batch_size`

The maximum number of requests to process in parallel.

**Type**: `int`

### [](#runner-context_size)`runner.context_size`

Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process.

**Type**: `int`

### [](#runner-gpu_layers)`runner.gpu_layers`

This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.

**Type**: `int`

### [](#runner-threads)`runner.threads`

Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.

**Type**: `int`

### [](#runner-use_mmap)`runner.use_mmap`

Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed.

**Type**: `bool`

### [](#save_prompt_metadata)`save_prompt_metadata`

Set to `true` to save the prompt value to a metadata field (`@prompt`) on the corresponding output message. If you use the `system_prompt` field, its value is also saved to an `@system_prompt` metadata field on each output message.

**Type**: `bool`

**Default**: `false`

### [](#seed)`seed`

Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt.

**Type**: `int`

```yaml
# Examples:
seed: 42
```

### [](#server_address)`server_address`

The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server.

**Type**: `string`

```yaml
# Examples:
server_address: http://127.0.0.1:11434
```

### [](#stop)`stop[]`

Sets the stop sequences to use. When this pattern is encountered, the LLM stops generating text and returns the final response.

**Type**: `array`

### [](#system_prompt)`system_prompt`

The system prompt to submit to the Ollama LLM. This field supports [interpolation functions](https://docs.redpanda.com/connect/configuration/interpolation/#bloblang-queries).

**Type**: `string`

### [](#temperature)`temperature`

The temperature of the model. Increasing the temperature makes the model answer more creatively.

**Type**: `int`

### [](#tools)`tools[]`

The external tools the LLM can invoke, such as functions, APIs, or web browsing. You can build a series of processors that include definitions of these tools, and the specified LLM can choose when to invoke them to help answer a prompt. For more information, see [examples](#examples).

**Type**: `object`

**Default**: `[]`

### [](#tools-description)`tools[].description`

A description of this tool, the LLM uses this to decide if the tool should be used.

**Type**: `string`

### [](#tools-name)`tools[].name`

The name of this tool.

**Type**: `string`

### [](#tools-parameters)`tools[].parameters`

The parameters the LLM needs to provide to invoke this tool.

**Type**: `object`

### [](#tools-parameters-properties)`tools[].parameters.properties`

The properties for the processor’s input data

**Type**: `object`

### [](#tools-parameters-properties-description)`tools[].parameters.properties.description`

A description of this parameter.

**Type**: `string`

### [](#tools-parameters-properties-enum)`tools[].parameters.properties.enum[]`

Specifies that this parameter is an enum and only these specific values should be used.

**Type**: `array`

**Default**: `[]`

### [](#tools-parameters-properties-type)`tools[].parameters.properties.type`

The type of this parameter.

**Type**: `string`

### [](#tools-parameters-required)`tools[].parameters.required[]`

The required parameters for this pipeline.

**Type**: `array`

**Default**: `[]`

### [](#tools-processors)`tools[].processors[]`

The pipeline to execute when the LLM uses this tool.

**Type**: `processor`

### [](#top_k)`top_k`

Reduces the probability of generating nonsense. A higher value, for example `100`, will give more diverse answers. A lower value, for example `10`, will be more conservative.

**Type**: `int`

### [](#top_p)`top_p`

Works together with `top-k`. A higher value, for example 0.95, will lead to more diverse text. A lower value, for example 0.5, will generate more focused and conservative text.

**Type**: `float`

## [](#examples)Examples

### [](#use-llava-to-analyze-an-image)Use Llava to analyze an image

This example fetches image URLs from stdin and has a multimodal LLM describe the image.

```yaml
input:
  stdin:
    scanner:
      lines: {}
pipeline:
  processors:
    - http:
        verb: GET
        url: "${!content().string()}"
    - ollama_chat:
        model: llava
        prompt: "Describe the following image"
        image: "root = content()"
output:
  stdout:
    codec: lines
```

### [](#use-subpipelines-as-tool-calls)Use subpipelines as tool calls

This example allows llama3.2 to execute a subpipeline as a tool call to get more data.

```yaml
input:
  generate:
    count: 1
    mapping: |
      root = "What is the weather like in Chicago?"
pipeline:
  processors:
    - ollama_chat:
        model: llama3.2
        prompt: "${!content().string()}"
        tools:
          - name: GetWeather
            description: "Retrieve the weather for a specific city"
            parameters:
              required: ["city"]
              properties:
                city:
                  type: string
                  description: the city to lookup the weather for
            processors:
              - http:
                  verb: GET
                  url: 'https://wttr.in/${!this.city}?T'
                  headers:
                    # Spoof curl user-ageent to get a plaintext text
                    User-Agent: curl/8.11.1
output:
  stdout: {}
```