# ollama_moderation

> For the complete documentation index, see [llms.txt](https://docs.redpanda.com/llms.txt). Component-specific: [cloud-data-platform-full.txt](https://docs.redpanda.com/cloud-data-platform-full.txt)

---
title: ollama_moderation
page-beta-text: This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments.
latest-operator-version: v26.1.4
latest-console-tag: v3.7.3
latest-connect-version: 4.93.0
latest-redpanda-tag: v26.1.9
docname: connect/components/processors/ollama_moderation
page-component-name: cloud-data-platform
page-version: master
page-component-version: master
page-component-title: Cloud
page-relative-src-path: connect/components/processors/ollama_moderation.adoc
page-edit-url: https://github.com/redpanda-data/cloud-docs/edit/main/modules/develop/pages/connect/components/processors/ollama_moderation.adoc
# Beta release status
page-beta: "true"
page-git-created-date: "2025-01-28"
page-git-modified-date: "2026-05-26"
release-status: beta - This is a beta feature. Beta features are available for testing and feedback. They are not supported by Redpanda and should not be used in production environments.
---

<!-- Source: https://docs.redpanda.com/cloud-data-platform/develop/connect/components/processors/ollama_moderation.md -->

**Available in:** Self-Managed

> 📝 **NOTE**
>
> Ollama connectors are currently only available on BYOC GCP clusters.

> ⚠️ **CAUTION**
>
> When Redpanda Connect runs a data pipeline with a Ollama processor in it, Redpanda Cloud deploys a GPU-powered instance for the exclusive use of that pipeline. As pricing is based on resource consumption, this can have cost implications.

Generates responses to messages in a chat conversation using the Ollama API, and checks the responses to make sure they do not violate [safety or security standards](https://mlcommons.org/2024/04/mlc-aisafety-v0-5-poc/).

#### Common

```yml
processors:
  label: ""
  ollama_moderation:
    model: "" # No default (required)
    prompt: "" # No default (required)
    response: "" # No default (required)
    runner:
      context_size: "" # No default (optional)
      batch_size: "" # No default (optional)
      gpu_layers: "" # No default (optional)
      threads: "" # No default (optional)
      use_mmap: "" # No default (optional)
    server_address: "" # No default (optional)
```

#### Advanced

```yml
processors:
  label: ""
  ollama_moderation:
    model: "" # No default (required)
    prompt: "" # No default (required)
    response: "" # No default (required)
    runner:
      context_size: "" # No default (optional)
      batch_size: "" # No default (optional)
      gpu_layers: "" # No default (optional)
      threads: "" # No default (optional)
      use_mmap: "" # No default (optional)
    server_address: "" # No default (optional)
    cache_directory: "" # No default (optional)
    download_url: "" # No default (optional)
```

This processor checks the safety of responses from your chosen large language model (LLM) using either [Llama Guard 3](https://ollama.com/library/llama-guard3) or [ShieldGemma](https://ollama.com/library/shieldgemma).

By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the `server_address` field. You can [download and install Ollama from the Ollama website](https://ollama.com/download).

For more information, see the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) and [Examples](#examples).

To check the safety of your prompts, see the [`ollama_chat` processor](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/processors/ollama_chat/#examples) documentation.

## [](#fields)Fields

### [](#cache_directory)`cache_directory`

If the `server_address` is not set, download the Ollama binary to this directory and use it as a model cache.

**Type**: `string`

```yaml
# Examples:
cache_directory: /opt/cache/connect/ollama
```

### [](#download_url)`download_url`

If `server_address` is not set, download the Ollama binary from this URL. The default value is the official Ollama GitHub release for this platform.

**Type**: `string`

### [](#model)`model`

The name of the Ollama LLM to use.

**Type**: `string`

| Option | Summary |
| --- | --- |
| llama-guard3 | When using llama-guard3, two pieces of metadata is added: @safe with the value of yes or no and the second being @category for the safety category violation. For more information see the Llama Guard 3 Model Card. |
| shieldgemma | When using shieldgemma, the model output is a single piece of metadata of @safe with a value of yes or no if the response is not in violation of its defined safety policies. |

```yaml
# Examples:
model: llama-guard3

# ---

model: shieldgemma
```

### [](#prompt)`prompt`

The prompt you used to generate a response from an LLM.

If you’re using the `ollama_chat` processor, you can set the `save_prompt_metadata` field to save the contents of your prompts. You can then run them through `ollama_moderation` processor to check the model responses for safety. For more details, see [Examples](#examples).

You can also check the safety of your prompts. For more information, see the [`ollama_chat` processor](https://docs.redpanda.com/cloud-data-platform/develop/connect/components/processors/ollama_chat/#examples) documentation.

This field supports [interpolation functions](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/interpolation/#bloblang-queries).

**Type**: `string`

### [](#response)`response`

The LLM’s response that you want to check for safety.

This field supports [interpolation functions](https://docs.redpanda.com/cloud-data-platform/develop/connect/configuration/interpolation/#bloblang-queries).

**Type**: `string`

### [](#runner)`runner`

Options for the model runner that are used when the model is first loaded into memory.

**Type**: `object`

### [](#runner-batch_size)`runner.batch_size`

The maximum number of requests to process in parallel.

**Type**: `int`

### [](#runner-context_size)`runner.context_size`

Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process.

**Type**: `int`

### [](#runner-gpu_layers)`runner.gpu_layers`

Sets the number of layers to offload to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.

**Type**: `int`

### [](#runner-threads)`runner.threads`

Sets the number of threads to use during response generation. For optimal performance, set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.

**Type**: `int`

### [](#runner-use_mmap)`runner.use_mmap`

Map the model into memory. Set to `true` to load only the necessary parts of the model into memory. This setting is only supported on Unix systems.

**Type**: `bool`

### [](#server_address)`server_address`

The address of the Ollama server to use. Leave this field blank and the processor starts and runs a local Ollama server, or specify the address of your own local or remote server.

**Type**: `string`

```yaml
# Examples:
server_address: http://127.0.0.1:11434
```

## [](#examples)Examples

### [](#use-llama-guard-3-classify-a-llm-response)Use Llama Guard 3 classify a LLM response

This example uses Llama Guard 3 to check if another model responded with a safe or unsafe content.

```yaml
input:
  stdin:
    scanner:
      lines: {}
pipeline:
  processors:
    - ollama_chat:
        model: llava
        prompt: "${!content().string()}"
        save_prompt_metadata: true
    - ollama_moderation:
        model: llama-guard3
        prompt: "${!@prompt}"
        response: "${!content().string()}"
    - mapping: |
        root.response = content().string()
        root.is_safe = @safe
output:
  stdout:
    codec: lines
```