ollama_moderation

Beta

Generates responses to messages in a chat conversation using the Ollama API, and checks the responses to make sure they do not violate safety or security standards.

Introduced in version 4.45.0.

  • Common

  • Advanced

# Common configuration fields, showing default values
label: ""
ollama_moderation:
  model: llama-guard3 # No default (required)
  prompt: "" # No default (required)
  response: "" # No default (required)
  runner:
    context_size: 0 # No default (optional)
    batch_size: 0 # No default (optional)
  server_address: http://127.0.0.1:11434 # No default (optional)
# All configuration fields, showing default values
label: ""
ollama_moderation:
  model: llama-guard3 # No default (required)
  prompt: "" # No default (required)
  response: "" # No default (required)
  runner:
    context_size: 0 # No default (optional)
    batch_size: 0 # No default (optional)
    gpu_layers: 0 # No default (optional)
    threads: 0 # No default (optional)
    use_mmap: false # No default (optional)
    use_mlock: false # No default (optional)
  server_address: http://127.0.0.1:11434 # No default (optional)
  cache_directory: /opt/cache/connect/ollama # No default (optional)
  download_url: "" # No default (optional)

This processor checks the safety of responses from your chosen large language model (LLM) using either Llama Guard 3 or ShieldGemma.

By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the server_address field. You can download and install Ollama from the Ollama website.

For more information, see the Ollama documentation and Examples.

To check the safety of your prompts, see the ollama_chat processor documentation.

Fields

model

The name of the Ollama LLM to use.

Type: string

Options:

Option

Description

llama-guard3

The Llama Guard 3 model writes the following metadata to each processed message:

  • The @safe field: If the message content is unsafe, the value is set to either yes. Otherwise, it’s set to no.

  • The @category field: When the @safe field is set to yes, this field returns the category of safety violation.

For more information, see the Llama Guard 3 model description.

shieldgemma

The ShieldGemma model writes an @safe metadata field to each processed message, with the value of yes if the message content is unsafe or no if it’s not.

For more information, see the ShieldGemma model description.

# Examples

model: llama-guard3

model: shieldgemma

prompt

The prompt you used to generate a response from an LLM.

If you’re using the ollama_chat processor, you can set the save_prompt_metadata field to save the contents of your prompts. You can then run them through ollama_moderation processor to check the model responses for safety. For more details, see Examples.

You can also check the safety of your prompts. For more information, see the ollama_chat processor documentation.

This field supports interpolation functions.

Type: string

response

The LLM’s response that you want to check for safety.

This field supports interpolation functions.

Type: string

runner

Options for the model runner that are used when the model is first loaded into memory.

Type: object

runner.context_size

Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process.

Type: int

runner.batch_size

The maximum number of requests to process in parallel.

Type: int

runner.gpu_layers

Sets the number of layers to offload to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.

Type: int

runner.threads

Sets the number of threads to use during response generation. For optimal performance, set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.

Type: int

runner.use_mmap

Map the model into memory. Set to true to load only the necessary parts of the model into memory. This setting is only supported on Unix systems.

Type: bool

runner.use_mlock

Set to true to lock the model in memory, preventing it from being swapped out when it’s mapped into memory. This option can improve performance but reduces the benefits of memory-mapping by increasing RAM usage and slowing model load times.

Type: bool

server_address

The address of the Ollama server to use. Leave this field blank and the processor starts and runs a local Ollama server, or specify the address of your own local or remote server.

Type: string

# Examples

server_address: http://127.0.0.1:11434

cache_directory

If the server_address is not set, download the Ollama binary to this directory and use it as a model cache.

Type: string

# Examples

cache_directory: /opt/cache/connect/ollama

download_url

If server_address is not set, download the Ollama binary from this URL. The default value is the official Ollama GitHub release for this platform.

Type: string

Examples

This example uses Llama Guard 3 to check if another model (LLaVA) responded with a safe or unsafe content.

input:
  stdin:
    scanner:
      lines: {}
pipeline:
  processors:
    - ollama_chat:
        model: llava
        prompt: "${!content().string()}"
        save_prompt_metadata: true
    - ollama_moderation:
        model: llama-guard3
        prompt: "${!@prompt}"
        response: "${!content().string()}"
    - mapping: |
        root.response = content().string()
        root.is_safe = @safe
output:
  stdout:
    codec: lines