ollama_moderation
beta
Ollama connectors are currently only available on BYOC GCP clusters. |
When Redpanda Connect runs a data pipeline with a Ollama processor in it, Redpanda Cloud deploys a GPU-powered instance for the exclusive use of that pipeline. As pricing is based on resource consumption, this can have cost implications. |
Generates responses to messages in a chat conversation using the Ollama API, and checks the responses to make sure they do not violate safety or security standards.
-
Common
-
Advanced
# Common configuration fields, showing default values
label: ""
ollama_moderation:
model: llama-guard3 # No default (required)
prompt: "" # No default (required)
response: "" # No default (required)
runner:
context_size: 0 # No default (optional)
batch_size: 0 # No default (optional)
server_address: http://127.0.0.1:11434 # No default (optional)
# All configuration fields, showing default values
label: ""
ollama_moderation:
model: llama-guard3 # No default (required)
prompt: "" # No default (required)
response: "" # No default (required)
runner:
context_size: 0 # No default (optional)
batch_size: 0 # No default (optional)
gpu_layers: 0 # No default (optional)
threads: 0 # No default (optional)
use_mmap: false # No default (optional)
use_mlock: false # No default (optional)
server_address: http://127.0.0.1:11434 # No default (optional)
cache_directory: /opt/cache/connect/ollama # No default (optional)
download_url: "" # No default (optional)
This processor checks the safety of responses from your chosen large language model (LLM) using either Llama Guard 3 or ShieldGemma.
By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the server_address
field. You can download and install Ollama from the Ollama website.
For more information, see the Ollama documentation and Examples.
To check the safety of your prompts, see the ollama_chat
processor documentation.
Fields
model
The name of the Ollama LLM to use.
Type: string
Options:
Option |
Description |
|
The Llama Guard 3 model writes the following metadata to each processed message:
For more information, see the Llama Guard 3 model description. |
|
The ShieldGemma model writes an For more information, see the ShieldGemma model description. |
# Examples
model: llama-guard3
model: shieldgemma
prompt
The prompt you used to generate a response from an LLM.
If you’re using the ollama_chat
processor, you can set the save_prompt_metadata
field to save the contents of your prompts. You can then run them through ollama_moderation
processor to check the model responses for safety. For more details, see Examples.
You can also check the safety of your prompts. For more information, see the ollama_chat
processor documentation.
This field supports interpolation functions.
Type: string
response
The LLM’s response that you want to check for safety.
This field supports interpolation functions.
Type: string
runner
Options for the model runner that are used when the model is first loaded into memory.
Type: object
runner.context_size
Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process.
Type: int
runner.gpu_layers
Sets the number of layers to offload to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.
Type: int
runner.threads
Sets the number of threads to use during response generation. For optimal performance, set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.
Type: int
runner.use_mmap
Map the model into memory. Set to true
to load only the necessary parts of the model into memory. This setting is only supported on Unix systems.
Type: bool
runner.use_mlock
Set to true
to lock the model in memory, preventing it from being swapped out when it’s mapped into memory. This option can improve performance but reduces the benefits of memory-mapping by increasing RAM usage and slowing model load times.
Type: bool
server_address
The address of the Ollama server to use. Leave this field blank and the processor starts and runs a local Ollama server, or specify the address of your own local or remote server.
Type: string
# Examples
server_address: http://127.0.0.1:11434
Examples
This example uses Llama Guard 3 to check if another model (LLaVA) responded with a safe or unsafe content.
input:
stdin:
scanner:
lines: {}
pipeline:
processors:
- ollama_chat:
model: llava
prompt: "${!content().string()}"
save_prompt_metadata: true
- ollama_moderation:
model: llama-guard3
prompt: "${!@prompt}"
response: "${!content().string()}"
- mapping: |
root.response = content().string()
root.is_safe = @safe
output:
stdout:
codec: lines