Docs Connect Components Processors ollama_moderation ollama_moderation Beta Available in: Self-Managed License: This component requires an enterprise license. You can either upgrade to an Enterprise Edition license, or generate a trial license key that's valid for 30 days. Generates responses to messages in a chat conversation using the Ollama API, and checks the responses to make sure they do not violate safety or security standards. Introduced in version 4.45.0. Common Advanced # Common configuration fields, showing default values label: "" ollama_moderation: model: llama-guard3 # No default (required) prompt: "" # No default (required) response: "" # No default (required) runner: context_size: 0 # No default (optional) batch_size: 0 # No default (optional) server_address: http://127.0.0.1:11434 # No default (optional) # All configuration fields, showing default values label: "" ollama_moderation: model: llama-guard3 # No default (required) prompt: "" # No default (required) response: "" # No default (required) runner: context_size: 0 # No default (optional) batch_size: 0 # No default (optional) gpu_layers: 0 # No default (optional) threads: 0 # No default (optional) use_mmap: false # No default (optional) use_mlock: false # No default (optional) server_address: http://127.0.0.1:11434 # No default (optional) cache_directory: /opt/cache/connect/ollama # No default (optional) download_url: "" # No default (optional) This processor checks the safety of responses from your chosen large language model (LLM) using either Llama Guard 3 or ShieldGemma. By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the server_address field. You can download and install Ollama from the Ollama website. For more information, see the Ollama documentation and Examples. To check the safety of your prompts, see the ollama_chat processor documentation. Fields model The name of the Ollama LLM to use. Type: string Options: Option Description llama-guard3 The Llama Guard 3 model writes the following metadata to each processed message: The @safe field: If the message content is unsafe, the value is set to either yes. Otherwise, it’s set to no. The @category field: When the @safe field is set to yes, this field returns the category of safety violation. For more information, see the Llama Guard 3 model description. shieldgemma The ShieldGemma model writes an @safe metadata field to each processed message, with the value of yes if the message content is unsafe or no if it’s not. For more information, see the ShieldGemma model description. # Examples model: llama-guard3 model: shieldgemma prompt The prompt you used to generate a response from an LLM. If you’re using the ollama_chat processor, you can set the save_prompt_metadata field to save the contents of your prompts. You can then run them through ollama_moderation processor to check the model responses for safety. For more details, see Examples. You can also check the safety of your prompts. For more information, see the ollama_chat processor documentation. This field supports interpolation functions. Type: string response The LLM’s response that you want to check for safety. This field supports interpolation functions. Type: string runner Options for the model runner that are used when the model is first loaded into memory. Type: object runner.context_size Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process. Type: int runner.batch_size The maximum number of requests to process in parallel. Type: int runner.gpu_layers Sets the number of layers to offload to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically. Type: int runner.threads Sets the number of threads to use during response generation. For optimal performance, set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads. Type: int runner.use_mmap Map the model into memory. Set to true to load only the necessary parts of the model into memory. This setting is only supported on Unix systems. Type: bool runner.use_mlock Set to true to lock the model in memory, preventing it from being swapped out when it’s mapped into memory. This option can improve performance but reduces the benefits of memory-mapping by increasing RAM usage and slowing model load times. Type: bool server_address The address of the Ollama server to use. Leave this field blank and the processor starts and runs a local Ollama server, or specify the address of your own local or remote server. Type: string # Examples server_address: http://127.0.0.1:11434 cache_directory If the server_address is not set, download the Ollama binary to this directory and use it as a model cache. Type: string # Examples cache_directory: /opt/cache/connect/ollama download_url If server_address is not set, download the Ollama binary from this URL. The default value is the official Ollama GitHub release for this platform. Type: string Examples This example uses Llama Guard 3 to check if another model (LLaVA) responded with a safe or unsafe content. input: stdin: scanner: lines: {} pipeline: processors: - ollama_chat: model: llava prompt: "${!content().string()}" save_prompt_metadata: true - ollama_moderation: model: llama-guard3 prompt: "${!@prompt}" response: "${!content().string()}" - mapping: | root.response = content().string() root.is_safe = @safe output: stdout: codec: lines Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution ollama_embeddings openai_chat_completion