cohere_embeddings

Beta

Generates vector embeddings to represent input text, using the Cohere API.

Introduced in version 4.37.0.

# Config fields, showing default values
label: ""
cohere_embeddings:
  base_url: https://api.cohere.com
  auth_token: "" # No default (required)
  model: embed-english-v3.0 # No default (required)
  text_mapping: "" # No default (optional)
  dimensions: search_document

This processor sends text strings to your chosen large language model (LLM), which generates vector embeddings for them using the Cohere API. By default, the processor submits the entire payload of each message as a string, unless you use the text_mapping field to customize it.

To learn more about vector embeddings, see the Cohere API documentation.

Examples

  • Store embedding vectors in Qdrant

Compute embeddings for some generated data and store it within Qdrant.

input:
  generate:
    interval: 1s
    mapping: |
      root = {"text": fake("paragraph")}
pipeline:
  processors:
  - cohere_embeddings:
      model: embed-english-v3
      auth_token: "${COHERE_AUTH_TOKEN}"
      text_mapping: "root = this.text"
output:
  qdrant:
    grpc_host: localhost:6334
    collection_name: "example_collection"
    id: "root = uuid_v4()"
    vector_mapping: "root = this"

Fields

base_url

The base URL to use for API requests.

Type: string

auth_token

Your authentication token for the Cohere API.

This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Secrets.

Type: string

model

The name of the Cohere LLM you want to use.

Type: string

# Examples
model: embed-english-v3.0
model: embed-english-light-v3.0
model: embed-multilingual-v3.0
model: embed-multilingual-light-v3.0

text_mapping

The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string.

Type: string

dimensions

The type of text input passed to the model.

Type: string

Default: search_document

Option

Summary

classification

For embeddings passed through a text classifier.

clustering

For embeddings run through a clustering algorithm.

search_document

For embeddings stored in a vector database for search use cases.

search_query

For embeddings of search queries run against a vector database to find relevant documents.