cohere_embeddings

Available in: Cloud, Self-Managed

Generates vector embeddings to represent input text, using the Cohere API.

Introduced in version 4.37.0.

# Configuration fields, showing default values
label: ""
cohere_embeddings:
  base_url: https://api.cohere.com
  auth_token: "" # No default (required)
  model: embed-english-v3.0 # No default (required)
  text_mapping: "" # No default (optional)
  input_type: search_document
  dimensions: "" # No default (optional)

This processor sends text strings to your chosen large language model (LLM), which generates vector embeddings for them using the Cohere API. By default, the processor submits the entire payload of each message as a string, unless you use the text_mapping field to customize it.

To learn more about vector embeddings, see the Cohere API documentation.

Examples

Store embedding vectors in Qdrant

Compute embeddings for some generated data and store it within xrefs:component:outputs/qdrant.adoc[Qdrant]

input:
  generate:
    interval: 1s
    mapping: |
      root = {"text": fake("paragraph")}
pipeline:
  processors:
  - cohere_embeddings:
      model: embed-english-v3
      api_key: "${COHERE_API_KEY}"
      text_mapping: "root = this.text"
output:
  qdrant:
    grpc_host: localhost:6334
    collection_name: "example_collection"
    id: "root = uuid_v4()"
    vector_mapping: "root = this"

Fields

`api_key`

The API key for the Cohere API.

This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Secrets.

Type: string

`base_url`

The base URL to use for API requests.

Type: string

Default: https://api.cohere.com

`dimensions`

The number of dimensions (numerical values) in each vector embedding generated by this processor. This parameter only supports embed-v4.0 and newer models.

Type: int

`input_type`

The type of text input passed to the model.

Type: string

Default: search_document

Option Summary

Option	Summary
`classification`	Used for embeddings passed through a text classifier.
`clustering`	Used for the embeddings run through a clustering algorithm.
`search_document`	Used for embeddings stored in a vector database for search use-cases.
`search_query`	Used for embeddings of search queries run against a vector DB to find relevant documents.

classification

Used for embeddings passed through a text classifier.

clustering

Used for the embeddings run through a clustering algorithm.

search_document

Used for embeddings stored in a vector database for search use-cases.

search_query

Used for embeddings of search queries run against a vector DB to find relevant documents.

`model`

The name of the Cohere LLM you want to use.

Type: string

# Examples:
model: embed-english-v3.0

# ---

model: embed-english-light-v3.0

# ---

model: embed-multilingual-v3.0

# ---

model: embed-multilingual-light-v3.0

`text_mapping`

The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string.

Type: string

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you think of this page?

Let us know more:

Let us contact you about your feedback:

cohere_embeddings

Examples

Store embedding vectors in Qdrant

Fields

api_key

base_url

dimensions

input_type

model

text_mapping

Simple online edits

Contribution guide

`api_key`

`base_url`

`dimensions`

`input_type`

`model`

`text_mapping`