openai_transcription

Beta

Generates a transcription of spoken audio in the input language, using the OpenAI API.

  • Common

  • Advanced

# Common config fields, showing default values
label: ""
openai_transcription:
  server_address: https://api.openai.com/v1
  api_key: "" # No default (required)
  model: whisper-1 # No default (required)
  file: "" # No default (required)
# All config fields, showing default values
label: ""
openai_transcription:
  server_address: https://api.openai.com/v1
  api_key: "" # No default (required)
  model: whisper-1 # No default (required)
  file: "" # No default (required)
  language: en # No default (optional)
  prompt: "" # No default (optional)

This processor sends an audio file object along with the input language to OpenAI API to generate a transcription. By default, the processor submits the entire payload of each message as a string, unless you use the file configuration field to customize it.

To learn more about audio transcription, see the: OpenAI API documentation.

Fields

server_address

The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service.

Type: string

Default: "https://api.openai.com/v1"

api_key

The API key for OpenAI API.

This field contains sensitive information that usually shouldn’t be added to a configuration directly. For more information, see Manage Secrets before adding it to your configuration.

Type: string

model

The name of the OpenAI model to use.

Type: string

# Examples

model: whisper-1

file

The audio file object (not file name) to transcribe, in one of the following formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

Type: string

language

The language of the input audio. Supplying the input language in ISO-639-1 format improves accuracy and latency. This field supports interpolation functions.

Type: string

# Examples

language: en

language: fr

language: de

language: zh

prompt

Optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language. This field supports interpolation functions.

Type: string