Create a voice with gpt-4o-mini-tts

POST

/v1/audio/speech

GPT-4o mini TTS is a text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text to natural sounding spoken text. The maximum number of input tokens is 2000.

For intelligent realtime applications, use the gpt-4o-mini-tts model, our newest and most reliable text-to-speech model. You can prompt the model to control aspects of speech, including:

Accent
Emotional range
Intonation
Impressions
Speed of speech
Tone
Whispering

Request

Body Params application/json

model

string

required

gpt-4o-mini-tts

input

string

required

The text to generate audio from. The maximum number of input tokens is 2000.

voice

string

required

The voice to use when generating audio. Supported voices are: alloy
ash
ballad
coral
echo
fable
nova
onyx
sage
shimmer

instructions

string

required

Control the voice of your generated audio with additional instructions:
You can prompt the model to control aspects of speech, including:

Accent
Emotional range
Intonation
Impressions
Speed of speech
Tone
Whispering

Example

{
    "model": "gpt-4o-mini-tts",
    "input": "Việt nam có đẹp không!",
    "voice": "coral",
    "instructions": "Speak in a cheerful and positive tone."
  }

Request samples

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

curl --location --request POST '/v1/audio/speech' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "gpt-4o-mini-tts",
    "input": "Việt nam có đẹp không!",
    "voice": "coral",
    "instructions": "Speak in a cheerful and positive tone."
  }'

Responses

🟢200success

application/json

Body

object {0}

Example

{}

Modified at 2025-06-12 14:10:12

Make a request

Create a voice