Text-to-Speech (TTS) API Guide

Overview

The Audio API provides the speech endpoint, powered by TTS models, to achieve the following functionalities:

📝 Read blog articles aloud.

🌍 Generate multilingual audio.

🎵 Output real-time audio streams.

Important Note: You must inform users that the voice they hear is AI-generated and not a human voice.

Basic Usage

Basic Example

Features

Audio Quality Options

tts-1: Low latency, suitable for real-time applications.

tts-1-hd: Higher quality, with potentially fewer static artifacts.

Available Voices

alloy

echo

fable

nova

shimmer

onyx

Supported Output Formats

Format	Features	Use Case
MP3	Default format	General-purpose scenarios
Opus	Low latency	Streaming media and communication
AAC	Efficient compression	Playback on mobile devices
FLAC	Lossless compression	Audio archiving
WAV	Uncompressed	Low-latency applications
PCM	Raw sampling	24kHz, 16-bit signed

Real-Time Audio Streaming

Supported Languages

The API supports multiple languages, including:

Asian Languages: Chinese, Japanese, Korean, etc.

European Languages: English, French, German, etc.

Other Languages: Arabic, Hindi, etc.

Note: The current voices are primarily optimized for English.

Frequently Asked Questions (FAQs)

Q: How can I control the emotion of the generated audio?

A: There is currently no direct mechanism to control emotions. Capitalization or punctuation might influence the output, but the effect is not guaranteed.

Q: Can I create custom voices?

A: Custom voice creation is not supported at this time.

Q: Who owns the generated audio?

A: The audio belongs to the creator, but you must inform users that the audio is AI-generated.

Python uses text to speech

Text-to-Speech (TTS) API Guide#

Overview#

Basic Usage#

Basic Example#

Features#

Audio Quality Options#

Available Voices#

Supported Output Formats#

Real-Time Audio Streaming#

Supported Languages#

Frequently Asked Questions (FAQs)#

Q: How can I control the emotion of the generated audio?#

Q: Can I create custom voices?#

Q: Who owns the generated audio?#