
Broadcast-quality voice synthesis with rich expressive prosody, 271+ voices across 15 languages, and real-time SSE streaming with per-word timestamps.
Broadcast-quality voice synthesis with rich expressive prosody, 271+ voices across 15 languages, and real-time SSE streaming with per-word timestamps.
Also known as TTS Max, TTS-1.5-Max
tts-1-5-maxPOST /v1/audio/speechPOST /v1/audio/speech:streamGET /v1/voicesLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
TTS 1.5 Max serves speech through POST /v1/audio/speech and returns playable audio. Send the text to speak as input with the model id tts-1-5-max. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/audio/speech \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1-5-max",
"input": "Welcome to EmpirioLabs. Your build just finished."
}' \
--output speech.mp3import requests
response = requests.post(
"https://api.empiriolabs.ai/v1/audio/speech",
headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
json={"model": "tts-1-5-max", "input": "Welcome to EmpirioLabs."},
)
with open("speech.mp3", "wb") as f:
f.write(response.content)Request parameters supported by the TTS 1.5 Max API on EmpirioLabs. Defaults apply when a field is omitted.
| Parameter | Type | Default | Range / values | Description |
|---|---|---|---|---|
| input | string | - | max 2000 | Text to synthesize. Max 2,000 characters per request — chunk longer copy at sentence boundaries on the client. |
| voice | enum | Sarah | Sarah, Olivia, Elizabeth, Ashley, Wendy, Julia, Priya, Pixie,... | Voice preset. 20 hand-picked voices covering English + Spanish + Portuguese + Hindi + various accents. For the full 271-voice catalog (including cloned voices), use... |
| voice_id | string | - | - | Free-form voice ID. Overrides voice when set. Use this to address voices outside the curated 20-preset list — Inworld TTS 1.5 ships 271+ named voices across 15... |
| language | enum | en-US | en-US, en-GB, es-ES, es-MX, fr-FR, de-DE, it-IT, pt-BR, pt-PT... | BCP-47 language code. Inworld TTS 1.5 covers 15 languages. |
| output_format | enum | WAV | MP3, WAV, OGG, FLAC, PCM, ALAW, MULAW | Audio container/codec. WAV = LINEAR16 inside RIFF (ubiquitous). MP3 / OGG = compressed. PCM = headerless raw — useful for chunked-real-time playback. FLAC = lossless. |
| sample_rate | enum | 24000 | 8000, 16000, 22050, 24000, 32000, 44100, 48000 | Output sample rate in Hz. 24000 is Inworld's default and what their voice models train at; raise to 48000 for broadcast quality. |
| speed | number | 1 | 0.5 to 1.5 | Speaking rate multiplier. 0.5 = half speed, 1.5 = 50% faster. |
| temperature | number | 1 | 0.1 to 2 | Voice expressiveness / variability. Lower = more consistent / "flat"; higher = more expressive but more variation between renders. |
| bit_rate | number | 128000 | 32000 to 320000 | Bitrate in bps for MP3 / OGG_OPUS. Ignored for other encodings. |
| apply_text_normalization | enum | ON | ON, OFF | When ON, Inworld expands numbers / abbreviations / dates into spoken form ("USD 5" → "five US dollars"). |
| timestamp_type | enum | NONE | NONE, WORD, CHARACTER | If non-NONE, the response includes per-word or per-character timestamps in timestamp_info. Useful for caption / highlight UIs. |
On EmpirioLabs, TTS 1.5 Max is billed pay as you go: Synthesis $29.75 (was $35.00) per 1M characters. The live rate card on this page always matches what the API charges.
TTS 1.5 Max is served through POST /v1/audio/speech on api.empiriolabs.ai with standard bearer-token authentication.
Yes. The EmpirioLabs playground runs TTS 1.5 Max in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Explore our models, or contact us about business inquiries, custom deployments, or anything else.