TTS 1.5 Max API

Broadcast-quality voice synthesis with rich expressive prosody, 271+ voices across 15 languages, and real-time SSE streaming with per-word timestamps.

InworldAudio GenerationProprietary EndpointNew

About TTS 1.5 Max

Broadcast-quality voice synthesis with rich expressive prosody, 271+ voices across 15 languages, and real-time SSE streaming with per-word timestamps.

Also known as TTS Max, TTS-1.5-Max

multi speakerreal timestreamingword timestampscharacter timestampsmultilingualexpressive prosodybroadcast quality

TTS 1.5 Max specs

Model ID
tts-1-5-max
Provider
Inworld
Category
Audio Generation
Input
text
Output
audio
Endpoints
POST /v1/audio/speech
POST /v1/audio/speech:stream
GET /v1/voices

TTS 1.5 Max API pricingSave up to 15%

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Synthesis
per 1M characters
$35.00$29.75
Compare on the full pricing page

How to call the TTS 1.5 Max API

TTS 1.5 Max serves speech through POST /v1/audio/speech and returns playable audio. Send the text to speak as input with the model id tts-1-5-max. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-5-max",
    "input": "Welcome to EmpirioLabs. Your build just finished."
  }' \
  --output speech.mp3
Python
import requests

response = requests.post(
    "https://api.empiriolabs.ai/v1/audio/speech",
    headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
    json={"model": "tts-1-5-max", "input": "Welcome to EmpirioLabs."},
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)
Full TTS 1.5 Max API reference

TTS 1.5 Max API parameters

Request parameters supported by the TTS 1.5 Max API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
inputstring-max 2000Text to synthesize. Max 2,000 characters per request — chunk longer copy at sentence boundaries on the client.
voiceenumSarahSarah, Olivia, Elizabeth, Ashley, Wendy, Julia, Priya, Pixie,...Voice preset. 20 hand-picked voices covering English + Spanish + Portuguese + Hindi + various accents. For the full 271-voice catalog (including cloned voices), use...
voice_idstring--Free-form voice ID. Overrides voice when set. Use this to address voices outside the curated 20-preset list — Inworld TTS 1.5 ships 271+ named voices across 15...
languageenumen-USen-US, en-GB, es-ES, es-MX, fr-FR, de-DE, it-IT, pt-BR, pt-PT...BCP-47 language code. Inworld TTS 1.5 covers 15 languages.
output_formatenumWAVMP3, WAV, OGG, FLAC, PCM, ALAW, MULAWAudio container/codec. WAV = LINEAR16 inside RIFF (ubiquitous). MP3 / OGG = compressed. PCM = headerless raw — useful for chunked-real-time playback. FLAC = lossless.
sample_rateenum240008000, 16000, 22050, 24000, 32000, 44100, 48000Output sample rate in Hz. 24000 is Inworld's default and what their voice models train at; raise to 48000 for broadcast quality.
speednumber10.5 to 1.5Speaking rate multiplier. 0.5 = half speed, 1.5 = 50% faster.
temperaturenumber10.1 to 2Voice expressiveness / variability. Lower = more consistent / "flat"; higher = more expressive but more variation between renders.
bit_ratenumber12800032000 to 320000Bitrate in bps for MP3 / OGG_OPUS. Ignored for other encodings.
apply_text_normalizationenumONON, OFFWhen ON, Inworld expands numbers / abbreviations / dates into spoken form ("USD 5" → "five US dollars").
timestamp_typeenumNONENONE, WORD, CHARACTERIf non-NONE, the response includes per-word or per-character timestamps in timestamp_info. Useful for caption / highlight UIs.

Good to know

Limits

  • Max input: 2,000 characters per request (chunk longer text at sentence boundaries)
  • WebSocket: 20 concurrent connections, 5 contexts/connection
  • Per-WS message: 1,000 characters

Latency

  • p90 TTFB: under 250 ms (Inworld benchmark)

Voices

  • 271+ named presets across 15 languages
  • 20 hand-picked presets exposed in the dropdown; pass any other voice ID via voice_id

TTS 1.5 Max API: common questions

How much does the TTS 1.5 Max API cost?

On EmpirioLabs, TTS 1.5 Max is billed pay as you go: Synthesis $29.75 (was $35.00) per 1M characters. The live rate card on this page always matches what the API charges.

Which endpoint does TTS 1.5 Max use?

TTS 1.5 Max is served through POST /v1/audio/speech on api.empiriolabs.ai with standard bearer-token authentication.

Can I try TTS 1.5 Max in the browser before integrating?

Yes. The EmpirioLabs playground runs TTS 1.5 Max in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a TTS 1.5 Max API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.