TTS 1.5 Mini API

Sub-130ms TTFB voice synthesis with 271+ voices across 15 languages, expressive prosody, and real-time SSE streaming for low-latency voice agents.

InworldAudio GenerationProprietary EndpointNew

About TTS 1.5 Mini

Sub-130ms TTFB voice synthesis with 271+ voices across 15 languages, expressive prosody, and real-time SSE streaming for low-latency voice agents.

Also known as TTS Mini, TTS-1.5-Mini

multi speakerreal timelow latencystreamingword timestampscharacter timestampsmultilingualexpressive prosody

TTS 1.5 Mini specs

Model ID
tts-1-5-mini
Provider
Inworld
Category
Audio Generation
Input
text
Output
audio
Endpoints
POST /v1/audio/speech
POST /v1/audio/speech:stream
GET /v1/voices

TTS 1.5 Mini API pricingSave up to 30%

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Synthesis
per 1M characters
$25.00$17.50
Compare on the full pricing page

How to call the TTS 1.5 Mini API

TTS 1.5 Mini serves speech through POST /v1/audio/speech and returns playable audio. Send the text to speak as input with the model id tts-1-5-mini. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-5-mini",
    "input": "Welcome to EmpirioLabs. Your build just finished."
  }' \
  --output speech.mp3
Python
import requests

response = requests.post(
    "https://api.empiriolabs.ai/v1/audio/speech",
    headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
    json={"model": "tts-1-5-mini", "input": "Welcome to EmpirioLabs."},
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)
Full TTS 1.5 Mini API reference

TTS 1.5 Mini API parameters

Request parameters supported by the TTS 1.5 Mini API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
inputstring-max 2000Text to synthesize. Max 2,000 characters per request — chunk longer copy at sentence boundaries on the client.
voiceenumSarahSarah, Olivia, Elizabeth, Ashley, Wendy, Julia, Priya, Pixie,...Voice preset. 20 hand-picked voices covering English + Spanish + Portuguese + Hindi + various accents. For the full 271-voice catalog (including cloned voices), use...
voice_idstring--Free-form voice ID. Overrides voice when set. Use this to address voices outside the curated 20-preset list — Inworld TTS 1.5 ships 271+ named voices across 15...
languageenumen-USen-US, en-GB, es-ES, es-MX, fr-FR, de-DE, it-IT, pt-BR, pt-PT...BCP-47 language code. Inworld TTS 1.5 covers 15 languages.
output_formatenumWAVMP3, WAV, OGG, FLAC, PCM, ALAW, MULAWAudio container/codec. WAV = LINEAR16 inside RIFF (ubiquitous). MP3 / OGG = compressed. PCM = headerless raw — useful for chunked-real-time playback. FLAC = lossless.
sample_rateenum240008000, 16000, 22050, 24000, 32000, 44100, 48000Output sample rate in Hz. 24000 is Inworld's default and what their voice models train at; raise to 48000 for broadcast quality.
speednumber10.5 to 1.5Speaking rate multiplier. 0.5 = half speed, 1.5 = 50% faster.
temperaturenumber10.1 to 2Voice expressiveness / variability. Lower = more consistent / "flat"; higher = more expressive but more variation between renders.
bit_ratenumber12800032000 to 320000Bitrate in bps for MP3 / OGG_OPUS. Ignored for other encodings.
apply_text_normalizationenumONON, OFFWhen ON, Inworld expands numbers / abbreviations / dates into spoken form ("USD 5" → "five US dollars").
timestamp_typeenumNONENONE, WORD, CHARACTERIf non-NONE, the response includes per-word or per-character timestamps in timestamp_info. Useful for caption / highlight UIs.

Good to know

Limits

  • Max input: 2,000 characters per request (chunk longer text at sentence boundaries)
  • WebSocket: 20 concurrent connections, 5 contexts/connection
  • Per-WS message: 1,000 characters

Latency

  • p90 TTFB: under 130 ms (Inworld benchmark)

Voices

  • 271+ named presets across 15 languages
  • 20 hand-picked presets exposed in the dropdown; pass any other voice ID via voice_id

TTS 1.5 Mini API: common questions

How much does the TTS 1.5 Mini API cost?

On EmpirioLabs, TTS 1.5 Mini is billed pay as you go: Synthesis $17.50 (was $25.00) per 1M characters. The live rate card on this page always matches what the API charges.

Which endpoint does TTS 1.5 Mini use?

TTS 1.5 Mini is served through POST /v1/audio/speech on api.empiriolabs.ai with standard bearer-token authentication.

Can I try TTS 1.5 Mini in the browser before integrating?

Yes. The EmpirioLabs playground runs TTS 1.5 Mini in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a TTS 1.5 Mini API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.