
LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.
LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.
glm-ttsPOST /v1/audio/speechLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
GLM TTS serves speech through POST /v1/audio/speech and returns playable audio. Send the text to speak as input with the model id glm-tts. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/audio/speech \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-tts",
"input": "Welcome to EmpirioLabs. Your build just finished."
}' \
--output speech.mp3import requests
response = requests.post(
"https://api.empiriolabs.ai/v1/audio/speech",
headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
json={"model": "glm-tts", "input": "Welcome to EmpirioLabs."},
)
with open("speech.mp3", "wb") as f:
f.write(response.content)Request parameters supported by the GLM TTS API on EmpirioLabs. Defaults apply when a field is omitted.
| Parameter | Type | Default | Range / values | Description |
|---|---|---|---|---|
| input | string | - | - | Text to synthesize. For multi-speaker use [S1] / [S2] tags or 'Speaker N:' lines. |
| voice | enum | emma | emma, james, arthur, xiaomei, zhigang, custom | emma=English Female, james=US Male, arthur=US Male alt, xiaomei=Chinese Female, zhigang=Chinese Male, custom=upload reference via voice_audio_url. |
| voice_audio_url | string | - | - | Reference audio URL for custom voice cloning. The reference recording must contain the speaker reading this exact consent phrase aloud, in their own voice: "I... |
| output_format | enum | mp3 | mp3, wav | Output media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint). |
| speed | number | 1 | 0.5 to 2 | Speaking rate multiplier. |
| model_quality | enum | quality | quality, fast | quality=FP16 (better), fast=INT8 (quicker) |
| sample_rate | enum | 24000 | 24000, 16000 | Output sample rate in Hz. |
| volume | number | 1 | 0.1 to 2 | Output gain multiplier. |
| use_cache | boolean | true | - | Speeds up repeated identical generations. |
| optimize_input | boolean | true | - | Auto-fix pronunciation of technical terms, acronyms, and special characters. |
| seed | number | - | - | Reproducibility seed. |
On EmpirioLabs, GLM TTS is billed pay as you go: Fast (INT8) $0.20 per 1k characters; Quality (FP16) $0.21 per 1k characters. The live rate card on this page always matches what the API charges.
GLM TTS is served through POST /v1/audio/speech on api.empiriolabs.ai with standard bearer-token authentication.
Yes. The EmpirioLabs playground runs GLM TTS in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Explore our models, or contact us about business inquiries, custom deployments, or anything else.