GLM TTS API

LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.

Z.aiAudio GenerationNative Inference

About GLM TTS

LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.

voice cloningemotion control

GLM TTS specs

Model ID
glm-tts
Provider
Z.ai
Category
Audio Generation
Input
text, audio
Output
audio
Endpoints
POST /v1/audio/speech

GLM TTS API pricing

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Fast (INT8)
per 1k characters
$0.20
Quality (FP16)
per 1k characters
$0.21
Compare on the full pricing page

How to call the GLM TTS API

GLM TTS serves speech through POST /v1/audio/speech and returns playable audio. Send the text to speak as input with the model id glm-tts. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/audio/speech \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-tts",
    "input": "Welcome to EmpirioLabs. Your build just finished."
  }' \
  --output speech.mp3
Python
import requests

response = requests.post(
    "https://api.empiriolabs.ai/v1/audio/speech",
    headers={"Authorization": "Bearer YOUR_EMPIRIOLABS_API_KEY"},
    json={"model": "glm-tts", "input": "Welcome to EmpirioLabs."},
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)
Full GLM TTS API reference

GLM TTS API parameters

Request parameters supported by the GLM TTS API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
inputstring--Text to synthesize. For multi-speaker use [S1] / [S2] tags or 'Speaker N:' lines.
voiceenumemmaemma, james, arthur, xiaomei, zhigang, customemma=English Female, james=US Male, arthur=US Male alt, xiaomei=Chinese Female, zhigang=Chinese Male, custom=upload reference via voice_audio_url.
voice_audio_urlstring--Reference audio URL for custom voice cloning. The reference recording must contain the speaker reading this exact consent phrase aloud, in their own voice: "I...
output_formatenummp3mp3, wavOutput media file format (mp3, wav, mp4, png, jpg, etc., depending on the endpoint).
speednumber10.5 to 2Speaking rate multiplier.
model_qualityenumqualityquality, fastquality=FP16 (better), fast=INT8 (quicker)
sample_rateenum2400024000, 16000Output sample rate in Hz.
volumenumber10.1 to 2Output gain multiplier.
use_cachebooleantrue-Speeds up repeated identical generations.
optimize_inputbooleantrue-Auto-fix pronunciation of technical terms, acronyms, and special characters.
seednumber--Reproducibility seed.

Good to know

Limits

  • Max input: 5,000 characters
  • Generation: 5-10 minutes

Voice cloning

  • Reference audio: 3-10 seconds
  • Accepted formats: WAV, MP3, OGG, FLAC, AAC, M4A, WebM

Preset voices

  • emma (English F)
  • james (US M)
  • arthur (UK M)
  • xiaomei (Chinese F)
  • zhigang (Chinese M)

GLM TTS API: common questions

How much does the GLM TTS API cost?

On EmpirioLabs, GLM TTS is billed pay as you go: Fast (INT8) $0.20 per 1k characters; Quality (FP16) $0.21 per 1k characters. The live rate card on this page always matches what the API charges.

Which endpoint does GLM TTS use?

GLM TTS is served through POST /v1/audio/speech on api.empiriolabs.ai with standard bearer-token authentication.

Can I try GLM TTS in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM TTS in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM TTS API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.