GLM 5.2 API

Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.

Z.aiText Generation1M contextProprietary EndpointNew

About GLM 5.2

Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.

Notes: - Context window: 1M tokens - Maximum output: 128K tokens - Adjustable reasoning effort from minimal to max (max recommended for complex coding) - Built-in web search adds $0.033 per request when used - Supports function calling, structured output, and streaming - Run structured output with thinking disabled

reasoningfunction callingstructured outputweb search

GLM 5.2 specs

Model ID
glm-5-2
Provider
Z.ai
Category
Text Generation
Context window
1M tokens
Max output
131,072 tokens
Input
text
Output
text
Endpoints
POST /v1/chat/completions
POST /v1/responses
POST /v1/messages

GLM 5.2 API pricing

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Input
per 1M prompt tokens
$1.40
Output
per 1M generated tokens
$4.40
Web Search
per request
$0.033
Compare on the full pricing page

How to call the GLM 5.2 API

GLM 5.2 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-2. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5-2",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-5-2",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)
Full GLM 5.2 API reference

GLM 5.2 API parameters

Request parameters supported by the GLM 5.2 API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
max_tokensinteger655361 to 131072Maximum number of output tokens to generate.
temperaturenumber10 to 1Controls randomness. Lower values make responses more deterministic.
top_pnumber0.950.01 to 1Nucleus sampling cutoff.
reasoning_effortenummaxnone, minimal, low, medium, high, xhigh, maxGLM-5.2 reasoning effort. none disables thinking; minimal through max set how hard the model reasons before answering. max is recommended for complex coding.
enable_thinkingbooleantrue-Allow the model to reason before answering. Turn off for the lowest-latency replies or strict structured output.
do_samplebooleantrue-Enable sampling. Turn off for greedy deterministic output (temperature and top_p are ignored).
tool_web_searchbooleanfalse-Enable built-in web search. Adds $0.033 per request when used.
search_recency_filterenumnoLimitoneDay, oneWeek, oneMonth, oneYear, noLimitLimit web search results to a recency window.
countinteger101 to 50Number of web search results to retrieve when web search is enabled.
search_domain_filterstring--Restrict web search to a specific domain.
search_promptstring--Optional prompt used to summarize retrieved web search results.
search_resultbooleantrue-Return web search result metadata in the response when web search is enabled.
tool_streambooleanfalse-Stream function-call arguments incrementally when streaming.
toolsarray[]-OpenAI-compatible function calling tool definitions.
3 more parameters in the docs

GLM 5.2 API: common questions

How much does the GLM 5.2 API cost?

On EmpirioLabs, GLM 5.2 is billed pay as you go: Input $1.40 per 1M prompt tokens; Output $4.40 per 1M generated tokens; Web Search $0.033 per request. The live rate card on this page always matches what the API charges.

What is the context window of GLM 5.2?

GLM 5.2 supports a 1M-token context window with up to 131,072 output tokens per response.

Is the GLM 5.2 API OpenAI-compatible?

Yes. GLM 5.2 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-2.

Can I try GLM 5.2 in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 5.2 in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM 5.2 API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.