
Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.
Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.
Notes: - Served by Alibaba Cloud Model Studio in China deployment mode - Context window: 202K tokens - Maximum output: 128K tokens - Supports function calling, structured output, and context cache - Structured output should run with enable_thinking=false - Does not support web search, batches, prefix continuation, or fine-tuning
glm-5-1POST /v1/chat/completionsPOST /v1/responsesPOST /v1/messagesLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
GLM-5.1 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-1. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-5-1",
"messages": [
{"role": "user", "content": "Write a haiku about the ocean."}
]
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.empiriolabs.ai/v1",
api_key="YOUR_EMPIRIOLABS_API_KEY",
)
response = client.chat.completions.create(
model="glm-5-1",
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)Request parameters supported by the GLM-5.1 API on EmpirioLabs. Defaults apply when a field is omitted.
| Parameter | Type | Default | Range / values | Description |
|---|---|---|---|---|
| max_tokens | integer | 4096 | 1 to 128000 | Maximum number of output tokens to generate. |
| temperature | number | 1 | 0 to 2 | Controls randomness. Lower values make responses more deterministic. |
| top_p | number | 0.95 | 0 to 1 | Nucleus sampling cutoff. |
| top_k | integer | 20 | 1 to 100 | Limits sampling to the top K tokens. |
| repetition_penalty | number | 1 | 0.1 to 2 | Penalizes repeated tokens. |
| reasoning_effort | enum | medium | none, low, medium, high, max | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style... |
| enable_thinking | boolean | true | - | Allow the model to reason before answering. Disable this for strict structured output. |
| thinking_budget | integer | 32768 | 1 to 38912 | Maximum tokens available for reasoning content when thinking is enabled. |
| tool_stream | boolean | false | - | Stream function-call arguments incrementally when streaming. |
| tools | array | [] | - | OpenAI-compatible function calling tool definitions. |
| tool_choice | object | - | - | OpenAI-compatible tool choice control. |
| parallel_tool_calls | boolean | true | - | Allow multiple tool calls in a single assistant turn when supported. |
| response_format | object | - | - | OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas. |
| stop | array | - | - | Optional stop sequences. |
On EmpirioLabs, GLM-5.1 is billed pay as you go: Input <=32K $0.825 (was $1.40); 32K-200K $1.10 (was $1.40) per 1M prompt tokens; Output <=32K $3.301 (was $4.40); 32K-200K $3.851 (was $4.40) per 1M generated tokens; Implicit cache read <=32K $0.165 (was $0.26); 32K-200K $0.22 (was $0.26) per 1M cached input tokens. The live rate card on this page always matches what the API charges.
GLM-5.1 supports a 202K-token context window.
Yes. GLM-5.1 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-1.
Yes. The EmpirioLabs playground runs GLM-5.1 in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Explore our models, or contact us about business inquiries, custom deployments, or anything else.