GLM-5.1 API

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

Z.aiText Generation202K contextChinaProprietary EndpointNew

About GLM-5.1

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

Notes: - Served by Alibaba Cloud Model Studio in China deployment mode - Context window: 202K tokens - Maximum output: 128K tokens - Supports function calling, structured output, and context cache - Structured output should run with enable_thinking=false - Does not support web search, batches, prefix continuation, or fine-tuning

reasoningfunction callingstructured outputcache

GLM-5.1 specs

Model ID
glm-5-1
Provider
Z.ai
Category
Text Generation
Context window
202K tokens
Input
text
Output
text
Region
China
Endpoints
POST /v1/chat/completions
POST /v1/responses
POST /v1/messages

GLM-5.1 API pricingSave up to 41%

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Input
per 1M prompt tokens
$1.40<=32K $0.825$1.4032K-200K $1.10
Output
per 1M generated tokens
$4.40<=32K $3.301$4.4032K-200K $3.851
Implicit cache read
per 1M cached input tokens
$0.26<=32K $0.165$0.2632K-200K $0.22
Web Search (Linkup)
per call when invoked
$0.013
Compare on the full pricing page

How to call the GLM-5.1 API

GLM-5.1 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-1. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5-1",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-5-1",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)
Full GLM-5.1 API reference

GLM-5.1 API parameters

Request parameters supported by the GLM-5.1 API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
max_tokensinteger40961 to 128000Maximum number of output tokens to generate.
temperaturenumber10 to 2Controls randomness. Lower values make responses more deterministic.
top_pnumber0.950 to 1Nucleus sampling cutoff.
top_kinteger201 to 100Limits sampling to the top K tokens.
repetition_penaltynumber10.1 to 2Penalizes repeated tokens.
reasoning_effortenummediumnone, low, medium, high, maxReasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style...
enable_thinkingbooleantrue-Allow the model to reason before answering. Disable this for strict structured output.
thinking_budgetinteger327681 to 38912Maximum tokens available for reasoning content when thinking is enabled.
tool_streambooleanfalse-Stream function-call arguments incrementally when streaming.
toolsarray[]-OpenAI-compatible function calling tool definitions.
tool_choiceobject--OpenAI-compatible tool choice control.
parallel_tool_callsbooleantrue-Allow multiple tool calls in a single assistant turn when supported.
response_formatobject--OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas.
stoparray--Optional stop sequences.
2 more parameters in the docs

GLM-5.1 API: common questions

How much does the GLM-5.1 API cost?

On EmpirioLabs, GLM-5.1 is billed pay as you go: Input <=32K $0.825 (was $1.40); 32K-200K $1.10 (was $1.40) per 1M prompt tokens; Output <=32K $3.301 (was $4.40); 32K-200K $3.851 (was $4.40) per 1M generated tokens; Implicit cache read <=32K $0.165 (was $0.26); 32K-200K $0.22 (was $0.26) per 1M cached input tokens. The live rate card on this page always matches what the API charges.

What is the context window of GLM-5.1?

GLM-5.1 supports a 202K-token context window.

Is the GLM-5.1 API OpenAI-compatible?

Yes. GLM-5.1 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-1.

Can I try GLM-5.1 in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM-5.1 in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM-5.1 API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.