GLM 5.1 API: Pricing, Playground & Docs

Q: How much does the GLM 5.1 API cost?

On EmpirioLabs, GLM 5.1 is billed pay as you go. The live rate card on this page always matches what the API charges.

Q: What is the context window of GLM 5.1?

GLM 5.1 supports a 202K-token context window.

Q: Is the GLM 5.1 API OpenAI-compatible?

Yes. GLM 5.1 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-1.

Q: Can I try GLM 5.1 in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 5.1 in the browser with the same parameters the API exposes, so you can test prompts before writing code.

Q: How do I get a GLM 5.1 API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

About GLM 5.1

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

Notes: - Served by Alibaba Cloud Model Studio in China deployment mode - Context window: 202K tokens - Maximum output: 128K tokens - Supports function calling, structured output, and context cache - Structured output should run with enable_thinking=false - Does not support web search, batches, prefix continuation, or fine-tuning

Also known as Z.ai GLM 5.1, GLM-5.1, glm-5-1

reasoningfunction callingcache

GLM 5.1 specs

Model ID: glm-5-1
Provider: Z.ai
Category: Text Generation
Released: Apr 7, 2026
Context window: 202K tokens
Input: Text
Output: Text
Structured output: JSON Schema
Region: China
Endpoints: POST/v1/chat/completionsPOST/v1/responsesPOST/v1/messagesPOST/v1beta/models/glm-5-1:generateContent

GLM 5.1 API pricingSave up to 41%

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type

Spec

Rate

Input

per 1M prompt tokens

$1.40<=32K $0.825$1.4032K-200K $1.10

Output

per 1M generated tokens

$4.40<=32K $3.301$4.4032K-200K $3.851

Implicit cache read

per 1M cached input tokens

$0.26<=32K $0.165$0.2632K-200K $0.22

Web Search (Linkup)

per call when invoked

$0.013

Compare on the full pricing page

How to call the GLM 5.1 API

GLM 5.1 serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-5-1. Get an API key from the EmpirioLabs dashboard.

cURL

curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5-1",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-5-1",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)

Full GLM 5.1 API reference

GLM 5.1 API parameters

Request parameters supported by the GLM 5.1 API on EmpirioLabs. Defaults apply when a field is omitted.

Parameter	Type	Default	Range / values	Description
max_tokens	integer	4096	1 to 128000	Maximum number of output tokens to generate.
temperature	number	1	0 to 2	Controls randomness. Lower values make responses more deterministic.
top_p	number	0.95	0 to 1	Nucleus sampling cutoff.
top_k	integer	20	1 to 100	Limits sampling to the top K tokens.
repetition_penalty	number	1	0.1 to 2	Penalizes repeated tokens.
reasoning_effort	enum	medium	none, low, medium, high, max	Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style...
enable_thinking	boolean	true	-	Allow the model to reason before answering. Disable this for strict structured output.
thinking_budget	integer	32768	1 to 38912	Maximum tokens available for reasoning content when thinking is enabled.
tool_stream	boolean	false	-	Stream function-call arguments incrementally when streaming.
tools	array	[]	-	OpenAI-compatible function calling tool definitions.
tool_choice	object	-	-	OpenAI-compatible tool choice control.
parallel_tool_calls	boolean	true	-	Allow multiple tool calls in a single assistant turn when supported.
stop	array	-	-	Optional stop sequences.
response_format	enum	-	-	Constrain the output to JSON. Use JSON mode for any valid JSON object, or JSON schema to force output that matches a schema you provide.

2 more parameters in the docs

GLM 5.1 API: common questions

How much does the GLM 5.1 API cost?

On EmpirioLabs, GLM 5.1 is billed pay as you go. The live rate card on this page always matches what the API charges.

What is the context window of GLM 5.1?

GLM 5.1 supports a 202K-token context window.

Is the GLM 5.1 API OpenAI-compatible?

Yes. GLM 5.1 serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-5-1.

Can I try GLM 5.1 in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 5.1 in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM 5.1 API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

GLM 5.1 API

About GLM 5.1

GLM 5.1 specs

GLM 5.1 API pricingSave up to 41%

How to call the GLM 5.1 API

GLM 5.1 API parameters

GLM 5.1 API: common questions

How much does the GLM 5.1 API cost?

What is the context window of GLM 5.1?

Is the GLM 5.1 API OpenAI-compatible?

Can I try GLM 5.1 in the browser before integrating?

How do I get a GLM 5.1 API key?

More Text Generation model APIs

GLM 5.2

Kimi K3

Kimi K2.7 Code

Muse Spark 1.1

Fugu Ultra v1.1

Qwen3.7 Plus

Ready to use better endpoints?