GLM 4.7 Flash API

Free lightweight GLM-4.7 text model for coding, reasoning, long-context writing, and general chat.

Z.aiText Generation200K contextSingaporeProprietary EndpointNew

About GLM 4.7 Flash

Free lightweight GLM-4.7 text model for coding, reasoning, long-context writing, and general chat.

Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled. Supports function tools, tool_choice auto, JSON response_format, thinking controls, tool_stream, temperature, top_p, do_sample, max_tokens, stop, and streaming.

Also known as GLM Flash, GLM-4.7-Flash

reasoningfunction callingstructured outputweb search

GLM 4.7 Flash specs

Model ID
glm-4-7-flash
Provider
Z.ai
Category
Text Generation
Context window
200K tokens
Max output
131,072 tokens
Input
text
Output
text
Region
Singapore
Endpoints
POST /v1/chat/completions
POST /v1/responses
POST /v1/messages

GLM 4.7 Flash API pricing

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Input
per 1M prompt tokens
Free
Output
per 1M generated tokens
Free
Implicit cache read
per 1M cached input tokens
Free
Web Search
per request when enabled
$0.033
Compare on the full pricing page

How to call the GLM 4.7 Flash API

GLM 4.7 Flash serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-4-7-flash. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4-7-flash",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-4-7-flash",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)
Full GLM 4.7 Flash API reference

GLM 4.7 Flash API parameters

Request parameters supported by the GLM 4.7 Flash API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
temperaturenumber10 to 1Sampling temperature. Lower values are more deterministic. GLM-4.7-Flash and GLM-4.6V-Flash default to 1.0; GLM-4.5-Flash defaults to 0.6.
top_pnumber0.950.01 to 1Nucleus sampling probability mass. Z.AI documents a 0.95 default for the GLM-4.7, GLM-4.6, and GLM-4.5 series.
max_tokensnumber40961 to 131072Maximum output tokens for GLM-4.7-Flash: 131072.
stoparray--Stop word list. Z.AI currently supports one stop string in array form.
do_samplebooleantrue-Enable sampling. When false, temperature and top_p do not affect generation.
enable_thinkingbooleantrue-Controls Z.AI thinking mode. Enabled is the default and makes GLM-4.7-Flash think; disable it for simple low-latency turns.
thinkingobject--Advanced thinking object. Use {"type":"enabled"} or {"type":"disabled"}. GLM-4.7-Flash thinks when enabled.
response_formatobject--Set {"type":"json_object"} for JSON mode or {"type":"text"} for plain text.
toolsarray--Function tools and the built-in web_search tool are supported.
tool_choiceenumautoautoControls whether the model may use tools. Z.AI documents auto tool selection; omit tools to disable tool use.
tool_streambooleanfalse-Stream function-call tool output when stream is true. Z.AI documents tool_stream for GLM-4.6 and newer models.
tool_web_searchbooleanfalse-Enable built-in web search. Adds $0.033 per request when enabled.
search_resultbooleantrue-Return structured web search result metadata when web search is enabled.
search_promptstring--Optional instruction for summarizing retrieved web search results.
3 more parameters in the docs

Good to know

Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled.

GLM 4.7 Flash API: common questions

How much does the GLM 4.7 Flash API cost?

On EmpirioLabs, GLM 4.7 Flash is billed pay as you go: Input Free per 1M prompt tokens; Output Free per 1M generated tokens; Implicit cache read Free per 1M cached input tokens. The live rate card on this page always matches what the API charges.

What is the context window of GLM 4.7 Flash?

GLM 4.7 Flash supports a 200K-token context window with up to 131,072 output tokens per response.

Is the GLM 4.7 Flash API OpenAI-compatible?

Yes. GLM 4.7 Flash serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-4-7-flash.

Can I try GLM 4.7 Flash in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 4.7 Flash in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM 4.7 Flash API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.