GLM 4.6V Flash API: Pricing, Playground & Docs

About GLM 4.6V Flash

Free multimodal GLM-4.6V model for image, video, file, and text understanding with native function calling.

Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled. Supports text, image, video, and file input plus function tools, tool_choice auto, JSON response_format, thinking controls, tool_stream, temperature, top_p, do_sample, max_tokens, stop, and streaming.

Also known as Z.ai GLM 4.6V Flash, GLM-4.6V-Flash, glm-4-6v-flash

visionvideo understandingdocument understandingfunction callingweb searchreasoning

GLM 4.6V Flash specs

Model ID: glm-4-6v-flash
Provider: Z.ai
Category: Text Generation
Released: Dec 8, 2025
Context window: 128K tokens
Max output: 32,768 tokens
Input: TextImageVideoFile
Output: Text
Structured output: JSON Mode
Region: Singapore
Endpoints: POST/v1/chat/completionsPOST/v1/responsesPOST/v1/messagesPOST/v1beta/models/glm-4-6v-flash:generateContent
Alternate model IDs: glm-4.6v-flashzai/glm-4.6v-flashzhipu/glm-4.6v-flash

GLM 4.6V Flash API pricing

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type

Spec

Rate

Input

per 1M prompt tokens

Free

Output

per 1M generated tokens

Free

Implicit cache read

per 1M cached input tokens

Free

Web search

per request when enabled

$0.033

Compare on the full pricing page

How to call the GLM 4.6V Flash API

GLM 4.6V Flash serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-4-6v-flash. Get an API key from the EmpirioLabs dashboard.

cURL

curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4-6v-flash",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="glm-4-6v-flash",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)

Full GLM 4.6V Flash API reference

GLM 4.6V Flash API parameters

Request parameters supported by the GLM 4.6V Flash API on EmpirioLabs. Defaults apply when a field is omitted.

Parameter	Type	Default	Range / values	Description
temperature	number	1	0 to 1	Sampling temperature. Lower values are more deterministic. GLM-4.7-Flash and GLM-4.6V-Flash default to 1.0; GLM-4.5-Flash defaults to 0.6.
top_p	number	0.95	0.01 to 1	Nucleus sampling probability mass. Z.AI documents a 0.95 default for the GLM-4.7, GLM-4.6, and GLM-4.5 series.
max_tokens	number	4096	1 to 32768	Maximum output tokens for GLM-4.6V-Flash: 32768.
stop	array	-	-	Stop word list. Z.AI currently supports one stop string in array form.
do_sample	boolean	true	-	Enable sampling. When false, temperature and top_p do not affect generation.
enable_thinking	boolean	true	-	Controls Z.AI thinking mode. Enabled is the default; GLM-4.6V-Flash automatically decides whether to think when enabled.
thinking	object	-	-	Advanced thinking object. Use {"type":"enabled"} or {"type":"disabled"}. GLM-4.6V-Flash automatically decides whether to think when enabled.
tools	array	-	-	Function tools and the built-in web_search tool are supported.
tool_choice	enum	auto	auto	Controls whether the model may use tools. Z.AI documents auto tool selection; omit tools to disable tool use.
tool_stream	boolean	false	-	Stream function-call tool output when stream is true. Z.AI documents tool_stream for GLM-4.6 and newer models.
tool_web_search	boolean	false	-	Enable built-in web search. Adds $0.033 per request when enabled.
search_result	boolean	true	-	Return structured web search result metadata when web search is enabled.
search_prompt	string	-	-	Optional instruction for summarizing retrieved web search results.
count	number	10	1 to 50	Number of web search results to retrieve.

3 more parameters in the docs

Good to know

Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled.

GLM 4.6V Flash API: common questions

How much does the GLM 4.6V Flash API cost?

On EmpirioLabs, GLM 4.6V Flash is billed pay as you go. The live rate card on this page always matches what the API charges.

What is the context window of GLM 4.6V Flash?

GLM 4.6V Flash supports a 128K-token context window with up to 32,768 output tokens per response.

Is the GLM 4.6V Flash API OpenAI-compatible?

Yes. GLM 4.6V Flash serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-4-6v-flash.

Can I try GLM 4.6V Flash in the browser before integrating?

Yes. The EmpirioLabs playground runs GLM 4.6V Flash in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a GLM 4.6V Flash API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

GLM 4.6V Flash API

About GLM 4.6V Flash

GLM 4.6V Flash specs

GLM 4.6V Flash API pricing

How to call the GLM 4.6V Flash API

GLM 4.6V Flash API parameters

Good to know

GLM 4.6V Flash API: common questions

How much does the GLM 4.6V Flash API cost?

What is the context window of GLM 4.6V Flash?

Is the GLM 4.6V Flash API OpenAI-compatible?

Can I try GLM 4.6V Flash in the browser before integrating?

How do I get a GLM 4.6V Flash API key?

More Text Generation model APIs

GLM 5.2

Kimi K3

Kimi K2.7 Code

Muse Spark 1.1

Fugu Ultra v1.1

Qwen3.7 Plus

Ready to use better endpoints?