
Free multimodal GLM-4.6V model for image, video, file, and text understanding with native function calling.
Free multimodal GLM-4.6V model for image, video, file, and text understanding with native function calling.
Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled. Supports text, image, video, and file input plus function tools, tool_choice auto, JSON response_format, thinking controls, tool_stream, temperature, top_p, do_sample, max_tokens, stop, and streaming.
glm-4-6v-flashPOST /v1/chat/completionsPOST /v1/responsesPOST /v1/messagesLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
GLM 4.6V Flash serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id glm-4-6v-flash. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4-6v-flash",
"messages": [
{"role": "user", "content": "Write a haiku about the ocean."}
]
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.empiriolabs.ai/v1",
api_key="YOUR_EMPIRIOLABS_API_KEY",
)
response = client.chat.completions.create(
model="glm-4-6v-flash",
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)Request parameters supported by the GLM 4.6V Flash API on EmpirioLabs. Defaults apply when a field is omitted.
| Parameter | Type | Default | Range / values | Description |
|---|---|---|---|---|
| temperature | number | 1 | 0 to 1 | Sampling temperature. Lower values are more deterministic. GLM-4.7-Flash and GLM-4.6V-Flash default to 1.0; GLM-4.5-Flash defaults to 0.6. |
| top_p | number | 0.95 | 0.01 to 1 | Nucleus sampling probability mass. Z.AI documents a 0.95 default for the GLM-4.7, GLM-4.6, and GLM-4.5 series. |
| max_tokens | number | 4096 | 1 to 32768 | Maximum output tokens for GLM-4.6V-Flash: 32768. |
| stop | array | - | - | Stop word list. Z.AI currently supports one stop string in array form. |
| do_sample | boolean | true | - | Enable sampling. When false, temperature and top_p do not affect generation. |
| enable_thinking | boolean | true | - | Controls Z.AI thinking mode. Enabled is the default; GLM-4.6V-Flash automatically decides whether to think when enabled. |
| thinking | object | - | - | Advanced thinking object. Use {"type":"enabled"} or {"type":"disabled"}. GLM-4.6V-Flash automatically decides whether to think when enabled. |
| response_format | object | - | - | Set {"type":"json_object"} for JSON mode or {"type":"text"} for plain text. |
| tools | array | - | - | Function tools and the built-in web_search tool are supported. |
| tool_choice | enum | auto | auto | Controls whether the model may use tools. Z.AI documents auto tool selection; omit tools to disable tool use. |
| tool_stream | boolean | false | - | Stream function-call tool output when stream is true. Z.AI documents tool_stream for GLM-4.6 and newer models. |
| tool_web_search | boolean | false | - | Enable built-in web search. Adds $0.033 per request when enabled. |
| search_result | boolean | true | - | Return structured web search result metadata when web search is enabled. |
| search_prompt | string | - | - | Optional instruction for summarizing retrieved web search results. |
Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled.
On EmpirioLabs, GLM 4.6V Flash is billed pay as you go: Input Free per 1M prompt tokens; Output Free per 1M generated tokens; Implicit cache read Free per 1M cached input tokens. The live rate card on this page always matches what the API charges.
GLM 4.6V Flash supports a 128K-token context window with up to 32,768 output tokens per response.
Yes. GLM 4.6V Flash serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to glm-4-6v-flash.
Yes. The EmpirioLabs playground runs GLM 4.6V Flash in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Explore our models, or contact us about business inquiries, custom deployments, or anything else.