Qwen3.5 Flash API

Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.

Alibaba CloudText Generation1M contextSingaporeProprietary Endpoint

About Qwen3.5 Flash

Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.

visionweb searchcode interpreterfunction calling

Qwen3.5 Flash specs

Model ID
qwen3-5-flash
Provider
Alibaba Cloud
Category
Text Generation
Context window
1M tokens
Max output
32,768 tokens
Input
text, image, video
Output
text
Region
Singapore
Endpoints
POST /v1/chat/completions
POST /v1/responses
POST /v1/messages

Qwen3.5 Flash API pricingSave up to 10%

Live pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.

Type
Spec
Rate
Input
per 1M prompt tokens
$0.10$0.090
Output
per 1M generated tokens
$0.40$0.368
Web Search
per call
$0.015
Image Search
per call
$0.012
Compare on the full pricing page

How to call the Qwen3.5 Flash API

Qwen3.5 Flash serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id qwen3-5-flash. Get an API key from the EmpirioLabs dashboard.

cURL
curl https://api.empiriolabs.ai/v1/chat/completions \
  -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-5-flash",
    "messages": [
      {"role": "user", "content": "Write a haiku about the ocean."}
    ]
  }'
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.empiriolabs.ai/v1",
    api_key="YOUR_EMPIRIOLABS_API_KEY",
)

response = client.chat.completions.create(
    model="qwen3-5-flash",
    messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)
Full Qwen3.5 Flash API reference

Qwen3.5 Flash API parameters

Request parameters supported by the Qwen3.5 Flash API on EmpirioLabs. Defaults apply when a field is omitted.

ParameterTypeDefaultRange / valuesDescription
temperaturenumber0.70 to 2Sampling temperature. 0 = deterministic, 2 = maximum randomness.
top_pnumber0.90 to 1Nucleus sampling probability mass. Lower = more focused.
max_tokensnumber40961 to 32768Maximum tokens in the response.
enable_thinkingbooleantrue-Enable extended thinking mode. Slower but improves reasoning-heavy tasks.
vl_high_resolution_imagesbooleantrue-Use higher resolution for input images. Better detail at higher cost.
max_pixelsnumber26214401 to 99999999Maximum pixels per input image. Larger = more detail but slower / more tokens.
tool_web_searchbooleanfalse-Search the web for real-time information.
tool_web_extractorbooleantrue-Extract and read content from URLs. Requires Web Search and Thinking.
tool_code_interpreterbooleantrue-Run Python code in a sandbox. Requires Thinking.
tool_web_search_imagebooleantrue-Search the web for images from text descriptions.
tool_image_searchbooleantrue-Find similar images from an uploaded image.
video_fpsnumber20.1 to 10Frames-per-second sampled from input video for analysis.
treat_images_as_videobooleanfalse-Treat a sequence of input images as a video for temporal reasoning.
disable_formattingbooleanfalse-Skip the EmpirioLabs Markdown formatting (citation [[N]](url) rewriting + References block when web search / tools were used). The raw upstream answer with plain [N]...

Good to know

Built-in tools (billed only when invoked)

  • Web search: $0.015/call
  • Web extractor: free
  • Code interpreter: free
  • Text-to-image search: $0.012/call
  • Image-to-image search: $0.012/call

Other

  • Thinking tokens are billed as output tokens

Text-to-Image Search and Image-to-Image Search use the Image Search pricing row. Each invoked image search is billed at that listed per-call rate.

Qwen3.5 Flash variants

Variants are alternate versions of Qwen3.5 Flash with their own model id. Depending on the variant, they can differ in serving region, pricing, or supported parameters; everything else works the same way.

Qwen3.5 Flash :variant1China1M contextSave up to 68%

Call it with the model id qwen3-5-flash:variant1.

Type
Spec
Rate
Input
per 1M prompt tokens
$0.090<=128K $0.029128K-256K $0.115256K-1M $0.172
Output
per 1M generated tokens
$0.368<=128K $0.287128K-256K $1.147256K-1M $1.72
Web search
per query when enabled
$0.01
Try :variant1 in the playground

Qwen3.5 Flash API: common questions

How much does the Qwen3.5 Flash API cost?

On EmpirioLabs, Qwen3.5 Flash is billed pay as you go: Input $0.090 (was $0.10) per 1M prompt tokens; Output $0.368 (was $0.40) per 1M generated tokens; Web Search $0.015 per call. The live rate card on this page always matches what the API charges.

What is the context window of Qwen3.5 Flash?

Qwen3.5 Flash supports a 1M-token context window with up to 32,768 output tokens per response.

Is the Qwen3.5 Flash API OpenAI-compatible?

Yes. Qwen3.5 Flash serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to qwen3-5-flash.

Which Qwen3.5 Flash variants are available?

Qwen3.5 Flash is available as 2 model ids: the default qwen3-5-flash plus qwen3-5-flash:variant1 (China). Variants can differ in serving region, pricing, or supported parameters; the rate cards for each are on this page.

Can I try Qwen3.5 Flash in the browser before integrating?

Yes. The EmpirioLabs playground runs Qwen3.5 Flash in the browser with the same parameters the API exposes, so you can test prompts before writing code.

How do I get a Qwen3.5 Flash API key?

Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.