
Qwen3.5 4B는 256K 컨텍스트, 이미지 및 비디오 입력, 기능 도구 및 구조화 된 출력을 가진 저비용 멀티모드 소싱 모델입니다.
Qwen3.5 4B는 256K 컨텍스트, 이미지 및 비디오 입력, 기능 도구 및 구조화 된 출력을 가진 저비용 멀티모드 소싱 모델입니다.
텍스트, 이미지 및 비디오 입력, 스트리밍, 기능 도구, 구조 JSON 출력, 종자 제어 및 기본으로 생각 모드를 지원합니다. reasoning effort 또는 thought budget for bounded thought, or enable thinking=false for direct Answer. 자동 캐시 읽음은 모델 서비스에 의해 보고될 때 캐시 입력 비율에 청구됩니다. Explicit 캐시 컨트롤은 지원되지 않습니다.
다른 이름 Alibaba Cloud Qwen3.5 4B, Qwen3.5-4B, qwen3-5-4b
qwen3-5-4bPOST /v1/chat/completionsPOST /v1/responsesPOST /v1/messagesPOST /v1/completionsLive pay-as-you-go rates from the EmpirioLabs catalog. You are billed only for what you use, with no monthly minimum.
Qwen3.5 4B serves the OpenAI-compatible Chat Completions API. Point any OpenAI SDK at https://api.empiriolabs.ai/v1 with your EmpirioLabs API key and use the model id qwen3-5-4b. Get an API key from the EmpirioLabs dashboard.
curl https://api.empiriolabs.ai/v1/chat/completions \
-H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-5-4b",
"messages": [
{"role": "user", "content": "Write a haiku about the ocean."}
]
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.empiriolabs.ai/v1",
api_key="YOUR_EMPIRIOLABS_API_KEY",
)
response = client.chat.completions.create(
model="qwen3-5-4b",
messages=[{"role": "user", "content": "Write a haiku about the ocean."}],
)
print(response.choices[0].message.content)Request parameters supported by the Qwen3.5 4B API on EmpirioLabs. Defaults apply when a field is omitted.
| 파라미터 | 유형 | 기본 | 범위 / 값 | 설명 |
|---|---|---|---|---|
| temperature | number | 0.7 | 0 to 2 | Sampling temperature. 0 is deterministic and 2 is maximum randomness. |
| top_p | number | 0.95 | 0 to 1 | Nucleus sampling probability mass. Lower values make outputs more focused. |
| max_tokens | integer | 4096 | 1 to 32768 | Maximum output tokens. |
| stop | string | - | - | Up to 4 strings where the model will stop generating further tokens. |
| reasoning_effort | enum | medium | none, low, medium, high, max | Reasoning effort. none disables thinking; low, medium, high, and max set bounded thinking budgets. |
| enable_thinking | boolean | true | - | Enable the model reasoning channel before final output. |
| thinking_budget | integer | 4096 | 1024 to 32768 | Maximum thinking tokens before the final answer. If max_tokens is lower, the service reserves room for the answer. |
| top_k | integer | 20 | 1 to 200 | Limit sampling to the top K candidate tokens when supported. |
| min_p | number | 0 | 0 to 1 | Minimum probability threshold for token sampling. |
| presence_penalty | number | 0 | -2 to 2 | Penalty for tokens that already appeared in the generated text. |
| frequency_penalty | number | 0 | -2 to 2 | Penalty based on how often a token has already appeared. |
| repetition_penalty | number | 1 | 0.1 to 2 | Penalty used by SGLang to reduce repeated text. |
| seed | integer | - | 0 to 2147483647 | Optional random seed for reproducible sampling. |
| logprobs | boolean | false | - | Return token log probabilities when supported. |
텍스트, 이미지 및 비디오 입력, 스트리밍, 기능 도구, 구조 JSON 출력, 종자 제어 및 기본으로 생각 모드를 지원합니다. reasoning effort 또는 thought budget for bounded thought, or enable thinking=false for direct Answer. 자동 캐시 읽음은 모델 서비스에 의해 보고될 때 캐시 입력 비율에 청구됩니다. Explicit 캐시 컨트롤은 지원되지 않습니다.
On EmpirioLabs, Qwen3.5 4B is billed pay as you go: 이름 * $0.04 1M 신속한 토큰 당; 제품정보 $0.07 1M 생성 토큰; Implicit 캐시 읽기 $0.02 1M 캐시된 입력 토큰 당. The live rate card on this page always matches what the API charges.
Qwen3.5 4B supports a 256K-token context window with up to 32,768 output tokens per response.
Yes. Qwen3.5 4B serves the OpenAI-compatible Chat Completions API, so existing OpenAI SDKs work by pointing base_url at https://api.empiriolabs.ai/v1 and setting the model id to qwen3-5-4b.
Yes. The EmpirioLabs playground runs Qwen3.5 4B in the browser with the same parameters the API exposes, so you can test prompts before writing code.
Create an EmpirioLabs account, then generate a key under API Keys in the dashboard. Billing is pay-as-you-go credits, so you only pay for the requests you make.
Check out our pricing or reach out if you want your own model deployed on our stack.