Pricing

Kling 3.0 Turbo

Kling AIReleased Jun 17, 2026Video Generation

Proprietary Endpoint

Text-to-video and image-to-video with synchronized native audio, at 720p or 1080p for 3 to 15 seconds, with aspect ratio and prompt control.

Chat API

Type

Spec

Rate

720p

per second

$0.18

1080p

per second

$0.225

GLM 5.2

Z.aiSingaporeReleased Jun 16, 2026Ctx 1MText Generation

Proprietary Endpoint

Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.40

Output

per 1M generated tokens

$4.40

Web search

per request

$0.033

GLM 5.2

Z.aiGermanyReleased Jun 16, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 21%

Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.40$1.10

Output

per 1M generated tokens

$4.40$3.851

Implicit cache read

per 1M cached input tokens

$0.275

Web Search (Linkup)

per call when invoked

$0.013

Kimi K3

Moonshot AIInternationalReleased Jul 15, 2026Ctx 1MText Generation

Proprietary Endpoint

Kimi K3 is Moonshot's flagship reasoning model with a 1M token context, always-on thinking, native web search, and text, image, and video inputs.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$3.00

Output

per 1M generated tokens

$15.00

Web search

per call when invoked

$0.015

Kimi K2.7 Code

Moonshot AIInternationalReleased Jun 16, 2026Ctx 256KText Generation

Proprietary Endpoint

Kimi K2.7 Code is Moonshot's trillion-parameter agentic coding model with 256K context, always-on reasoning, and text, image, and video inputs.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.95

Output

per 1M generated tokens

$4.00

Web search

per call when invoked

$0.015

Kimi K2.7 Code

Moonshot AIGermanyReleased Jun 16, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 7%

Kimi K2.7 Code is Moonshot's trillion-parameter agentic coding model with 256K context, always-on reasoning, and text, image, and video inputs.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.95$0.8939

Output

per 1M generated tokens

$4.00$3.7131

Implicit cache read

per 1M cached input tokens

$0.1788

Web Search (Linkup)

per call when invoked

$0.013

Muse Spark 1.1

Meta AICtx 1MText Generation

Proprietary Endpoint

Meta frontier reasoning model with a 1M token context, image and video understanding, built-in web search with cited sources, and tool calling.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.25

Output

per 1M generated tokens

$4.25

Implicit cache read

per 1M cached input tokens

$1.00

Web search

per search query

$0.00825

Fugu Ultra v1.1

Sakana AIReleased Jul 23, 2026Ctx 1MText Generation

Proprietary Endpoint

Updated multi-agent conductor for hard reasoning, coding, and research, with distinct max effort, 1M context, image input, and web search.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=272K $5.00>272K $10.00

Output

per 1M generated tokens

<=272K $30.00>272K $45.00

Implicit cache read

per 1M cached input tokens

<=272K $0.50>272K $1.00

Qwen3.7 Plus

Alibaba CloudSingaporeReleased Jun 1, 2026Ctx 1MText Generation

Proprietary Endpoint

Cost-effective Qwen3.7 vision-language model for text, image, video, coding, tool use, GUI understanding, and 1M-context workflows.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=256K $0.40256K-1M $1.20

Output

per 1M generated tokens

<=256K $1.60256K-1M $4.80

Web search

per request when enabled

$0.03

Image Search

per call

$0.03

Qwen3.7 Plus

Alibaba CloudChinaReleased Jun 1, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 31%

Cost-effective Qwen3.7 vision-language model for text, image, video, coding, tool use, GUI understanding, and 1M-context workflows.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.40<=256K $0.276$1.20256K-1M $0.826

Output

per 1M generated tokens

$1.60<=256K $1.101$4.80256K-1M $3.301

Implicit cache read

per 1M cached prompt tokens

$0.08<=256K $0.056$0.24256K-1M $0.166

Web search

per request when enabled

$0.01

Image Search

per call

$0.01

Kimi K2.7 Code Highspeed

Moonshot AIInternationalReleased Jun 16, 2026Ctx 256KText Generation

Proprietary Endpoint

Kimi K2.7 Code Highspeed is the faster-serving tier of Moonshot's agentic coding model, with 256K context, always-on reasoning, and image and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.90

Output

per 1M generated tokens

$8.00

Web search

per call when invoked

$0.015

Fugu Ultra v1.0

Sakana AIReleased Jun 21, 2026Ctx 1MText Generation

Proprietary Endpoint

Original multi-agent conductor for hard reasoning, coding, and research, with 1M context, image input, function calling, and web search.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=272K $7.50>272K $15.00

Output

per 1M generated tokens

<=272K $45.00>272K $67.50

Implicit cache read

per 1M cached input tokens

<=272K $1.50>272K $3.00

Qwen3.7 Flash

Alibaba CloudSingaporeReleased Jul 15, 2026Ctx 1MText Generation

Proprietary Endpoint

Fast Qwen3.7 vision-language model for text, image, video, tool use, and agentic tasks, with implicit caching and a 1M token context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=32K $0.0332K-256K $0.10256K-1M $0.20

Output

per 1M generated tokens

<=32K $0.1332K-256K $0.40256K-1M $0.80

Implicit cache read

per 1M cached input tokens

<=32K $0.00632K-256K $0.02256K-1M $0.04

Web search

per request when enabled

$0.03

Image Search

per call

$0.03

Qwen3.7 Flash

Alibaba CloudChinaReleased Jul 15, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 18%

Fast Qwen3.7 vision-language model for text, image, video, tool use, and agentic tasks, with implicit caching and a 1M token context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.03<=32K $0.028$0.1032K-256K $0.083$0.20256K-1M $0.165

Output

per 1M generated tokens

$0.13<=32K $0.11$0.4032K-256K $0.33$0.80256K-1M $0.66

Implicit cache read

per 1M cached input tokens

<=32K $0.006$0.0232K-256K $0.017$0.04256K-1M $0.033

Web search

per request when enabled

$0.01

Image Search

per call

$0.01

MiniMax M3

MiniMaxSingaporeReleased Jun 1, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 25%

MiniMax M3 is a multimodal reasoning model for coding, agents, and long-context analysis with text, image, and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.30<=512K $0.225$0.60>512K $0.45

Output

per 1M generated tokens

$1.20<=512K $0.90$2.40>512K $1.80

Implicit cache read

per 1M cached input tokens

$0.06<=512K $0.045$0.12>512K $0.09

Web Search (Linkup)

per call when invoked

$0.013

MiniMax M3

MiniMaxSingaporeReleased Jun 1, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 25%

MiniMax M3 is a multimodal reasoning model for coding, agents, and long-context analysis with text, image, and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.45<=512K $0.3375$0.90>512K $0.675

Output

per 1M generated tokens

$1.80<=512K $1.35$3.60>512K $2.70

Implicit cache read

per 1M cached input tokens

$0.09<=512K $0.0675$0.18>512K $0.135

Web Search (Linkup)

per call when invoked

$0.013

Qwen3.7 Max

Alibaba CloudSingaporeReleased May 21, 2026Ctx 1MText Generation

Proprietary Endpoint

Qwen3.7 Max is a flagship text model for coding, productivity, long-running agents, deep thinking, tools, and 1M-token context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.50

Output

per 1M generated tokens

$7.50

Web search

per call when invoked

$0.02

Web extractor

per call when invoked

$0.02

Code interpreter

per call when invoked

$0.02

Qwen3.7 Max

Alibaba CloudChinaReleased May 21, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 34%

Qwen3.7 Max is a flagship text model for coding, productivity, long-running agents, deep thinking, tools, and 1M-token context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.50$1.65

Output

per 1M generated tokens

$7.50$4.951

Web search

per call when invoked

$0.01

Web extractor

per call when invoked

$0.01

Code interpreter

per call when invoked

$0.01

ACE-Step 1.5 XL

ACE-StepReleased Apr 2, 2026Audio Generation

Native Inference

Save up to 17%

Open-source music generation model for text-to-song and lyric-guided audio, with fast 8-step XL Turbo inference for controllable song iteration.

Chat API

Type

Spec

Rate

Music generation

per generated second

$0.0003$0.00025

FLUX.2 Klein 4B

Black Forest LabsReleased Jan 15, 2026Image Generation

Native Inference

Save up to 39%

Apache-licensed 4B FLUX.2 Klein image generation and editing model with text-to-image, reference-image editing, and creative workflow support.

Chat API

Type

Spec

Rate

Image generation

per image

$0.014$0.0085

MiniMax M2.7 Highspeed

MiniMaxSingaporeReleased Mar 18, 2026Ctx 200KText Generation

Proprietary Endpoint

Save up to 50%

High-speed M2.7 variant tuned for fast inference with strong general-purpose performance with strong agentic capabilities.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.60$0.30

Output

per 1M generated tokens

$2.40$1.20

Implicit cache read

per 1M cached input tokens

$0.06$0.03

Web Search (Linkup)

per call when invoked

$0.013

TTS 2

InworldReleased May 5, 2026Audio Generation

Proprietary Endpoint

Save up to 12%

Realtime voice model with plain-English voice direction, one voice identity across 100+ languages, and sub-200ms streaming time-to-first-audio.

Chat API

Type

Spec

Rate

Synthesis

per 1M characters

$25.00$22.00

TTS 1.5 Mini

InworldReleased Jan 21, 2026Audio Generation

Proprietary Endpoint

Save up to 30%

Sub-130ms TTFB voice synthesis with 271+ voices across 15 languages, expressive prosody, and real-time SSE streaming for low-latency voice agents.

Chat API

Type

Spec

Rate

Synthesis

per 1M characters

$25.00$17.50

TTS 1.5 Max

InworldReleased Jan 21, 2026Audio Generation

Proprietary Endpoint

Save up to 15%

Broadcast-quality voice synthesis with rich expressive prosody, 271+ voices across 15 languages, and real-time SSE streaming with per-word timestamps.

Chat API

Type

Spec

Rate

Synthesis

per 1M characters

$35.00$29.75

GLM 5.1

Z.aiChinaReleased Apr 7, 2026Ctx 202KText Generation

Proprietary Endpoint

Save up to 41%

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.40<=32K $0.825$1.4032K-200K $1.10

Output

per 1M generated tokens

$4.40<=32K $3.301$4.4032K-200K $3.851

Implicit cache read

per 1M cached input tokens

$0.26<=32K $0.165$0.2632K-200K $0.22

Web Search (Linkup)

per call when invoked

$0.013

Kimi K2.6

Moonshot AIChinaReleased Apr 20, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 7%

Kimi K2.6 is a Moonshot multimodal reasoning model with 256K context, strong coding, and text, image, and video inputs.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.95$0.8939

Output

per 1M generated tokens

$4.00$3.7131

Implicit cache read

per 1M cached input tokens

$0.1788

Web Search (Linkup)

per call when invoked

$0.013

MiniMax M2.7

MiniMaxSingaporeReleased Mar 18, 2026Ctx 200KText Generation

Proprietary Endpoint

Save up to 50%

MiniMax M2.7 is a general-purpose reasoning chat model with interleaved thinking, function calling, and prompt caching.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.30$0.15

Output

per 1M generated tokens

$1.20$0.60

Implicit cache read

per 1M cached input tokens

$0.06$0.03

Web Search (Linkup)

per call when invoked

$0.013

TRELLIS.2 4B

Microsoft3D Generation

Native Inference

Save up to 90%

TRELLIS.2 image-to-3D model that turns a reference image into a textured GLB asset with resolution, seed, mesh, texture, and export controls.

Chat API

Type

Spec

Rate

512 asset

per request

$0.25$0.025

1024 asset

per request

$0.30$0.249

1536 asset

per request

$0.499

Qwen3.5 122B-A10B

Alibaba CloudChinaReleased Feb 24, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 71%

Qwen3.5 122B-A10B is a multimodal reasoning model with 256K context, efficient sparse MoE inference, and text, image, and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.40<=128K $0.115$0.40128K-256K $0.287

Output

per 1M generated tokens

$3.20<=128K $0.917$3.20128K-256K $2.294

Web search

per request when enabled

$0.01

Qwen3.5 397B-A17B

Alibaba CloudChinaReleased Feb 16, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 71%

Qwen3.5 397B-A17B is a flagship multimodal reasoning model for language, code, agents, GUI tasks, and image and video understanding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.60<=128K $0.172$0.60128K-256K $0.43

Output

per 1M generated tokens

$3.60<=128K $1.032$3.60128K-256K $2.58

Web search

per request when enabled

$0.01

Qwen3.5 35B-A3B

Alibaba CloudChinaReleased Feb 24, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 77%

Qwen3.5 35B-A3B is an efficient native vision-language model with sparse MoE routing, deep thinking, and text, image, and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.25<=128K $0.057$0.25128K-256K $0.229

Output

per 1M generated tokens

$2.00<=128K $0.459$2.00128K-256K $1.835

Web search

per request when enabled

$0.01

Qwen3.5 27B

Alibaba CloudChinaReleased Feb 24, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 71%

Qwen3.5 27B is a dense multimodal reasoning model with fast responses, 256K context, and text, image, and video understanding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.30<=128K $0.086$0.30128K-256K $0.258

Output

per 1M generated tokens

$2.40<=128K $0.688$2.40128K-256K $2.064

Web search

per request when enabled

$0.01

Qwen3.6 27B

Alibaba CloudChinaReleased Apr 22, 2026Ctx 256KText Generation

Proprietary Endpoint

Save up to 31%

Qwen3.6 27B improves agentic coding, STEM reasoning, spatial vision, OCR, and text, image, and video understanding on 256K context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.60$0.412564

Output

per 1M generated tokens

$3.60$2.475384

Web search

per request when enabled

$0.01

Qwen3.6 Flash

Alibaba CloudSingaporeReleased Apr 16, 2026Ctx 1MText Generation

Proprietary Endpoint

Fast Qwen3.6 vision-language model for agentic coding, math reasoning, spatial understanding, OCR, and text, image, and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=256K $0.25256K-1M $1.00

Output

per 1M generated tokens

<=256K $1.50256K-1M $4.00

Web search

per query when enabled

$0.02

Qwen3.6 Flash

Alibaba CloudChinaReleased Apr 16, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 34%

Fast Qwen3.6 vision-language model for agentic coding, math reasoning, spatial understanding, OCR, and text, image, and video input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.25<=256K $0.165$1.00256K-1M $0.66

Output

per 1M generated tokens

$1.50<=256K $0.99$4.00256K-1M $3.961

Web search

per query when enabled

$0.01

Gemma 4 26B-A4B

GoogleReleased Mar 31, 2026Ctx 256KText Generation

Native Inference

Save up to 83%

Gemma 4 26B A4B is a Google open multimodal model with 256K context, text, image, and video input, tools, and structured output.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.15$0.05

Output

per 1M generated tokens

$0.50$0.29

Implicit cache read

per 1M cached input tokens

$0.15$0.025

Web Search (Linkup)

per call when invoked

$0.013

Qwen3.5 9B

Alibaba CloudReleased Feb 23, 2026Ctx 256KText Generation

Native Inference

Save up to 13%

Qwen3.5 9B is a compact multimodal reasoning model with 256K context, image and video input, function tools, and structured output.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.10$0.09

Output

per 1M generated tokens

$0.15$0.13

Implicit cache read

per 1M cached input tokens

$0.045

Web Search (Linkup)

per call when invoked

$0.013

Qwen3.5 4B

Alibaba CloudReleased Mar 2, 2026Ctx 256KText Generation

Native Inference

Qwen3.5 4B is a low-cost multimodal reasoning model with 256K context, image and video input, function tools, and structured output.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.04

Output

per 1M generated tokens

$0.07

Implicit cache read

per 1M cached input tokens

$0.02

Web Search (Linkup)

per call when invoked

$0.013

Qwen3.6 35B A3B

Alibaba CloudReleased Jul 29, 2026Ctx 128KText Generation

Native Inference

Save up to 72%

Qwen3.6 35B A3B is a 256-expert mixture-of-experts reasoning model with 128K context, function tools, and strict structured JSON output.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.248$0.07

Output

per 1M generated tokens

$1.485$0.42

Implicit cache read

per 1M cached input tokens

$0.035

Web Search (Linkup)

per call when invoked

$0.013

GLM 4.7 Flash

Z.aiSingaporeReleased Jan 19, 2026Ctx 200KText Generation

Proprietary Endpoint

Free lightweight GLM-4.7 text model for coding, reasoning, long-context writing, and general chat.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

Free

Output

per 1M generated tokens

Free

Implicit cache read

per 1M cached input tokens

Free

Web search

per request when enabled

$0.033

GLM 4.5 Flash

Z.aiSingaporeReleased Jul 28, 2025Ctx 200KText Generation

Proprietary Endpoint

Free lightweight GLM-4.5 text model for reasoning, coding, long-form chat, and general language tasks.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

Free

Output

per 1M generated tokens

Free

Implicit cache read

per 1M cached input tokens

Free

Web search

per request when enabled

$0.033

GLM 4.6V Flash

Z.aiSingaporeReleased Dec 8, 2025Ctx 128KText Generation

Proprietary Endpoint

Free multimodal GLM-4.6V model for image, video, file, and text understanding with native function calling.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

Free

Output

per 1M generated tokens

Free

Implicit cache read

per 1M cached input tokens

Free

Web search

per request when enabled

$0.033

Amazon Nova Canvas

AmazonReleased Dec 3, 2024Image Generation

Proprietary Endpoint

Image generation and editing model creating and modifying images from text or image inputs, with inpainting, virtual try-on, and style controls.

Chat API

Type

Spec

Rate

Small Standard (≤1024×1024)

per image

$0.12

Small Premium (≤1024×1024)

per image

$0.18

Large Standard (≤2048×2048)

per image

$0.18

Large Premium (≤2048×2048)

per image

$0.24

Amazon Nova Reel 1.1

AmazonReleased Apr 7, 2025Video Generation

Proprietary Endpoint

Video generation model producing up to 2-minute multi-shot videos from text and optional image prompts with improved quality and consistency.

Chat API

Type

Spec

Rate

Per Second

per second

$0.14

Deepgram Nova 3

DeepgramReleased Feb 12, 2025Transcription

Proprietary Endpoint

Speech-to-text transcription using the Nova-3 model with multi-language support and advanced customizable settings for production workloads.

Chat API

Type

Spec

Rate

Transcription

per minute of audio

$0.014

DeepReasoning

WinFuncReleased Jan 26, 2025Text Generation

Proprietary Endpoint

Pairs DeepSeek R1 chain-of-thought reasoning with Anthropic Claude creative and code generation behind a unified, data-controlled interface.

Chat API

Type

Spec

Rate

R1-0528 + Claude Sonnet 4.5 (Default)

per 1K tokens

In $0.012 / Out $0.058

R1-0528 + Claude Haiku 4.5

per 1K tokens

In $0.0048 / Out $0.023

R1-0528 + Claude Opus 4.5

per 1K tokens

In $0.019 / Out $0.092

R1-0528 + Claude Sonnet 4

per 1K tokens

In $0.012 / Out $0.058

R1-0528 + Claude Opus 4.1

per 1K tokens

In $0.053 / Out $0.26

R1-0528 + Claude Opus 4

per 1K tokens

In $0.053 / Out $0.26

Web Search (Linkup)

per call when invoked

$0.013

DeepSeek Prover V2

DeepSeekReleased Apr 30, 2025Text Generation

Proprietary Endpoint

Open-source LLM specialized in formal theorem proving in Lean 4, built on a recursive theorem-proving pipeline.

Chat API

Type

Spec

Rate

Per Message

fixed

$0.020

DeepSeek V3.2

DeepSeekSingaporeReleased Dec 1, 2025Ctx 128KText Generation

Proprietary Endpoint

Open-source Mixture-of-Experts LLM tuned for high-efficiency reasoning, coding, and general language tasks across long-form prompts.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.57

Output

per 1M generated tokens

$1.71

Web search

per request when enabled

$0.015

DeepSeek V4 Flash

DeepSeekGermanyReleased Apr 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Lightweight MoE model with 284B total / 13B active parameters and native 1M context, tuned for low-latency, cost-effective high-concurrency use.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.14

Output

per 1M generated tokens

$0.28

Web Search (Linkup)

per call when invoked

$0.013

DeepSeek V4 Flash

DeepSeekSingaporeReleased Apr 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Lightweight MoE model with 284B total / 13B active parameters and native 1M context, tuned for low-latency, cost-effective high-concurrency use.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.20

Output

per 1M generated tokens

$0.40

Web search

per request when enabled

$0.02

DeepSeek V4 Flash

DeepSeekChinaReleased Apr 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 2%

Lightweight MoE model with 284B total / 13B active parameters and native 1M context, tuned for low-latency, cost-effective high-concurrency use.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.14$0.138

Output

per 1M generated tokens

$0.28$0.275

Implicit cache read

per 1M cached input tokens

$0.028

Web search

per request when enabled

$0.01

DeepSeek V4 Pro

DeepSeekGermanyReleased Apr 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 5%

Flagship MoE LLM with 1.6T total / 49B active parameters and native 1M context for advanced math, logical inference, and specialized coding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.74$1.65

Output

per 1M generated tokens

$3.48$3.30

Web Search (Linkup)

per call when invoked

$0.013

DeepSeek V4 Pro

DeepSeekSingaporeReleased Apr 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Flagship MoE LLM with 1.6T total / 49B active parameters and native 1M context for advanced math, logical inference, and specialized coding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.40

Output

per 1M generated tokens

$4.80

Web search

per request when enabled

$0.02

DeepSeek V4 Pro

DeepSeekChinaReleased Apr 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 5%

Flagship MoE LLM with 1.6T total / 49B active parameters and native 1M context for advanced math, logical inference, and specialized coding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.74$1.65

Output

per 1M generated tokens

$3.48$3.301

Implicit cache read

per 1M cached input tokens

$0.138

Web search

per request when enabled

$0.01

Exa Answer

ExaResearch & Search

Proprietary Endpoint

Quick LLM-style answer to a natural-language question, grounded in fresh Exa web search results with inline citations and source links.

Chat API

Type

Spec

Rate

Answer

per request

$0.01

Exa Search

ExaResearch & Search

Proprietary Endpoint

Web search engine for finding pages, retrieving similar pages, crawling, and dedicated code search across the open web for AI agents.

Chat API

Type

Spec

Rate

Search (1-25 results)

per search

$0.0060

Search (26-100 results)

per search

$0.030

Content (Text/Highlights/Summary)

per page/feature

$0.0060

Code Search

per 1k tokens

$0.0060

Gemini 2.5 Flash TTS

GoogleReleased May 20, 2025Audio Generation

Proprietary Endpoint

Low-latency text-to-speech with single- and multi-speaker voices and controllable style, accent, and expressive tone for production apps.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.50

Output

per 1M generated tokens

$30.00

Gemini 2.5 Pro TTS

GoogleReleased May 20, 2025Audio Generation

Proprietary Endpoint

High-quality TTS preview for podcasts, audiobooks, and customer support, with expressive multi-speaker voices across 23+ languages.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$3.00

Output

per 1M generated tokens

$60.00

Gemini 3.1 Flash TTS

GoogleReleased Apr 13, 2026Audio Generation

Proprietary Endpoint

Highly controllable TTS with new Audio Tags for precise style, tone, pace, and delivery across narration, assistants, and voice apps.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.60

Output

per 1M generated tokens

$52.00

Gemma 3 27B

GoogleReleased Mar 10, 2025Ctx 128KText Generation

Proprietary Endpoint

Open-source vision-language model with 128K context, 140+ languages, improved math/reasoning, structured outputs, and function calling.

Chat API

Type

Spec

Rate

Per Message

fixed

$0.0040

Web Search (Linkup)

per call when invoked

$0.013

GLM TTS

Z.aiReleased Dec 11, 2025Audio Generation

Native Inference

LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.

Chat API

Type

Spec

Rate

Fast (INT8)

per 1k characters

$0.20

Quality (FP16)

per 1k characters

$0.21

GPTZero

GPTZeroTools & Agents

Proprietary Endpoint

Deep-learning detector that flags portions of text likely generated by AI versus human, classifying content as entirely human, AI, or mixed.

Chat API

Type

Spec

Rate

Text Scan

per 1,000 words

$0.39

HappyHorse 1.0

Alibaba CloudSingaporeReleased May 6, 2026Video Generation

Proprietary Endpoint

Video model offering Text-to-Video, Image-to-Video, Reference-to-Video, and Video Edit modes with high-fidelity, motion-smooth output.

Chat API

Type

Spec

Rate

All Modes 720P

per second

$0.14

All Modes 1080P

per second

$0.24

Hunyuan Image 3

TencentReleased Sep 28, 2025Image Generation

Proprietary Endpoint

Open-source text-to-image model on a multimodal Mixture-of-Experts architecture with photorealistic detail and strong multilingual text rendering.

Chat API

Type

Spec

Rate

Standard

per image

$0.13

Hunyuan Video 1.5

TencentReleased Nov 20, 2025Video Generation

Native Inference

Save up to 19%

8.3B-parameter video model with native 720p output (upscalable to 1080p), strong motion coherence, and bilingual prompt understanding up to 10s.

Chat API

Type

Spec

Rate

480p

per second

$0.075$0.061

720p

per second

$0.29

1080p (upscaled)

per second

$0.67

Janus-Pro DeepSeek

DeepSeekReleased Jan 27, 2025Image Generation

Proprietary Endpoint

Autoregressive framework on the Janus Pro 7B model that unifies multimodal understanding and image generation in one architecture.

Chat API

Type

Spec

Rate

Image Generation

per image

$0.030

Image Analysis

per uploaded image

$0.030

Kling O3

Kling AIReleased Feb 5, 2026Video Generation

Proprietary Endpoint

Video model in Standard or Pro modes with Text-to-Video, Image-to-Video, Reference-to-Video, editing, native sound, and multi-scene transitions.

Chat API

Type

Spec

Rate

Standard T2V/I2V

per second

$0.168

Standard T2V/I2V Sound

per second

$0.224

Standard Video Input

per second

$0.252

Pro T2V/I2V

per second

$0.224

Pro T2V/I2V Sound

per second

$0.280

Pro Video Input

per second

$0.336

4K T2V/I2V/Ref

per second

$0.525

Kling v3 Motion Control

Kling AIVideo Generation

Proprietary Endpoint

Kling 3.0 model that transfers motion from a reference video onto a character from a reference image, with Standard 720p and Pro 1080p tiers.

Chat API

Type

Spec

Rate

Standard (720p)

per second

$0.14

Pro (1080p)

per second

$0.18

Linkup Deep Search

LinkupCtx 100KResearch & Search

Proprietary Endpoint

Iterative AI search that keeps querying when initial results are insufficient, returning more comprehensive answers than Standard mode.

Chat API

Type

Spec

Rate

Per Message

fixed

$0.13

Linkup Standard

LinkupCtx 100KResearch & Search

Proprietary Endpoint

AI-powered web search with detailed overviews and answers, faster than Deep Search. Ranks #1 on OpenAI SimpleQA benchmark.

Chat API

Type

Spec

Rate

Per Message

fixed

$0.013

Mistral Medium 3

Mistral AIReleased May 7, 2025Ctx 130KText Generation

Proprietary Endpoint

Cost-efficient language model offering strong reasoning and multimodal performance for general production workloads at competitive latency.

Chat API

Type

Spec

Rate

Per Message

fixed

$0.015

Web Search (Linkup)

per call when invoked

$0.013

Mistral Medium 3.1

Mistral AIReleased Aug 12, 2025Ctx 131KText Generation

Proprietary Endpoint

Enterprise-grade model with strong reasoning, coding, and STEM performance, supporting hybrid, on-prem, and in-VPC deployments.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.52

Output

per 1M generated tokens

$2.60

Web Search (Linkup)

per call when invoked

$0.013

Mistral Small 3.1

Mistral AIReleased Mar 17, 2025Ctx 128KText Generation

Proprietary Endpoint

24B-parameter multimodal model with 128K context for image analysis, programming, math, and multilingual tasks, tuned for efficient local inference.

Chat API

Type

Spec

Rate

Per Message

fixed

$0.0019

Web Search (Linkup)

per call when invoked

$0.013

Mistral Small 4

Mistral AIReleased Mar 16, 2026Ctx 256KText Generation

Proprietary Endpoint

Hybrid model unifying Instruct, Reasoning (Magistral), and Devstral families: 40% lower completion time and 3x throughput vs Small 3.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.15

Output

per 1M generated tokens

$0.60

Standard Web Search

per call

$0.084

Premium Web Search

per call

$0.140

Code Interpreter

per call

$0.084

Image Generation

per image

$0.280

MOSS Video and Audio

OpenMOSSReleased Jan 29, 2026Video Generation

Native Inference

Open-source 32B MoE foundation model that generates synchronized video and audio in one inference step with precise dual-tower lip-sync.

Chat API

Type

Spec

Rate

360p Video

per video

$0.17

720p Video

per video

$2.82

T2V Fast

additional fee

$0.065

T2V Quality

additional fee

$0.13

Nova Lite 1.0

AmazonReleased Dec 3, 2024Ctx 300KText Generation

Proprietary Endpoint

Low-cost multimodal foundation model for text, images, and video on a 300K context (up to ~30 min video), tuned for speed and affordability.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.069

Output

per 1M generated tokens

$0.28

Cached input

per 1M tokens

$0.0386

Web Search (Linkup)

per call when invoked

$0.013

Nova Lite 2

AmazonReleased Dec 2, 2025Ctx 1MText Generation

Proprietary Endpoint

Fast, cost-effective multimodal reasoning model for text, images, documents, and video on a 1M context (long docs and ~90 min clips).

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.38

Output

per 1M generated tokens

$3.16

Cached input

per 1M tokens

$0.2128

Web Search (Linkup)

per call when invoked

$0.013

Nova Micro 1.0

AmazonReleased Dec 3, 2024Ctx 128KText Generation

Proprietary Endpoint

Text-only foundation model tuned for ultra-low latency and cost on 128K context. Strong for summarization, translation, and chat with 44% cache discount.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.040

Output

per 1M generated tokens

$0.16

Cached input

per 1M tokens

$0.0224

Web Search (Linkup)

per call when invoked

$0.013

Nova Premier 1.0

AmazonReleased Apr 30, 2025Ctx 1MText Generation

Proprietary Endpoint

Most capable model in the family. Multimodal text/image/video on a 1M context with chain-of-thought reasoning across tools and data sources.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$3.00

Output

per 1M generated tokens

$15.00

Cached input

per 1M tokens

$1.68

Web Search (Linkup)

per call when invoked

$0.013

Nova Pro 1.0

AmazonReleased Dec 3, 2024Ctx 300KText Generation

Proprietary Endpoint

Multimodal foundation model balancing accuracy, speed, and cost for text, images, and video on 300K context (up to ~30 min video).

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.40

Output

per 1M generated tokens

$9.60

Latency Optimized Input

per 1M prompt tokens

$3.00

Latency Optimized Output

per 1M generated tokens

$12.00

Web Search (Linkup)

per call when invoked

$0.013

OpenAI Whisper 1

OpenAIReleased Sep 21, 2022Transcription

Proprietary Endpoint

Whisper-1 speech-to-text transcription trained on multilingual supervised audio, with a 25 MB upload limit per file.

Chat API

Type

Spec

Rate

Per Minute of Audio

per minute

$0.030

Perplexity Advanced Deep Research

PerplexityResearch & Search

Proprietary Endpoint

Institutional-grade research powered by Claude Opus 4.6 reasoning, with maximum depth, enhanced tool access, and extensive source coverage.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$12.00

Output

per 1M generated tokens

$60.00

Web Search Call

per call

$0.012

URL Fetch Call

per call

$0.0012

Perplexity Deep Research

PerplexityCtx 128KResearch & Search

Proprietary Endpoint

Research model for multi-step retrieval, synthesis, and reasoning, autonomously searching, reading, and evaluating sources across complex topics.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$4.80

Output

per 1M generated tokens

$19.00

Citation Tokens

per 1M tokens

$4.80

Reasoning Tokens

per 1M tokens

$7.20

Search Queries

per query

$0.012

Perplexity Pro Search

PerplexityResearch & Search

Proprietary Endpoint

Sonar Pro as an agentic researcher: chains web searches, fetches full pages, and streams live reasoning, adapting strategy for complex queries.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$7.80

Output

per 1M generated tokens

$39.00

Base Fee (Low Context)

per request

$0.036

Base Fee (Medium Context)

per request

$0.047

Base Fee (High Context)

per request

$0.057

Perplexity Search

PerplexityResearch & Search

Proprietary Endpoint

Real-time web search with filtering by domain, language, date, and more. Returns search results, not LLM responses; no file uploads.

Chat API

Type

Spec

Rate

Search Request

per request

$0.0060

Perplexity Sonar

PerplexityCtx 127KResearch & Search

Proprietary Endpoint

Real-time web-connected search with accurate citations and customizable sources for up-to-date AI search integration in production apps.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.40

Output

per 1M generated tokens

$2.40

Base Fee (Low Context)

per request

$0.012

Base Fee (Medium Context)

per request

$0.019

Base Fee (High Context)

per request

$0.029

Perplexity Sonar Pro

PerplexityCtx 200KResearch & Search

Proprietary Endpoint

Search-grounded model with double the citations and a larger context window, tuned for complex queries needing in-depth, nuanced answers.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$7.20

Output

per 1M generated tokens

$36.00

Base Fee (Low Context)

per request

$0.014

Base Fee (Medium Context)

per request

$0.024

Base Fee (High Context)

per request

$0.034

Perplexity Sonar Reasoning Pro

PerplexityCtx 128KResearch & Search

Proprietary Endpoint

Reasoning model on the uncensored open-source R1-1776 with web search, outperforming leading search engines and LLMs on the SimpleQA benchmark.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$4.80

Output

per 1M generated tokens

$19.00

Base Fee (Low Context)

per request

$0.014

Base Fee (Medium Context)

per request

$0.024

Base Fee (High Context)

per request

$0.034

Pixverse v5

PixVerseReleased Aug 29, 2025Video Generation

Proprietary Endpoint

Cinematic video generation in Text-to-Video, Image-to-Video, and Transition modes with high detail, fluid motion, and lifelike animations.

Chat API

Type

Spec

Rate

360p/540p 5s

per video

$0.45

360p/540p 8s

per video

$0.90

720p 5s

per video

$0.60

720p 8s

per video

$1.20

1080p 5s

per video

$1.20

Pixverse v5.6

PixVerseReleased Jan 26, 2026Video Generation

Proprietary Endpoint

Generates videos from text or 1-2 frame image prompts up to 1080p, multiple aspect ratios, 5-10s durations, with optional synchronized audio.

Chat API

Type

Spec

Rate

360p/540p 5s no audio

per video

$0.40

360p/540p 5s audio

per video

$0.80

360p/540p 8s no audio

per video

$0.80

360p/540p 8s audio

per video

$1.60

360p/540p 10s no audio

per video

$0.88

360p/540p 10s audio

per video

$1.76

720p 5s no audio

per video

$0.65

720p 5s audio

per video

$1.30

720p 8s no audio

per video

$1.30

720p 8s audio

per video

$2.60

720p 10s no audio

per video

$1.43

720p 10s audio

per video

$2.86

1080p 5s no audio

per video

$0.75

1080p 5s audio

per video

$1.50

1080p 8s no audio

per video

$1.50

1080p 8s audio

per video

$3.00

Qwen Image 2.0

Alibaba CloudSingaporeReleased Mar 3, 2026Image Generation

Proprietary Endpoint

Save up to 8%

Unified image generation and editing model with class-leading complex Chinese/English text rendering, realistic textures, and multi-image fusion.

Chat API

Type

Spec

Rate

Standard

per image

$0.035$0.0322

Pro

per image

$0.075$0.069

Qwen3.5 Flash

Alibaba CloudSingaporeReleased Feb 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 10%

Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.10$0.090

Output

per 1M generated tokens

$0.40$0.368

Web search

per request when enabled

$0.015

Image Search

per call

$0.012

Qwen3.5 Flash

Alibaba CloudChinaReleased Feb 24, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 68%

Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.090<=128K $0.029128K-256K $0.115256K-1M $0.172

Output

per 1M generated tokens

$0.368<=128K $0.287128K-256K $1.147256K-1M $1.72

Web search

per query when enabled

$0.01

Qwen3.5 Omni Flash

Alibaba CloudSingaporeReleased Mar 30, 2026Ctx 256KText Generation

Proprietary Endpoint

Cost-efficient omni-modal model handling text, image, audio, and video, with up to 3 hours of audio and 1 hour of video across 90+ languages.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

per 1M prompt tokens $0.40per 1M prompt tokens $3.00

Output

per 1M generated tokens

per 1M generated tokens $2.20per 1M generated tokens $11.90

Web search

per request

$0.015

Qwen3.5 Omni Plus

Alibaba CloudSingaporeReleased Mar 30, 2026Ctx 256KText Generation

Proprietary Endpoint

Flagship omni-modal model for text, image, audio, and video. 3h audio, 1h video, 90+ input and 30+ output languages, 55 voice timbres.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

per 1M prompt tokens $1.40per 1M prompt tokens $11.00

Output

per 1M generated tokens

per 1M generated tokens $8.30per 1M generated tokens $44.00

Web search

per request

$0.015

Qwen3.5 Plus

Alibaba CloudSingaporeReleased Feb 16, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 10%

Multimodal model with hybrid architecture for efficient deep thinking and visual understanding across text, image, and video on a 1M context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.40<=256K $0.36$1.20256K-1M $1.08

Output

per 1M generated tokens

$2.40<=256K $2.21$7.20256K-1M $6.62

Web search

per request when enabled

$0.015

Image Search

per call

$0.012

Qwen3.5 Plus

Alibaba CloudChinaReleased Feb 16, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 69%

Multimodal model with hybrid architecture for efficient deep thinking and visual understanding across text, image, and video on a 1M context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.36<=128K $0.115$0.36128K-256K $0.287$1.08256K-1M $0.573

Output

per 1M generated tokens

$2.21<=128K $0.688$2.21128K-256K $1.72$6.62256K-1M $3.44

Web search

per query when enabled

$0.01

Qwen3.6 Max Preview

Alibaba CloudSingaporeReleased Apr 20, 2026Ctx 256KText Generation

Proprietary Endpoint

Largest preview variant in the 3.6 series (text-only): improved coding agent execution, stronger front-end skills, and broader long-tail knowledge.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=128K $1.31128K-256K $1.97

Output

per 1M generated tokens

<=128K $7.88128K-256K $11.82

Web search

per request when enabled

$0.020

Qwen3.6 Plus

Alibaba CloudSingaporeReleased Apr 2, 2026Ctx 1MText Generation

Proprietary Endpoint

Vision-language model with major upgrades over 3.5: agentic and front-end coding, multimodal recognition, OCR, and object localization.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=256K $0.50256K-1M $2.00

Output

per 1M generated tokens

<=256K $3.00256K-1M $6.00

Web search

per request when enabled

$0.026

Image Search

per call

$0.0208

Qwen3.6 Plus

Alibaba CloudChinaReleased Apr 2, 2026Ctx 1MText Generation

Proprietary Endpoint

Save up to 45%

Vision-language model with major upgrades over 3.5: agentic and front-end coding, multimodal recognition, OCR, and object localization.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.50<=256K $0.276$2.00256K-1M $1.101

Output

per 1M generated tokens

$3.00<=256K $1.651256K-1M $6.602

Web search

per query when enabled

$0.01

Qwen3 Max

Alibaba CloudSingaporeReleased Sep 23, 2025Ctx 256KText Generation

Proprietary Endpoint

Save up to 10%

256K-context flagship with major improvements in reasoning, instruction following, and multilingual support, plus higher coding/math accuracy.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.20<=32K $1.08$2.4032K-128K $2.16$3.00128K-256K $2.70

Output

per 1M generated tokens

$6.00<=32K $5.52$12.0032K-128K $11.04$15.00128K-256K $13.80

Web search

per request

$0.015

Qwen3 Max Preview

Alibaba CloudSingaporeReleased Sep 5, 2025Ctx 256KText Generation

Proprietary Endpoint

Save up to 20%

Preview release with major gains over the 2.5 series in Chinese-English understanding, complex instructions, multilingual ability, and tool use.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.20<=32K $1.08$2.4032K-128K $2.16$3.00128K-256K $2.70

Output

per 1M generated tokens

$6.00<=32K $4.80$12.0032K-128K $9.60$15.00128K-256K $12.00

Qwen3 Max Thinking

Alibaba CloudSingaporeReleased Sep 23, 2025Ctx 256KText Generation

Proprietary Endpoint

Save up to 10%

Reasoning model with adaptive tool use (search, memory, code interpreter) and test-time scaling for higher accuracy on complex tasks.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.20<=32K $1.08$2.4032K-128K $2.16$3.00128K-256K $2.70

Output

per 1M generated tokens

$6.00<=32K $5.52$12.0032K-128K $11.04$15.00128K-256K $13.80

Web search

per request

$0.015

Qwen3 Rerank

Alibaba CloudSingaporeReleased Jun 5, 2025Ctx 4000Rerankers

Proprietary Endpoint

Semantic document reranker. Sorts up to 500 candidates per query by relevance, supports 100+ languages, and accepts a custom sorting instruction.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.10

Seed 2.0 Code

ByteDanceMalaysiaReleased Feb 14, 2026Ctx 256KText Generation

Proprietary Endpoint

Coding-tuned 256K-context model with strong front-end results and multilingual programming support for AI coding tools and agents.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=128K $0.40128K-256K $0.80

Output

per 1M generated tokens

<=128K $2.40128K-256K $4.80

Seed 2.0 Lite

ByteDanceMalaysiaReleased Feb 14, 2026Ctx 256KText Generation

Proprietary Endpoint

Balanced general-purpose model for high-frequency enterprise workloads: information processing, content, search, and data analysis.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=128K $0.31128K-256K $0.62

Output

per 1M generated tokens

<=128K $2.50128K-256K $5.00

Seed 2.0 Mini

ByteDanceMalaysiaReleased Feb 14, 2026Ctx 256KText Generation

Proprietary Endpoint

Latency-focused multimodal model with 256K context, four reasoning effort modes, and image/video understanding for high-concurrency use.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=128K $0.12128K-256K $0.24

Output

per 1M generated tokens

<=128K $0.50128K-256K $1.00

Seed 2.0 Pro

ByteDanceMalaysiaReleased Feb 14, 2026Ctx 256KText Generation

Proprietary Endpoint

Flagship general model with 256K context for complex reasoning, multimodal understanding, structured generation, and tool-augmented execution.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

<=128K $0.63128K-256K $1.26

Output

per 1M generated tokens

<=128K $3.79128K-256K $7.58

Seedance 2.0 Fast

ByteDanceMalaysiaReleased Feb 12, 2026Video Generation

Proprietary Endpoint

Speed-optimized 2.0 video variant for cinematic clips with native audio sync, camera control, and stable motion at lower cost per render.

Chat API

Type

Spec

Rate

T2V/I2V 480P

per second

$0.122

T2V/I2V 720P

per second

$0.260

Video Input 480P

per second

$0.284

Video Input 720P

per second

$0.610

Seedance 2.0 Pro

ByteDanceMalaysiaReleased Feb 12, 2026Video Generation

Proprietary Endpoint

Multimodal video model for cinematic output from text, image, audio, or video inputs, with stable motion and consistent characters.

Chat API

Type

Spec

Rate

T2V/I2V 480P

per second

$0.139

T2V/I2V 720P

per second

$0.300

T2V/I2V 1080P

per second

$0.749

T2V/I2V 4K

per second

$1.555

Video Input 480P

per second

$0.342

Video Input 720P

per second

$0.736

Video Input 1080P

per second

$1.841

Video Input 4K

per second

$3.732

Seedream 5.0 Lite

ByteDanceMalaysiaReleased Feb 13, 2026Image Generation

Proprietary Endpoint

Unified multimodal image model that reasons through prompts before rendering, producing high-resolution and consistent edits and brand visuals.

Chat API

Type

Spec

Rate

Standard

per image

$0.0350

Seedream 5.0 Pro

ByteDanceMalaysiaReleased Jul 8, 2026Image Generation

Proprietary Endpoint

Premium Seedream image model for detailed text-to-image, single-image edits, and multi-reference fusion with 1K and 2K output.

Chat API

Type

Spec

Rate

Output up to 2.36MP

per image

$0.075

Output above 2.36MP

per image

$0.150

Extra input image

per input image after the first

$0.005

SoulX Podcast

Soul AI LabReleased Oct 29, 2025Audio Generation

Native Inference

Open-source voice model for long-form, multi-speaker podcast dialogue with paralinguistic control (laughter, sighs) and zero-shot voice cloning.

Chat API

Type

Spec

Rate

Base

per 1k characters

$0.015

Dialect

per 1k characters

$0.015

Stable Audio 2.0

Stability AIReleased Apr 3, 2024Audio Generation

Proprietary Endpoint

Generates audio up to 3 minutes from text prompts, supporting text-to-audio and audio-to-audio with adjustable duration, steps, and CFG scale.

Chat API

Type

Spec

Rate

Base Cost

per generation

$0.58

Per Step Cost

per step

$0.00

Stable Audio 2.5

Stability AIReleased Sep 10, 2025Audio Generation

Proprietary Endpoint

Up-to-3-minute audio from text with text-to-audio, audio-to-audio, and audio inpainting for music production, sound design, and remixing.

Chat API

Type

Spec

Rate

Generation

per generation

$0.68

SVI 2.0 Pro

VITA-Group / EPFLReleased Dec 26, 2025Video Generation

Native Inference

Stable Video Infinity 2.0 Pro on WAN 2.2: extends still images into theoretically infinite-length video while keeping consistent character IDs.

Chat API

Type

Spec

Rate

480p Video

per second

$0.057

720p Video

per second

$0.17

T2V Fast

additional fee

$0.065

T2V Quality

additional fee

$0.13

Tavily Research

TavilyResearch & Search

Proprietary Endpoint

Multi-search research assistant that explores a topic, analyzes sources, and produces a detailed research report with citations.

Chat API

Type

Spec

Rate

Mini

average per task

~$1.19

Pro

average per task

~$2.75

Tavily Search

TavilyResearch & Search

Proprietary Endpoint

Web search with crawl, extract, and URL mapping for fast, structured retrieval across pages and domains for downstream pipelines.

Chat API

Type

Spec

Rate

Search (Basic/Fast/Ultra-Fast)

per search

$0.0096

Search (Advanced)

per search

$0.019

Search (Advanced + Answer)

per search

$0.029

Extract (Basic)

per 5 URLs

$0.0096

Extract (Advanced)

per 5 URLs

$0.019

Crawl/Map (basic)

per 10 pages

$0.0096

Text Embedding v4

Alibaba CloudSingaporeReleased Jun 4, 2025Ctx 8192Embeddings

Proprietary Endpoint

Multilingual text embedding with selectable output dimensions (64–2048). Up to 8,192 tokens per input.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.07

Tongyi Embedding Vision Flash

Alibaba CloudSingaporeReleased Sep 23, 2025Ctx 1024Embeddings

Proprietary Endpoint

Speed-optimised multimodal embedding — same shape as Vision-Plus, 3× cheaper image/video tokens.

Chat API

Type

Spec

Rate

Text input

per 1M tokens

$0.09

Image / video input

per 1M tokens

$0.03

Tongyi Embedding Vision Plus

Alibaba CloudSingaporeReleased Sep 23, 2025Ctx 1024Embeddings

Proprietary Endpoint

Multimodal embedding producing independent vectors for text, image, and video inputs.

Chat API

Type

Spec

Rate

Text input

per 1M tokens

$0.09

Image / video input

per 1M tokens

$0.09

Wan 2.6

Alibaba CloudSingaporeReleased Jan 12, 2026Video Generation

Proprietary Endpoint

Save up to 10%

Multimodal video generation model for cinematic, multi-shot stories with native audio-visual sync (lip-sync, dialogue, music, SFX).

Chat API

Type

Spec

Rate

Standard 720P

per second

$0.10$0.09

Standard 1080P

per second

$0.15$0.138

Flash 720P (audio)

per second

$0.050$0.045

Flash 720P (no audio)

per second

$0.0250$0.0225

Flash 1080P (audio)

per second

$0.0750$0.069

Flash 1080P (no audio)

per second

$0.03750$0.0345

Wan 2.7

Alibaba CloudSingaporeReleased Apr 26, 2026Video Generation

Proprietary Endpoint

Multimodal video model supporting T2V, I2V, video editing, and reference-to-video, with high-fidelity output from text, image, or video inputs.

Chat API

Type

Spec

Rate

All Modes 720P

per second

$0.10

All Modes 1080P

per second

$0.150

Wan2.7 Image

Alibaba CloudSingaporeReleased Apr 1, 2026Image Generation

Proprietary Endpoint

Image generation and editing companion model: text-to-image, bounding-box edits, and cohesive image sets, with up to 4K output on Pro.

Chat API

Type

Spec

Rate

Standard

per image

$0.030

Pro

per image

$0.075

Whisper Large v3 Turbo

OpenAIReleased Oct 1, 2024Transcription

Native Inference

Save up to 17%

Controlled Whisper Large v3 Turbo transcription with multilingual ASR, translation, VAD, timestamps, subtitles, hotwords, and decoder controls.

Chat API

Type

Spec

Rate

Controlled transcription

per minute of audio

$0.006$0.005

Manus

ManusTools & Agents

Proprietary Endpoint

Autonomous AI agent that turns a high-level prompt into subtasks, calls tools and APIs, and delivers end-to-end results without manual orchestration.

Chat API

Type

Spec

Rate

Adaptive - Manus 1.6 Lite

per task

$1.44 - $2.63

Adaptive - Manus 1.6

per task

$2.89 - $5.25

Adaptive - Manus 1.6 Max

per task

$5.25 - $9.19

Grok Imagine Video 1.5

xAIReleased Jun 16, 2026Video Generation

Proprietary Endpoint

Image-to-video model that animates a source image with prompt-guided motion, up to 15 seconds at 480p or 720p across seven aspect ratios.

Chat API

Type

Spec

Rate

Image input

per image

$0.05

480p

per second

$0.096

720p

per second

$0.168

MiMo V2.5 Pro

XiaomiReleased Apr 27, 2026Ctx 1MText Generation

Proprietary Endpoint

Top-tier model for agentic workflows, complex software engineering, and long-horizon tasks, sustaining work across 1000+ tool calls on 1M context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$2.175

Output

per 1M generated tokens

$4.35

Implicit cache read

per 1M cached input tokens

$0.018

Web search

per request when enabled

$0.015

MiMo V2.5

XiaomiReleased Apr 22, 2026Ctx 1MText Generation

Proprietary Endpoint

Multimodal model with native visual and audio understanding on a 1M context, designed to reason and act across modalities in agentic workflows.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.70

Output

per 1M generated tokens

$1.40

Implicit cache read

per 1M cached input tokens

$0.014

Web search

per request when enabled

$0.015

HappyHorse 1.1

Alibaba CloudSingaporeReleased Jun 22, 2026Video Generation

Proprietary Endpoint

Text, image, and reference-to-video in one model. Cinematic motion, character consistency across up to 9 references, and synchronized native audio.

Chat API

Type

Spec

Rate

720p

per second

$0.14

1080p

per second

$0.18

Seedance 2.0 Mini

ByteDanceMalaysiaReleased Jun 15, 2026Video Generation

Proprietary Endpoint

The fastest, most affordable Seedance 2.0 tier for short cinematic clips with native audio, camera control, and image or video inputs at 480p and 720p.

Chat API

Type

Spec

Rate

T2V/I2V 480P

per second

$0.070

T2V/I2V 720P

per second

$0.150

Video Input 480P

per second

$0.167

Video Input 720P

per second

$0.359

Step 3.7 Flash

StepFunInternationalReleased May 28, 2026Ctx 256KText Generation

Proprietary Endpoint

StepFun multimodal reasoning model with image and video input, tool calling, adjustable reasoning effort, and 256K context.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.20

Output

per 1M generated tokens

$1.15

Implicit cache read

per 1M cached input tokens

$0.04

Web Search (Linkup)

per call when invoked

$0.013

Step 3.5 Flash

StepFunInternationalReleased Feb 12, 2026Ctx 256KText Generation

Proprietary Endpoint

StepFun text reasoning model for agents, coding, tool calling, and long-context analysis.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.10

Output

per 1M generated tokens

$0.30

Implicit cache read

per 1M cached input tokens

$0.02

Web Search (Linkup)

per call when invoked

$0.013

Step 3.5 Flash 2603

StepFunInternationalReleased Apr 2, 2026Ctx 256KText Generation

Proprietary Endpoint

Agent-optimized Step 3.5 Flash variant with low and high reasoning effort modes.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.10

Output

per 1M generated tokens

$0.30

Implicit cache read

per 1M cached input tokens

$0.02

Web Search (Linkup)

per call when invoked

$0.013

StepAudio 2.5 Chat

StepFunInternationalCtx 256KText Generation

Proprietary Endpoint

StepFun audio and text conversation model with text output and paralinguistic understanding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$1.43

Output

per 1M generated tokens

$3.57

Implicit cache read

per 1M cached input tokens

$0.29

Web Search (Linkup)

per call when invoked

$0.013

Step Image Edit 2

StepFunInternationalReleased Apr 29, 2026Image Generation

Proprietary Endpoint

StepFun image generation and image editing model for text-to-image and single-image edits.

Chat API

Type

Spec

Rate

Output image

per generated image

$0.003

StepAudio 2.5 TTS

StepFunInternationalReleased Apr 21, 2026Audio Generation

Proprietary Endpoint

Contextual StepFun text-to-speech model with natural-language voice direction and expressive delivery.

Chat API

Type

Spec

Rate

Synthesis

per 10,000 characters

$0.85

Step TTS 2

StepFunInternationalAudio Generation

Proprietary Endpoint

StepFun text-to-speech model with official voices, custom cloned voices, and voice tag controls.

Chat API

Type

Spec

Rate

Synthesis

per 10,000 characters

$0.40

StepAudio 2.5 ASR

StepFunInternationalReleased Apr 24, 2026Transcription

Proprietary Endpoint

StepFun streaming speech recognition model for Chinese and English audio transcription.

Chat API

Type

Spec

Rate

Transcription

per hour of audio

$0.022

Seed 2.1 Turbo

ByteDanceMalaysiaReleased Jul 13, 2026Ctx 256KText Generation

Proprietary Endpoint

Next-generation coding and agent model with engineering-grade code delivery, long-horizon autonomy, and 256K multimodal understanding.

Chat API

Type

Spec

Rate

Input

per 1M prompt tokens

$0.63

Output

per 1M generated tokens

$3.13

Qwen Audio 3.0 TTS

Alibaba CloudSingaporeReleased Jul 20, 2026Audio Generation

Proprietary Endpoint

Tiered speech synthesis with over 1,000 voices, 16 languages, 20 Chinese dialects, natural-language delivery direction, and inline emotion tags.

Chat API

Type

Spec

Rate

Plus synthesis

per 10,000 characters

$0.20

Flash synthesis

per 10,000 characters

$0.15

Pay-as-you-go AI model pricing.

Metric

Specification

Price (per 1M Tokens)

Metric

Specification