Home Blog

How to Use the Kimi K2.7 Code API

Kimi K2.7 Code via API cover

Jun 12, 2026

EmpirioLabs AI

Disclosure: This article was written with AI assistance and reviewed by EmpirioLabs AI.

Kimi K2.7 Code is Moonshot AI's new agentic coding model, released on June 12, 2026. It is a trillion-parameter Mixture-of-Experts model that activates roughly 32B parameters per token, tuned specifically for code generation, debugging, tool use, and long multi-step engineering workflows. On Moonshot's own coding benchmark it jumps from 50.9 (K2.6) to 62.0, and it scores 81.1 percent on MCPMark Verified, ahead of several frontier closed models on agentic tool use.

The model is live on EmpirioLabs today with a 262,144-token context window, always-on reasoning, function calling, JSON mode structured output, and text, image, and video inputs. You can try it in the playground or call it through the OpenAI-compatible API. The full spec lives on the Kimi K2.7 Code model page and the API docs.

Pricing

Billing is strictly usage based with no subscription: input and output tokens are metered per token, and each invoked web search adds a small per-call fee that applies only when a search actually runs. Current per-token rates always live on the model page. Reasoning is always on for this model, and reasoning tokens are billed as output tokens, so budget your max tokens with that in mind.

Quickstart

Kimi K2.7 Code is OpenAI-compatible, so the official SDKs work by pointing the base URL at EmpirioLabs:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_EMPIRIOLABS_API_KEY",
    base_url="https://api.empiriolabs.ai/v1",
)

response = client.chat.completions.create(
    model="kimi-k2-7-code",
    messages=[
        {"role": "user", "content": "Write a Python function that merges overlapping intervals."}
    ],
)

print(response.choices[0].message.reasoning_content)  # the model's reasoning
print(response.choices[0].message.content)            # the final answer

Streaming, function calling, JSON mode, the Anthropic-style /v1/messages endpoint, and the /v1/responses endpoint all work out of the box.

Things to know before you build

A few operational details we confirmed while bringing the model up:

  • Thinking is always on. Every response includes reasoning_content ahead of the final answer, and it cannot be disabled. Reasoning counts toward output tokens and toward your max tokens limit, so leave headroom: the API defaults to a generous output budget and accepts up to 131,072 output tokens per request.
  • Sampling is fixed. The model service runs pinned sampling settings, so temperature, top_p, and penalty overrides are accepted but ignored rather than rejected. Your existing OpenAI-style code works unchanged.
  • Web search is built in. Set "tool_web_search": true on any chat request and the model runs its hosted web search tool itself: it decides when to search, reads live results, and cites sources in the answer. Each invoked search adds a small per-search fee, billed only when a search actually runs and reported in usage.tool_usage.web_search.
  • Tool calls carry reasoning. When you run your own function-calling loops, replay the assistant message with its reasoning_content field intact; the model service requires the current turn's reasoning to stay in context during multi-step tool calling.
  • It is genuinely multimodal. Image and video inputs work through standard OpenAI content arrays, which makes it practical to debug from screenshots or screen recordings.

Summary

Kimi K2.7 Code brings frontier-level agentic coding to a usage-based, per-token API. Start in the playground, read the docs, or grab an API key and point your OpenAI SDK at https://api.empiriolabs.ai/v1.

Ready to use better endpoints?

Explore our models, or contact us about business inquiries, custom deployments, or anything else.