Kimi K2.7 Code: a new open source model of Moonshot AI for agent coding

◷ 12 min read 6/13/2026

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

Kimi K2.7 Code: a new open source model of Moonshot AI for agent coding - обложка

On June 12, 2026, the same week that Anthropic released and almost immediately recalled Claude Fable 5, China’s Moonshot AI released the Kimi K2.7 Code, an open-source model specialized in agent programming. The release states a 30% reduction in reasoning token consumption compared to the previous version and improvements on several internal benchmarks.

We understand what this model is, what its real characteristics are, and what independent tests say about it - there are nuances.

Context: Where did K2 come from? 7

Moonshot AI, founded in 2023 by Tsinghua University graduate Zhilin Yang, built a company around the Kimi chatbot. The transition to open weights began with the K2 series in mid-2025, and since then the pace of releases has remained high: the base model K2 was released in July 2025, K2 Thinking with improved reasoning in November 2025, K2.5 in January 2026, K2.6 in April 2026. K2.7-Code, released in June 2026, is the fifth major release in less than a year.

An important detail about the K2.6: When the model was released in April, it ranked first in the weekly OpenRouter rankings, a ranking based on real-world developer solutions for routing requests through APIs rather than self-declared benchmarks. This gives the K2.7 a certain amount of credibility at the start – the previous version was actually used, rather than just looking pretty in the press release.

Technical specifications

Kimi K2.7 Code is a Mixture-of-Experts model with 1 trillion total parameters and 32 billion active parameters per token. The architecture uses 384 experts, of which 8 are selected for the token plus 1 total, 61 layers (including 1 dense layer), MLA for attention and SwiGLU for feed-forward path. The MoonViT visual encoder adds 400 million parameters for image and video processing.

Context window – 256 thousand tokens. That’s noticeably less than the 1 million Claude flagships have — a difference worth considering when choosing a model for very long context tasks.

The model is available under the Modified MIT License – the license allows commercial use with authorship for large-scale deployments. The scales are placed on Hugging Face, and Moonshot claims support for deployment through vLLM, SGLang, and KTransformers.

K2.7-Code does not support non-thinking mode. Forced thinking with preserve_thinking enabled is a feature designed for multi-step agent workflow in coding, but it also means that for trivial queries you can not turn off “thinking” and save on tokens in this way.

What Moonshot Says: Benchmarks

The Moonshot team published six lines of benchmarks comparing the K2.7-Code to its predecessor K2.6 and its competitors.

Improvements regarding K2. 6

According to the company, K2.7-Code improves on all six published indicators relative to K2.6, with the largest gains in coding-specific tests:

+21.8% on Kimi Code Bench v2
+11.0%* on Program Bench
+31.5% on MLS Bench Lite
approximately -30% for reasoning tokens

All three benchmarks – Kimi Code Bench v2, Program Bench and MLS Bench Lite – are proprietary benchmarks of Moonshot AI.

Comparison with Frontier Models

The picture here is less clear. According to a table published by Moonshot, GPT-5.5 remains ahead of K2.7-Code on all six rows of the table. Claude Opus 4.8 leads the K2.7-Code in five of the six metrics.

But there are exceptions that Moonshot highlights separately: K2.7-Code outperforms Opus 4.8 on the MCP Mark Verified at 81.1 vs 76.4. On the MLS Bench Lite, the K2.7-Code is close to GPT-5.5 (35.1 vs. 35.5), though still behind Opus 4.8 at 42.8.

MCP Mark Verified is a benchmark that checks the correctness of calling tools through the Model Context Protocol: something at the intersection of CI checks, ticket updates and file editing in one loop. Excellence is logically combined with the positioning of the model as an agent.

Methodological reservation from Moonshot itself

Important: K2.7-Code and K2.6 were tested with thinking mode enabled, while GPT-5.5 was tested in Codex at xhigh, and Claude Opus 4.8 was tested in Claude Code at xhigh. These are primary numbers from the vendor, not from an independent leaderboard – that is, the test conditions for different models are not identical, and the comparison should be taken with this caveat.

What Independent Observers Say

Here begins the most interesting – and the most important for those who plan to use the model in production.

Benchmarks have not yet been independently verified

At the time of release, the model was not sent to DeepSWE, an independent benchmark for coding, which gives a spread of 70 points between models (versus a spread of 30 points in the SWE-Bench Pro), which makes it a more distinguishing signal for teams configuring model routing systems. In other words, the reported increases are confirmed only by Moonshot itself.

Reaction of practitioners to K2.6 as a guide

Since there are few independent tests of the K2.7, it is useful to see how the community reacted to the K2.6, a model with a similar architecture that was released in April. User Hacker News described it as "dirt cheap on OpenRouter for how good it is." Simon Willison conducted a live demonstration of the generation of animated SVG/HTML through OpenRouter and called the model practical and fast. According to unconfirmed reports from the discussions, the K2.6 was used as a backend for the composer-2 in Cursor — an integration that is harder to fabricate than the Vendor benchmark.

There was also a skeptical camp: one commentator on Hacker News wrote that, when used in real life, the impression was "so-so despite strong benchmarks" - a recurring complaint about underperformance in specific areas. According to BenchLM, the Claude Opus 4.7 scored 94 against 68 for the Kimi K2.5 overall.

This illustrates the common pattern with the Kimi K2 series models: strong in-house benchmarks and real price/throughput popularity on OpenRouter – but mixed experiences when compared to top-end closed models on specific tasks.

Prices and access

Through the Moonshot API

On the official API, Kimi K2.7-Code is valued at $0.95 per million input tokens, $4.00 per million output tokens, and $0.19 per million cache-hit tokens.

If you apply the stated reduction of reasoning tokens by 30% to the account output, for agent workflow with a large share of withdrawal, this is a direct savings on inferencing, provided that the statement is confirmed in practice.

Access methods

Moonshot API - primary, OpenAI-compatible API, base URL platform.moonshot.ai, model id kimi-k2.7-code
OpenRouter – routing like moonshotai/kimi-k2.7-code, handy if you already have centralized billing and fallbacks there
Cloudflare Workers AI - edge-inferencing as @cf/moonshotai/kimi-k2.7-code
Vercel AI Gateway - for teams standardized on routing via Vercel
Self-hosting – Hugging Face weights, deployment via vLLM, SGLang or KTransformers

Kimi Code is a separate product

In addition to the API, Moonshot offers Kimi Code, a terminal coding agent with subscription plans starting at $19 per month. This is a separate billing from using the API. Also announced is “6x High-Speed Mode” as an option that will appear later – the speed-tier pattern already familiar from closed models: for example, Anthropic’s fast mode on Opus 4.8 runs 2.5 times faster for double the price.

Strategically, it reads like this: Moonshot doesn’t just publish weights, it builds a subscription platform around them – the same model + plan playbook Anthropic uses with Claude Code.

Compatibility with existing agents

Moonshot documentation has historically supported the use of Kimi models inside third-party agents—including Claude Code, Cline, and Roo Code—through compatible API endpoints. This lowers the threshold for trial use to change the environment variable, without rewriting the infrastructure.

Strengths and weaknesses: an honest assessment

Strengths

Open weights under a commercially usable license – you can self-host, inspect, not depend on one closed vendor.

Compatibility with existing agent tools through an OpenAI-compatible API is a low trial run threshold.

The notable superiority in MCP tools is particularly useful for agent pipelines with CI, ticketing and multi-step editing.

The price is significantly lower than the flagship closed models – $0.95/$4.00 against, for example, $5/$30 for GPT-5.5.

Weaknesses

This is a specialized Code-model - at the start there is no corresponding general-purpose "Kimi K2.7" or Instruct-version, that is, the model is sharpened for engineering tasks, not for a wide chat.

The 256K context is noticeably smaller than 1M for the Claude flagships.

Standard third-party benchmarks have not been released at the time of release - performance statements are so far based only on Moonshot figures.

Hard for self-hosting - a trillion parameters require a serious infrastructure, even with 32B active.

Forced thinking without disabling means that you can’t run a model in cheap mode without reasoning for trivial calls – this may not be effective for simple tasks in a mixed flow of queries.

How it fits into the June 2026 picture

The release of K2.7-Code is also notable for timing. The same week began with the release of the Claude Fable 5, the most powerful public Anthropic model, which the U.S. government ordered turned off after three days on national security grounds.

Against this background, an open model with weights on Hugging Face, available for self-hosting without depending on the cloud API of one vendor, gets an additional practical argument: for teams who do not want to risk the sudden unavailability of the model by a third party decision, open weights are a form of infrastructure independence, even if the pure benchmark model does not lead.

What to do now if you are considering K2.7-Code

If your stack is already using Kimi K2.6 through OpenRouter or a direct API, switching to K2.7-Code is reduced to replacing the model ID, and it’s worth trying on the traffic part, comparing the actual consumption of tokens on your tasks with the stated decrease of 30%.

If you choose a model from scratch for agent coding and the price for the result is important - K2.7-Code should be included in the list of candidates for A/B testing, but not as the only option, especially before the emergence of independent benchmarks such as DeepSWE.

If the key criterion is deep work with MCP tools in long agent cycles, the advantage of K2.7-Code on MCP Mark Verified (81.1 vs. 76.4 for Opus 4.8 according to Moonshot) makes the model worthy of a separate test in this scenario.

If you want a context of more than 256K tokens or a general-purpose model without forced thinking – K2.7-Code is not suitable for architecture, and you should look at K2.6 (if you need an Instruct variant) or other flagships.

Outcome

Kimi K2.7-Code is a logical step in the rapid cycle of releases of Moonshot AI: specialization in agent coding, a stated reduction in the consumption of reasoning tokens by 30% and a clear focus on the MCP tool-use, where the model according to Moonshot’s own data overtakes even Claude Opus 4.8.

At the same time, the key figures for today are the first-party data of the vendor, not sent to independent benchmarks such as DeepSWE, with an obvious methodological reservation about the different conditions for testing different models. The K2.6 story shows that previous models in the series have had real popularity on OpenRouter with mixed direct-use experiences - it's reasonable to expect a similar picture here until independent data emerges.

The practical conclusion is that the model is worth testing, especially if you are already in the Kimi ecosystem or the price per token is critical for your agent pipeline – but don’t make a decision about migration based on the benchmark table from the press release.