MiniMax M3 – 1M tokens, multimodality and frontier coding in one open model

◷ 5 min read 6/2/2026 by: Alexey, VibeCode

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

MiniMax M3 – 1M tokens, multimodality and frontier coding in one open model - обложка

On June 1, 2026, the Shanghai-based MiniMax Lab officially released the MiniMax M3, the next model in the series after the M2.7. API is now operational, weights and technical report are promised within 10 days.

The main thesis of MiniMax: M3 is the first open model that simultaneously closes three frontier tasks: leader level coding, 1 million token context and native multimodal input. Up to this point, each of these three properties was either closed or individual.

Architecture: MSA instead of full attention

The central innovation of the M3 is **MSA (MiniMax Sparse Attention), a new sparse attention mechanism designed specifically for working with very long contexts.

The problem of standard full attention is well known: computational complexity increases quadratically with the length of the context. You doubled the context, you got four times more computations. At 1 million tokens, this makes inferens unacceptably slow and expensive.

MSA solves this at the operator level through the “KV outer gather Q” approach: KV cache blocks serve as an external loop, aggregating requests. Each block is read once, memory access is continuous, the arithmetic intensity is much higher. According to MiniMax, MSA is 4+ times faster than open implementations of sparse attention (Flash-Sparse-Attention, flash-moba) on the M3 head configuration.

In practice, this gives in the context of 1M tokens:

9.7x faster* prefill stage compared to M2
in 15.6x faster stage decoding compared to M2
computational costs per token – 1/20 from M2

On most benchmarks, MSA delivers results comparable to full attention – savings are achieved without a noticeable loss of quality.

Coding and agent tasks

That's the main focus of the M3. Results on agent benchmarks at the time of release:

Бенчмарк	MiniMax M3
SWE-Bench Pro	59.0%
Terminal Bench 2.1	66.0%
SWE-fficiency	34.8%
KernelBench Hard	28.8%
MCP Atlas	74.2%

On the SWE-Bench Pro, the M3 scores 59.0%, higher than the GPT-5.5’s 58.6%. For an open model, this is a notable result.

MCP Atlas (74.2%) is a benchmark for the agency use of tools through the MCP protocol. This is directly relevant for vibcoding: an agent with M3 must be better at multi-step tasks, tool calls, and error recovery.

Multimodality and computer management

The M3 accepts text, images and video input without additional adapters – natively, within a single architecture. A separate feature is the management of the desktop computer as an integrated function, not an external plugin.

The context of 1M tokens opens up practical scenarios that were previously inaccessible: download the entire repository, transfer several hours of video, work with large codebases without breaking down into chunks. The API supports up to 1 million context tokens, with a guaranteed minimum of 512K high quality tokens.

Availability and prices

M3 is available through MiniMax Code, token plans and a standard API. Open weights and technical report are promised in about 10 days on Hugging Face and GitHub.

Price via API: $0.60 per million input tokens is one of the lowest among frontier-class models.

Also available through OpenRouter (minimax/minimax-m3 model).

What this means for Vibcoders

Three practical implications of the M3 release:

1M tokens are ~750K words or a few hundred thousand lines of code. Most of the real projects are placed entirely. The agent sees the entire context at once, not a sliding window.

** Cheap frontier coding via API.** $0.60 per million tokens with results above GPT-5.5 on SWE-Bench – the price/quality ratio is noticeably better than most alternatives.

**Agent loops without degradation. ** One of the key problems agents have with long sessions is re-prefilling every time a tool is called. The MSA architecture is specifically optimized to address this problem – a “latency killer” in agent cycles where an agent repeatedly invokes tools in a growing context.

## Current status

The API is live, you can connect right now. Weights and technical report - within 10 days from June 1. When the weights come out, there will be the possibility of local start-up and fine tuning.

Model page: minimaxi.com OpenRouter: minimax/minimax-m3 Context: 1M tokens (512K warranty) Price: From $0.60 / 1M input tokens