GPT-5.5: OpenAI rolled out the smartest model – and it already works for you

◷ 10 min read 4/23/2026 by: Alexey, VibeCode

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

GPT-5.5: OpenAI rolled out the smartest model – and it already works for you - обложка

On April 23, 2026, OpenAI showed GPT-5.5 — and this is not just another “plus 2% on benchmarks”. According to the company, this is “a new class of intelligence for real work,” and judging by the numbers, this is not a marketing slogan, but a real generational change.

The model is already rolling out in ChatGPT and Codex for Plus, Pro, Business and Enterprise subscribers. In addition, the GPT-5.5 Pro is a heavier version for Pro/Business/Enterprise, sharpened for the most difficult tasks. The API promises "very soon.".

What is GPT-5.5 and why is it talked about

The key idea of GPT-5.5 is **agency. Instead of explaining each step to her, you throw the model a dirty, multi-component task: “fix this bug”, “analyze 6 months of data”, “make an application on the screen”, and she herself:

planning a solution,
uses tool-use (calls tools, browser, terminal),
check your own conclusion,
it deals with ambiguities,
** and don't give up until the task is closed.*.

OpenAI saw a particularly strong jump in four areas:

Agentic coding – writing and debugging code for long-term tasks.
Computer use - the model actually "sees the screen", clicks, types, switches between windows.
Knowledge work - documents, tables, presentations, discussion.
Scientific research - scientific problems in biology, mathematics, bioinformatics.

At the same time, which is especially nice: speed remained at the level of GPT-5.4 (latency per token), despite the fact that the model is noticeably smarter. And tokens on the same task in Codex, it spends less, not more.

Benchmarks: figures that are really impressive

OpenAI compared GPT-5.4, GPT-5.5 Pro, Claude Opus 4.7 and Gemini 3.1 Pro. Here are the key metrics:

Бенчмарк	GPT-5.5	GPT-5.4	GPT-5.5 Pro	GPT-5.4 Pro	Claude Opus 4.7	Gemini 3.1 Pro
Terminal-Bench 2.0	82.7%	75.1%	—	—	69.4%	68.5%
Expert-SWE (internal)	73.1%	68.5%	—	—	—	—
SWE-Bench Pro	58.6%	57.7%	—	—	64.3%*	54.2%
GDPval (wins or ties)	84.9%	83.0%	82.3%	82.0%	80.3%	67.3%
OSWorld-Verified	78.7%	75.0%	—	—	78.0%	—
BrowseComp	84.4%	82.7%	90.1%	89.3%	79.3%	85.9%
Toolathlon	55.6%	54.6%	—	—	—	48.8%
Tau2-bench Telecom	98.0%	92.8%	—	—	—	—
FrontierMath Tier 1–3	51.7%	47.6%	52.4%	50.0%	43.8%	36.9%
FrontierMath Tier 4	35.4%	27.1%	39.6%	38.0%	22.9%	16.7%
CyberGym	81.8%	79.0%	—	—	73.1%	—
BixBench	80.5%	74.0%	—	—	—	—
GeneBench	25.0%	19.0%	33.2%	25.6%	—	—

Anthropic itself reported signs of memorization on parts of SWE-Bench Pro tasks.

Particularly significant are Terminal-Bench 2.0 (+7.6 p.p. relative to GPT-5.4), FrontierMath Tier 4 (+8.3 p.p.) and Tau2-bench Telecom - 98% without prompt tuning, that is, the model understands the problem "from the first word".

Agentic coding: the main feature of the release

This is the part where GPT-5.5 really breaks off. The model holds the context of large systems, talks about ambiguous bugs, tests hypotheses with tools and carefully drags changes through the entire codebase.

A few testimonials from early testers:

Dan Shipper (Every): "The first coding model to have real conceptual clarity." He had a real-life post-launch bug that took a few days to dismantle one of the best engineers and eventually rewrote part of the system. GPT-5.4 didn't pull that off. GPT-5.5 – *issued the same refactoring as the engineer.
Pietro Schirano (MagicPath): GPT-5.5 stinked a branch with hundreds of front-end and refactoring changes to the main (which also changed a lot) ** in one sitting in ~20 minutes**.
NVIDIA Engineer: Losing access to GPT-5.5 is like having a limb amputated.
Fabian Hedin (CTO Lovable): auth flow, real-time sync, multi-file edits - everything started to get ** the first time, without endless iterations**.

In Codex, the model actually plays the role of an engineer: it writes code, refactors, debagrams, tests, validates – and predicts what will need to be checked/tested, without explicit requests from the user.

Knowledge work: a model that can use a computer

The same abilities that make GPT-5.5 strong in code make it powerful in everyday computing. The model understands the point better, so the whole cycle of knowledge work itself goes through: find → understand → do → check → give out a ready-made artifact.

In Codex, it is better than GPT-5.4 at generating documents, tables and presentations. In conjunction with computer use, this is the feeling that “the model is really using the computer with me”: sees the screen, clicks, types, switches between applications.

What's telling is that OpenAI is already using it internally:

85%+ employees learn Codex every week (engineering, finance, communications, marketing, data science, product).
Comms team for 6 months of data on speaking-requests built a scoring framework and a Slack-agent, which handles low-risk queries.
Finance ran 24,771 Form K-1 (71,637 pages) and accelerated work by 2 weeks compared to last year.
A Go-to-Market employee automated weekly business reports — saving 5-10 hours a week.

There are two modes available in ChatGPT: GPT-5.5 Thinking (for complex tasks, but quickly) and GPT-5.5 Pro (for very hard work – law, business, data science, research).

GPT-5.5 as a co-scientist

It's interesting here too. The model doesn’t just “answer complex questions,” it **holds the entire research cycle: hypothesis → data → test → interpretation → next step.

Examples:

GeneBench (multi-stage scientific analysis in genetics and quantitative biology): GPT-5.5 is markedly ahead of GPT-5.4. The tasks that the experts occupy **multi-day projects **, the model solves itself.
*BixBench (Real Bioinformatics): Leadership among published models.
Ramsey numbers: The internal custom-bound version of GPT-5.5 found new evidence of an asymptotic fact about off-diagonal Ramsey numbers. The proof was then verified in Lean. It’s no longer “coding” — it’s a real mathematical contribution.
Derya Unutmaz (Jackson Laboratory): analyzed the gene expression dataset (62 samples, ~28,000 genes) - obtained a detailed research report with key questions and insights. Work that would take the team months.
Bartosz Naskręcki (mathematician): in 11 minutes one prompt received an application on algebraic geometry with visualization of the intersection of square surfaces and conversion to the Weierstrass model.

Efficiency of inferencing of the new generation

In order for GPT-5.5 to operate at the latency level of GPT-5.4, OpenAI redefined the inference as a single system rather than a set of isolated optimizations.

Co-designed, trained and served on the NVIDIA GB200 and GB300 NVL72. And what's particularly funny is that Codex and GPT-5.5 itself helped write the infrastructure that serves it. One specific case: Codex analyzed weeks of food traffic and wrote custom heuristics for partitioning and load balancing. The result is +20% to the token generation rate.

Prices and availability

Продукт	Доступно для	Цена API (за 1M токенов)
GPT-5.5 в ChatGPT	Plus, Pro, Business, Enterprise	—
GPT-5.5 Pro в ChatGPT	Pro, Business, Enterprise	—
GPT-5.5 в Codex	Plus, Pro, Business, Enterprise, Edu, Go	400K контекст
gpt-5.5 API	Responses и Chat Completions (скоро)	$5 input / $30 output, 1M ctx
gpt-5.5-pro API	скоро	$30 input / $180 output

There is a Fast mode in Codex - ** Tokens are generated 1.5 times faster at 2.5x prices**. For the API, there is Batch and Flex at half the price of the standard, and Priority is 2.5x of the standard.

Yes, GPT-5.5 is more expensive than GPT-5.4 per token. But it’s also smarter and spends fewer tokens on the same task—in Codex, the difference is often offset.

Security: OpenAI's toughest set of safeguards to date

OpenAI classified GPT-5.5’s biological/chemical and cyber capabilities as High by their Preparedness Framework. The model did not reach the Critical cyber level, but the growth relative to GPT-5.4 is noticeable.

What they added:

More stringent classifiers of potentially dangerous cyber queries (users may at first consider them “boring” – OpenAI promises to muddy over time).
Separate protections against repeated abuse.
**Trusted Access for Cyber: Verified defenders (e.g., organizations protecting critical infrastructure) can access cyber-permissive models like GPT-5.4-Cyber with less restrictions.
Working with government partners to protect critical infrastructure such as power grids, water supply, and tax data.

Before the release, the model passed a full cycle of evaluations, redteaming (internal and external), targeted tests in biology and cyber, as well as feedback from ~200 trusted early-access partners.

Bottom line: what does it mean in practice

GPT-5.5 is not just another step. This is the moment when agentic coding and computer use are transformed from demos into a working tool to which you can delegate real multi-hour tasks.

Key takeaways:

Agentic coding level state-of-the-art - Terminal-Bench 82.7%, Expert-SWE 73.1%, and real reviews about "limb amputated".
GPT-5.5 works at the level of GPT-5.4 over latency, but is noticeably smarter.
Token efficiency higher* – Codex often costs less than GPT-5.4.
Real computer use - OSWorld-Verified 78.7%, the model actually clicks on the interfaces.
**Scientific Contribution — new evidence about Ramsey numbers, BixBench, GeneBench — is a co-scientist, not a toy.
Price Adequate to Opportunities - $5/$30 for M tokens for the base version, $30/$180 for Pro.

If you live at Codex or write agents, chances are you’ve already felt it. If not, it’s time to see what GPT-5.5 can take you out of your routine.

Official announcement: openai.com/index/introducing-gpt-5-5