~/wiki / spravka / gpt-5-reasoning-effort-low-medium-high-xhigh

Reasoning effort in GPT-5.4 and GPT-5.5: when to use low, medium, high and xhigh

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

$ cd section/ $ join vibe dev
Reasoning effort in GPT-5.4 and GPT-5.5: when to use low, medium, high and xhigh - обложка

If you are working with GPT-5.4 or GPT-5.5 through an API, one of the parameters determines the result more strongly than the wording of the prompt is reasoning.effort. It depends on how deeply the model “thinks” before answering, how much it will cost and how much you have to wait.

Most developers either ignore this parameter (leaving default) or put high on everything just in case - and both approaches are equally suboptimal. In this article, we analyze each level separately, with real numbers for price and latency, and give clear selection criteria.


What is Reasoning Effort

reasoning.effort controls the computational budget that the model spends on “thinking” before forming a final answer. Reasoning models generate reasoning tokens – internal tokens that the model uses to “think”: break down a request into parts and consider different approaches before forming a response.

Key technical point: reasoning tokens are billed as part of output tokens. There is no separate markup – but the higher the effort, the more tokens are generated and the more you pay. A request with high effort can generate twice as many tokens as low for the same task – and you pay for all those tokens.


Five levels: What each means

Supported values are model-specific and may include none, minimal, low, medium, high and xhigh. Not all models support the entire set – it is worth checking the documentation of a particular model before choosing a setup.

none - without reasoning

The model behaves like an unreasoning one: responds immediately, without internal “thinking.” The fastest and cheapest option.

When to use: Tasks that do not require reasoning or chains of tool calls - easy voice cues, quick information search, classification.

low – effective reasoning

Minimum level of reasoning with emphasis on speed. The model is still “thinking”, but briefly.

When to use: for latency-sensitive tasks – but if tools, planning, searching or multi-step decisions are involved, evaluate low, not none, because none may be too limited for such scenarios.

medium – balanced point (default)

GPT-5.5 uses medium reasoning effort. This is the recommended starting point for the balance of quality, reliability, latency and cost.

If you didn’t specify reasoning.effort at all, the model uses this layer. For most tasks, the API is the correct default choice.

  • When to use:** Basic level for anything that is not explicitly low/none or high/xhigh.

high for complex agency tasks

A high level of reasoning designed for complex agent tasks requiring serious thinking in situations where latency is not critical.

When to use:** batch processing (code review, document analysis, data extraction) – here you do not block the user waiting, so additional latency does not matter, and the improvement in accuracy accumulates on hundreds of elements.

xhigh – maximum depth for the most complex tasks

xhigh is designed for the most complex asynchronous agent tasks or for evals that test the limits of model intelligence.

xhigh was added as a level of reasoning effort starting with models after the GPT-5.1 Codex Max - earlier models do not support it at all.

When to use: High-risk single queries – codebase security audit, complex migration planning, new algorithm development. This is where increased computing pays off.


Real numbers: latency and cost by level

Here begins the most important thing – concrete data, not general formulations.

Time to First Token (Time to First Token)

The time to the first token at xhigh for GPT-5.5 is about 115 seconds on the Responses API – this is not a typo. If the product interface is designed for a streaming response within five seconds, xhigh cannot be put on the main user path.

Cost

Calculations on the same benchmark (Artificial Analysis Intelligence Index) at different effort levels for GPT-5.5:

Уровень effort Сгенерировано токенов Стоимость прогона
medium (дефолт) ~23 773 (на задачу, ProfBench) базовая
high ~45 млн токенов суммарно $2 159
xhigh ~75 млн токенов суммарно $3 357

For comparison, GPT-5.4 at xhigh on the same benchmark generated about 120 million tokens in total at a cost of $ 2,851 - even more than GPT-5.5 on high.

A request for xhigh can cost 3-5 times more than the same request for low – this should be taken into account in calculating the budget for the project, especially when batch processing a large number of tasks.

Basic API prices (per 1M tokens)

Модель Вход Выход Кэшированный вход
GPT-5.4 $2.50 $15
GPT-5.5 $5 $30 $0.50 (скидка 90%)
GPT-5.5 Pro $30 $180 без скидки на кэш

An important nuance: the GPT-5.5 Pro does not have a discount on cached input. If your workflow has a stable long preamble (repeated system prompt or context) – this removes one of the main reasons to keep long prefixes. For queries with more repetitive context, it is wiser to either use regular GPT-5.5 or shorten the context.


The main mistake: “higher = better” is not true

This is a key thesis worth remembering: a higher reasoning effort does not automatically mean a better outcome.

If a task contains conflicting instructions, weak stop criteria, or open access to tools, a higher effort can lead to overthinking, excessive search, or even poor output.

This is counterintuitive, but logical: a model with a long “time to think” with fuzzy instructions begins to think for you – to generate additional steps, rechecks, alternative ways that were not needed and take away from the essence of the task.

What to do instead of increasing effort

From the practical experience of developers, before raising the reasoning effort, it is worth checking and improving the following – and often it gives a better result than switching to high or xhigh:

*Clarity of instructions. * Be clear about what you need to get out.

Few-shots. A few examples of a desired outcome often help more than additional “reflection.”.

Structured output. Use the response format to limit and direct the response format.

Verification steps. Ask the model to check their work before the final answer.

Decomposition. Break down a complex task into simpler subtasks – instead of one huge xhigh query, multiple medium queries.

The advice from the community discussions is: first improve completion rules, verification cycles, and tool usage rules – and then raise the reasoning effort.


verbosity is the second parameter that is often confused with effort

In addition to reasoning.effort, GPT-5.x has a separate parameter text.verbosity with values low, medium, high. These are different things: effort affects how much the model thinks, verbosity affects how much the model writes in the final answer.

The verbosity parameter consistently scales both the length and depth of the model output, maintaining the correctness and quality of reasoning - without changing the prompt itself.

In practice, the difference is as follows:

  • low verbosity - minimal, functional result without unnecessary comments and structure
  • medium verbosity - explanatory comments, function structure, reproducibility elements are added
  • high verbosity - a complex, production-ready result with analysis of arguments, several approaches, runtime checks, notes on use

An important nuance for GPT-5.5: on this model, low verbosity gives proportionally more concise answers compared to the same low value on GPT-5.4. That is, the same parameter on different models gives a different degree of output compression.

Effort and verbosity - independent axes

These are two different settings that can be combined:

low verbosity high verbosity
low effort Быстро, кратко, минимум размышлений Быстро по размышлению, но многословный вывод
high effort Глубокое размышление, краткий ответ Глубокое размышление + развёрнутый ответ (дороже всего)

For example, for the boundary case classification problem, medium effort + low verbosity makes sense: the model thinks enough to correctly classify, but does not spend tokens explaining each solution.


Practical table of choice by type of task

Тип задачи Рекомендуемый effort Обоснование
Автокомплит, чат в реальном времени none или low Латентность критична, рассуждение не добавляет ценности
Классификация, простой лукап none Задача не требует многошагового мышления
Голосовые реплики (voice UI) none Пользователь ждёт мгновенного ответа
Общие API-задачи без явной специфики medium (дефолт) Лучший баланс качества/цены/скорости для большинства случаев
Ревью кода в pipeline (не блокирует пользователя) high Латентность не важна, точность накапливается на множестве задач
Анализ документов batch-режимом high Аналогично — асинхронная обработка
Аудит безопасности кодовой базы xhigh Высокая цена ошибки оправдывает 12x вычислений
Планирование сложной миграции xhigh Разовая задача с большой ценой неправильного решения
Рефакторинг с тонкими архитектурными инвариантами xhigh или альтернативная модель См. раздел про конкурентов ниже
Eval / бенчмаркинг моделей xhigh Цель — проверить пределы возможностей модели

Computer Use and Web Search: Special Effort Requirements

A separate technical nuance: the web search tool through the API requires a reasoning model – GPT-5 non-reflective surfaces do not provide access to this tool in the same way through the API.

For Computer Use, GPT-5.4 scored 75% on the OSWorld benchmark, exceeding the human expert baseline of 72.4%. This mode is enabled by transferring the computer_use tool type – and here it is also important to test different levels of effort, since interface management tasks are often multi-step and benefit from reasoning above low.


How does GPT-5.5 compare to competitors on different efforts

If you’re choosing between models for a specific task, here are the landmarks in several directions that are relevant at the time of the release of GPT-5.5.

Against Claude Opus 4.7 on complex tasks with the code: GPT-5.5 is inferior to Opus 4.7 on the SWE-bench Pro - 58.6% against 64.3%. If your script is closest to “fixing a real GitHub bug in 40 files,” Opus may be the default choice, regardless of the effort level of GPT-5.5.

Against price-optimized models: Alternatives like the DeepSeek V4 Pro cost about 7 times less than the standard GPT-5.5 and remain competitive on several smart benchmarks. If cost is the main factor in your project, and GPT-5.5 is not for unique features, but for overall quality, it is worth testing cheaper alternatives before switching to high / xhigh.

GPT-5.5 Pro and xhigh should be reserved for really frontline tasks – research, complex mathematics, multifile refactorings with fine invariants. Do not place such requests on the hot path of a high-loaded product.


Technical details for developers

Parallel tool calls and minimal effort

Parallel tool calls are not supported if reasoning_effort is installed in minimal – this is important to consider if your agent relies on multiple tools simultaneously.

System and developer messages

Modern reasoning models support system messages to facilitate migration. It is not recommended to use both a developer message and a system message in the same query – this can lead to conflicts in the processing of instructions.

Chat Completions vs Responses API

Reasoning models work with the max_completion_tokens parameter when using the Chat Completions API, whereas max_output_tokens is used when working with the Responses API. This is especially important with the high/xhigh effort – without an explicit limitation, the model can generate many more reasoning tokens than planned, and you will get an unexpected score.

Preamble to reduce perceived delay

For applications that are sensitive to latency, you can ask the model to generate a short preamble before moving on to deeper reasoning – this gives the user a faster first visible token, even if the final answer is generated longer.

Adaptive reasoning

Models reason adaptively within a given effort level, using fewer tokens for simple query parts and more for complex ones. That is, even on the high effort, the model does not spend the same number of tokens on each subtask – simple parts are processed faster.


Checklist before changing reasoning effort

plaintext
Run benchmark on medium (default) is the baseline for comparison
● Completion criteria checked: whether the stop conditions in the prompt are clear
● Instructions for Contradictions – Are There Conflicting Claims
● Few-shot examples instead of increased effort
Structured output through response format
Added step of self-check model before the final answer
● Complex task decomposed into subtasks
● Only now, if the above does not give a result – tested high/xhigh
● For high/xhigh, max output tokens are set to avoid uncontrolled account growth.
● For latency-sensitive pathways, xhigh is excluded from the hot product path
Verbosity is configured separately from effort for the format of the desired answer

Outcome

Reasoning effort is not a “quality slider” that needs to be twisted to the maximum for better results. It’s a trade-off slider between speed, price, and depth of thought, and for most API tasks, the correct value is medium, set by default.

Increase to high is only for asynchronous batch problems, where latency is not critical, and the accumulation of accuracy on a set of requests justifies the increased cost. Prior to xhigh – only for rare, costly errors: security auditing, architectural planning, frontline research – and never on the hot path of a product, given latency up to 115 seconds.

Before increasing the effort, it is always worth checking the cheaper levers: the clarity of the prompt, few-shot examples, structured output, verification and decomposition of the task. Often they give a greater increase in quality than the transition from medium to high - at a multiple of lower cost.

$ cd ../ ← back to Reference