How are reasoning models different from standard LLMs?

They are trained and configured to reason at length before answering, spending more inference compute on hard problems instead of replying in near-constant time.

What is test-time compute?

Computation spent at inference (the model 'thinking' longer), as opposed to train-time compute spent making the model. It is a distinct way to improve results.

Should I always use a reasoning model?

No. They cost more and add latency. Use them for hard, multi-step problems and route simpler tasks to faster, cheaper models.

Do they eliminate hallucination?

No. Reasoning improves accuracy on many tasks but does not guarantee correctness; grounding, tools and evaluation remain necessary.

ConceptsUpdated 2026-06-21 · Version 1.0

What are Reasoning Models?

Reasoning models are language models trained to spend extra computation 'thinking' before they answer — generating internal reasoning steps to solve harder problems in math, code and logic. They trade latency and cost for accuracy on complex, multi-step tasks. The key idea is test-time compute: letting a model reason longer at inference, rather than only making the model bigger, can substantially improve results.

Evidence: BenchmarkConfidence: HighSource: BenchmarkSource: Paper

Machine-readable: JSON

Definition

Reasoning models are language models optimized to perform extended step-by-step reasoning at inference time — using additional test-time compute — to improve accuracy on complex, multi-step problems.

Key takeaways

They 'think' before answering, using extra inference compute.
Test-time compute is a new scaling axis beyond model size.
Best for math, code, logic and multi-step planning.
They trade latency and token cost for accuracy.
Overkill for simple tasks — match the model to the problem.

Context

Standard models answer in roughly constant time regardless of difficulty. Reasoning models break that: they generate a chain of internal reasoning, effectively spending more compute on harder questions, which lifts performance on tasks that need multi-step deduction.

This introduced a second scaling axis. Beyond making models larger (train-time compute), you can let them reason longer at inference (test-time compute) — a major driver of recent progress on hard benchmarks.

Architecture

Reasoning models are typically trained to produce long internal reasoning before a final answer, often reinforced with reinforcement learning that rewards correct outcomes. At inference, more 'thinking' tokens generally mean better answers on hard problems.

In agentic systems, reasoning models serve as strong planners and decision-makers, while cheaper, faster models can handle routine steps. Routing between them by task difficulty is a common cost-control pattern.

Components

Extended reasoning (thinking tokens)Test-time compute budgetRL-based training for reasoningFinal-answer extraction

Benefits

Higher accuracy on complex, multi-step problems.
Strong at math, coding and planning.
Reasoning effort can be scaled per query.
Good planners at the core of capable agents.

Risks

Higher latency and token cost.
Overkill — and wasteful — for simple tasks.
Longer reasoning is not always more correct.
Internal reasoning can be hard to audit or trust verbatim.

Tools & technologies

Reasoning model tiers from major providersAdjustable reasoning-effort settingsModel routing by task difficultyEvaluation suites

Examples

Solving a multi-step math or logic problem that trips up a standard model.
Planning a complex agent task before execution.
Routing only hard tickets to a reasoning model to control cost.

FAQs

How are reasoning models different from standard LLMs?: They are trained and configured to reason at length before answering, spending more inference compute on hard problems instead of replying in near-constant time.
What is test-time compute?: Computation spent at inference (the model 'thinking' longer), as opposed to train-time compute spent making the model. It is a distinct way to improve results.
Should I always use a reasoning model?: No. They cost more and add latency. Use them for hard, multi-step problems and route simpler tasks to faster, cheaper models.
Do they eliminate hallucination?: No. Reasoning improves accuracy on many tasks but does not guarantee correctness; grounding, tools and evaluation remain necessary.