A reasoning model, also known as a Large Reasoning Model (LRM), is a type of artificial intelligence specifically trained to solve complex tasks that require multiple steps of logic. These models demonstrate superior performance in fields such as mathematics, coding, and science compared to standard Large Language Models.
OpenAI popularised this category in late 2024 with the release of the o1 series, describing these models as being designed to “spend more time thinking” before they respond. Unlike traditional models that generate answers immediately, reasoning models allocate additional compute time during inference to solve multi-step problems.
The core of a reasoning model’s operation is the generation of internal chains of intermediate steps. The model revisits and revises these steps to select and refine a final answer. Accuracy in these systems improves smoothly as the model is given more reinforcement learning during training and more compute time at inference. Commercial deployments now include controls for “reasoning effort,” which allows users to tune exactly how much thinking compute the model should allocate to a specific task. While this makes the models slower than ordinary chatbots, it enables much stronger performance on difficult exams and technical challenges.
Reasoning models, however, are rewarded for each correct intermediate step in their chain of thought. This approach significantly outperforms outcome-only supervision on challenging problems and improves interpretability because humans can check the logic of each individual step.
A central ingredient in the training of these models is process supervision. Traditionally, models were aligned using outcome-based rewards, where only the final answer was judged by human raters. Reasoning models, however, are rewarded for each correct intermediate step in their chain of thought. This approach significantly outperforms outcome-only supervision on challenging problems and improves interpretability because humans can check the logic of each individual step. This training teaches the model to recognise its own mistakes, break problems into simpler parts, and switch strategies when an approach fails.
However, this increased capability comes with distinct drawbacks, primarily in the form of higher inference costs. Research on mathematical benchmarks found that reasoning models can be 10 to 74 times more expensive to operate than their non-reasoning counterparts. This is due to the extended thinking time and the detailed, step-by-step outputs they produce, which are much longer than standard responses. Furthermore, increased reasoning depth has surprisingly been correlated with higher hallucination rates for specific person-based questions in some models, suggesting that multi-step logic creates new failure points where errors can compound. Despite these challenges, reasoning models represent a significant step toward achieving expert-level performance in complex technical domains.
The team at Academii are always happy to discuss all your training and education needs, help your organisation attract and train new talent, and build a resilient workforce. Please drop us a line here to know more.













































































