BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News OpenAI Study Investigates the Causes of LLM Hallucinations and Potential Solutions

OpenAI Study Investigates the Causes of LLM Hallucinations and Potential Solutions

Listen to this article -  0:00

In a recent research paper, OpenAI suggested that the tendency of large language models (LLMs) to hallucinate stems from the way standard training and evaluation methods reward guessing over acknowledging uncertainty. According to the study, this insight could pave the way for new techniques to reduce hallucinations and build more trustworthy AI systems, but not all agree on what hallucinations are in the first place.

According to OpenAI researchers, hallucinations are no mystery and arise from errors during the pre-training phase, where models cannot distinguish incorrect statements from facts because they are exposed only to positive examples. However, the researchers note that such errors would still be inevitable even if all pre-training data were labeled as true or false.

Those errors then persist through the post-training phase due to how models are evaluated. Simply put, evaluation methods tend to prioritize and rank models based on accuracy, while penalizing uncertainty or abstention. This generates a kind of vicious circle in which LLMs learn to guess in order to maximize accuracy on a relatively small subset of evaluation tests.

We observe that existing primary evaluations overwhelmingly penalize uncertainty, and thus the root problem is the abundance of evaluations that are not aligned. Suppose Model A is an aligned model that correctly signals uncertainty and never hallucinates. Let Model B be similar to Model A except that it never indicates uncertainty and always “guesses” when unsure. Model B will outperform A under 0-1 scoring, the basis of most current benchmarks.

Based on this insight, OpenAI researchers conclude that reducing hallucinations requires rethinking how models are evaluated. One proposed approach is to penalize confident errors more heavily than expressions of uncertainty, so that a model is relatively rewarded when it conveys uncertainty appropriately. While this idea has already attracted some attention, the OpenAI team takes a more radical stance:

It is not enough to add a few new uncertainty-aware tests on the side. The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess. Fixing scoreboards can broaden adoption of hallucination-reduction techniques, both newly developed and those from prior research.

In fact, OpenAI researchers report results suggesting that their effort to reduce hallucinations in GPT-5-thinking-mini has been successful, lowering the error rate to 26% from 75% in o4-mini. However, as meshugaas noted on Hacker News, this also implies that "more than half of responses would end up as 'I don’t know.'". As they put it, "Nobody would use something that did that".

While OpenAI researchers say they are confident hallucination can be avoided, they acknowledge that there is no consensus as to what hallucinations exactly are, largely due to their multifaceted nature.

Their optimism is tempered by critics of LLM anthropomorphization. On Hacker News didibus underscores the marketing motivations behind labeling LLM errors as hallucinations and suggests that "if you stop anthropomorphizing them and go back to their actual nature as a predictive model, then it's not even a surprising outcome that predictions can turn out to be wrong".

On one end of the LLM hallucination debate is Rebecca Parsons, CTO of ThoughtWorks. Martin Fowler reports that she views LLM hallucinations not as bugs, but as a feature:

All an LLM does is produce hallucinations, it’s just that we find some of them useful.

As a final perspective in the debate on LLM hallucinations, Gary Marcus emphasizes that while LLMs mimic the structure of human language, they have no grasp of reality, and their superficial understanding of their own output makes it impossible for them to fact-check it.

About the Author

Rate this Article

Adoption
Style

BT