Concept

Hallucination

concept hallucination reliability multi-source

Hallucination

Hallucination is when a large language model generates confident, plausible-sounding content that is false or fabricated. It is a direct consequence of the training objective: the model learns to produce text that looks statistically like correct answers, not to verify facts before generating.


Why it happens

The model’s training set is full of questions answered confidently. The model learns the style of confident answers. When asked about something it does not know, it does not default to silence or uncertainty — it generates text that looks like the answer it expects to see, whether or not that answer exists.

Example: “Who is Orson Kovats?” — a name that does not refer to any real person. Older models (Falcon 7B) confabulate freely: “an American author… no, a TV character… no, a baseball player.” Modern models, after additional training, more often respond: “There’s no well-known public figure by that name.”


Two mitigations

1. Refusal training

Meta’s Llama 3 approach:

  1. Generate factual Q&A pairs from a known document.
  2. Interrogate the model with those questions many times.
  3. Compare its answers to the correct answers using an LLM judge.
  4. For questions the model consistently gets wrong, add a training example where the correct response is “I don’t know.”

This teaches the model to surface its internal uncertainty as an explicit refusal rather than a confident wrong answer. The uncertainty was always there — the neuron existed — but it was not connected to the output.

Knowledge in the parameters is vague probabilistic recall — something read months ago. Knowledge in the Context Window is working memory — something directly in front of the model right now. Giving the model access to web search lets it retrieve current, accurate information and place it in the context window before answering. This dramatically reduces hallucination on time-sensitive or obscure factual queries.


Structural forms of hallucination

  • Knowledge cutoff hallucination. The model makes up events it cannot know — e.g., Llama 3 (cutoff end-2023) asked about the 2024 US election generated parallel-universe outcomes with plausible-but-wrong candidate pairings.
  • Reversal curse. Knowledge appears directional: the model knows A → B but not B → A (Tom Cruise → mother; mother → Tom Cruise’s son fails).
  • Confabulated identity. Models without explicit identity training will hallucinate being ChatGPT by OpenAI, because that string pattern is overwhelmingly present in the training data.

Practical implications

  • Verify anything that matters, especially facts, citations, and numbers.
  • For time-sensitive queries, use a model with web search enabled.
  • Paste source documents into the context window rather than asking the model to recall from memory.
  • For generated code or analysis, read the code and check the results — the model is a capable but absent-minded analyst.

See also