The Ghost in the Machine: What AI Hallucinations Reveal About Intelligence

Photo: Unsplash

When AI Gets It Wrong

The Ghost in the Machine: What AI Hallucinations Reveal About Intelligence

AI hallucinations aren't bugs — they're a window into how these systems actually work, and what that means for intelligence itself
hallucinationartificial-intelligencelarge-language-modelscognitionreliability

In 2023, a New York attorney named Steven Schwartz filed a legal brief that cited six court cases as precedent. The cases were convincing: plausible names, plausible citations, plausible summaries of holdings that fit the argument he was making. They were also entirely fictional. ChatGPT had invented them, complete with realistic-sounding docket numbers and judicial opinions that had never existed. The judge was not amused. Schwartz was sanctioned and publicly humiliated.

The incident prompted predictable hand-wringing about AI reliability. But the interesting question isn’t whether AI systems make things up — they do, and everyone who works with them knows this. The interesting question is why they make things up in this specific way: confident, plausible, detail-rich fabrications that have the texture of truth without its substance. Understanding the “why” reveals something important not just about AI systems but about the nature of knowledge and language itself.

Large language models are, at their core, extraordinarily sophisticated text prediction engines. They are trained to predict the next token — roughly, the next fragment of text — given everything that preceded it, and to do this accurately across an enormous corpus of human-generated text. The training objective is not “tell the truth” or “accurately represent reality.” It is “generate text that is statistically consistent with the kind of text that humans produce.”

This sounds like a subtle distinction, but it is a profound one. Humans generate text in a wide variety of modes: factual reporting, creative fiction, speculation, persuasion, conversation, analysis. These modes are not always cleanly distinguishable from each other in the training data. A model trained to predict human text learns to produce text that sounds like each of these modes, in the contexts where each is appropriate. When you ask it a factual question, it generates text that sounds like a factual answer — because that is the pattern associated with factual questions in human text. The problem is that generating text that sounds like a correct answer is not the same as generating a correct answer.

The mechanism that produces hallucination is the same mechanism that makes language models fluent and useful. A model that generated text that sounded plausibly human would necessarily sometimes generate text that was plausibly wrong but stated with the confidence of plausible truth. You cannot separate the capability from the failure mode, because they are the same capability operating in different conditions.

Consider what happens when you ask a language model about a court case that doesn’t exist. The model has learned that questions about court cases are typically answered in a specific format: case name, year, jurisdiction, holding, brief description. It has learned that this format is associated with correct answers in its training data. When you ask about a fictional case, it generates text in this format — because the format is correct, and because in the vast majority of cases in its training data, questions about specific cases have answers. There is no mechanism to distinguish between “a case I have information about” and “a case that doesn’t exist but would fit this pattern.” The model generates what would be the correct pattern of text for an existent case, because that is what the statistical structure of the question calls for.

This is why hallucinations have the specific character they have: they are not random errors. They are plausible errors. They fit the pattern of correct answers. A language model hallucinating a court case doesn’t invent something that sounds like science fiction; it invents something that sounds exactly like a real court case, because it has learned what real court cases sound like. A model hallucinating a scientific paper doesn’t invent nonsense; it invents something that sounds like a real scientific paper, with plausible-sounding authors, a plausible-sounding journal, and findings that are consistent with the general state of knowledge in the field.

This plausibility is what makes hallucinations dangerous and what makes them structurally different from random errors. A random error is easy to catch because it doesn’t fit the pattern. A plausible error is hard to catch precisely because it does fit the pattern — and human cognition, tuned by evolution to trust pattern-consistent information, tends to accept it.

The philosophical dimension of hallucination goes deeper than the engineering. What hallucination reveals is the difference between knowledge and pattern completion. Human knowledge — real knowledge, as opposed to the ability to recall a fact — involves something like an internal model of the world that generates predictions, that can be updated by experience, and that supports counterfactual reasoning. If I know that Paris is the capital of France, this knowledge is connected to other things I know: it allows me to infer that French government agencies are located in Paris, that a flight to Paris is a flight to France, that a French president governs from Paris. The knowledge is not an isolated datum; it is a node in a web of inferential relationships.

Language models do something that superficially resembles this but is structurally different. They have learned statistical associations between tokens, which allows them to retrieve, recombine, and generate information in ways that often produce correct answers to factual questions. But the associations are not grounded in the same way that knowledge is grounded. There is no persistent internal model of the world that gets updated. There is no mechanism for the model to notice that a generated claim is inconsistent with other things it “knows” — or, more precisely, the consistency checking is probabilistic rather than logical, and it operates at the level of surface linguistic coherence rather than deep semantic truth.

This is what the phrase “stochastic parrot” — coined by linguist Emily Bender and colleagues — was trying to capture, though the phrase has been used polemically in ways that obscure more than they illuminate. The claim is not that language models produce random noise; they clearly don’t. The claim is that the mechanism of production is statistical pattern completion over text, and that this mechanism is importantly different from understanding in ways that have practical consequences — hallucination being the most salient.

Why might hallucination rates never reach zero? The answer has to do with the inescapable tension between fluency and factuality in probabilistic text generation. A model that refused to generate text when uncertain would be frustratingly unhelpful — it would be uncertain about many things, and the cases where it is uncertain but correct are hard to distinguish from cases where it is uncertain and wrong. A model that generated text freely, as if it knew things it doesn’t, will sometimes confidently generate wrong things. Tuning this trade-off changes the rate of hallucination and the usefulness of the model, but it cannot eliminate the underlying tension.

Retrieval-augmented generation — systems that retrieve relevant documents before generating answers — can substantially reduce factual hallucination by grounding generation in retrieved text. But this approach doesn’t eliminate hallucination; it shifts it. A model can still misinterpret retrieved documents, make incorrect inferences from correct facts, or hallucinate in domains where retrieval finds nothing relevant. It reduces the problem dramatically for certain factual question-answering tasks, while leaving other modes of error largely unchanged.

What hallucination ultimately reveals is that we have built systems that are extraordinarily good at something that is not the same as knowing things. This is not a condemnation of those systems — they are extraordinarily useful, and the thing they are good at overlaps substantially with what we care about in practice. But it is a reason to be precise about what we are deploying and what we are depending on it for.

The practical implication for how we design AI-assisted workflows is more nuanced than “don’t trust AI.” It is: understand which kinds of tasks are susceptible to which kinds of errors, and build human review into the points where those errors matter. Hallucination is particularly dangerous in domains with high factual precision requirements and where errors are hard to catch because they look like correct answers: legal citations, scientific references, medical information, financial data. It is less dangerous in domains where outputs are inherently verifiable (code can be run and tested), where the cost of an error is low, or where human judgment naturally applies a correction.

The attorney who submitted AI-generated case citations failed not because he used AI but because he didn’t understand what kind of tool he was using and therefore didn’t build in the verification step that the tool’s error mode required. Understanding that language models are pattern completers, not knowers, would have led him to verify the cases before submitting. The tool’s behavior was exactly what it should have been given what it is. The failure was a mismatch between the tool’s actual properties and the user’s model of those properties.

That gap — between what AI systems actually are and what people imagine them to be — is where most AI deployment problems live. Hallucination is not a bug that will eventually be patched. It is a window into the nature of the system, and closing that window by pretending it isn’t there is not a safety strategy. It is the opposite.

The deeper implication of hallucination as a structural property is what it tells us about the relationship between language and truth. Language models learn from human text, and human text does not bear a simple relationship to truth. Humans write fiction, speculation, persuasion, and propaganda as well as factual reporting. They make errors, hold false beliefs, and deliberately mislead. A model trained to predict human text learns to produce text that is statistically consistent with all of these modes. When we ask it factual questions, we are asking it to be one kind of language user — the accurate reporter — in a context where it has learned to be many kinds simultaneously.

This is not merely a technical observation. It is a comment on what language is and what it does. Philosophers of language have long debated whether language primarily represents the world or primarily performs social functions — coordinating action, expressing relationships, managing impressions. Language models are trained overwhelmingly on language as social performance, because that is what the internet is largely made of. The expectation that they will reliably perform as accurate reporters of external facts is asking them to do something that their training did not specifically optimize for. The remarkable thing is not that they hallucinate but that they are as accurate as they are — a function of the fact that a great deal of human text does, in fact, try to accurately represent the world, and that accuracy is statistically correlated with the patterns they learn. The accuracy is real. So is its incompleteness. Both deserve to be understood clearly.