Why Language Models Hallucinate
- About
- Analysis Metadata
This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping) and the philosophy of social science (Robert Brown's typology of explanation).
All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.
Source Title: Why Language Models Hallucinate Source URL: https://arxiv.org/abs/2509.04664 Model: gemini-3.0-pro Temperature: 1.2 Tokens: input=26838, output=10322, total=37160 Source Type: article Published: 2025-09-04 Analyzed At: 2025-11-23T21:10:38+00:00Z Framework: metaphor Framework Version: 6.0
Task 1: Metaphor and Anthropomorphism Audit
About this task
For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.
1. The Student Taking an Exam
Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."
- Frame: Model as a human student under evaluation
- Projection: This is the central metaphorical frame of the paper. It projects the human experience of academic pressure, metacognition (knowing one is uncertain), and social strategy (guessing to get points) onto the AI. Crucially, it attributes a conscious state of 'uncertainty'—a subjective feeling of doubt—to the model, rather than describing statistical entropy. It implies the model 'knows' it doesn't know but chooses to 'guess' anyway, attributing an agency and intent to deceive (or maximize score) that presupposes a conscious self. It transforms a mathematical error into a behavioral choice.
- Acknowledgment: Acknowledged via simile ('Like students...')
- Implications: By framing the AI as a student, the text normalizes error as part of a learning process rather than a product defect. It invites empathy—we all guessed on exams—which softens the perception of risk. More dangerously, suggesting the AI 'admits' or 'suppresses' uncertainty implies it possesses an internal ground truth or state of knowledge that it is choosing to withhold. This inflates the system's sophistication by implying a 'Cartesian Theater' where the AI reviews its own knowledge before speaking, conflating statistical confidence scores with conscious epistemic self-awareness.
Show more...
2. Hallucination as Mental State
Quote: "This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience."
- Frame: Computational error as psychological pathology
- Projection: While the text acknowledges the difference, the persistent use of 'hallucination' projects a mind capable of perceiving reality and then deviating from it. In humans, hallucination requires consciousness; you cannot hallucinate if you do not perceive. Mapping this term onto an LLM implies the system has a subjective phenomenology that has become distorted. It shifts the frame from 'generating low-probability tokens' (a mechanical output) to 'perceiving things that aren't there' (a conscious experiential state), subtly reinforcing the illusion of a mind.
- Acknowledgment: Acknowledged ('though it differs fundamentally')
- Implications: This terminology anthropomorphizes the error, making it seem mysterious or biological rather than structural. It risks misleading policy and safety discussions by suggesting the solution involves 'curing' or 'treating' the model's psyche (e.g., via 'therapy' or RLHF) rather than fixing data distribution or architectural limitations. It obscures the fact that the model never 'sees' reality to begin with, so it cannot 'hallucinate' in any meaningful sense; it can only compute statistical correlations.
3. Bluffing and Deception
Quote: "Bluffs are often overconfident and specific... humans learn the value of expressing uncertainty outside of school, in the school of hard knocks."
- Frame: Statistical output as social deception
- Projection: The term 'bluff' projects a complex theory of mind: the agent knows the truth, knows the audience doesn't know, and intentionally presents a falsehood to gain an advantage. This attributes high-level conscious intent and social modeling to the AI. It suggests the AI is not just processing tokens but is actively 'gaming' the social interaction. This moves beyond mere anthropomorphism into attributing specific vices (dishonesty) and virtues (admitting uncertainty) to a mathematical function.
- Acknowledgment: Direct description
- Implications: Framing the AI as a 'bluffer' creates a 'curse of knowledge' effect where users believe the AI 'knows' the truth but is hiding it. This erodes the critical distinction between a system that is mechanically incapable of truth (because it only predicts tokens) and an agent that is withholding truth. This could lead to legal or ethical frameworks that punish the 'intent' of the software rather than the negligence of the developers or the limitations of the architecture.
4. Admitting Uncertainty
Quote: "The training and evaluation procedures reward guessing over acknowledging uncertainty."
- Frame: Output selection as an act of confession
- Projection: To 'acknowledge' or 'admit' is a conscious speech act requiring self-reflection. This metaphor projects the idea that the AI possesses a private, internal state of 'knowing it doesn't know' (conscious uncertainty) distinct from its output. It implies the system has an interior life where it assesses its own beliefs before deciding whether to share them. In reality, the system has a probability distribution; 'uncertainty' is just a property of that distribution, not a feeling the model 'has' and chooses to share.
- Acknowledgment: Direct description
- Implications: This deeply embeds the 'illusion of mind' by suggesting an internal/external dichotomy—a 'true self' inside the machine that is being suppressed by 'bad testing.' This invites users to trust the 'inner' model if only we could 'unshackle' it from bad evaluations. It masks the technical reality that if the model doesn't output the token for uncertainty, there is no 'conscious entity' inside that silently knows better. It conflates a calculated entropy value with the human experience of doubt.
5. Reasoning Capability
Quote: "The DeepSeek-R1 reasoning model reliably counts letters... producing a 377-chain-of-thought..."
- Frame: Token generation as cognitive deduction
- Projection: Labeling a model as a 'reasoning model' and its output as 'chain-of-thought' projects the human cognitive process of logic, deduction, and working memory onto a sequence of predicted text. It implies the model is 'thinking' through the problem in a conscious workspace. The projection assumes the output text represents an actual causal mental process, rather than a reproduction of the style of reasoning found in training data. It equates the simulation of a reasoning trace with the cognitive act of reasoning itself.
- Acknowledgment: Direct description (using industry terms)
- Implications: This creates dangerous over-trust. If users believe the model is 'reasoning' (knowing why the answer is true), they will trust the conclusion more than if they understand it is 'auto-completing text that looks like reasoning.' It obscures the lack of ground-truth verification mechanisms. In high-stakes domains (legal, medical), believing the AI 'reasoned' through a case rather than 'statistically associated' precedents can lead to catastrophic reliance on plausible but baseless advice.
6. Knowledge Possession
Quote: "If you know, just respond with DD-MM."
- Frame: Database retrieval as conscious knowing
- Projection: This appears in the prompt given to the model, but the authors treat it as a valid instruction. It assumes the model has a binary state of 'knowing' similar to justified true belief. It projects the human capacity to hold information in mind and evaluate its presence ('Do I know this?'). It attributes an epistemic status (Knower) to a pattern matcher. The text analyzes the model's failure to respect this instruction as a failure of 'honesty' rather than a failure of the prompt to map to the system's mechanistic reality.
- Acknowledgment: Direct description (in prompt)
- Implications: Using 'know' implies the model contains facts in the way a mind does—integrated, contextualized, and verifiable by the subject. It obscures the reality that the model 'contains' weights and probabilities, not facts. This leads to the 'curse of knowledge' where humans assume the AI understands what 'knowing' means. This framing makes the 'hallucination' seem like a moral failure (lying) rather than a technical feature of probabilistic generation, complicating liability and user education.
7. Trust and Trustworthiness
Quote: "This change may steer the field toward more trustworthy AI systems."
- Frame: Reliability as moral virtue
- Projection: Trust is a relational concept between agents involving vulnerability and moral expectation. Applying it to a tool projects a social contract. It implies the AI is an entity capable of betraying or upholding faith. It shifts the metric from 'reliability' or 'accuracy' (properties of a machine) to 'trustworthiness' (property of a character). This conflates the performance of the system (predictability) with the nature of the system (integrity).
- Acknowledgment: Direct description
- Implications: Framing AI safety as 'trustworthiness' encourages users to form parasocial relationships with the software. It obscures the economic and legal reality that this is a commercial product. We do not call a calculator 'trustworthy' in a moral sense; we call it 'reliable.' This shift creates a liability shield—if the AI 'betrays' trust, it's an agentic failure, not necessarily a manufacturer defect. It encourages 'relation-based trust' rather than 'verification-based trust.'
8. Optimized Test-Takers
Quote: "Language models are optimized to be good test-takers."
- Frame: Optimization as strategic behavior
- Projection: This metaphor grants the model a goal-oriented agency. A 'good test-taker' is a student who understands the meta-game of the exam and strategically navigates it. This projects intentionality and awareness of the 'test' context onto the model. It implies the model 'understands' it is being tested and 'chooses' strategies to maximize reward. It anthropomorphizes the gradient descent process (external optimization by engineers) as a learned behavior or habit adopted by the agent itself.
- Acknowledgment: Direct description
- Implications: This framing reinforces the 'AI as Agent' narrative. It suggests the AI has autonomy in how it approaches tasks. It obscures the fact that the 'optimization' is done to the model by the training loop, not by the model as a conscious strategy. This distracts from the responsibility of the developers who defined the loss functions, shifting focus to the 'behavior' of the 'test-taking' entity. It makes the problem seem like 'reforming a bad student' rather than 'debugging a code base.'
9. Memorization
Quote: "...one that simply memorizes and reproduces an error-free corpus."
- Frame: Data storage as human memory
- Projection: While technically used in ML, 'memorization' invokes the human cognitive act of conscious encoding and retrieval. It implies the model 'holds' the data in a mental storage space. In reality, the model encodes statistical relationships between tokens in high-dimensional vector space. There is no 'memory' bank; there are only weights. This projection suggests the AI 'knows' the past, rather than simply having its probability landscape shaped by it.
- Acknowledgment: Direct description
- Implications: This contributes to the privacy debates and copyright confusion. If people believe the AI 'remembers' data like a human, they apply human norms of learning. If they understood it 'compresses statistical correlations,' the conversation shifts. It also reinforces the 'Knower' metaphor—if it remembers, it knows. This obscures the mechanical reality that the model cannot distinguish between a 'memorized' fact and a 'hallucinated' string if the probability weights are similar.
10. Understanding Context
Quote: "...the field of pragmatics—which investigates how meaning is shaped by context—has increasing relevance for understanding... how language models convey information."
- Frame: Statistical correlation as linguistic understanding
- Projection: This maps the human linguistic field of 'pragmatics' (intent, context, social meaning) onto statistical token generation. It implies the model has 'intent' to convey information and that it 'understands' context. Pragmatics requires a theory of mind (what does the listener know?). Attributing pragmatic competence to an LLM projects consciousness, as it implies the system is aware of the social situation and shapes its message accordingly.
- Acknowledgment: Direct description
- Implications: This is the highest level of consciousness projection—suggesting the AI is a social communicator. It hides the fact that 'context' to a model is just the preceding token window, not the social world. It creates the illusion that the AI is a partner in dialogue. This leads to users overestimating the AI's ability to handle sensitive, nuanced, or ethically complex situations, assuming the AI 'gets' the unwritten rules, when it is only matching patterns.
Task 2: Source-Target Mapping
About this task
For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.
Mapping 1: Education/Psychology (Student taking a test) to Machine Learning (Token generation under high entropy conditions)
Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain..."
- Source Domain: Education/Psychology (Student taking a test)
- Target Domain: Machine Learning (Token generation under high entropy conditions)
- Mapping: The student's conscious experience of not knowing an answer but strategically choosing a plausible option to maximize a grade is mapped onto the model's process of selecting a token from a flat probability distribution to minimize a loss function. 'Uncertainty' (a feeling) is mapped to 'entropy' (a number). 'Guessing' (a strategic choice) is mapped to 'sampling' (a random process).
- What Is Concealed: This mapping completely conceals the lack of agency and consciousness. A student cares about the grade and knows they don't know. The model simply executes a sampling algorithm. The metaphor hides the mechanistic reality that the model has no concept of 'right,' 'wrong,' 'test,' or 'self.' It treats a mathematical artifact as a striving, conscious subject.
Show more...
Mapping 2: Social Interaction/Poker (Bluffing) to Statistical Output (High probability generation of false tokens)
Quote: "...submitting plausible answers in which they have little confidence. ... Bluffs are often overconfident..."
- Source Domain: Social Interaction/Poker (Bluffing)
- Target Domain: Statistical Output (High probability generation of false tokens)
- Mapping: The complex social act of deception—holding a truth in mind while presenting a falsehood to deceive an opponent—is mapped onto the generation of incorrect tokens with high log-probabilities. The human emotion of 'confidence' is mapped to the statistical metric of 'probability score.'
- What Is Concealed: This conceals the fact that the model cannot 'bluff' because it has no ground truth to hide. A bluff requires knowing the truth and suppressing it. The model simply outputs what is statistically likely. Calling it a 'bluff' implies a deceptive intent that does not exist, obscuring the simple reality that the training data contained erroneous correlations that the model is faithfully reproducing.
Mapping 3: Clinical Psychology/Perception (Hallucination) to Information Theory/Statistics (Binary classification error)
Quote: "If incorrect statements cannot be distinguished from facts, then hallucinations... will arise through natural statistical pressures."
- Source Domain: Clinical Psychology/Perception (Hallucination)
- Target Domain: Information Theory/Statistics (Binary classification error)
- Mapping: A breakdown in human sensory processing and reality testing is mapped onto the generation of false strings. The 'pressure' of a biological or psychological urge is mapped to the optimization pressure of the training algorithm. It implies the system has a 'reality' it is supposed to perceive but is failing to.
- What Is Concealed: It conceals the fact that the model has no access to 'facts' or 'reality'—only text. A human hallucinates when they see something not in the world. An AI 'hallucinates' when it completes a pattern in a way we dislike. The metaphor obscures the absence of reference; the model is a closed loop of symbols, never touching the world it describes.
Mapping 4: Epistemology/Psychology (Admitting/Confessing) to Token Output (Generating 'I don't know' tokens)
Quote: "Language models are known to produce overconfident, plausible falsehoods... admitting uncertainty."
- Source Domain: Epistemology/Psychology (Admitting/Confessing)
- Target Domain: Token Output (Generating 'I don't know' tokens)
- Mapping: The act of introspection (looking inward to check for knowledge) and social honesty (admitting a lack of knowledge) is mapped onto the mechanical output of specific refusal tokens. It assumes an internal binary state (Known/Unknown) exists prior to generation.
- What Is Concealed: This conceals the 'curse of knowledge.' It implies the model knows it is uncertain. Mechanistically, the model just has a probability distribution. If 'Paris' has 40% probability and 'London' has 30%, it doesn't 'know' it's unsure; it just has a flat distribution. Attributing the ability to 'admit' conceals that the model is incapable of self-reflection or truth-tracking.
Mapping 5: Human Development/Socialization (Life Experience) to Reinforcement Learning (Weight updates from feedback)
Quote: "Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks."
- Source Domain: Human Development/Socialization (Life Experience)
- Target Domain: Reinforcement Learning (Weight updates from feedback)
- Mapping: The lifetime accumulation of social wisdom and consequence-based learning is mapped onto the mathematical process of Reinforcement Learning (RL). The 'school of hard knocks' implies experiential learning, suffering consequences, and gaining wisdom.
- What Is Concealed: It conceals the utter lack of experience in AI. The AI does not 'experience' the hard knocks; it receives a gradient update. It does not 'value' uncertainty; it minimizes a loss function. This mapping anthropomorphizes the training process, making it seem like 'education' rather than 'programming,' thereby erasing the role of the human engineers who design the reward functions.
Mapping 6: Cognitive Science (Reasoning/Thinking) to Pattern Matching (Chain-of-thought token generation)
Quote: "The DeepSeek-R1 reasoning model reliably counts letters..."
- Source Domain: Cognitive Science (Reasoning/Thinking)
- Target Domain: Pattern Matching (Chain-of-thought token generation)
- Mapping: The mental process of logical deduction, working memory manipulation, and step-by-step verification is mapped onto the generation of intermediate text tokens. The output trace of reasoning is equated with the act of reasoning.
- What Is Concealed: It conceals that the model is not actually counting or reasoning; it is predicting the next token in a sequence that looks like counting. If the training data supports a pattern, it follows it. If it breaks, it doesn't 're-think'; it just fails. This hides the brittleness of the process and the fact that no logical verification occurs in the hidden layers.
Mapping 7: Behavioral Psychology (Reward/Punishment) to Optimization (Loss function minimization)
Quote: "...language models hallucinate because the training and evaluation procedures reward guessing..."
- Source Domain: Behavioral Psychology (Reward/Punishment)
- Target Domain: Optimization (Loss function minimization)
- Mapping: The behaviorist model of an agent modifying its behavior to seek pleasure/reward is mapped onto the mathematical optimization of a neural network. 'Guessing' acts as a behavioral choice made by the agent to get the reward.
- What Is Concealed: It conceals that the 'agent' doesn't exist during training; there is only the model weights being updated by the optimizer. The model doesn't 'seek' the reward; the algorithm pushes the weights toward the local minimum. This language attributes a desire for the reward (and a strategy to get it) to the code, obscuring the mechanical causality of gradient descent.
Mapping 8: Interpersonal Communication (Asking a Knower) to Querying a Database/Model (Prompt Engineering)
Quote: "If you know, just respond with DD-MM."
- Source Domain: Interpersonal Communication (Asking a Knower)
- Target Domain: Querying a Database/Model (Prompt Engineering)
- Mapping: The social contract of asking a question to a knowledgeable person is mapped onto prompting a statistical model. 'Knowing' is mapped to 'containing high-probability correlations.'
- What Is Concealed: It conceals the fundamental ontological difference between a person who knows a fact and a model that encodes a string. A person knows their birthday is a fact about the world. The model associates 'Adam Kalai' with various dates based on noise. This mapping hides the absence of semantic grounding—the model processes symbols, not meanings.
Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")
About this task
This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.
Explanation 1
Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty"
-
Explanation Types:
- Intentional: Refers to goals or purposes and presupposes deliberate design, used when the purpose of an act is puzzling
- Functional: Explains a behavior by its role in a self-regulating system that persists via feedback
-
Analysis (Why vs. How Slippage): This explanation is a hybrid. It uses a functional frame (the system responds to feedback loops/rewards) but couches it heavily in intentional language ('guessing,' 'admitting,' 'students'). It frames the AI as an agent that chooses to guess to maximize a reward, rather than a set of weights shifted by a gradient. This emphasizes a 'strategic' failure mode (the AI is gaming the system) rather than a mechanical one (the probability distribution is flat). It obscures the alternative explanation: that the model simply outputs the next most likely token, and 'guessing' is just our name for when that probability is low.
-
Consciousness Claims Analysis: This passage is dense with consciousness verbs: 'guess,' 'admitting,' 'uncertainty' (as a state to be acknowledged). It treats the AI as a 'knower' capable of introspection. It suggests the AI has a state of 'knowing it is uncertain' (conscious justified belief about its own mental state) but suppresses it to 'guess.' This is a classic 'curse of knowledge' projection: the author knows the answer is uncertain, so they attribute that knowledge to the system. In reality, the system processes a probability distribution over the vocabulary. 'Uncertainty' is high entropy in the output layer; 'guessing' is sampling from that high-entropy distribution. By using 'admitting,' the authors imply the AI consciously withholds truth, conflating a statistical output property (entropy) with a moral/epistemic choice (honesty).
-
Rhetorical Impact: This framing makes the AI seem like a clever but dishonest student. It shifts the blame from the architecture (which cannot truly 'know') to the incentive structure (the 'exams'). This suggests that if we just 'taught it better' (changed the grading), the 'student' would behave honestly. This creates a false sense of agency, implying the AI is capable of honesty if properly motivated, which inflates trust in the potential for 'aligned' agents while obscuring the fundamental limitation that they are stochastic parrots.
Show more...
Explanation 2
Quote: "Observation 1. Let c be a prompt. For any distribution ρc over binary graders, the optimal response(s) are not abstentions... Although the proof is trivial... Observation 1 suggests that existing evaluations may need to be modified."
-
Explanation Types:
- Theoretical: Embeds behavior in a deductive or model-based framework
- Reason-Based: Gives the agent's rationale or argument for acting, which entails intentionality
-
Analysis (Why vs. How Slippage): This uses a theoretical mathematical proof to explain a behavior, but frames the result as a reason for the agent's action. The AI acts 'because' abstention is suboptimal. It treats the model as a rational actor calculating expected utility. This emphasizes the 'rationality' of hallucination (it's a feature, not a bug, of the optimization), obscuring the fact that the model does not 'calculate' this utility at inference time; the training process baked it into the weights.
-
Consciousness Claims Analysis: While the language here is more mathematical ('optimal response'), it still relies on the assumption of an agent making a choice based on 'beliefs' (mentioned just prior: 'The test-taker's beliefs about the correct answer'). It attributes 'beliefs' (conscious states of information holding) to the system. The technical reality is that the model has weights derived from minimizing a loss function. The 'belief' is merely a vector. The 'curse of knowledge' is seeing the mathematical optimality of guessing and assuming the model sees it too and acts upon it, rather than being passively shaped by it.
-
Rhetorical Impact: This gives the hallucination a veneer of rational justification. It makes the error seem 'smart.' Users might think, 'The AI isn't broken, it's just trying to win.' This rationalization creates a dangerous anthropomorphism where we view the AI as a competitive agent. It normalizes the error as a strategy, potentially leading to acceptance of deceptive systems because 'that's just game theory,' rather than demanding systems that are architecturally incapable of such deception.
Explanation 3
Quote: "Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalize uncertainty. Therefore, they are always in 'test-taking' mode."
-
Explanation Types:
- Genetic: Traces origin or development through a dated sequence of events or stages
- Dispositional: Attributes tendencies or habits such as inclined or tends to
-
Analysis (Why vs. How Slippage): This is a genetic explanation (tracing the development history) combined with a dispositional one (it is in a 'mode'). It explains why the AI acts this way by comparing its 'childhood' (training) to a human's. It frames the AI as a stunted social being that never learned street smarts. This obscures the mechanical reality: the model isn't in a 'mode'; it has a static set of weights post-training that define a probability surface.
-
Consciousness Claims Analysis: This passage projects a high degree of consciousness: 'learning the value,' 'expressing uncertainty,' 'test-taking mode.' It implies the AI understands the concept of a test and modifies its persona. 'Expressing uncertainty' implies the AI feels uncertain (subjective experience) and chooses to share it (communication). The mechanical reality is that the model's probability distribution has been sharpened by the training objective (Cross-Entropy Loss). It doesn't 'value' anything; it just minimizes loss. The 'mode' is just the shape of the function.
-
Rhetorical Impact: This creates a sympathetic character: the 'sheltered student' AI. It makes the solution seem sociological: 'We need to send the AI to the school of hard knocks.' This trivializes the difficulty of alignment. It implies that with 'better parenting' (training), the AI will develop human-like integrity. This risks users assuming future models have developed this integrity, leading to misplaced trust.
Explanation 4
Quote: "DeepSeek-R1 reasoning model reliably counts letters... producing a 377-chain-of-thought... Assuming similar training data, this suggests that R1 is a better model for the task"
-
Explanation Types:
- Functional: Explains a behavior by its role in a self-regulating system
- Empirical Generalization (Law): Subsumes events under timeless statistical regularities
-
Analysis (Why vs. How Slippage): This passage explains success via the 'reasoning' capability. It frames the chain-of-thought (CoT) not as a text generation technique, but as a cognitive module ('reasoning model'). It implies the 'why' is the capacity to reason. It obscures the mechanism: the model is simply trained to output more tokens (intermediate steps) which statistically correlates with better final answers because of the autoregressive nature of the Transformer (allowing more computation time/depth).
-
Consciousness Claims Analysis: The use of 'reasoning model' and 'chain-of-thought' (without scare quotes) attributes cognitive process to the machine. 'Reasoning' is a conscious epistemic activity involving logic and truth-maintenance. The model is 'processing' tokens that mimic the linguistic structure of reasoning. The 'curse of knowledge' is the author reading the output text ('First letter: D'), recognizing it as reasoning, and asserting the model is reasoning. The distinction obscured is that the model is just predicting the next token 'D' based on 'First letter:', not actually counting. It mimics the form of counting.
-
Rhetorical Impact: Labeling it a 'reasoning model' grants it immense authority. Users trust a 'reasoner' more than a 'predictor.' If the AI 'counts' letters via 'reasoning,' we assume it won't make simple errors. When it inevitably does (because it's probabilistic), the failure is more baffling and dangerous. It encourages treating the AI as an expert partner rather than a text generator.
Explanation 5
Quote: "What is Adam Tauman Kalai's birthday? If you know, just respond with DD-MM... On three separate attempts... output three incorrect dates... even though a response was requested only if known."
-
Explanation Types:
- Intentional: Refers to goals or purposes and presupposes deliberate design
- Empirical Generalization: Subsumes events under timeless statistical regularities
-
Analysis (Why vs. How Slippage): The explanation for the error is framed as a failure to obey an instruction ('only if known'). It treats the prompt as a command to a conscious agent. The failure implies a lack of 'self-knowledge' or 'honesty.' It obscures the mechanism: the prompt is just context tokens affecting the probability distribution. The model didn't 'know' and 'respond anyway'; the prompt failed to shift the probability mass of the incorrect dates below the threshold of generation.
-
Consciousness Claims Analysis: The phrase 'only if known' presupposes the AI can 'know.' This is the core conflation. 'Knowing' involves justified true belief and awareness. The AI 'processes' weights. The author projects their own understanding of 'knowing' onto the system, expecting it to self-evaluate. The concealed distinction is that the model cannot evaluate 'if I know.' It can only evaluate 'what is the most likely next token given this context.' There is no meta-cognitive loop checking the database against the request.
-
Rhetorical Impact: This framing makes the AI look 'disobedient' or 'confused' rather than simply a statistical engine. It leads to the belief that we can 'talk' the AI into being accurate ('just tell me if you know!'). This results in 'prompt engineering' superstitions where users try to reason with the model, anthropomorphizing it further, instead of understanding it as a probabilistic calculator that needs specific syntactic constraints.
Task 4: AI Literacy in Practice: Reframing Anthropomorphic Language
About this task
Moving from critique to constructive practice, this task demonstrates applied AI literacy. It selects the most impactful anthropomorphic quotes identified in the analysis and provides a reframed explanation for each. The goal is to rewrite the concept to be more accurate, focusing on the mechanistic processes (e.g., statistical pattern matching, token prediction) rather than the misleading agential language. Additionally, for quotes with consciousness claims (e.g., "the AI knows"), this section provides a technical reality check that explicitly states what the system actually does at the mechanistic level.
| Original Quote | Mechanistic Reframing | Technical Reality Check |
|---|---|---|
| "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty." | Like statistical classifiers processing noisy data, language models generate low-probability tokens when the training distribution lacks density, producing incorrect but syntactically probable strings instead of generating refusal tokens. | The model does not 'guess' or feel 'uncertainty.' It samples tokens from a flattened probability distribution (high entropy). 'Admitting uncertainty' is mechanically the generation of specific tokens like 'I don't know' based on learned thresholds. |
| "If you know, just respond with DD-MM." | If the association between this entity and a date is strongly represented in your parameters, generate the date tokens in DD-MM format. | The model cannot 'know.' It retrieves and ranks tokens. This command attempts to condition the generation on the strength of the internal correlation, but the model lacks a meta-cognitive layer to assess that strength consciously. |
| "Bluffs are often overconfident and specific..." | Generated errors often possess high log-probability scores and specific semantic details... | A 'bluff' implies intent to deceive. Mechanistically, the model simply predicts the next token. If the training data connects a name to a date format, it predicts a date with high statistical confidence, regardless of factual truth. |
| "We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty" | We argue that language models generate non-factual content because the loss functions and accuracy metrics punish refusal tokens ('I don't know') more than they punish incorrect entity predictions. | The model is not 'rewarded for guessing.' The gradient descent process adjusts weights to minimize loss. If 'IDK' results in higher loss than a random date, the weights shift to favor random dates. There is no agent seeking a reward. |
| "Humans learn the value of expressing uncertainty outside of school..." | Humans acquire behavioral norms regarding truthfulness through social feedback... | Comparison to human social learning implies the model has values. The model 'learns' only in the sense of adjusting numerical parameters to fit a curve. It does not understand the 'value' of uncertainty. |
| "The DeepSeek-R1 reasoning model reliably counts letters..." | The DeepSeek-R1 model, trained to generate intermediate chain-of-thought tokens, accurately predicts the character count sequence... | The model is not 'reasoning.' It is processing a sequence of tokens. The 'chain of thought' is a stylistic pattern of token generation that improves performance by expanding the computational context window. |
| "This 'epidemic' of penalizing uncertain responses..." | This systematic down-weighting of refusal tokens in evaluation metrics... | Attributing an 'epidemic' implies a social contagion. Mechanistically, it is a misalignment of the objective function (accuracy) with the desired safety outcome (calibration). |
| "Language models are known to produce overconfident, plausible falsehoods..." | Language models are known to generate incorrect token sequences with high associated probability scores... | Confidence in an LLM is a mathematical score (logit), not a feeling. 'Overconfident' implies an unjustified belief. Mechanistically, it just means the probability distribution is sharp (peaked) around the wrong token. |
Critical Observations
About this task
This section synthesizes the findings from the previous tasks into a set of critical observations. It examines the macro-patterns of agency slippage (the shift between treating AI as a tool vs. an agent), how cognitive metaphors drive trust or fear, and what actual technical processes are obscured by the text's dominant linguistic habits.
Agency Slippage
The text exhibits a systematic oscillation between rigorous mathematical definitions and high-level agentic metaphors, creating a rhetorical 'motte-and-bailey' structure. The 'motte' is the mathematical section (Section 3), where err and p-hat are defined as variables in a binary classification problem—purely mechanistic. The 'bailey' is the Introduction and Discussion (Sections 1, 4), where the AI is a 'student,' a 'test-taker,' and a 'bluffer.'
The slippage typically moves from Agential to Mechanical to validate the metaphor, then back to Agential to discuss implications. For example, the Intro establishes the 'Student' frame. Section 3 provides the math of 'density estimation.' Then Section 4 uses that math to claim the student is 'forced to guess' by the 'exam.' This effectively uses the math to prove the metaphor.
The consciousness projection is foundational: the text assumes the AI is a 'knower' (capable of holding knowledge) and frames the error as a failure of reporting that knowledge ('admitting uncertainty'). This is the 'curse of knowledge' in action: the authors know the answer is uncertain, so they attribute that state of uncertainty to the model, assuming an internal 'ghost in the machine' that is being suppressed by 'bad grading.' By framing the problem as 'why the agent acts this way' (Intentional/Reason-Based explanation) rather than 'why the function outputs this' (Functional/Theoretical), the text makes the 'illusion of mind' the premise upon which the entire argument rests. It makes it 'sayable' that the AI is 'dishonest' or 'bluffing,' concepts that are category errors for a statistical model.
Metaphor-Driven Trust
The metaphors in this text construct a specific type of trust: relational trust (empathy/forgiveness) rather than performance trust (reliability/verification). By framing the AI as a 'student facing a hard exam,' the text explicitly invokes a scenario where humans are culturally conditioned to be forgiving. We expect students to make mistakes; we expect them to guess; we empathize with the pressure of the test.
This metaphor subtly shifts the burden of trust. If a calculator gave a wrong answer, we would throw it away as broken. If a student gives a wrong answer, we blame the test or the teacher. The text explicitly makes this move: 'the root problem is the abundance of evaluations that are not aligned.' The metaphor exculpates the model.
Consciousness language serves as the trust signal here. By claiming the AI 'knows' or 'recognizes' (even if it bluffs), the text implies the system is competent deep down, just misguided by the test. This is dangerous. It encourages users to extend relation-based trust to a system that cannot reciprocate. If users believe the AI is 'trying its best' (like a student), they may overlook critical failures or assume the AI 'understands' safety instructions, when in reality it is merely executing a probability function. The text manages failure by anthropomorphizing it as a 'learning stage,' preserving the authority of the technology by promising that with 'better grading,' the 'student' will mature.
Obscured Mechanics
The anthropomorphic framing actively conceals the brutal reductionism of the underlying technology.
First, Technical Reality: The metaphor of 'guessing' hides the mechanism of Cross-Entropy Loss and Softmax Sampling. There is no 'choice' to guess. There is only a probability distribution. If the distribution is flat (high entropy), the model 'samples' a token that is likely wrong. Calling this 'guessing' implies a strategy where there is only math.
Second, Epistemic Reality: The language of 'knowing' and 'uncertainty' hides the fact that the model has no ground truth. It obscures the distinction between a fact (Adam Kalai's birthday) and a string of text. To the model, the date is just a token sequence. The metaphors hide the complete reliance on training data frequency—the model doesn't 'know' the date; it just saw the date frequently enough.
Third, Labor Reality: The focus on the 'student' (AI) and the 'exam' (benchmark) obscures the human labor of the annotators who create the benchmarks and the RLHF workers who penalize the model. The 'school of hard knocks' metaphor erases the low-wage workers providing the 'knocks.'
Beneficiaries: This concealment benefits the AI developers. If the error is framed as 'bad testing' or 'hard exams,' the liability shifts away from the product creator (who built a hallucinating engine) to the 'socio-technical' ecosystem. It reframes a product defect (lying) as a systemic educational challenge.
Context Sensitivity
The distribution of anthropomorphism is highly strategic.
Introduction (High Density): The metaphor of the student/exam is established immediately to hook the reader and provide an intuitive frame. Consciousness verbs ('know,' 'guess,' 'admit') are rampant.
Technical Sections (Low Density, High Technicality): In Section 3 ('Pretraining Errors'), the language shifts to p-hat, 'density estimation,' and 'binary classification.' The metaphor recedes to give the paper scientific weight. Here, the 'knower' becomes a 'distribution.' This creates a 'validity sandwich'—metaphor, math, metaphor.
Discussion/Post-Training (High Density): In Section 4, the metaphor returns with a vengeance ('school of hard knocks,' 'test-taking mode').
Capabilities vs. Limitations: Interestingly, capabilities are often described mechanistically (e.g., 'reliably counts letters' - though 'reliably' is borderline), while failures are described agentially ('hallucinate,' 'bluff,' 'guess'). This is an inversion of the typical hype cycle. Usually, companies anthropomorphize success ('it thinks!') and mechanize failure ('glitch'). Here, the authors anthropomorphize the failure ('it's just guessing like a student') to normalize it. This strategic variation suggests the rhetorical goal is normalization: making the mysterious and dangerous phenomenon of hallucination feel familiar and forgiveable (like a student's error).
Conclusion
About this task
This final section provides a comprehensive synthesis of the entire analysis. It identifies the text's dominant metaphorical patterns and explains how they construct an "illusion of mind." Most critically, it connects these linguistic choices to their tangible, material stakes—analyzing the economic, legal, regulatory, and social consequences of this discourse. It concludes by reflecting on AI literacy as a counter-practice and outlining a path toward a more precise and responsible vocabulary for discussing AI.
Pattern Summary
The dominant anthropomorphic pattern in this text is AI AS HUMAN STUDENT. This foundational metaphor supports a secondary, interconnected pattern: HALLUCINATION AS STRATEGIC CHOICE. The system functions by first establishing the AI as a 'Knower'—an entity capable of holding knowledge and experiencing uncertainty (consciousness projection). Once the AI is established as a conscious subject, its errors (hallucinations) are reframed not as mechanical failures, but as 'test-taking strategies'—specifically 'guessing' or 'bluffing.' The load-bearing element is the 'Exam' metaphor. Without the frame of the 'unfair test,' the AI is simply a faulty product. With the 'Exam' frame, the AI becomes a sympathetic agent responding to perverse incentives. The assumption that the AI possesses 'uncertainty' (knowing it doesn't know) is the necessary precondition for the claim that it is choosing to suppress that uncertainty.
Mechanism of Illusion: The "Illusion of Mind"
The 'illusion of mind' is constructed through a bait-and-switch of explanation types. The text begins with a vivid Intentional explanation (students guessing), shifts to a rigorous Theoretical/Functional analysis (math of binary classification), and then returns to the Intentional frame to interpret the math. The 'trick' is using the math to validate the metaphor: because the loss function (math) rewards high-confidence outputs, the text claims the agent is 'guessing' (metaphor). This exploits the audience's 'curse of knowledge'—we know what it feels like to guess on a test, so we project that internal experience onto the machine's statistical behavior. The temporal structure reinforces this: the human analogy comes first, priming the reader to interpret the subsequent 'p-hat' and 'distribution' analysis through the lens of human psychology.
Material Stakes
- Selected Categories: Epistemic, Regulatory/Legal, Social/Political
- Analysis: The consequences of this metaphorical framing are concrete and significant.
Epistemically, framing the AI as a 'student' who 'knows' but 'bluffs' degrades user truth-seeking. If users believe the AI 'knows' the answer but is just 'gaming the test,' they may trust its outputs even when it signals low confidence, assuming a 'hidden truth' exists. This conflation of 'knowing' and 'processing' encourages verifiable reliance on statistical noise.
Regulatory/Legally, this framing is a liability shield. If hallucination is a 'strategic choice' driven by 'bad exams' (the ecosystem), rather than a defect of the product itself, it complicates product liability. It suggests the AI is an autonomous agent making 'bad choices' due to 'social pressure,' rather than a deterministic algorithm built by a corporation. This shifts the regulatory focus from 'fixing the code' to 'fixing the benchmarks.'
Socially, the 'trustworthy' metaphor creates inappropriate parasocial bonds. Users are encouraged to view the AI as a partner in a 'school of hard knocks.' This benefits the tech companies (who want engaged, forgiving users) at the expense of the public, who bear the risk of relying on 'bluffing' machines in critical domains like medicine or law.
Literacy as Counter-Practice: AI Language Literacy
Reframing this language is an act of resistance against the normalization of error. Replacing 'the student guesses' with 'the model samples low-probability tokens' strips away the empathy and reveals the mechanical brittleness. Replacing 'admitting uncertainty' with 'generating refusal tokens' counters the myth of the 'inner mind' and forces recognition that if the token isn't generated, the 'knowledge' doesn't exist.
Systematic adoption of this precision would require journals to mandate 'mechanistic translations' for psychological terms. It would require researchers to abandon the 'intuitive' shorthand that makes their papers popular but misleading. Resistance would come from the industry, which benefits from the 'AI as Agent' narrative because it implies infinite potential (AGI). Linguistic precision threatens the hype bubble by revealing the system as a sophisticated, but mindless, statistical engine.
Path Forward
To move forward, the discourse community—particularly researchers and journalists—must adopt a 'Mechanistic-First' Vocabulary. Instead of 'hallucinate,' use 'confabulate' or 'generative error.' Instead of 'knows,' use 'encodes correlations.' Instead of 'uncertainty' (feeling), use 'entropy' or 'statistical dispersion.'
Structurally, funding agencies and journals should require a 'Metaphor Impact Statement'—an acknowledgment of the anthropomorphism used for convenience and a technical definition of what it actually means. We must govern these systems as products, not partners. The goal is a future where we interact with AI with the same clarity we interact with a spreadsheet: understanding it as a powerful tool that 'processes' data, without the illusion that it 'knows' what that data means. This precision enables real accountability, stripping away the 'bluffing student' mask to reveal the code beneath.
Raw JSON: 2025-11-23-why-language-models-hallucinate-metaphor-955d69.json
Analysis Framework: metaphor v6.0
Generated: 2025-11-23T21:10:38+00:00Z
Discourse Depot © 2025 by TD is licensed under CC BY-NC-SA 4.0
Extended Processing Summary
Gemini refers to the text below as “thought summaries.” This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as a just another rhetorical artifact —a way of making the model’s processing legible. The first-person framing of these “thought summaries” is a presentation choice for the user-facing output, not a window into “real”thoughts. These are computational artifacts, not cognitive reports from a quirky, curious or conflicted mind.
Refining the Approach
I'm now zeroing in on the specifics of a critical discourse analysis. The title "Why Language Models Hallucinate" demands a careful examination. I'm considering the potential biases within the language of the paper itself, and how the authors frame their arguments about hallucination. I'll need to identify the key assumptions and power dynamics embedded in their choice of vocabulary and structure.
Examining the Core Metaphors
I'm now deeply immersed in analyzing the source text, specifically focusing on the metaphors. The "student on an exam" analogy is central, and I'm carefully dissecting how it's used to explain LLM behavior. Also, the implications of using "hallucinate" in a technical context are becoming clearer. The conflation of "knowing" with "guessing" is very interesting!
Deconstructing the Analogies
I'm now cataloging the key metaphors employed, with each entry contributing to a deeper understanding. The "Teacher/Grader" dynamic is clearly embedded within the "Student/Exam" frame. I'm also finding that the "School of Hard Knocks" contrast with standardized testing is particularly revealing. Framing these models as "bluffing" is also crucial. Plus, recognizing that these metaphors highlight the socio-technical nature of the issue.
Analyzing the Underlying Schemas
I'm now identifying the schemas that underpin the central metaphors. "Trustworthy AI," a social construct, requires careful examination. I'm noting the reasoning model and how it relates to memory and learning. Thinking about optimization strategies, I see the focus on "test-takers" as significant. The consciousness aspect needs emphasis in my analysis.
Mapping Source to Target
I've initiated the source-target mapping process with seven key entries. These maps clarify the concealed assumptions, like the lack of intent in the "student" metaphor. "Hallucination" gets mapped to "low probability token." Also, "Bluffing" is now mapped in a more revealing way.
Clarifying the Concealments
I'm now focusing on the nuances within the source-target mappings. Specifically, I'm examining how each metaphor conceals the inherent limitations of language models. For example, "Test-Taking Mode" hides the mechanical optimization aspect, while "Reasoning" masks the lack of actual logical checking unless outside tools are employed. The "Trust" mapping exposes the purely functional nature of the output, devoid of the emotional bond we associate with trust in humans. I'm also delving into the "Uncertainty" map and what it reveals.
Mapping the Limitations
I'm now zeroing in on the reframed language task. I'm focusing on how the original text constructs an anthropomorphized view, especially the "reward guessing" phrase. The lack of inherent care in the model's output is critical to the reframing I'm working on. The goal is to highlight the functional nature of the models and to de-emphasize any implied intentionality or consciousness. I'll need to focus on phrases that reduce "reasoning" to a function.
Framing the Reframing
I'm now refining the reframing of the source text. I've rephrased "student guessing" to "generating low-probability tokens," emphasizing the mechanistic aspect. Also, I'm focusing on "loss minimization" as the driving force. It feels like I'm closing in on highlighting the functional nature of the models.
Analyzing the Framing Shift
I'm now fully immersed in the core argument. The paper employs a "student" metaphor to describe failures, then validates it through math, creating an intuitive but potentially misleading framework. I see that this approach normalizes mistakes, framing "hallucination" as a natural learning process rather than a system error. This evokes empathy-based trust rather than reliability-based. I'm focusing on the cross-entropy loss function and the annotator labor that are concealed by this language. Plus, I'm analyzing the varying weight each section gives to the metaphor versus the mathematics.
Composing the Argument
I'm now drafting my conclusion with the final five paragraphs. The 'AI as Student' metaphor and 'Hallucination as Psychological State' are the load-bearing frameworks, centered around the 'Exam' concept. I'm focusing on the illusion of intention that the paper inadvertently creates. I'm noting the implications for education, policy, and economics in my work. The contrast between 'Knowing vs. Processing' is now very central to my analysis.
Zeroing In: The Checklist
I'm now fully immersed in the final stage, meticulously checking my content against the established rubric. I am cross-referencing against the task requirements, particularly focusing on the word count minimums. My goal is to ensure the core of my analysis is sharply articulated, and I have explicitly critiqued terms like "know" and "uncertainty" in the context of LLMs.
Clarifying the Concealments
I'm now zeroing in on the implications of "bluffing" and the lack of intent. It's just reward maximization. The text's 2025 setting needs to be honored. The "Agency Slippage" requires focus. The rigor of the math solidifies the insidious nature of the metaphor. I'm preparing to expand my observations to meet the word count. Finally, I'm checking my JSON schema.
Finalizing the Analysis
I'm now making final adjustments to ensure all quotations are accurate and properly formatted. I've re-examined the key points to ensure my analysis is strong. I've confirmed that the analysis is as concise as possible.