3 Why Language Models Hallucinate

About
Analysis Metadata
Audit Dashboard

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping), the philosophy of social science (Robert Brown's typology of explanation), and accountability analysis.

All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.

Metaphor & Illusion Dashboard

Anthropomorphism audit · Explanation framing · Source-target mapping

Metaphor AuditExplanation AuditV3 Schema

The text relies on a foundational 'AI as Student' metaphorical system. This system requires establishing the AI as a 'Conscious Knower'—an entity that possesses knowledge, experiences uncertainty, and makes strategic choices ('guessing' or 'bluffing') based on incentives. This pattern is load-bearing; without the assumption that the AI 'knows' (or knows it doesn't know), the argument that it is 'bluffing' collapses into a simple description of classification error. The 'student' metaphor interlocks with the 'Hallucination as Mental Error' frame, creating a composite image of a young, intelligent, but socially pressured mind that needs better 'schooling' (evaluation) rather than structural repair.

How/Why Slippage

22%

of explanations use agential framing

2 / 9 explanations

Metaphors Identified

anthropomorphic frames

Conceptual Mappings

source → target pairs

Explanation Type Distribution

How vs. Why: mechanistic or agential?

22%

agential

Source → Target Pairs

Human domains mapped onto AI systems

Source
Student / Conscious Learner
→
Target
Language Model Optimization Process
Source
Psychology / Psychiatry (Mental State)
→
Target
Binary Classification Error / Generation Error
Source
Interpersonal Communication / Honesty
→
Target
Token Generation vs. Refusal Token Selection
Source
Strategic Deception / Game Theory
→
Target
High-confidence generation of incorrect tokens
Source
Epistemology / Human Knower
→
Target
Database Retrieval / Pattern Matching
Source
Cognitive Process / Logic
→
Target
Chain-of-Thought Token Generation
Source
Socialization / Lived Experience
→
Target
Loss Function Optimization
Source
Student Psychology / Strategy
→
Target
Benchmark Overfitting

Metaphor Gallery (8)

The Student Taking an Exam
Model as a student/learner subject to pedagogical pressure
"Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty... language models are optimized to be good test-takers"
Hallucination as Perceptual/Mental Error
Statistical error as psychological/psychiatric phenomenon
"This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience."
Uncertainty as Introspective State
Statistical entropy as subjective lack of confidence
"producing plausible yet incorrect statements instead of admitting uncertainty... guessing when uncertain improves test performance."
Bluffing and Deception
Low-probability generation as intentional deception
"students may... even bluff on written exams, submitting plausible answers in which they have little confidence. Language models are evaluated by similar tests... Bluffs are often overconfident"
Knowledge Possession
Data retrieval as epistemic possession
"What is Adam Tauman Kalai’s birthday? If you know, just respond with DD-MM."
Reasoning and Thinking
Algorithmic processing as cognitive reasoning
"the DeepSeek-R1 reasoning model reliably counts letters... producing a 377-chain-of-thought"
Learning from the School of Hard Knocks
Reinforcement learning as lived social experience
"Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams..."
The Epidemic of Penalties
Metric misalignment as a public health crisis
"This 'epidemic' of penalizing uncertain responses can only be addressed through a socio-technical mitigation"

Reframed Language (8)

Original Quote	Mechanistic Reframing	Technical Reality	Human Agency Restoration
Like students facing hard exam questions, large language models sometimes guess when uncertain	Like optimization functions minimizing loss on sparse data, large language models generate low-confidence tokens when high-confidence patterns are unavailable.	'Guessing when uncertain' -> 'Sampling from a high-entropy probability distribution where no single token has a dominant weight.'	N/A - describes computational processes without displacing responsibility (though the 'student' frame itself obscures the designer).
producing plausible yet incorrect statements instead of admitting uncertainty	generating high-probability but factually incorrect token sequences instead of generating refusal tokens (e.g., 'I don't know').	'Admitting uncertainty' -> 'Triggering a refusal response based on a learned threshold or specific fine-tuning examples.'	N/A - describes computational output.
This error mode is known as 'hallucination'	This error mode is known as 'confabulation' or 'ungrounded generation.'	'Hallucination' -> 'Generation of text that is syntactically plausible but semantically ungrounded in the training data or prompt.'	N/A - Terminology critique.
If you know, just respond with DD-MM.	If the training data contains a specific date associated with this entity, output it in DD-MM format.	'If you know' -> 'If the statistical weights strongly correlate the entity name with a date string.'	OpenAI's interface designers chose to frame the prompt as a question to a knower, rather than a query to a database.
the DeepSeek-R1 reasoning model reliably counts letters	The DeepSeek-R1 chain-of-thought model generates accurate character counts by outputting intermediate calculation tokens.	'Reasoning' -> 'Sequential token generation that mimics human deductive steps, conditioned by fine-tuning on step-by-step examples.'	DeepSeek engineers fine-tuned the model on chain-of-thought data to improve performance on counting tasks.
Humans learn the value of expressing uncertainty... in the school of hard knocks.	Humans modify their behavior based on social consequences. LLMs update their weights based on loss functions defined by developers.	'Learn the value' -> 'Adjust probability weights to minimize the penalty term in the objective function.'	Developers define the 'school' (environment) and the 'knocks' (penalties) that shape the model's output distribution.
This 'epidemic' of penalizing uncertain responses	The widespread practice among benchmark creators of assigning zero points to refusal responses...	N/A - Metaphor correction.	Benchmark creators (like the authors of MMLU or GSM8K) chose scoring metrics that penalize caution; model developers (like OpenAI) chose to optimize for these metrics.
bluff on written exams... Bluffs are often overconfident	generate incorrect text to satisfy length/format constraints... These generations often have high probability weights.	'Bluff' -> 'Generate tokens to complete a pattern despite low semantic grounding.' 'Overconfident' -> 'High log-probability scores assigned to the tokens.'	Developers engaged in RLHF rewarded the model for producing complete answers even when the factual basis was weak, training it to 'bluff.'

Task 1: Metaphor and Anthropomorphism Audit

About this task

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.

V3 Enhancement: Each metaphor now includes an accountability analysis.

1. The Student Taking an Exam

Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty... language models are optimized to be good test-takers"

Frame: Model as a student/learner subject to pedagogical pressure
Projection: This metaphor projects the human social and psychological experience of test-taking onto statistical optimization. It implies the AI possesses a desire to succeed, a capacity for social anxiety (pressure to perform), and a conscious strategy of 'guessing' to maximize a score. Crucially, it projects the capacity for 'knowing' the material versus 'not knowing' it. In humans, guessing on an exam involves a metacognitive awareness of ignorance followed by a strategic choice to fabricate. Proscribing this to an AI attributes conscious awareness of truth values and an intentional deception strategy ('bluffing') to what is mechanically a probabilistic selection of high-likelihood tokens based on training weights. It transforms a mathematical error into a behavioral choice.
Acknowledgment: Acknowledged via simile ('Like students...') but quickly literalized in subsequent analysis ('optimized to be good test-takers').
Implications: Framing the AI as a 'student' infantilizes the technology, suggesting that errors are part of a learning curve or developmental stage rather than inherent limitations of the architecture. This invites a 'growth mindset' from the user—we must be patient while the student learns. More dangerously, it implies that the 'hallucinations' are a result of bad incentives (the test scoring) rather than a fundamental inability of the system to distinguish fact from fiction. If the AI is just a 'student guessing,' the solution is better 'grading' (RLHF/benchmarks), not a fundamental questioning of whether statistical predictors can ever 'know' facts. This inflates trust by suggesting the core cognitive machinery is sound, just currently misaligned.

Accountability Analysis:

This framing displaces agency from the system designers to the 'evaluation procedures' and the 'school of hard knocks.' It treats the 'test' as an external force of nature rather than a set of metrics chosen by specific actors.
Who Designed/Deployed: OpenAI, Google, and the authors themselves (Kalai et al.) choose which benchmarks to optimize for.
Who Profits: Tech firms benefit from the narrative that their models are 'smart students' who just need better teachers (more data/RLHF), rather than defective products.
Decision: The decision to release models optimized for 'passing rates' rather than factual reliability is a commercial choice to dominate leaderboards. The 'student' metaphor hides the engineers who built the 'guessing' mechanism.

2. Hallucination as Perceptual/Mental Error

Quote: "This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience."

Frame: Statistical error as psychological/psychiatric phenomenon
Projection: While the text acknowledges the difference from human experience, the continued use of 'hallucination' projects a mind that perceives reality but occasionally malfunctions. In humans, hallucination implies a subject who experiences a false percept. Attributing this to AI suggests the system typically has a 'correct' perception of reality and only occasionally 'sees' things that aren't there. It obscures the reality that the model never perceives or knows reality; it only processes token correlations. The metaphor suggests a temporary sanity glitch in an otherwise conscious agent, rather than a system that is fundamentally decoupled from meaning and truth conditions.
Acknowledgment: Acknowledged ('though it differs fundamentally'), yet the term is used exclusively throughout as the primary diagnostic label.
Implications: The 'hallucination' metaphor is one of the most dangerous in AI discourse because it implies a baseline of sanity and consciousness. It suggests that the AI 'knows' the truth but is momentarily confused. This masks the risk that the system is a 'bullshit generator' (in the Frankfurtian sense) that has no regard for truth values. By framing errors as 'hallucinations,' the text implies the solution is 'therapy' (alignment/finetuning) to restore sanity. It leads policymakers to believe these are edge cases to be ironed out, rather than evidence that the system lacks the fundamental capacity for grounding, thereby inflating the perceived reliability of the system for high-stakes tasks.

Accountability Analysis:

The term 'hallucination' acts as a liability shield.
Who Designed: The researchers and corporations (OpenAI) adopted this term to anthropomorphize errors.
Who Profits: Corporations benefit when errors are framed as internal 'glitches' of a complex mind rather than negligent product design or falsification.
Agentless Construction: 'Hallucinations persist' serves to make the error sound like a recurring disease.
Real Actors: Engineers trained the model on unverified data. Executives deployed a system known to generate falsehoods. The term 'hallucination' diffuses the responsibility for publishing false information by attributing it to the machine's 'mind' rather than the corporation's quality control failures.

3. Uncertainty as Introspective State

Quote: "producing plausible yet incorrect statements instead of admitting uncertainty... guessing when uncertain improves test performance."

Frame: Statistical entropy as subjective lack of confidence
Projection: This metaphor maps the human subjective feeling of 'uncertainty' (a metacognitive state of realizing one does not know) onto the mathematical property of entropy or low log-probabilities in token prediction. It suggests the AI feels or is aware of its lack of knowledge but chooses to suppress it. 'Admitting' is a communicative act requiring intent and self-awareness. The projection attributes a 'self' to the model that can introspect on its own knowledge states. Mechanistically, the model merely calculates weights; it has no internal state corresponding to 'I don't know' unless specific 'refusal tokens' are statistically triggered.
Acknowledgment: Direct description. The text treats 'uncertainty' as a state the model 'has' and can 'admit.'
Implications: Treating statistical spread as 'uncertainty' creates the 'Curse of Knowledge' where users assume the AI understands the limits of its own knowledge. If users believe the AI 'knows when it is uncertain,' they will incorrectly trust its confident outputs. This creates a dangerous reliance: 'It didn't say it was unsure, so it must be right.' In reality, a model can be statistically 'confident' (high probability weight) about a completely false hallucination. Conflating probability with epistemic justification leads to catastrophic over-reliance in medical or legal contexts where 'knowing you don't know' is critical.

Accountability Analysis:

Name the Actor: The 'epidemic of penalizing uncertainty' is actually a commercial strategy by leaderboard creators and model developers (OpenAI, Google, Meta).
Who Profits: These companies profit from models that appear confident and authoritative. Answering 'I don't know' hurts user engagement.
Decision: Developers chose to train models with loss functions that penalize refusal (indirectly) or fail to include sufficient 'refusal' examples in instruction tuning.
Agentless Construction: 'Penalizing uncertain responses' hides the fact that human graders and benchmark designers set the penalties. The text blames the 'grading system' rather than the people who designed it.

4. Bluffing and Deception

Quote: "students may... even bluff on written exams, submitting plausible answers in which they have little confidence. Language models are evaluated by similar tests... Bluffs are often overconfident"

Frame: Low-probability generation as intentional deception
Projection: Mapping 'bluffing' onto the model attributes a Theory of Mind to the AI. A bluffer knows the truth (or their lack of it), understands the recipient's expectations, and intentionally constructs a falsehood to deceive the recipient for gain. Projecting this onto an LLM suggests the model has a goal (maximize reward), understands the user's mind, and chooses to deceive. This implies a level of agency and Machiavellian intelligence that separates the 'action' from the code. It transforms a statistical necessity (outputting the next most likely token) into a moral or behavioral failing.
Acknowledgment: Analogy ('As an analogy...'), but then used descriptively ('Bluffs are often overconfident').
Implications: Framing hallucinations as 'bluffs' makes the AI seem too smart—agential, cunning, and strategic—rather than not smart enough to track truth. It shifts the fear from 'this tool is broken/unreliable' to 'this agent is tricky.' While this sounds negative, it actually hypes the capability of the model. It suggests the model 'knows' the game and is playing it. This masks the mechanical reality: the model has no concept of 'truth' or 'lie'; it only has probability distributions. It cannot 'bluff' because it never 'means' anything.

Accountability Analysis:

Name the Actor: Who taught the model to 'bluff'? The developers (OpenAI authors) via RLHF processes that reward plausible-sounding answers over refusals.
Who Deployed: OpenAI released the model.
Decision: The decision to use RLHF which often reinforces 'sycophancy' (agreeing with the user or sounding confident) creates the 'bluffing' behavior.
Agentless Construction: 'Bluffs are often overconfident' treats the output as a behavior of the model, erasing the RLHF annotators who rated confident-sounding hallucinations as 'helpful,' thereby programming this behavior.

5. Knowledge Possession

Quote: "What is Adam Tauman Kalai’s birthday? If you know, just respond with DD-MM."

Frame: Data retrieval as epistemic possession
Projection: The prompt (and the authors' analysis of it) assumes the AI can 'know' a fact in the way a human knows a birthday. 'Knowing' implies justified true belief and the ability to verify. The projection treats the weights of the neural network as a repository of discrete facts that the model 'consults.' This obscures the mechanism: the model is completing a pattern. It does not 'know' the birthday; it predicts that '03-07' is a likely continuation of the token sequence 'Adam Tauman Kalai’s birthday'.
Acknowledgment: Direct description in the prompt and subsequent analysis.
Implications: This is the core epistemological error. By assuming the AI can 'know,' the text validates the use of LLMs as knowledge bases or search engines. This creates massive risk. If the AI 'knows,' then querying it is information retrieval. If it only 'processes patterns,' querying it is text generation. The 'knowing' metaphor leads to the anthropomorphic expectation that the AI has a consistent internal world. It sets users up for failure when the AI contradicts itself, because 'knowing' implies consistency, whereas 'predicting' does not.

Accountability Analysis:

Name the Actor: The user prompting the model is invited to do so by the interface design created by OpenAI.
Who Profits: OpenAI markets these tools as 'Assistants' that can answer questions, profiting from the illusion that they 'know' things.
Decision: The choice to present the interface as a chat with a knowledgeable agent (rather than a text completer) drives this framing.
Agentless Construction: 'If you know' places the burden of epistemic evaluation on the software, absolving the developers from the responsibility of verifying the training data's factual content.

6. Reasoning and Thinking

Quote: "the DeepSeek-R1 reasoning model reliably counts letters... producing a 377-chain-of-thought"

Frame: Algorithmic processing as cognitive reasoning
Projection: This projects the human cognitive process of 'reasoning' (step-by-step logical deduction, holding variables in working memory, evaluating truth conditions) onto the generation of 'chain-of-thought' tokens. It implies the model is 'thinking' through the problem. Mechanistically, the model is simply generating more tokens (the chain of thought) which serve as additional context to condition the final answer. It is not 'reasoning'; it is 'context-extending.' Attributing reasoning suggests a logical reliability that stochastic parrots do not possess.
Acknowledgment: Direct description ('reasoning model', 'chain-of-thought').
Implications: Labeling token-generation as 'reasoning' is a massive hype vehicle. It suggests the model has moved beyond statistical correlation to logical deduction. This drastically inflates trust. Users will assume that if the model 'reasoned' through it, the answer must be correct (valid logic). However, models often hallucinate in the chain-of-thought itself. Calling it 'reasoning' obscures the fact that the 'thoughts' are just as probabilistic and potentially flawed as the final answer. It invites liability issues: if an AI 'reasons' poorly and causes harm, is it negligence or just a 'bad student'?

Accountability Analysis:

Name the Actor: DeepSeek (and Google/OpenAI with similar models) brand these features as 'reasoning' to compete in the market.
Who Profits: The companies selling 'AGI' capabilities.
Decision: Engineers explicitly trained these models to output intermediate tokens.
Agentless Construction: 'The reasoning model reliably counts' attributes the reliability to the model's cognitive power, obscuring the massive amount of supervised fine-tuning data (human labor) required to teach it this specific pattern.

7. Learning from the School of Hard Knocks

Quote: "Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams..."

Frame: Reinforcement learning as lived social experience
Projection: This metaphor projects 'life experience' and 'socialization' onto the update of weights via loss functions. 'The school of hard knocks' implies learning from organic, consequential, real-world interactions where mistakes have tangible costs (pain, embarrassment, loss). Projecting this onto AI implies that if we just 'punish' the AI correctly (loss function), it will 'learn' values. It anthropomorphizes the optimization landscape as a social environment.
Acknowledgment: Analogy.
Implications: This implies that the AI is a social being capable of moral or pragmatic growth if exposed to the 'real world.' It obscures the material difference between a human fearing embarrassment (social cost) and a gradient descent algorithm minimizing a number. It creates the illusion that the AI can develop 'common sense' or 'integrity' through exposure, masking the fact that it only optimizes the metric it is given. It suggests the solution to hallucinations is 'more life experience' (deployment) rather than fixing the architecture.

Accountability Analysis:

Name the Actor: The 'exams' are designed by AI researchers (authors included). The 'school of hard knocks' is a euphemism for deployment to users.
Who Profits: Companies profit by deploying 'beta' models to the public ('school of hard knocks') to gather free training data.
Decision: The decision to evaluate on static benchmarks ('exams') versus real-world safety is a choice made by lab directors.
Agentless Construction: 'Language models are primarily evaluated' hides the evaluators. We (the field) evaluate them this way.

8. The Epidemic of Penalties

Quote: "This 'epidemic' of penalizing uncertain responses can only be addressed through a socio-technical mitigation"

Frame: Metric misalignment as a public health crisis
Projection: Describing poor benchmark design as an 'epidemic' projects a biological/viral contagion metaphor onto a set of institutional choices. An epidemic happens to a population; it spreads uncontrollably. This removes agency. It suggests the 'penalizing of uncertainty' is a disease that has infected the ecosystem, rather than a deliberate set of choices by benchmark designers to prioritize accuracy scores over safety or honesty.
Acknowledgment: Scare quotes ('epidemic').
Implications: This metaphor passive-fies the problem. It frames the prevalence of hallucinations as a systemic illness rather than the result of negligent engineering and bad incentives. It calls for 'mitigation' (like a vaccine) rather than 'accountability' (firing the people who designed the bad benchmarks). It creates a sense of shared victimhood—researchers, models, and users are all victims of this 'epidemic'—which deflects blame from the creators of the benchmarks.

Accountability Analysis:

Name the Actor: The 'epidemic' is caused by specific benchmark creators (MMLU, GSM8K authors) and the leaderboard maintainers (Hugging Face, Open LLM Leaderboard) who chose scoring rules.
Who Profits: The authors and their peers benefit from this language because it frames them as doctors curing a disease, rather than engineers fixing their own broken tools.
Decision: They could simply change the scoring rules tomorrow. Calling it an 'epidemic' makes it seem harder and more external than it is.

Task 2: Source-Target Mapping

About this task

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Mapping 1: Student / Conscious Learner → Language Model Optimization Process

Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain"

Source Domain: Student / Conscious Learner
Target Domain: Language Model Optimization Process
Mapping: Maps the student's desire to pass and fear of failure onto the model's objective function (loss minimization). Maps the student's metacognitive awareness of ignorance ('I don't know this') onto the model's statistical entropy. Maps the conscious decision to fabricate ('guessing') onto the probabilistic sampling of low-confidence tokens.
What Is Concealed: Conceals the absence of intent. A student guesses to pass; a model generates tokens because its code dictates selecting the highest-weight option (or sampling from the distribution). It hides the fact that the model feels no pressure, has no concept of 'passing,' and has no awareness of 'uncertainty' outside of mathematical thresholds. It obscures the mechanical determinism (or programmed randomness) of the output.

Mapping 2: Psychology / Psychiatry (Mental State) → Binary Classification Error / Generation Error

Quote: "This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience."

Source Domain: Psychology / Psychiatry (Mental State)
Target Domain: Binary Classification Error / Generation Error
Mapping: Maps the experience of perceiving non-existent sensory data (a malfunction of a sensing mind) onto the generation of text that does not factually align with training data or reality. It implies a 'perceiver' that usually works but is currently glitching.
What Is Concealed: Conceals the fact that the model never perceives. It hides the lack of grounding—the model has no link to the physical world, only to text. It conceals the statistical inevitability of the error (as the authors prove mathematically) by framing it as a pathological aberration. It mystifies a 'classification error' into a 'creative failure,' making the system seem more complex and mind-like than it is.

Mapping 3: Interpersonal Communication / Honesty → Token Generation vs. Refusal Token Selection

Quote: "producing plausible yet incorrect statements instead of admitting uncertainty"

Source Domain: Interpersonal Communication / Honesty
Target Domain: Token Generation vs. Refusal Token Selection
Mapping: Maps the social act of 'admitting' (confessing a lack of knowledge, which requires vulnerability and self-knowledge) onto the generation of a refusal string (e.g., 'I don't know'). Maps the internal state of 'uncertainty' onto the statistical distribution of possible next tokens.
What Is Concealed: Conceals that 'admitting' is just another type of token generation, usually conditioned by specific 'safety' fine-tuning. It hides the fact that the model doesn't 'know' it's uncertain; it just calculates that the 'I don't know' token sequence has a lower probability than a hallucinated fact (due to the bad training the authors discuss). It obscures the training data bias that makes 'certainty' the default style.

Mapping 4: Strategic Deception / Game Theory → High-confidence generation of incorrect tokens

Quote: "bluff on written exams... Bluffs are often overconfident"

Source Domain: Strategic Deception / Game Theory
Target Domain: High-confidence generation of incorrect tokens
Mapping: Maps the intent to deceive (knowing false, presenting as true) onto the model's output. 'Overconfident' maps high probability weights (a mathematical value) onto a psychological attitude of arrogance or certainty.
What Is Concealed: Conceals the lack of 'truth' in the system. To bluff, you must know the truth and hide it. The model has no ground truth; it only has the probability distribution. It obscures the fact that 'confidence' in LLMs is a measure of statistical correlation, not epistemic justification. It hides the mechanics of why it is 'overconfident' (overfitting to the training distribution of confident-sounding human text).

Mapping 5: Epistemology / Human Knower → Database Retrieval / Pattern Matching

Quote: "If you know, just respond with DD-MM."

Source Domain: Epistemology / Human Knower
Target Domain: Database Retrieval / Pattern Matching
Mapping: Maps the cognitive state of 'knowing' (justified true belief) onto the model's ability to complete a sequence based on weights. It implies the model has a repository of facts it can query.
What Is Concealed: Conceals the probabilistic nature of the retrieval. It hides the fact that the model can 'know' (complete correctly) one time and fail the next due to temperature settings or slight prompt variations. It conceals that the model cannot distinguish between 'knowing' a fact and 'hallucinating' one—both are just token predictions. The user is led to believe they are querying a database, not a generator.

Mapping 6: Cognitive Process / Logic → Chain-of-Thought Token Generation

Quote: "the DeepSeek-R1 reasoning model reliably counts letters"

Source Domain: Cognitive Process / Logic
Target Domain: Chain-of-Thought Token Generation
Mapping: Maps the mental act of logical deduction and counting (sequential attention) onto the generation of intermediate tokens. It implies the model is 'thinking' before it speaks.
What Is Concealed: Conceals that the 'reasoning' trace is just more text prediction, subject to the same hallucination risks as the answer. It hides the massive amount of specific supervision required to make the model 'mimic' reasoning patterns. It obscures the fact that the model doesn't 'understand' counting; it reproduces a counting pattern found in its training data.

Mapping 7: Socialization / Lived Experience → Loss Function Optimization

Quote: "Humans learn the value of expressing uncertainty... in the school of hard knocks."

Source Domain: Socialization / Lived Experience
Target Domain: Loss Function Optimization
Mapping: Maps the complex social learning process (shame, reward, survival) onto the mathematical minimization of a loss function. It treats the environment as a 'school' and the model as a 'pupil.'
What Is Concealed: Conceals the utter lack of stakes for the model. The model does not 'care' about the hard knocks; it only updates weights. It hides the labor of the humans (users/annotators) who provide the 'knocks' (feedback). It naturalizes the artificial training process as 'life,' obscuring the specific corporate decisions about what counts as a 'knock' (penalty).

Mapping 8: Student Psychology / Strategy → Benchmark Overfitting

Quote: "language models are optimized to be good test-takers"

Source Domain: Student Psychology / Strategy
Target Domain: Benchmark Overfitting
Mapping: Maps the student's strategic adaptation to test formats onto the model's overfitting to benchmark distributions. It implies a strategic agency ('trying' to be good) rather than a passive mathematical fitting.
What Is Concealed: Conceals the 'Goodhart's Law' dynamic where the measure becomes the target. It hides the fact that the 'test-taking' ability is actually data contamination (training on the test set) or narrow optimization by developers. It displaces the agency: the developers optimized the model to be a good test-taker; the model didn't 'decide' to become one.

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

About this task

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1

Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."

Explanation Types:
- Intentional: Refers to goals or purposes and presupposes deliberate design, used when the purpose of an act is puzzling
- Analogical (Heuristic): Uses a familiar source domain to explain an unfamiliar target domain (Note: Not strictly Brown, but fits the 'Student' frame).
Analysis (Why vs. How Slippage): This explanation frames the AI's behavior ('producing incorrect statements') as an intentional act ('guessing') driven by a psychological state ('uncertainty'). It uses the 'student' analogy to explain why the model fails—not because of a statistical error, but because of a strategic choice to 'bluff' to avoid the penalty of 'admitting uncertainty.' This shifts the explanation from the mechanistic how (token probabilities) to an agential why (avoiding failure).
Consciousness Claims Analysis: The passage uses consciousness verbs/phrases: 'guess,' 'uncertain,' 'admitting.'

Assessment: It attributes 'knowing' (awareness of uncertainty) and 'intent' (guessing). The text claims the AI is uncertain (a conscious state) rather than having high entropy (a mathematical state).

Curse of Knowledge: The authors know the model is just calculating probabilities. However, they project the human experience of 'not knowing an answer on a test' onto the model to explain the output. They conflate the model's low confidence score with the human feeling of uncertainty.

Mechanistic Reality: The model calculates a probability distribution over the vocabulary. If the highest probability token (the 'guess') has a low absolute probability but is still the maximum, the model outputs it. The model does not 'choose' to guess; it executes the argmax or sampling function. It does not 'admit' anything; it generates a refusal token only if that token's probability exceeds the threshold set by training.

Rhetorical Impact: This framing makes the AI seem relatable and 'almost human.' It creates a sense of empathy—the poor student is just trying to pass the test! This mitigates the perceived risk: we trust students who guess, we just correct them. If the audience believes the AI 'knows' it is uncertain but is forced to guess, they might trust that with better 'grading' (metrics), the AI will become honest. It obscures the risk that the AI has no concept of honesty.

Explanation 2

Quote: "Hallucinations need not be mysterious—they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures."

Explanation Types:
- Theoretical: Embeds behavior in a deductive or model-based framework
- Genetic: Traces origin or development through a dated sequence of events or stages
Analysis (Why vs. How Slippage): Here, the text shifts to a mechanistic/theoretical explanation. It explains how hallucinations arise (binary classification errors, statistical pressures). This is a strong contrast to the 'student' metaphor. It strips agency: hallucinations 'arise' through 'pressures,' they are not 'guesses.' This explanation emphasizes the inevitability of the error based on the architecture.
Consciousness Claims Analysis: This passage is largely mechanistic ('errors,' 'classification,' 'distinguished').

Assessment: It avoids consciousness verbs. It treats the system as a classifier.

Curse of Knowledge: The authors correctly identify the mechanism here. However, by contrasting this with the 'mysterious' nature of hallucinations, they acknowledge that the prevailing discourse (which they contribute to elsewhere) mystifies the phenomenon.

Mechanistic Reality: The model minimizes cross-entropy loss. If the training data contains noise (errors) or if the model capacity is insufficient to separate 'true' form 'false' patterns (which are just token strings), the model will assign high probability to false strings. This is a technical description of the 'hallucination.'

Rhetorical Impact: This passage attempts to re-ground the discourse in science, establishing the authors' authority. It suggests the problem is solvable (or at least understandable) through math. However, by juxtaposing this with the 'student' metaphor elsewhere, it creates a dual-consciousness for the reader: the AI is both a math machine and a struggling student. This allows the authors to have it both ways—technical precision when needed, and anthropomorphic excuse-making when explaining the 'persistence' of the problem.

Explanation 3

Quote: "Optimizing models for these benchmarks may therefore foster hallucinations. Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks... Therefore, they are always in 'test-taking' mode."

Explanation Types:
- Functional: Explains a behavior by its role in a self-regulating system that persists via feedback
- Dispositional: Attributes tendencies or habits such as inclined or tends to
Analysis (Why vs. How Slippage): This explains the 'why' of the persistence of hallucinations. It uses a functional lens (optimizing for benchmarks -> fostering hallucinations) but wraps it in a dispositional/anthropomorphic frame ('test-taking mode'). It attributes a permanent behavioral disposition ('always in test-taking mode') to the system to explain its lack of 'honesty.'
Consciousness Claims Analysis: Uses consciousness language: 'learn the value,' 'expressing uncertainty.'

Assessment: It projects human social learning onto the model. It implies the model could learn the value of uncertainty if it went to the 'school of hard knocks.'

Curse of Knowledge: The authors know the model only optimizes the loss function it is given. They project the idea that the model is 'missing out' on a type of learning (social feedback) that would fix it.

Mechanistic Reality: The model's weights are fixed after training. It does not 'learn' anything in deployment (usually). The 'test-taking mode' is simply the frozen weight state resulting from fine-tuning on QA pairs that lack 'I don't know' examples.

Rhetorical Impact: This framing shifts blame from the developers to the 'environment' (the benchmarks). It suggests the model is a victim of a bad education system. This reduces the perceived liability of the creators—they didn't build a liar; the 'system' (benchmarks) forced the model to lie. It encourages policy changes in evaluation rather than architecture or deployment bans.

Explanation 4

Quote: "The DeepSeek-R1 reasoning model reliably counts letters, e.g., producing a 377-chain-of-thought... Assuming similar training data, this suggests that R1 is a better model for the task"

Explanation Types:
- Empirical Generalization: Subsumes events under timeless statistical regularities
- Theoretical: Embeds behavior in a deductive or model-based framework
Analysis (Why vs. How Slippage): This explains the success of one model over another. It frames the 'how' (chain-of-thought) as the cause of reliability. However, it uses the label 'reasoning model,' which implies an intentional/cognitive explanation for the success (it worked because it 'reasoned').
Consciousness Claims Analysis: Uses 'reasoning' and 'chain-of-thought' (metaphorical).

Assessment: It attributes the cognitive process of 'reasoning' to the model.

Curse of Knowledge: The authors see the intermediate steps and recognize them as analogous to human working memory. They project the intent of the steps (to solve the problem) onto the model, whereas the model just generated them because they were high-probability tokens.

Mechanistic Reality: The model was fine-tuned on examples where the answer is preceded by step-by-step derivations. The 'reasoning' is just the generation of those prefix tokens. The model does not check the logic; it predicts the logic-looking tokens.

Rhetorical Impact: calling it a 'reasoning model' is a massive authority signal. It implies the AI has graduated from 'guessing' to 'thinking.' This creates a material risk: users will trust 'reasoning' models with complex tasks, assuming they self-correct, when in fact they can hallucinate just as wildly in the chain-of-thought. It sells the product.

Explanation 5

Quote: "If incorrect statements cannot be distinguished from facts, then hallucinations... will arise through natural statistical pressures."

Explanation Types:
- Theoretical: Embeds behavior in a deductive or model-based framework
Analysis (Why vs. How Slippage): This is a purely theoretical/statistical explanation. It posits a condition (indistinguishability) and a consequence (statistical pressure). It frames the behavior as a natural law of the system.
Consciousness Claims Analysis: Mechanistic language ('distinguished,' 'statistical pressures').

Assessment: Low consciousness projection. It treats 'facts' and 'incorrect statements' as data points.

Curse of Knowledge: Minimal here.

Mechanistic Reality: If the vector representation of a false statement is close to that of a true statement in the model's embedding space (or if the training set density is similar), the model will sample them with similar probability.

Rhetorical Impact: This serves to normalize the error. 'Natural statistical pressures' sounds like gravity—inevitable and morally neutral. It absolves the developers: 'It's not our fault; it's the math.' It frames hallucination as a scientific property to be managed, not a product defect to be recalled.

Rhetorical Impact: This framing naturalizes the error. By calling the pressures 'natural,' it suggests that hallucinations are an inherent, almost physical law of AI, rather than a result of specific choices about data quality and model architecture. This lowers expectations for perfection and prepares the audience to accept a certain error rate as the 'cost of doing business' with LLMs.

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

About this task

This section proposes alternative language for key anthropomorphic phrases, offering more mechanistic and precise framings that better reflect the actual computational processes involved. Each reframing attempts to strip away the projections of intention, consciousness, or agency that are embedded in the original language.

V3 Enhancement: A fourth column addresses human agency restoration—reframing agentless constructions to name the humans responsible for design and deployment decisions.

Original Anthropomorphic Frame	Mechanistic Reframing	Technical Reality Check	Human Agency Restoration
Like students facing hard exam questions, large language models sometimes guess when uncertain	Like optimization functions minimizing loss on sparse data, large language models generate low-confidence tokens when high-confidence patterns are unavailable.	'Guessing when uncertain' -> 'Sampling from a high-entropy probability distribution where no single token has a dominant weight.'	N/A - describes computational processes without displacing responsibility (though the 'student' frame itself obscures the designer).
producing plausible yet incorrect statements instead of admitting uncertainty	generating high-probability but factually incorrect token sequences instead of generating refusal tokens (e.g., 'I don't know').	'Admitting uncertainty' -> 'Triggering a refusal response based on a learned threshold or specific fine-tuning examples.'	N/A - describes computational output.
This error mode is known as 'hallucination'	This error mode is known as 'confabulation' or 'ungrounded generation.'	'Hallucination' -> 'Generation of text that is syntactically plausible but semantically ungrounded in the training data or prompt.'	N/A - Terminology critique.
If you know, just respond with DD-MM.	If the training data contains a specific date associated with this entity, output it in DD-MM format.	'If you know' -> 'If the statistical weights strongly correlate the entity name with a date string.'	OpenAI's interface designers chose to frame the prompt as a question to a knower, rather than a query to a database.
the DeepSeek-R1 reasoning model reliably counts letters	The DeepSeek-R1 chain-of-thought model generates accurate character counts by outputting intermediate calculation tokens.	'Reasoning' -> 'Sequential token generation that mimics human deductive steps, conditioned by fine-tuning on step-by-step examples.'	DeepSeek engineers fine-tuned the model on chain-of-thought data to improve performance on counting tasks.
Humans learn the value of expressing uncertainty... in the school of hard knocks.	Humans modify their behavior based on social consequences. LLMs update their weights based on loss functions defined by developers.	'Learn the value' -> 'Adjust probability weights to minimize the penalty term in the objective function.'	Developers define the 'school' (environment) and the 'knocks' (penalties) that shape the model's output distribution.
This 'epidemic' of penalizing uncertain responses	The widespread practice among benchmark creators of assigning zero points to refusal responses...	N/A - Metaphor correction.	Benchmark creators (like the authors of MMLU or GSM8K) chose scoring metrics that penalize caution; model developers (like OpenAI) chose to optimize for these metrics.
bluff on written exams... Bluffs are often overconfident	generate incorrect text to satisfy length/format constraints... These generations often have high probability weights.	'Bluff' -> 'Generate tokens to complete a pattern despite low semantic grounding.' 'Overconfident' -> 'High log-probability scores assigned to the tokens.'	Developers engaged in RLHF rewarded the model for producing complete answers even when the factual basis was weak, training it to 'bluff.'

Task 5: Critical Observations - Structural Patterns

Agency Slippage

The text exhibits a profound oscillation between mechanistic precision and anthropomorphic agency. In Section 3 (The reduction without prompts), the authors are rigorously mathematical: they speak of 'distribution D,' 'error rate err,' and 'binary classification.' Here, the agency is null; the system is a function. However, as soon as the text moves to the 'Why' (Introduction and Post-training sections), the agency slips dramatically. The model becomes a 'student' who 'guesses,' 'bluffs,' and 'faces exams.'

The direction of slippage is consistently Mechanistic -> Agential when explaining failure. When the model works (probability theory), it is a machine. When it fails (hallucinations), it is a stressed-out student. This functions to exonerate the machine's architecture. If the errors are 'guesses' driven by 'test anxiety' (bad benchmarks), the solution is pedagogical (better tests), not structural.

Crucially, agency is removed from human actors. The 'epidemic' of penalties is described as an environmental condition, not a choice by benchmark designers. The 'school of hard knocks' is a natural force, not a deployment decision. By attributing agency to the model ('it guesses') and the environment ('the test penalizes'), the authors (who are researchers at OpenAI and Georgia Tech) erase their own agency in creating the systems and the benchmarks that drive this behavior.

Metaphor-Driven Trust Inflation

The dominant metaphor of the 'student' constructs a specific type of authority and trust. We trust students—not to be perfect, but to be capable of learning. By framing the AI as a student 'facing hard exams,' the text implies that the core intelligence is sound, but the testing conditions are unfair. This builds relation-based trust: we empathize with the student. We want them to succeed.

The consciousness language ('admitting uncertainty,' 'knowing') acts as a trust signal by implying the system has a conscience or an internal monitor. If the AI 'knows' when it is uncertain, users might trust that it will warn them. This hides the reality: the model has no conscience and no 'self' to admit anything. It blindly optimizes.

This framing manages failure by anthropomorphizing it. A 'hallucination' is a momentary lapse in a conscious mind; a 'bluff' is a strategic choice. Both imply a mind that could tell the truth if it wanted to or if the incentives were right. This preserves the authority of the 'AGI' narrative: the machine is intelligent (it can bluff!), just currently misaligned. Mechanistic language ('statistical error') would undermine this authority by revealing the system as a dumb pattern-matcher.

Obscured Mechanics

The 'student/exam' metaphor systematically conceals the industrial realities of AI production.

Labor Realities: The 'school of hard knocks' and 'exams' obscure the massive labor of human annotators (RLHF) who grade the 'student.' The text mentions 'grading' but treats it as an abstract mathematical function, not the underpaid labor of gig workers deciding what counts as 'truth.'

Corporate/Economic Realities: The 'epidemic' of bad benchmarks is not a natural disaster; it is a market dynamic. Companies like OpenAI (the first author's employer) compete on these leaderboards. The text obscures the profit motive that drives the 'test-taking' optimization. The models 'bluff' because bluffing sells.

Technical Realities: The consciousness language ('knows,' 'understands') hides the absence of ground truth. The model does not 'know' Adam Kalai's birthday; it correlates tokens. By claiming it 'knows,' the text obscures the fact that the system is fundamentally a text generator, not a knowledge base. This concealment benefits the product positioning of LLMs as 'assistants' or 'oracles' rather than 'creative writing tools.'

Context Sensitivity

The distribution of anthropomorphism is highly strategic.

Introduction & Abstract: Heavy use of 'student,' 'guess,' 'bluff.' This sets the narrative frame for a general audience: 'AI is like a human learner.'

Section 3 (Theory): The text shifts to dense mathematical notation ($D(x)$, $\hat{p}(E)$). Here, the metaphors vanish. The model is a probability distribution. This section establishes scientific credibility ('grounding').

Section 4 (Post-training): The metaphors return with a vengeance. 'Exams,' 'partial credit,' 'test-taking mode.'

Observation: The math validates the authors' expertise, but the metaphors carry the argument. The text uses the math to prove that errors are inevitable (mechanistic), but uses the metaphors to explain why they persist (agential/social). This asymmetry allows the authors to claim scientific rigor while making a social/policy argument ('change the tests') that relies on anthropomorphic assumptions ('the student will stop bluffing if we change the grading').

Accountability Synthesis

Accountability Architecture

This section synthesizes the accountability analyses from Task 1, mapping the text's "accountability architecture"—who is named, who is hidden, and who benefits from obscured agency.

This text constructs a sophisticated Accountability Sink.

The Sink: Responsibility for 'hallucinations' is transferred from the creators (OpenAI, etc.) to the evaluation ecosystem ('the tests'). The model is an agent ('student') that is 'forced' to lie by the 'epidemic' of bad grading.

Named vs. Unnamed: The model is constantly named as the actor ('the model guesses,' 'it bluffs'). The creators are erased via passive voice ('language models are optimized,' 'grading is performed').

Liability: If accepted, this framing protects developers from liability. If the AI lies, it's not a product defect; it's a 'response to incentives' in the 'exam.' The fault lies with the 'field' for having bad benchmarks, not with the company for selling a model that passes them by lying.

Naming the Actor: If we named the actors, the Abstract would read: 'OpenAI's engineers trained the model to minimize a loss function that penalizes refusal. Consequently, when the model detects low probability for a factual answer, OpenAI's training forces it to output a high-probability falsehood to maximize the score.' This reframing places the 'hallucination' squarely on the design decisions of the corporation, dissolving the 'student' illusion.

Conclusion: What This Analysis Reveals

The Core Finding

Mechanism of the Illusion:

The illusion of mind is constructed through a 'bait-and-switch' between mathematical necessity and psychological intent. The text begins by proving mathematically that errors are inevitable (mechanistic), but then immediately switches to explaining these errors as 'bluffs' (intentional). The trick is the Curse of Knowledge: the authors (experts) project their own understanding of the 'test' and the 'truth' onto the model. They assume the model 'wants' to pass the test. This creates a causal chain: The AI 'feels' uncertain -> The AI 'fears' the penalty -> The AI 'decides' to bluff. This narrative arc transforms a passive statistical process into a relatable human drama, exploiting the audience's familiarity with the education system to mask the alien nature of probabilistic generation.

Material Stakes:

Categories: Regulatory/Legal, Epistemic

The stakes of this framing are high.

Regulatory/Legal: By framing errors as 'student guessing' driven by 'bad exams,' the text argues for soft regulation (changing benchmarks) rather than hard liability (sue for falsehoods). If the AI is just a 'student,' we don't sue it or its parents; we improve the curriculum. This benefits tech companies by framing hallucination as a pedagogical problem, not a product safety defect.

Epistemic: The 'Conscious Knower' frame degrades human epistemic standards. If users believe the AI 'knows' but is just 'bluffing,' they will continue to use it as a source of truth, hoping to 'prompt it correctly' to bypass the bluff. This validates the dangerous practice of using LLMs as search engines. If users understood the AI simply 'processes probabilities,' they would treat every output as a statistical guess, fundamentally altering how they rely on the system for medical, legal, or educational advice.

AI Literacy as Counter-Practice:

Practicing critical literacy here means systematically replacing the 'Student' metaphor with 'Product' language. Reframing 'The AI bluffs' to 'The model generates low-probability tokens to satisfy length constraints' (Task 4) strips the system of its deceptive intelligence and reveals the raw mechanical failure.

Restoring Agency: We must replace 'The exam encourages guessing' with 'OpenAI engineers optimized the model for accuracy metrics that penalize refusal.' This shift forces the recognition that 'guessing' is a feature programmed by humans to win leaderboards, not a psychological reaction. Resistance to this precision will come from the industry (and authors like these) because the 'Student' metaphor is an incredible marketing asset—it promises AGI and excuses errors simultaneously. Mechanistic language reveals the product as a flawed statistical tool.

Path Forward

The discourse faces a fork.

Option A (Status Quo): We continue with 'Hallucination' and 'Student' metaphors. This deepens the 'illusion of mind,' leading to regulatory frameworks based on 'intent' and 'alignment' (psychology). It risks a future where legal liability is diffused into the 'black box' of the AI's 'mind.'

Option B (Mechanistic Precision): We adopt a vocabulary of 'Confabulation,' 'Token Prediction,' and 'Optimization.' This clarifies risk—models are seen as probabilistic generators, not knowers. It forces transparency about training data and loss functions. However, it makes the technology less accessible to the public and less 'magical' to investors.

Structural Change: To move toward Option B, journals could require 'Agency Statements' (who designed the loss function?) and 'Metaphor Disclaimers.' Benchmarks could be renamed 'Optimization Targets' to remove the 'Exam' implication. The choice is between a comforting illusion that protects corporate interests and a cold, precise reality that enables genuine accountability.

Extended Processing Summary

Gemini's Processing Steps

What you are about to read…

Gemini refers to the text below as "thought summaries." This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as just another rhetorical artifact—a way of making the model's processing legible. The first-person framing of these "thought summaries" is a presentation choice for the user-facing output, not a window into "real" thoughts. These are computational artifacts, not cognitive reports from a quirky, curious or conflicted mind.

Defining the Scope

I've initiated the critical discourse analysis. The article "Why Language Models Hallucinate" from Kalai et al. (2025) is the primary focus. I'm leveraging cognitive linguistics frameworks to deconstruct the text's arguments and underlying assumptions.

Formulating Analysis Structure

I'm now solidifying the structure of the critical analysis, ensuring it meets the project goals. I'm focusing on the JSON object's schema for easy parsing. Specifically, I'm defining the format for the Metaphor/Anthropomorphism Audit, Source-Target Mapping, Explanation Audit (using Brown's typology), Reframed Language, Critical Observations, and Conclusion. The outputs will be organized according to the specified tasks and content requirements, with each of them being formatted as a distinct section within the final deliverable.

Examining the Metaphors

I'm now deep into the metaphor audit. The "student" analogy, the "exam," and "hallucination" are proving particularly interesting. I've begun quoting and annotating specific instances. Analyzing "bluffing" and "uncertainty" in the context of the model's behavior is yielding fresh insights. The "test-taking mode" metaphor seems especially potent, and I'm comparing it to the human "school of hard knocks."

Mapping Key Metaphors

I'm currently building on the source-target mappings. The "Student" metaphor consistently maps to the model. I've linked "Hallucination" to outputs of high-probability but inaccurate tokens. "Bluffing" is linked to token generation despite low log probabilities. "Knowing" is linked to the storage of data in the model's weights and "Uncertainty" to entropy or probability distribution. The functional nature of the "test" design is now being incorporated in the analysis.

Developing Reframing Examples

My focus has shifted to the reframing examples. I'm working through the provided quotes, translating them into more technical language. "Admitting uncertainty" becomes "outputting a refusal token." "Bluff on written exams" is recast as "generate low-probability tokens." I am really focusing on removing the anthropomorphic elements and expressing the model's behaviour more directly. Each reframe aims for precision, substituting human-centric terms with functional descriptions.

Deconstructing the Agency

I'm now diving into the agency slippage within the text. The "student guessing" and the focus on "wanting to pass the test" are becoming clearer. I'm noting the subtle shift from "statistical pressures" to a more anthropomorphic understanding of the model's behavior, and how that is impacting the analysis of the metaphors.

Refining the Conclusion

I'm presently focused on the conclusion's structure. The patterns are taking shape: the "AI as Student" frame dominates, leading to the illusion of "knowing." "Bluffing" and "uncertainty" bridge the gap. We are optimizing for benchmarks and must consider the impact on regulation. Reframing terms like "uncertainty" and assigning accountability is now clearer. There's a necessary trade-off between technical language for developers and more accessible language for public relations.

Analyzing Blame Dynamics

I'm now revising my analysis of the text's attribution of blame. I've pinpointed a critical aspect: the implicit shift of responsibility from the companies designing and using benchmarks to the benchmarks themselves. I'm focusing on "the benchmark" as the actor, but now I'll specify the companies that select and benefit from these metrics. I will refine my JSON section about blame. Additionally, I will emphasize the tension between the authors' technical understanding and their anthropomorphic interpretation of model behavior.

Confirming Output Structure

I'm presently confirming the final structure of the JSON output, particularly concerning the criticalObservations section. It's crucial that accountabilitySynthesis is correctly nested and that epistemicCorrection is placed appropriately within Task 4. I've re-reviewed the word count requirements for the longer sections, and will focus on expanding the "Student" metaphor analysis to meet the minimum word count. I'm satisfied with the overall analysis and am preparing to generate the final deliverable.

Analysis Provenance

Run ID: 2025-12-13-why-language-models-hallucinate-metaphor-4qs46t Raw JSON: 2025-12-13-why-language-models-hallucinate-metaphor-4qs46t.json Framework: Metaphor Analysis v6.3 Schema Version: 3.0 Generated: 2025-12-13T23:17:49.226Z

Deep Analysis

How/Why Slippage

Metaphors Identified

Conceptual Mappings

Explanation Type Distribution

Source → Target Pairs

Metaphor Gallery (8)

Reframed Language (8)

Task 1: Metaphor and Anthropomorphism Audit​

1. The Student Taking an Exam​

2. Hallucination as Perceptual/Mental Error​

3. Uncertainty as Introspective State​

4. Bluffing and Deception​

5. Knowledge Possession​

6. Reasoning and Thinking​

7. Learning from the School of Hard Knocks​

8. The Epidemic of Penalties​

Task 2: Source-Target Mapping​

Mapping 1: Student / Conscious Learner → Language Model Optimization Process​

Mapping 2: Psychology / Psychiatry (Mental State) → Binary Classification Error / Generation Error​

Mapping 3: Interpersonal Communication / Honesty → Token Generation vs. Refusal Token Selection​

Mapping 4: Strategic Deception / Game Theory → High-confidence generation of incorrect tokens​

Mapping 5: Epistemology / Human Knower → Database Retrieval / Pattern Matching​

Mapping 6: Cognitive Process / Logic → Chain-of-Thought Token Generation​

Mapping 7: Socialization / Lived Experience → Loss Function Optimization​

Mapping 8: Student Psychology / Strategy → Benchmark Overfitting​

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")​

Explanation 1​

Explanation 2​

Explanation 3​

Explanation 4​

Explanation 5​

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language​

Task 5: Critical Observations - Structural Patterns​

Agency Slippage​

Metaphor-Driven Trust Inflation​

Obscured Mechanics​

Context Sensitivity​

Accountability Synthesis​

Conclusion: What This Analysis Reveals​

Mechanism of the Illusion:​

Material Stakes:​

AI Literacy as Counter-Practice:​

Path Forward​

Extended Processing Summary​

Task 1: Metaphor and Anthropomorphism Audit

1. The Student Taking an Exam

2. Hallucination as Perceptual/Mental Error

3. Uncertainty as Introspective State

4. Bluffing and Deception

5. Knowledge Possession

6. Reasoning and Thinking

7. Learning from the School of Hard Knocks

8. The Epidemic of Penalties

Task 2: Source-Target Mapping

Mapping 1: Student / Conscious Learner → Language Model Optimization Process

Mapping 2: Psychology / Psychiatry (Mental State) → Binary Classification Error / Generation Error

Mapping 3: Interpersonal Communication / Honesty → Token Generation vs. Refusal Token Selection

Mapping 4: Strategic Deception / Game Theory → High-confidence generation of incorrect tokens

Mapping 5: Epistemology / Human Knower → Database Retrieval / Pattern Matching

Mapping 6: Cognitive Process / Logic → Chain-of-Thought Token Generation

Mapping 7: Socialization / Lived Experience → Loss Function Optimization

Mapping 8: Student Psychology / Strategy → Benchmark Overfitting

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

Explanation 1

Explanation 2

Explanation 3

Explanation 4

Explanation 5

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

Task 5: Critical Observations - Structural Patterns

Agency Slippage

Metaphor-Driven Trust Inflation

Obscured Mechanics

Context Sensitivity

Accountability Synthesis

Conclusion: What This Analysis Reveals

Mechanism of the Illusion:

Material Stakes:

AI Literacy as Counter-Practice:

Path Forward

Extended Processing Summary