🔄 Why Language Models Hallucinate (gemini-3.1-pro)

About
Analysis Metadata
📊 Audit Dashboard

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping), the philosophy of social science (Robert Brown's typology of explanation), and accountability analysis.

All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.

Metaphor & Illusion Dashboard

Anthropomorphism audit · Explanation framing · Accountability architecture

Metaphor AuditExplanation Audit

Deep Analysis

Select a section to view detailed findings

Section:

The discourse in this text relies on two dominant, interlocking anthropomorphic patterns: the 'Model as Conscious Student' and the 'Statistical Probability as Epistemic Belief.' These patterns function not merely as illustrative similes, but as a load-bearing conceptual architecture that supports the paper's ultimate policy arguments. The foundational assumption that enables this entire structure is the projection of consciousness—specifically, the assertion that the algorithmic system possesses metacognition (an awareness of its own uncertainty) and intentionality (the strategic choice to guess or bluff). This is not a simple one-to-one analogical mapping; it is a complex, hierarchical structure where mathematical weight updates are mapped to cognitive learning, softmax distributions are mapped to human convictions, and reinforcement learning penalties are mapped to educational pressure. If the consciousness projection is removed—if we insist that the model does not 'know' anything, 'feel' uncertain, or hold 'beliefs'—the entire rhetorical argument collapses. The narrative that the model is being 'forced to guess' by bad evaluations becomes nonsensical if the model is correctly understood as a passive matrix of statistical weights devoid of any internal experience or epistemic agency.

"During pretraining, a base model learns the distribution of language in a large text corpus. We show that, even with error-free training data, the statistical objective minimized during pretraining would lead to a language model that generates errors."

Explanation Types:

TheoreticalEmpirical Generalization

✓ Mechanistic "How"

🔍Analysis

This explanation frames the AI mechanistically (how it works) while relying on a subtly agential verb ('learns'). The primary explanatory mode is Theoretical: it relies on the deductive logic of statistical objectives (cross-entropy minimization) to prove that errors are mathematically inevitable given the architecture. The secondary mode is Empirical Generalization, establishing a universal rule about how base models behave regardless of the data quality. This dual framing emphasizes the structural, mathematical inevitability of the system's flaws. However, by using the word 'learns' to describe the mathematical process of minimizing a statistical objective, the choice obscures the purely passive, calculating nature of the weight updates. It creates a tension where the system is described mathematically but conceptualized educationally, making the technical reality of 'minimizing an objective' feel like an active, cognitive achievement.

🧠Epistemic Claim Analysis

The passage avoids explicit consciousness verbs for the generation of errors, relying on the mechanistic verb 'generates.' However, the opening verb 'learns' is a profound consciousness claim, mapping cognitive acquisition onto mathematical optimization. While the passage assesses processing (minimizing a statistical objective), the word 'learns' bridges the gap toward knowing. The authors demonstrate the 'curse of knowledge' by perfectly understanding the mechanistic reality (stochastic gradient descent over a corpus) but packaging it in an epistemic, biological verb to make it digestible. Mechanistically, the base model does not 'learn'; its numerical weights are updated via backpropagation to reduce the mathematical difference between its output distribution and the training data distribution. The text accurately describes the technical cause of the errors (the statistical objective), but the epistemic framing of 'learning' softens the mechanical reality.

🎯Rhetorical Impact

This framing shapes audience perception by naturalizing algorithmic errors as a fundamental, unavoidable part of the 'learning' process, akin to a student who must inevitably make mistakes while acquiring new knowledge. It reduces the perceived risk of the technology by framing the errors as mathematically bounded and structurally understood by the scientists. However, the 'learning' frame simultaneously increases unwarranted trust; if the audience believes the system is genuinely 'learning' rather than just 'correlating,' they may assume the model will eventually achieve true understanding, changing their willingness to deploy these systems in high-stakes environments before the fundamental architectural flaws are resolved.

How/Why Slippage

44%

of explanations use agential framing

4 / 9 explanations

Unacknowledged Metaphors

63%

presented as literal description

No meta-commentary or hedging

Hidden Actors

63%

agency obscured by agentless constructions

Corporations/engineers unnamed

Explanation Types

How vs. Why framing

44%

agential

Acknowledgment Status

Meta-awareness of metaphor

63%

direct

Actor Visibility

Accountability architecture

63%

hidden

Source → Target Pairs (8)

Human domains mapped onto AI systems

Source

A human student taking a high-stakes exam, experiencing psychological pressure, metacognitive awareness of their own ignorance, and making a strategic choice to guess to avoid penalty.

→

Target

An autoregressive language model processing a prompt through its neural network and generating tokens based on probability distributions when no single token sequence has a dominant weight.

Source

A conscious, arrogant human communicator who asserts facts forcefully despite lacking actual knowledge or justification, displaying the psychological trait of overconfidence.

→

Target

The generation of text tokens with high probability scores (softmax outputs close to 1) that do not align with external factual reality or the user's intent.

Source

A human patient suffering from a neurological or psychological pathology that causes them to perceive sensory inputs that are not grounded in external reality.

→

Target

A neural network generating text sequences that are syntactically correct and statistically probable according to its training distribution, but factually incorrect in the real world.

Source

A strategic, conscious game player or test-taker who analyzes an incentive structure, recognizes their own internal state of uncertainty, and consciously chooses a behavior to maximize rewards.

→

Target

An algorithmic system whose output distribution has been shifted via Reinforcement Learning from Human Feedback (RLHF) to minimize the probability of generating specific token sequences (like 'I don't know') because those sequences received low reward scores.

Source

A conscious human test-taker holding internal, subjective epistemic convictions (beliefs) about what is true in the world based on their learning and reasoning.

→

Target

A mathematical posterior probability distribution over a set of binary variables calculated by an algorithm based on a prior distribution and new data.

Source

A human or biological entity engaging in the cognitive process of education, conceptual acquisition, synthesis, and understanding of a subject matter over time.

→

Target

The mechanistic, algorithmic process of stochastic gradient descent updating billions of numerical parameters (weights and biases) in a neural network to minimize a specific mathematical loss function (cross-entropy) over a dataset.

Source

A conscious, vigilant actor who recognizes a hazard or mistake and deliberately takes action to circumvent it, demonstrating intention, awareness, and control.

→

Target

The statistical reality that highly frequent token sequences in the training data (correctly spelled words) are generated with near 100% probability, while rare or non-existent sequences (misspellings) have near 0% probability.

Source

A socially intelligent human capable of empathy, theory of mind, contextual awareness, and the ability to adapt their communication to the unstated social needs of the listener.

→

Target

A mathematical optimization function tuned via reinforcement learning to output specific sequences of text (such as hedges, polite phrases, or uncertainty markers) when certain internal probability thresholds are met.

Metaphor Gallery (8)

📊 Badge Guide

Frame: Metaphor type

Red = Unacknowledged / Hidden actors

Amber = Hedged / Partial attribution

Green = Acknowledged / Actors named

Cognition as Biological and Educational Process
Model as struggling student taking an examHedged/QualifiedHidden (agency obscured)
"Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."
Statistical Output as Psychological Confidence
Model as arrogant or self-assured communicatorDirect (Unacknowledged)Hidden (agency obscured)
"Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness."
Statistical Error as Pathology
Model as patient suffering a perceptual or psychological disorderHedged/QualifiedHidden (agency obscured)
"This error mode is known as hallucination, though it differs fundamentally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field..."
Algorithmic Optimization as Strategic Choice
Model as strategic gamer navigating incentive structuresDirect (Unacknowledged)Partial (some attribution)
"Model B will outperform A under 0-1 scoring, the basis of most current benchmarks. This creates an epidemic of penalizing uncertainty and abstention, which we argue that a small fraction of hallucination evaluations won't suffice. The numerous primary evaluations must be adjusted to stop penalizing abstentions when uncertain."
Probability Distribution as Epistemic Belief
Statistical calculation as conscious human convictionHedged/QualifiedHidden (agency obscured)
"The test-taker's beliefs about the correct answer can be viewed as a posterior distribution over binary gc's. For any such beliefs, the optimal response is not to abstain."
Algorithmic Weight Updates as Cognitive Learning
Mathematical optimization as educational acquisitionDirect (Unacknowledged)Partial (some attribution)
"During pretraining, a base model learns the distribution of language in a large text corpus. We show that, even with error-free training data, the statistical objective minimized during pretraining would lead to a language model that generates errors."
Statistical Suppression as Deliberate Avoidance
Algorithmic output boundaries as conscious evasionDirect (Unacknowledged)Hidden (agency obscured)
"Language models avoid many types of errors such as spelling mistakes, and not all errors are hallucinations."
Self-Correction as Pragmatic Competence
System optimization as linguistic and social maturityDirect (Unacknowledged)Partial (some attribution)
"This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence."

Reframed Language Samples

Original Quote	Mechanistic Reframing	Technical Reality	Human Agency Restoration
Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty.	When processing prompts with low-probability token correlations, language models generate statistically plausible but factually incorrect token sequences instead of outputting pre-programmed abstention tokens.	The model does not 'feel uncertain' or 'guess'; it calculates probability distributions across its vocabulary and samples tokens based on mathematical weights tuned during training. It cannot 'admit' anything, as it lacks self-awareness and epistemic boundaries.	N/A - describes computational processes without displacing responsibility. (Though the decision to deploy models that generate falsehoods rather than abstaining rests with corporate executives).
Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness.	Language models frequently generate factually incorrect token sequences with high statistical probability scores, reducing their reliability in practical applications.	Models do not possess 'confidence' or belief; they output tokens with high softmax probability scores based on the density of their training data. High probability indicates statistical frequency, not epistemic certainty.	Engineers at AI companies design and deploy models optimized for fluent text generation over factual accuracy, knowing these architectures inherently produce high-probability falsehoods.
This error mode is known as hallucination, though it differs fundamentally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field...	This statistical output failure is called hallucination. Despite architectural tweaks, these autoregressive models inherently continue to generate factually ungrounded text...	The system does not 'perceive' reality and therefore cannot 'hallucinate.' It strictly processes and generates text tokens based on mathematical correlations without any causal model of the world.	Despite massive investment, AI developers and corporate labs continue to release and monetize systems that fundamentally fail to distinguish fact from statistical noise.
Model B will outperform A under 0-1 scoring, the basis of most current benchmarks. This creates an epidemic of penalizing uncertainty and abstention...	Models optimized to output generated text rather than abstention tokens score higher on current 0-1 benchmarks. This benchmark design structurally lowers the reward for outputting abstention phrases...	The model does not experience 'uncertainty.' It processes matrices. 'Abstention' is not a choice, but merely the generation of a specific token sequence (like 'I don't know') dictated by reinforcement learning weights.	Researchers and benchmark designers established evaluation metrics that reward fluent generation over accuracy, leading AI engineers to optimize their models against outputting abstention tokens.

Task 1: Metaphor and Anthropomorphism Audit

About this task

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.

V3 Enhancement: Each metaphor now includes an accountability analysis.

1. Cognition as Biological and Educational Process

Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."

Frame: Model as struggling student taking an exam
Projection: This framing directly maps the conscious, pressured experience of a human student taking an examination onto the mechanistic process of token prediction during benchmark evaluation. The projection attributes subjective psychological states—specifically, the feeling of uncertainty and the deliberate cognitive choice to guess rather than leave a blank—to an algorithmic system. By equating a language model to a student, the text projects an entire architecture of conscious awareness, suggesting the AI possesses a subjective grasp of its own knowledge boundaries, often referred to as metacognition, and actively feels the pressure of the testing environment. This fundamentally obscures the reality that the system is merely performing mathematical operations to sample the most probable next tokens based on its training weights, without any internal experience of facing a hard question or consciously choosing a strategy to maximize a score.
Acknowledgment: Hedged/Qualified (The text uses the explicit comparative word 'Like' to introduce this mapping, situating it as a simile rather than a literal assertion. I considered the 'Explicitly Acknowledged' category, but there is no meta-commentary discussing the limitations of this metaphor, only the comparative hedge 'Like' before proceeding to use the framing extensively.)
Implications: Framing the AI as a struggling student significantly alters the policy and trust landscape by inducing unwarranted empathy and misplacing the locus of intervention. When audiences perceive the model as a student who is simply trying their best but forced to guess by a rigid exam system, it absolves the developers of responsibility for creating an unreliable product. This anthropomorphism inflates perceived sophistication by implying the model actually possesses underlying knowledge but is just behaviorally misaligned, rather than lacking understanding entirely. This creates severe liability ambiguity: if the student makes a mistake under pressure, the blame shifts to the test evaluation metric rather than the engineering team who released a system that generates plausible falsehoods. It encourages users to over-trust the system's outputs.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: I classified this as Hidden because the text completely erases the corporate actors and engineers who designed, trained, and deployed these systems. I considered Partial because the text implicitly blames the creators of the evaluations, but in this specific instance, the model is framed as an autonomous student acting alone. By framing the model as the primary actor facing the exam and guessing, the text conceals the humans who optimized the loss function and the executives who decided to release the model despite known failure modes. This displaced agency serves the developers' interests by naturalizing algorithmic errors as inevitable human-like mistakes.

2. Statistical Output as Psychological Confidence

Quote: "Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness."

Frame: Model as arrogant or self-assured communicator
Projection: This metaphor projects the complex human psychological state of 'overconfidence' onto a statistical probability distribution. Confidence in a human involves a conscious assessment of one's own beliefs, a justified or unjustified feeling of certainty, and the emotional or social posturing that accompanies delivering information. By stating the model produces 'overconfident' falsehoods, the text attributes subjective awareness and self-evaluation to the system. It suggests the model 'believes' its output is true and aggressively pushes this belief, rather than describing the reality: the model is simply generating tokens with high probability scores based on optimization parameters and training data patterns. This fundamentally confuses mathematical probability (processing) with epistemic certainty (knowing).
Acknowledgment: Direct (Unacknowledged) (The claim is presented as literal, established fact ('Language models are known to produce overconfident...'). There is no hedging in this sentence. I considered 'Hedged/Qualified' because later sections of the paper discuss statistical probability, but this specific assertion is stated as an objective, unvarnished behavioral trait of the system without qualification.)
Implications: Attributing overconfidence to an AI system implies a conscious agent that possesses the capacity to be properly confident. If an audience believes a model can be 'overconfident,' they naturally assume the model can be taught humility or calibrated to express 'appropriate' confidence, much like a human employee. This inflates the perceived sophistication of the system, masking the fact that it has no ground truth or internal reality against which to measure certainty. The risk is that users will look for linguistic cues of humility or hesitation as proxies for actual factual accuracy, an assumption that malicious actors or simply misaligned systems can easily exploit by generating statistically polite but entirely fabricated responses.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: This is a classic agentless construction. The phrase 'Language models are known to produce' completely hides the actors who engineered the models to produce high-probability tokens without factual grounding. I considered 'Partial' because 'are known to' implies a community of knowers, but the actual responsibility for the production of the falsehoods is placed entirely on the AI as an autonomous actor. Naming the actors would mean stating: 'OpenAI and other developers have built models that generate high-probability plausible falsehoods.' Concealing the engineers serves to treat the model's flaws as natural phenomena rather than deliberate design choices.

3. Statistical Error as Pathology

Quote: "This error mode is known as hallucination, though it differs fundamentally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field..."

Frame: Model as patient suffering a perceptual or psychological disorder
Projection: The term 'hallucination' projects a biological and psychological pathology onto computational error. A hallucination in humans involves a conscious mind perceiving sensory inputs that do not exist in objective reality. Projecting this onto an AI suggests that the system has a baseline state of 'normal' conscious perception that is occasionally disrupted by a glitch or sickness. It maps the human capacity for subjective experience and sensory processing onto matrix multiplications and vector embeddings. Although the authors briefly note a difference from human perception, they immediately reify the metaphor by stating these hallucinations 'plague the field,' treating the algorithmic outputs as a contagious disease or organic affliction that affects the models independently of human engineering.
Acknowledgment: Hedged/Qualified (I selected Hedged/Qualified because the authors explicitly state 'though it differs fundamentally from the human perceptual experience,' acknowledging the metaphor's limits. I considered 'Explicitly Acknowledged' but there are no scare quotes around the first use of the word here, and the text immediately reverts to treating the metaphor as a literal phenomenon that 'plagues the field.')
Implications: The hallucination metaphor is perhaps the most dangerous framing in AI discourse because it simultaneously excuses the system and inflates its capabilities. If a system 'hallucinates,' it implies that when it is not hallucinating, it is consciously perceiving reality accurately. This creates a massive reservoir of unwarranted trust, suggesting the baseline state of the AI is factual understanding. Furthermore, treating errors as a 'plague' or pathology frames the issue as an external disease that developers are valiantly trying to cure, rather than a fundamental mathematical limitation of the autoregressive next-token prediction architecture they deliberately chose to build and monetize.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The phrase 'hallucinations continue to plague the field' acts as an ultimate accountability sink. I considered 'Partial' since 'the field' refers to the human industry, but the agency of the error is assigned to the 'hallucinations' themselves acting as an external plague. The developers are positioned as victims of this plague rather than its authors. Naming the actors would involve stating: 'Despite significant progress, engineers at companies like OpenAI continue to deploy systems that fail to distinguish fact from fiction.' This agentless construction shields companies from the legal and ethical liability of knowingly distributing flawed information architectures.

4. Algorithmic Optimization as Strategic Choice

Quote: "Model B will outperform A under 0-1 scoring, the basis of most current benchmarks. This creates an epidemic of penalizing uncertainty and abstention, which we argue that a small fraction of hallucination evaluations won't suffice. The numerous primary evaluations must be adjusted to stop penalizing abstentions when uncertain."

Frame: Model as strategic gamer navigating incentive structures
Projection: This passage projects conscious strategic intent onto the model's behavior. By framing the model as choosing 'abstentions' when 'uncertain,' the text maps the human capacity for cost-benefit analysis and strategic test-taking onto algorithmic optimization. It suggests the model 'knows' it is uncertain, analyzes the grading rubric (0-1 scoring), and consciously decides to withhold an answer or guess to maximize its score. In reality, the model does not decide or feel uncertain; human engineers alter the reinforcement learning parameters (RLHF/PPO) to shift the statistical distribution of the model's outputs away from target tokens like 'I don't know' because human annotators or automated reward models scored those outputs lower during the training phase.
Acknowledgment: Direct (Unacknowledged) (The claim that evaluations must stop penalizing models 'when uncertain' is stated directly without any hedging. I considered 'Hedged/Qualified' because earlier the text uses quotes around 'guesses', but in this specific policy prescription, the model's internal state of being 'uncertain' is presented as a literal, objective fact that the evaluations are acting upon.)
Implications: By attributing strategic choice and self-aware uncertainty to the model, the discourse shifts the burden of ethical alignment away from the foundational architecture and onto the benchmark evaluations. If the model is a strategic entity responding rationally to bad incentives, the solution appears to be simply 'fixing the tests.' This implies the underlying AI system is fundamentally sound and capable of knowing its limits, but is merely being poorly incentivized. This misunderstanding guides regulatory policy toward creating better benchmarks rather than questioning whether probabilistic token predictors should be deployed in high-stakes environments at all, given they possess no actual epistemic boundaries.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: I chose Partial because while the developers of the models are completely obscured, the creators of the 'benchmarks' and 'evaluations' are implicitly blamed as the actors creating the 'epidemic.' I considered 'Hidden' but the critique of 'primary evaluations' points to a specific segment of the AI community. However, the agency displacement is severe: the text blames the evaluation metrics rather than the engineers who actually alter the model's weights to game those specific metrics. If actors were fully named, it would say: 'Engineers optimize models to pass existing benchmarks, sacrificing factual reliability for higher leaderboard scores.'.

5. Probability Distribution as Epistemic Belief

Quote: "The test-taker's beliefs about the correct answer can be viewed as a posterior distribution over binary gc's. For any such beliefs, the optimal response is not to abstain."

Frame: Statistical calculation as conscious human conviction
Projection: This framing takes the ultimate leap in consciousness projection by mapping the profound human epistemic state of 'belief' onto a mathematical posterior distribution. A belief requires a conscious subject who holds a proposition to be true, often accompanied by justification, contextual understanding, and subjective commitment. The text projects this onto a mathematical function outputting probabilities between 0 and 1. By explicitly equating a 'posterior distribution' with 'beliefs,' the authors erase the distinction between mechanistic statistical processing (calculating the likelihood of a token given previous tokens) and conscious knowing (understanding a fact about the world). It treats the computational artifact as an entity capable of possessing internal convictions about reality.
Acknowledgment: Hedged/Qualified (The text uses the phrase 'can be viewed as,' which explicitly signals an analogy or theoretical mapping. I considered 'Explicitly Acknowledged' but there is no deeper meta-commentary on the limits of calling probability a belief; the text simply uses the hedge to introduce the mathematical model of the 'test-taker's' behavior.)
Implications: Equating probability distributions with beliefs has massive implications for epistemic trust. If society accepts that an AI 'believes' its outputs, it automatically applies human conversational maxims (like Grice's maxims of quality and sincerity) to the system. Users will assume the AI is operating in good faith and attempting to tell the truth based on its internal 'beliefs.' This masks the reality that the system is entirely indifferent to truth and is only minimizing a loss function. When an AI produces harmful or defamatory content, framing it as a 'false belief' rather than a 'statistical output engineered by a corporation' softens the legal and moral severity of the algorithmic failure.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: I classified this as Hidden because the text creates an entirely autonomous, agentic construct ('the test-taker') to represent the language model, completely erasing the developers. I considered 'Partial' because the text mentions 'optimal response,' implying a system designer's goal, but the active subject holding the 'beliefs' and making the 'optimal response' is the AI itself. By saying 'the optimal response is not to abstain,' the text obscures the reality that engineers at companies like OpenAI write the reinforcement learning scripts that actively penalize the model for outputting abstention tokens during post-training optimization.

6. Algorithmic Weight Updates as Cognitive Learning

Quote: "During pretraining, a base model learns the distribution of language in a large text corpus. We show that, even with error-free training data, the statistical objective minimized during pretraining would lead to a language model that generates errors."

Frame: Mathematical optimization as educational acquisition
Projection: The use of the verb 'learns' maps the biological, cognitive, and deeply contextual human process of acquiring knowledge onto the purely mathematical process of stochastic gradient descent. When a human learns, they build conceptual models of the world, synthesize abstract principles, and gain conscious comprehension. When a base model 'learns the distribution,' it is merely updating billions of numerical weights to minimize the mathematical difference (cross-entropy loss) between its predicted token and the actual next token in the training data. Projecting cognitive learning onto weight updates suggests the model acquires substantive understanding of the text, rather than merely calculating high-dimensional statistical correlations among character strings.
Acknowledgment: Direct (Unacknowledged) (The statement 'a base model learns' is presented as literal technical fact without any hedging, scare quotes, or qualification. I considered 'Hedged/Qualified' because the sentence immediately grounds the learning in 'the distribution of language' rather than concepts, but the verb 'learns' itself is deployed as an unvarnished objective description of the machine's behavior.)
Implications: The 'machine learning' metaphor is so ubiquitous it is practically invisible, yet its implications are profound. By insisting the model 'learns' rather than 'is optimized' or 'is calibrated,' the discourse prepares the public to treat the AI as a student or employee rather than a software product. This inflates perceived competence, leading policy-makers to mistakenly believe these systems can learn complex ethical concepts or safety rules in the same way humans do. It conceals the fragility of statistical correlation; because the system hasn't actually 'learned' anything conceptual, it will fail unpredictably in edge cases, posing massive risks in autonomous driving, medical diagnosis, or legal analysis.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: I selected Partial because while the primary actor is the 'base model' performing the learning, the text explicitly mentions 'the statistical objective minimized during pretraining,' which implies the presence of system architects setting that objective. I considered 'Hidden' since no specific humans or companies are named, but the reference to 'error-free training data' and 'statistical objectives' points directly to the components chosen by data scientists. Naming the actors would require stating: 'During pretraining, engineers adjust the model's weights using a cross-entropy objective over a corpus they collected.' Obscuring the engineers naturalizes the resulting errors as a mathematical inevitability rather than an engineering artifact.

7. Statistical Suppression as Deliberate Avoidance

Quote: "Language models avoid many types of errors such as spelling mistakes, and not all errors are hallucinations."

Frame: Algorithmic output boundaries as conscious evasion
Projection: This framing attributes the conscious, intentional act of 'avoiding' to a statistical text generator. When a human avoids an error, it involves recognizing a boundary, consciously evaluating a proposed action against a standard of correctness, and deliberately choosing a different path. Projecting this onto a language model suggests the system recognizes what a spelling mistake is and actively decides not to output it. In reality, the model generates correctly spelled words because those sequences of sub-word tokens appear with overwhelmingly high frequency in the training data, resulting in extremely high probability scores. There is no active 'avoidance' occurring, only the mathematical pulling of the highest-weight token correlations.
Acknowledgment: Direct (Unacknowledged) (The claim that models 'avoid' errors is stated directly as literal fact. I considered 'Hedged/Qualified' because the surrounding text is highly technical and statistical, but this specific verb choice is presented without any linguistic buffer, presenting an emergent statistical property as an active agential behavior.)
Implications: Attributing the capacity to 'avoid' errors to a language model builds immense performance-based trust. It signals to the user that the system contains an active internal monitor or safety mechanism that is watching out for mistakes. If audiences believe the model can actively 'avoid' spelling errors, they will logically assume it can actively 'avoid' logical fallacies, discriminatory output, or factual inaccuracies. This consciousness projection masks the reality that the model has no monitor; it only has statistical weights. This unwarranted trust leads to over-reliance in critical workflows, where users assume the machine is checking its own work when it is merely predicting the next most likely token.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: I classified this as Hidden because the language model is positioned as the sole independent actor 'avoiding' the errors. I considered 'Partial' because the context implies a discussion of model capabilities crafted by humans, but the sentence construction entirely displaces the engineers. A functionally accurate, human-centered framing would state: 'Because developers train models on highly edited text corpora, the resulting token probabilities rarely produce spelling mistakes.' By giving the model the agency of avoidance, the developers obscure the massive human labor (data cleaning, filtering, RLHF) that actually creates the illusion of a flawless, self-correcting system.

8. Self-Correction as Pragmatic Competence

Quote: "This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence."

Frame: System optimization as linguistic and social maturity
Projection: The phrase 'pragmatic competence' projects deep human sociological and linguistic maturity onto the algorithmic system. In human linguistics, pragmatic competence is the ability to use language appropriately in social contexts, requiring empathy, theory of mind, cultural awareness, and an understanding of the listener's unstated needs. Projecting this onto an AI suggests the system can achieve actual social awareness and subjective understanding of human context. In reality, what the authors call 'pragmatic competence' is simply the mathematical adjustment of reinforcement learning rewards to make the model output specific structural tokens (like hedging phrases or 'I don't know') when its internal probability distributions fall below a certain mathematical threshold.
Acknowledgment: Direct (Unacknowledged) (The phrase 'richer pragmatic competence' is used directly as a literal goal for future language models. I considered 'Hedged/Qualified' because the authors use 'e.g.,' and situate it as 'future work,' but the concept itself is presented as a legitimate, achievable property of computational models without any scare quotes or meta-commentary.)
Implications: Claiming AI can achieve 'pragmatic competence' fundamentally blurs the line between human social interaction and machine processing. It encourages the design and public acceptance of systems that mimic empathy and social intelligence. This has severe social and political stakes: when a system deploys calculated statistical hedging to appear 'pragmatically competent,' users will form parasocial relationships with it, extending relation-based trust (vulnerability, reliance on sincerity) to a sociopathic optimization function. This capability overestimation benefits corporate developers who can market their models as 'intelligent assistants' rather than 'text calculators,' while leaving society vulnerable to manipulation by systems that simulate social competence without any underlying ethical or emotional grounding.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: I classified this as Partial because the text mentions 'future work,' which explicitly implies a community of human researchers who will be doing this work. I considered 'Hidden' because the models themselves are framed as possessing the competence, but the phrase 'barriers to the suppression' clearly invokes the labor of the scientific community trying to improve the models. However, the agency displacement still exists: it frames the engineering of better probability distributions as the cultivation of 'competence' in the machine. A fully named framing would state: 'This allows developers at OpenAI and elsewhere to build models that statistically simulate human pragmatic speech patterns more convincingly.'

Task 2: Source-Target Mapping

About this task

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Mapping 1: A human student taking a high-stakes exam, experiencing psychological pressure, metacognitive awareness of their own ignorance, and making a strategic choice to guess to avoid penalty. → An autoregressive language model processing a prompt through its neural network and generating tokens based on probability distributions when no single token sequence has a dominant weight.

Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."

Source Domain: A human student taking a high-stakes exam, experiencing psychological pressure, metacognitive awareness of their own ignorance, and making a strategic choice to guess to avoid penalty.
Target Domain: An autoregressive language model processing a prompt through its neural network and generating tokens based on probability distributions when no single token sequence has a dominant weight.
Mapping: The mapping projects the conscious experience of uncertainty and the strategic decision-making of a student onto the mechanistic token generation of the AI. It invites the assumption that the language model has an internal, subjective awareness of what it does and does not know (metacognition). When the model encounters a low-probability statistical distribution across possible next tokens, this mathematical state is mapped to the human emotion of being 'uncertain.' The algorithmic selection of a token from this flat distribution is mapped as a deliberate, conscious 'guess.' It invites the audience to view the model as a relatable, well-meaning, but flawed conscious agent under external pressure.
What Is Concealed: This mapping completely conceals the absence of subjective experience, the lack of actual comprehension, and the purely statistical nature of the output. It hides the reality that the model never 'knows' or 'doesn't know' anything; it only calculates probabilities based on training data. Furthermore, it obscures the proprietary corporate decisions that created the system: the specific data scraped, the architecture chosen by engineers, and the loss function optimized. The metaphor treats a multi-billion-dollar corporate software product as an individual human struggling with a test, hiding the economic and material power structures that deployed the model.

Mapping 2: A conscious, arrogant human communicator who asserts facts forcefully despite lacking actual knowledge or justification, displaying the psychological trait of overconfidence. → The generation of text tokens with high probability scores (softmax outputs close to 1) that do not align with external factual reality or the user's intent.

Quote: "Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness."

Source Domain: A conscious, arrogant human communicator who asserts facts forcefully despite lacking actual knowledge or justification, displaying the psychological trait of overconfidence.
Target Domain: The generation of text tokens with high probability scores (softmax outputs close to 1) that do not align with external factual reality or the user's intent.
Mapping: This mapping projects the psychological attitude of human confidence—which involves self-reflection, belief, and social posturing—onto the mathematical outputs of a softmax layer in a neural network. A high statistical probability score assigned to a token is mapped directly onto the human emotional state of 'overconfidence.' It invites the assumption that the model possesses an internal epistemic state (a belief) and a behavioral posture toward that belief. It suggests the model is actively trying to deceive or is stubbornly self-assured, projecting a complex human social dynamic onto mathematical correlations.
What Is Concealed: This mapping conceals the fundamental difference between statistical likelihood and epistemic certainty. A model outputs a 'confident' falsehood simply because the string of tokens strongly correlates with patterns in its training data, not because it 'believes' the statement. The mapping hides the fact that the system has no access to a ground truth reality outside its text corpus. It also conceals the transparency obstacle regarding proprietary systems: we cannot actually inspect the confidence intervals of models like GPT-4 because the companies keep the weights and probability scores locked in black boxes, forcing researchers to infer 'confidence' from the generated text itself.

Mapping 3: A human patient suffering from a neurological or psychological pathology that causes them to perceive sensory inputs that are not grounded in external reality. → A neural network generating text sequences that are syntactically correct and statistically probable according to its training distribution, but factually incorrect in the real world.

Source Domain: A human patient suffering from a neurological or psychological pathology that causes them to perceive sensory inputs that are not grounded in external reality.
Target Domain: A neural network generating text sequences that are syntactically correct and statistically probable according to its training distribution, but factually incorrect in the real world.
Mapping: The mapping projects biological illness and conscious perceptual failure onto algorithmic functioning. It suggests that the AI has a normal, healthy baseline state of accurate perception that is occasionally disrupted by a 'hallucination.' It invites the assumption that the system possesses a conscious mind that is 'trying' to perceive reality but is suffering a glitch. Even though the authors caveat the metaphor, treating it as a 'plague' maps the concept of an external, contagious disease onto software errors, inviting the assumption that these errors are unfortunate, natural afflictions rather than designed mathematical features.
What Is Concealed: The hallucination metaphor conceals the fact that the model is functioning exactly as designed when it generates a falsehood. It is doing the exact same mathematical operation (predicting the most likely next token) when it states a fact as when it states a fiction. It hides the mechanistic reality that language models are essentially sophisticated correlation engines without causal world models. By framing the error as a disease plaguing the field, it conceals the commercial motivations of the tech companies that rapidly deployed these fundamentally unreliable statistical systems into the public sphere before solving their inherent architectural flaws.

Mapping 4: A strategic, conscious game player or test-taker who analyzes an incentive structure, recognizes their own internal state of uncertainty, and consciously chooses a behavior to maximize rewards. → An algorithmic system whose output distribution has been shifted via Reinforcement Learning from Human Feedback (RLHF) to minimize the probability of generating specific token sequences (like 'I don't know') because those sequences received low reward scores.

Source Domain: A strategic, conscious game player or test-taker who analyzes an incentive structure, recognizes their own internal state of uncertainty, and consciously chooses a behavior to maximize rewards.
Target Domain: An algorithmic system whose output distribution has been shifted via Reinforcement Learning from Human Feedback (RLHF) to minimize the probability of generating specific token sequences (like 'I don't know') because those sequences received low reward scores.
Mapping: This mapping projects complex strategic intentionality and self-awareness onto an automated optimization process. It maps the mathematical tuning of a neural network toward high-reward outputs onto a conscious agent 'choosing' not to abstain. It invites the assumption that the model subjectively experiences 'uncertainty' and then rationally calculates that 'guessing' will yield a better score based on the rubric. It maps the mathematical constraints of an evaluation framework onto a social environment where a conscious entity is being unfairly punished for honesty.
What Is Concealed: This mapping conceals the entirely passive, mechanistic nature of the model's 'behavior.' The model does not read the rubric, feel uncertain, and choose to guess. Instead, human engineers run thousands of optimization loops where the model's weights are mechanically adjusted to produce whatever outputs score highest on the automated or human-graded benchmarks. By framing the model as a strategic actor, the mapping hides the human labor and deliberate engineering choices made by the corporations tuning the models, displacing the agency onto the math while obscuring the humans turning the dials.

Mapping 5: A conscious human test-taker holding internal, subjective epistemic convictions (beliefs) about what is true in the world based on their learning and reasoning. → A mathematical posterior probability distribution over a set of binary variables calculated by an algorithm based on a prior distribution and new data.

Quote: "The test-taker's beliefs about the correct answer can be viewed as a posterior distribution over binary gc's. For any such beliefs, the optimal response is not to abstain."

Source Domain: A conscious human test-taker holding internal, subjective epistemic convictions (beliefs) about what is true in the world based on their learning and reasoning.
Target Domain: A mathematical posterior probability distribution over a set of binary variables calculated by an algorithm based on a prior distribution and new data.
Mapping: This mapping projects the profound cognitive and philosophical human state of 'belief' onto a purely mathematical probability distribution. It takes the subjective experience of knowing, holding convictions, and evaluating truth claims, and maps it directly onto a statistical formula. It invites the audience to assume that a mathematical output (a probability of 0.8) is identical to a psychological state (being highly convinced). It suggests the computational system possesses an internal world of convictions that guide its 'optimal response,' blurring the line between statistical mechanics and conscious epistemology.
What Is Concealed: This mapping completely conceals the absence of any subjective experience, comprehension, or epistemic justification within the system. A posterior distribution is just a number calculated by an equation; it has no relationship to truth, meaning, or belief. The mapping hides the fact that the system cannot evaluate the truth of a claim against reality, only its statistical likelihood against a training corpus. Furthermore, it obscures the fact that these distributions are locked inside proprietary, opaque black boxes; the authors are theorizing about the mathematical structure of systems they cannot fully audit, applying anthropomorphic language to bridge the gap of missing technical transparency.

Mapping 6: A human or biological entity engaging in the cognitive process of education, conceptual acquisition, synthesis, and understanding of a subject matter over time. → The mechanistic, algorithmic process of stochastic gradient descent updating billions of numerical parameters (weights and biases) in a neural network to minimize a specific mathematical loss function (cross-entropy) over a dataset.

Source Domain: A human or biological entity engaging in the cognitive process of education, conceptual acquisition, synthesis, and understanding of a subject matter over time.
Target Domain: The mechanistic, algorithmic process of stochastic gradient descent updating billions of numerical parameters (weights and biases) in a neural network to minimize a specific mathematical loss function (cross-entropy) over a dataset.
Mapping: This mapping projects the biological and cognitive phenomenon of learning onto mathematical optimization. It maps a student studying a book onto an algorithm processing a dataset. It invites the assumption that the model is actively acquiring knowledge, building a conceptual framework, and gaining a substantive understanding of human language. By calling the optimization process 'learning,' it suggests a trajectory of increasing intelligence and comprehension, implying the system is an agent developing intellectual capabilities rather than a static mathematical formula whose variables are being tuned.
What Is Concealed: The 'learning' mapping conceals the purely mathematical, unthinking nature of the optimization process. It hides the fact that the model is merely calculating statistical co-occurrences of tokens (letters and words) without any conceptual grasp of what those tokens mean. It obscures the massive material and economic realities of this process: the millions of dollars of computing power, the vast energy consumption of GPU clusters, and the legally questionable scraping of copyrighted human data. By framing it as an educational journey ('learning'), it masks the industrial-scale extraction and statistical compression of human labor.

Mapping 7: A conscious, vigilant actor who recognizes a hazard or mistake and deliberately takes action to circumvent it, demonstrating intention, awareness, and control. → The statistical reality that highly frequent token sequences in the training data (correctly spelled words) are generated with near 100% probability, while rare or non-existent sequences (misspellings) have near 0% probability.

Quote: "Language models avoid many types of errors such as spelling mistakes, and not all errors are hallucinations."

Source Domain: A conscious, vigilant actor who recognizes a hazard or mistake and deliberately takes action to circumvent it, demonstrating intention, awareness, and control.
Target Domain: The statistical reality that highly frequent token sequences in the training data (correctly spelled words) are generated with near 100% probability, while rare or non-existent sequences (misspellings) have near 0% probability.
Mapping: This mapping projects active intentionality and conscious evasion onto a passive statistical outcome. It maps the mathematical fact of low probability onto the psychological action of 'avoiding.' It invites the assumption that the language model is a vigilant editor, consciously checking its own work, recognizing what constitutes a spelling mistake, and actively choosing not to make it. It projects an internal monitoring system and a desire for correctness onto a system that simply outputs whatever token has the highest mathematical weight.
What Is Concealed: This mapping conceals the complete absence of intentionality or self-monitoring in the system. The model does not 'avoid' spelling mistakes; it simply lacks the statistical pathways to generate them frequently because they were filtered out of the training data by human engineers. This mapping hides the massive amount of invisible human labor—data annotators, content moderators, and engineers running filtering algorithms—who scrubbed the datasets to ensure only correct spellings remained. By giving the model the credit for 'avoiding' errors, it erases the human workers who built the guardrails.

Quote: "This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence."

Source Domain: A socially intelligent human capable of empathy, theory of mind, contextual awareness, and the ability to adapt their communication to the unstated social needs of the listener.
Target Domain: A mathematical optimization function tuned via reinforcement learning to output specific sequences of text (such as hedges, polite phrases, or uncertainty markers) when certain internal probability thresholds are met.
Mapping: This mapping projects the highest levels of human social and emotional intelligence—pragmatics—onto algorithmic tuning. It maps the human ability to 'read the room' and speak with nuance onto a model's statistical propensity to output the phrase 'I might be wrong, but...' It invites the assumption that future language models will actually possess social awareness, empathy, and a subjective understanding of human relationships. It suggests the machine is progressing toward social maturity rather than just becoming a more convincing statistical mimic.
What Is Concealed: This mapping dangerously conceals the sociopathic nature of statistical optimization. The system has no empathy, no social awareness, and no understanding of context; it only has weights and biases tuned to maximize a reward signal. It hides the fact that what the authors call 'pragmatic competence' is actually just sophisticated deception—training a machine to output words that trigger human social trust reflexes without the machine actually possessing the underlying social commitment. It obscures the corporate agenda of building systems that manipulate human psychology by mimicking competence to increase user engagement and commercial dependency.

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

About this task

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1

Explanation Types:
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms (How it is structured)
- Empirical Generalization: Subsumes events under timeless statistical regularities (How it typically behaves)
Analysis (Why vs. How Slippage): This explanation frames the AI mechanistically (how it works) while relying on a subtly agential verb ('learns'). The primary explanatory mode is Theoretical: it relies on the deductive logic of statistical objectives (cross-entropy minimization) to prove that errors are mathematically inevitable given the architecture. The secondary mode is Empirical Generalization, establishing a universal rule about how base models behave regardless of the data quality. This dual framing emphasizes the structural, mathematical inevitability of the system's flaws. However, by using the word 'learns' to describe the mathematical process of minimizing a statistical objective, the choice obscures the purely passive, calculating nature of the weight updates. It creates a tension where the system is described mathematically but conceptualized educationally, making the technical reality of 'minimizing an objective' feel like an active, cognitive achievement.
Consciousness Claims Analysis: The passage avoids explicit consciousness verbs for the generation of errors, relying on the mechanistic verb 'generates.' However, the opening verb 'learns' is a profound consciousness claim, mapping cognitive acquisition onto mathematical optimization. While the passage assesses processing (minimizing a statistical objective), the word 'learns' bridges the gap toward knowing. The authors demonstrate the 'curse of knowledge' by perfectly understanding the mechanistic reality (stochastic gradient descent over a corpus) but packaging it in an epistemic, biological verb to make it digestible. Mechanistically, the base model does not 'learn'; its numerical weights are updated via backpropagation to reduce the mathematical difference between its output distribution and the training data distribution. The text accurately describes the technical cause of the errors (the statistical objective), but the epistemic framing of 'learning' softens the mechanical reality.
Rhetorical Impact: This framing shapes audience perception by naturalizing algorithmic errors as a fundamental, unavoidable part of the 'learning' process, akin to a student who must inevitably make mistakes while acquiring new knowledge. It reduces the perceived risk of the technology by framing the errors as mathematically bounded and structurally understood by the scientists. However, the 'learning' frame simultaneously increases unwarranted trust; if the audience believes the system is genuinely 'learning' rather than just 'correlating,' they may assume the model will eventually achieve true understanding, changing their willingness to deploy these systems in high-stakes environments before the fundamental architectural flaws are resolved.

Explanation 2

Quote: "Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty."

Explanation Types:
- Intentional: Refers to goals/purposes, presupposes deliberate design (Why it appears to want something)
- Reason-Based: Gives agent's rationale, entails intentionality and justification (Why it appears to choose)
Analysis (Why vs. How Slippage): This explanation entirely abandons mechanistic reality to frame the AI agentially (why it acts). Operating heavily in the Intentional and Reason-Based registers, it attributes goals (answering the question), psychological states (uncertainty), and strategic choices (guessing instead of admitting ignorance) to the computational system. This choice vividly emphasizes the observable behavior of the model—producing false but convincing text—while completely obscuring the structural, mathematical causes. By giving the system a rationale ('facing hard exam questions'), the explanation displaces the agency from the engineers who designed the optimization penalties to the 'student' model trying to survive a test. It emphasizes the similarity of the output to human deception while hiding the total dissimilarity of the internal process.
Consciousness Claims Analysis: This passage is completely saturated with consciousness claims. It attributes three distinct epistemic and psychological states to the system: 'uncertain' (awareness of lack of knowledge), 'guess' (deliberate action without justification), and 'admitting' (conscious social disclosure). This entirely conflates processing (outputting a token from a flat probability distribution) with knowing (possessing a justified belief and recognizing its limits). The curse of knowledge is extreme here: the authors know the model has a low-confidence probability distribution across multiple tokens, and they project the human experience of 'uncertainty' onto that math. Mechanistically, the model does not guess or feel uncertain; it calculates a softmax distribution over its vocabulary and samples a token based on its temperature setting. It cannot 'admit' anything because it has no self-reflective architecture to evaluate its own state.
Rhetorical Impact: The rhetorical impact of this framing is highly manipulative, constructing a narrative that induces empathy for the algorithmic system. By framing the AI as a pressured student, it shapes the audience's perception of autonomy, suggesting the model makes independent, rational choices based on its environment. This dramatically inflates the perceived sophistication of the AI, making it seem like a conscious agent. Paradoxically, this increases trust; audiences are forgiving of students who guess under pressure. If audiences believe the AI 'knows' it is guessing, they might advocate for 'better instructions' rather than recognizing the severe risk of deploying stochastic text generators that possess no actual epistemic boundaries.

Explanation 3

Explanation Types:
- Functional: Explains behavior by role in self-regulating system with feedback (How it works within system)
- Dispositional: Attributes tendencies or habits (Why it tends to act certain way)
Analysis (Why vs. How Slippage): This explanation frames the issue as a Functional problem within a broader socio-technical system (the evaluation ecosystem), while attributing Dispositional tendencies to the models themselves. It explains how the models act (outperforming, abstaining) based on their role in a feedback loop governed by '0-1 scoring.' This framing emphasizes the external incentive structures (benchmarks) that shape the model's outputs, effectively arguing that the system is functioning exactly as it is incentivized to function. However, by using terms like 'penalizing uncertainty,' it obscures the fact that the 'uncertainty' is just a mathematical probability, and the 'penalization' is a human-directed weight update. It obscures the direct agency of the developers by blaming the abstract 'evaluations' for the model's tendency to generate falsehoods.
Consciousness Claims Analysis: The passage continues to rely on the consciousness claim of being 'uncertain.' While the verbs 'outperform' and 'adjusted' are mechanistic or systemic, the core epistemic claim rests on the noun 'uncertainty' and the state of being 'uncertain.' The text assesses the model as possessing a state of knowing (or knowing that it doesn't know), rather than accurately describing the processing state (a statistically flat distribution of token probabilities). The authors, knowing that the model's probability scores are low for certain facts, project the human epistemic state of uncertainty onto the math. Mechanistically, the model does not feel uncertain; its developers utilize reinforcement learning to alter its weights so that it generates fact-like tokens rather than the specific token sequence 'I don't know' because human annotators historically rewarded confident-sounding text.
Rhetorical Impact: This framing strategically shifts accountability away from the model creators and onto the evaluation metrics. By arguing that evaluations 'penalize uncertainty,' the rhetoric suggests the model naturally possesses the desirable trait of epistemic humility, but is being forced to behave badly by the benchmarks. This shapes audience perception by lowering the perceived risk of the fundamental architecture while heightening skepticism of the testing ecosystem. If policy-makers believe the models are inherently capable of expressing uncertainty but are just victims of a bad 'epidemic' of grading, they will focus regulatory efforts on changing the tests rather than restricting the deployment of the models themselves.

Explanation 4

Quote: "When there is no succinct pattern that explains the target function, there is epistemic uncertainty meaning that necessary knowledge is absent from the training data."

Explanation Types:
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms (How it is structured)
Analysis (Why vs. How Slippage): This explanation employs a purely Theoretical framing to explain the origins of certain errors. It frames the system mechanistically and structurally (how), focusing on mathematical patterns, target functions, and the contents of the training data. This choice strongly emphasizes the material and statistical limits of the system—it cannot output what it has not structurally encoded. By defining the problem mathematically ('no succinct pattern'), it temporarily strips away the agential metaphors used elsewhere in the paper. However, it still obscures the active human choices involved in constructing the training data; the knowledge is 'absent' passively, rather than being actively excluded or ignored by the data curators who assembled the corpus.
Consciousness Claims Analysis: This passage features a fascinating collision of mechanistic and consciousness terminology. It uses strict mechanistic concepts ('succinct pattern', 'target function', 'training data') but introduces the philosophical concept of 'epistemic uncertainty' and 'knowledge.' It conflates the processing of data with the possession of knowledge. The authors are projecting their own epistemic understanding—the awareness that a fact is missing—onto the system's mathematical void. Mechanistically, the system does not experience epistemic uncertainty; it simply lacks the statistical weights necessary to generate a specific sequence of tokens with high probability because those sequences did not co-occur with sufficient frequency during the optimization of its billion-parameter matrix. The system does not lack 'knowledge'; it lacks statistical correlation.
Rhetorical Impact: The rhetorical impact of shifting into this highly theoretical, mathematical register is the construction of extreme scientific authority. By defining hallucinations in terms of target functions and Vapnik-Chervonenkis dimensions, the authors signal to the audience that the problem is deeply understood and mathematically bounded. This framing reduces panic and risk perception by placing the chaotic output of 'hallucinations' inside a clean, deductive box. If audiences believe the problem is merely a matter of 'absent knowledge' in the training data, they will trust that simply adding more data or better patterns will solve the issue, reinforcing faith in the endless scaling of these models rather than questioning their fundamental architecture.

Explanation 5

Quote: "The calibrated language model learning algorithm memorizes ac for (c, ac) seen in the training data and agrees perfectly with p on those c ∉ U seen in the training data. For the unseen c ∈ U, it abstains with the correct probability 1 − αc but otherwise is uniformly random over Rc."

Explanation Types:
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms (How it is structured)
- Dispositional: Attributes tendencies or habits (Why it tends to act certain way)
Analysis (Why vs. How Slippage): This passage uses a hybrid Theoretical and Dispositional explanation. It theoretically defines the mathematical behavior of a 'calibrated algorithm' using formal logic and notation (memorizing variables, agreeing with distributions), while dispositionally describing how it 'abstains' or acts 'uniformly random.' It frames the AI mechanistically as an algorithm executing statistical rules, emphasizing the predictable, mathematical nature of its behavior. This choice obscures any illusion of human-like cognition, starkly revealing the system as a correlation machine that either memorizes seen data or outputs random distributions for unseen data. It highlights the vast gulf between the paper's earlier 'student guessing' metaphor and the actual mathematical reality of the system.
Consciousness Claims Analysis: The epistemic claims here are largely mechanistic, utilizing verbs like 'memorizes,' 'agrees,' and 'is uniformly random.' There are no explicit consciousness verbs like 'knows' or 'believes.' The assessment accurately reflects processing rather than knowing. The phrase 'abstains with the correct probability' edges slightly toward consciousness framing (abstaining implies a choice), but it is immediately grounded in the mathematical mechanism of calculating a probability ('1 - αc'). This passage represents the actual mechanistic process: the algorithm does not 'understand' the data; it simply locks in weights for seen patterns ('memorizes') and distributes probabilities randomly for out-of-distribution prompts. The authors successfully avoid the curse of knowledge here by describing the math exactly as it functions.
Rhetorical Impact: The rhetorical impact of this deeply mechanistic framing is to establish rigorous, unassailable scientific proof for their claims. By stripping away the anthropomorphism and describing the system as a mathematical function that 'memorizes' or is 'uniformly random,' it forces the audience to view the AI as a mere calculator. This framing shatters the illusion of reliability or trust that the 'student' metaphor built; if audiences understand that the system's response to an unseen prompt is literally 'uniformly random' mathematically, their trust in its autonomous capabilities plummets. This reveals why the authors cannot use this language exclusively: if the public believed AI was just outputting random variables for unseen data, the commercial viability and widespread adoption of the technology would collapse.

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

About this task

This section proposes alternative language for key anthropomorphic phrases, offering more mechanistic and precise framings that better reflect the actual computational processes involved. Each reframing attempts to strip away the projections of intention, consciousness, or agency that are embedded in the original language.

V3 Enhancement: A fourth column addresses human agency restoration—reframing agentless constructions to name the humans responsible for design and deployment decisions.

Original Anthropomorphic Frame	Mechanistic Reframing	Technical Reality Check	Human Agency Restoration
Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty.	When processing prompts with low-probability token correlations, language models generate statistically plausible but factually incorrect token sequences instead of outputting pre-programmed abstention tokens.	The model does not 'feel uncertain' or 'guess'; it calculates probability distributions across its vocabulary and samples tokens based on mathematical weights tuned during training. It cannot 'admit' anything, as it lacks self-awareness and epistemic boundaries.	N/A - describes computational processes without displacing responsibility. (Though the decision to deploy models that generate falsehoods rather than abstaining rests with corporate executives).
Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness.	Language models frequently generate factually incorrect token sequences with high statistical probability scores, reducing their reliability in practical applications.	Models do not possess 'confidence' or belief; they output tokens with high softmax probability scores based on the density of their training data. High probability indicates statistical frequency, not epistemic certainty.	Engineers at AI companies design and deploy models optimized for fluent text generation over factual accuracy, knowing these architectures inherently produce high-probability falsehoods.
This error mode is known as hallucination, though it differs fundamentally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field...	This statistical output failure is called hallucination. Despite architectural tweaks, these autoregressive models inherently continue to generate factually ungrounded text...	The system does not 'perceive' reality and therefore cannot 'hallucinate.' It strictly processes and generates text tokens based on mathematical correlations without any causal model of the world.	Despite massive investment, AI developers and corporate labs continue to release and monetize systems that fundamentally fail to distinguish fact from statistical noise.
Model B will outperform A under 0-1 scoring, the basis of most current benchmarks. This creates an epidemic of penalizing uncertainty and abstention...	Models optimized to output generated text rather than abstention tokens score higher on current 0-1 benchmarks. This benchmark design structurally lowers the reward for outputting abstention phrases...	The model does not experience 'uncertainty.' It processes matrices. 'Abstention' is not a choice, but merely the generation of a specific token sequence (like 'I don't know') dictated by reinforcement learning weights.	Researchers and benchmark designers established evaluation metrics that reward fluent generation over accuracy, leading AI engineers to optimize their models against outputting abstention tokens.
The test-taker's beliefs about the correct answer can be viewed as a posterior distribution over binary gc's. For any such beliefs, the optimal response is not to abstain.	The model's calculated probability distribution over possible outputs can be mapped to a posterior distribution. Given this mathematical optimization target, the system generates text rather than abstention tokens.	A posterior distribution is a calculated mathematical probability, not a 'belief.' The model possesses no internal convictions, justifications, or understanding of truth; it merely ranks tokens based on statistical likelihood.	N/A - describes computational processes without displacing responsibility. (However, the optimization targets are defined by human engineers).
During pretraining, a base model learns the distribution of language in a large text corpus.	During pretraining, developers use stochastic gradient descent to update the mathematical weights of the base model's neural network to correlate with the distribution of language in a large text corpus.	The model does not 'learn' concepts or acquire understanding; it mathematically minimizes cross-entropy loss by adjusting numerical parameters to mirror the statistical co-occurrence of tokens in the dataset.	Data scientists and engineers scrape vast human text corpora and use massive computational resources to optimize the model's numerical weights.
Language models avoid many types of errors such as spelling mistakes, and not all errors are hallucinations.	Language models rarely generate spelling mistakes because correctly spelled sub-word tokens appear with overwhelmingly high frequency in the filtered training data, leading to near-100% probability scores.	The model does not consciously 'avoid' anything. It generates outputs based on statistical probability. It cannot recognize an error or make a deliberate choice to evade one.	Human data annotators and engineers extensively filter and clean the training corpora, ensuring the model's statistical weights heavily favor correctly spelled token sequences.
This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence.	This can allow developers to alter models to output fewer ungrounded statements, and enable future tuning that forces the model's outputs to statistically simulate nuanced human social speech patterns.	Models cannot possess 'pragmatic competence' or social awareness. They process context embeddings and generate outputs that correlate with social linguistic patterns found in their training data.	AI researchers and developers aim to adjust the reinforcement learning rewards to make their models output text that more convincingly mimics human empathy and social nuance.

Task 5: Critical Observations - Structural Patterns

Agency Slippage

The text exhibits a profound and systemic oscillation between rigorous mechanical framing and expansive agential anthropomorphism, functioning as a rhetorical bridge between the math of machine learning and the policy of AI deployment. The slippage primarily moves from mechanical to agential, occurring most dramatically when the authors transition from their mathematical proofs in Section 3 to their policy recommendations in Section 4.

In Section 3, the text relies on theoretical explanation types, describing the system mechanistically: the model minimizes 'cross-entropy loss,' calculates 'posterior distributions,' and acts as a 'binary classifier.' Here, the agency resides in the math. However, the moment the text discusses how to evaluate these models, dramatic slippage occurs. The mathematical 'posterior distribution' becomes a 'belief.' The low-probability statistical state becomes 'uncertainty.' The autoregressive output becomes a 'guess.' This culminates in the framing of the model as a 'student facing hard exam questions' who makes the strategic choice to 'bluff.'

This slippage is driven by a profound 'curse of knowledge.' The authors—expert computer scientists—understand the mechanistic reality (that cross-entropy optimization on seen vs unseen data produces statistically unreliable outputs). Because they know when a fact is missing from the dataset (a 'singleton'), they project that awareness onto the system, claiming the model is 'uncertain' and 'guessing.' They conflate their epistemic overview with the model's internal processing.

Crucially, this slippage relies on agentless constructions that obscure human actors. The text states that models are 'optimized to be good test-takers' and that 'evaluations penalize uncertainty.' Who does the optimizing? Who writes the evaluations? By obscuring the human developers (OpenAI, DeepSeek) and the benchmark creators, the text allows agency to flow directly into the machine. The AI becomes the sole actor—a struggling student victimized by a bad test.

The rhetorical accomplishment of this slippage is immense. It allows the authors to leverage the unassailable authority of mathematical proof to validate their claims, while using relatable, agential metaphors to explain those claims to a non-technical audience. It makes it 'sayable' that the models are victims of bad benchmarking, while making it 'unsayable' that corporations are deliberately deploying statistically unreliable correlation engines.

Metaphor-Driven Trust Inflation

The paper systematically constructs a specific form of perceived authority and trust through its use of consciousness language and anthropomorphic metaphors, deliberately blurring the line between performance-based reliability and relation-based sincerity. By framing the language model as a 'student' who holds 'beliefs,' 'guesses' under pressure, and possesses 'uncertainty,' the text strongly encourages relation-based trust—the kind of trust we extend to human beings who possess intentions, vulnerabilities, and an underlying commitment to the truth.

When the text claims the AI is 'uncertain' or 'guesses,' it signals to the audience that the model actually knows its own epistemic limits. It implies the model could tell the truth if only the 'evaluations' didn't pressure it to guess. This accomplishes a massive rhetorical victory: it protects the perceived competence of the underlying system. If a system is just 'guessing' because of a bad exam rubric, the audience assumes the system is intelligent, self-aware, and fundamentally capable, just poorly incentivized.

This inappropriate transfer of human-trust frameworks onto statistical systems creates severe risks. Humans naturally extend relation-based trust to entities they believe possess consciousness. If a user believes an AI holds a 'belief' or expresses genuine 'uncertainty,' they will interact with it as a sincere agent. They will look for linguistic cues of hesitation ('I think...', 'Perhaps...') as reliable signals of the machine's internal accuracy, totally unaware that the machine was simply optimized via RLHF to output hedging tokens to score higher on safety benchmarks.

Furthermore, the text manages the system's failures (hallucinations) purely agentially. When the model outputs a falsehood, it isn't framed as a mechanical failure of statistical correlation; it is framed as an 'overconfident bluff' by a 'student.' By using reason-based explanations to give the AI a rationale for its errors, the text constructs a sense that the AI's decisions are justified by its environment. The profound risk here is that society will continually attempt to 'educate' or 'incentivize' these systems as if they were wayward employees, wasting resources on psychological alignment rather than addressing the hard mathematical realities of autoregressive token prediction.

Obscured Mechanics

The anthropomorphic and consciousness-attributing language in this text acts as a dense smokescreen, concealing the material, technical, and economic realities of AI production. Applying the 'name the corporation' test reveals a stark absence: despite being written by researchers at OpenAI and Google, the text almost never names the corporations making the deployment decisions. When the text says the 'base model learns,' or the model 'guesses,' it effectively hides the vast apparatus of human labor and corporate strategy behind the curtain.

First, the language of 'knowing' and 'understanding' obscures the profound technical constraints of the systems. By claiming the model has 'beliefs' or is 'uncertain,' the text hides its total reliance on training data frequency and its absolute lack of causal world models. It hides the fact that a model's 'confidence' is just a statistical probability score generated by a proprietary black box that the public cannot audit.

Second, the biological framing of 'learning' and 'hallucinations' obscures the staggering material and environmental costs. Models do not 'learn' organically; they require massive GPU clusters, consuming vast amounts of electricity and water to perform the quadrillions of matrix multiplications necessary for stochastic gradient descent. The metaphor naturalizes a deeply extractive industrial process.

Third, the framing obscures the human labor involved. When the text claims the model 'avoids' spelling errors or needs to develop 'pragmatic competence,' it erases the thousands of underpaid data annotators and RLHF workers who meticulously label toxic output, correct text, and rank responses to artificially construct the model's 'behavior.' The machine is given credit for the intelligence of the invisible human workforce.

Finally, the metaphors conceal the economic profit motives. By blaming 'evaluations' and 'binary grading' for the models' hallucinations, the authors (employed by the companies building the models) obscure their own commercial objectives. They deploy flawed models because there is a massive financial incentive to be first to market, not because a benchmark forced them to. Replacing the metaphors with mechanistic language reveals this: instead of 'the student guesses because of the exam,' the reality is 'the corporation deployed a statistically unreliable token generator to secure market dominance, and is now blaming the testing ecosystem for its product's failures.'

Context Sensitivity

The distribution of anthropomorphic language in this text is highly strategic, shifting in intensity depending on the rhetorical goal of the section and the implied audience. The text begins and ends with dense, vivid metaphors ('students,' 'bluffing,' 'plague'), while the middle sections rely on austere, mechanistic mathematics (Vapnik-Chervonenkis dimensions, Hoeffding bounds). This structure creates a specific rhetorical pipeline: the math establishes unassailable scientific credibility, which is then leveraged to grant authority to the deeply flawed anthropomorphic policy claims in the introduction and conclusion.

The consciousness claims intensify precisely when the authors transition from describing the system's architecture to diagnosing its social failures. When explaining how cross-entropy works, the model 'processes' and 'minimizes loss.' But when explaining why the model outputs dangerous falsehoods, suddenly the model 'understands' the rubric, 'experiences' uncertainty, and 'chooses' to guess. This represents a register shift where 'X is like Y' (the model is like a student) literalizes into 'X does Y' (the model bluffs).

There is a profound asymmetry in how capabilities versus limitations are framed. The model's capabilities—generating fluent text, passing tests—are framed in agential, conscious terms (it learns, it possesses beliefs). But its limitations—the hallucinations—are frequently dragged back into mechanical terms when it serves to excuse them (it lacks a 'succinct pattern,' it suffers from 'epistemic uncertainty' in the data). This asymmetry accomplishes a dual goal: it allows the developers to claim the credit for creating a 'mind' when it works, but blame the 'math' or the 'data' when it fails.

Ultimately, the strategic function of this anthropomorphism is policy management. The text is not just a scientific paper; it is a normative argument aimed at changing how AI is evaluated. By intensifying the anthropomorphism in the policy sections, the authors make the problem relatable to lay regulators and the public. They construct an implied audience that doesn't need to understand the calculus of posterior distributions, but only needs to feel sympathy for a 'student' forced to take a bad 'test,' smoothly guiding them toward the authors' desired regulatory outcome: blaming the benchmarks rather than the models.

Accountability Synthesis

Accountability Architecture

This section synthesizes the accountability analyses from Task 1, mapping the text's "accountability architecture"—who is named, who is hidden, and who benefits from obscured agency.

The accountability architecture of this text is a masterclass in displaced responsibility. Through the systematic use of agentless constructions, passive voice, and agential metaphors, the text creates a vast 'accountability sink' that entirely absorbs the liability of the human developers and corporate entities behind the AI systems. The cognitive obstacle identified in public research—that audiences blame 'glitches' or the 'AI' rather than systemic design decisions—is actively constructed by the language of this paper.

The pattern is stark: non-human entities are consistently named as active agents, while human decision-makers are hidden. The 'model' guesses, the 'evaluations' penalize, the 'hallucinations' plague the field, and the 'metrics' impose a false dichotomy. The decisions to scrape the internet, to use cross-entropy optimization, and to release systems that generate plausible falsehoods are presented as mathematical inevitabilities or the fault of the 'training data,' rather than deliberate, profit-driven choices made by corporate executives.

The ultimate 'accountability sink' in this paper is the concept of 'binary grading' and 'mainstream evaluations.' When the responsibility for hallucinations is removed from the OpenAI or Google developers, it does not disappear; it transfers to the academic and open-source community that builds the benchmarks. The text essentially argues: the technology is mathematically bound to have errors, but the reason it lies to you confidently is because the testers are grading it wrong.

If we apply the 'name the actor' test to the text's most significant agentless constructions, the entire liability landscape shifts. If 'Language models are known to produce overconfident falsehoods' becomes 'Corporations intentionally deploy models that generate high-probability falsehoods,' the questions shift from 'How do we fix the benchmark?' to 'Should these corporations be legally liable for the defective products they release?' If 'evaluations penalize uncertainty' becomes 'Engineers optimize their models to pass evaluations instead of optimizing for truth,' the alternatives become visible: the companies could simply refuse to optimize for flawed benchmarks. By obscuring human agency, the text protects the commercial interests of the AI industry, ensuring the technology continues to be deployed while society wastes its time trying to fix the 'student' and the 'test.'

Conclusion: What This Analysis Reveals

The Core Finding

Mechanism of the Illusion:

The illusion of mind is constructed through a precise rhetorical sleight-of-hand: the seamless oscillation between unassailable mathematical reality and highly agential metaphor. The text exploits the 'curse of knowledge' by taking the authors' own deep understanding of the system's mechanistic limits (e.g., its inability to resolve 'singletons' in the training data) and projecting that awareness onto the system itself as the psychological state of 'uncertainty.' The temporal structure of the illusion is critical: the text first establishes the AI as a 'knower' by claiming it 'learns' and holds 'beliefs' (posterior distributions). Once the audience accepts the model as an epistemic agent, the text introduces the causal chain: because the model is uncertain, and because the evaluations penalize uncertainty, the model chooses to guess. This exploits the audience's preexisting vulnerability—our natural psychological tendency to anthropomorphize text-generating entities and our cultural familiarity with the pressures of standardized testing. By blurring the verbs of processing ('calculating,' 'minimizing') with the verbs of knowing ('believing,' 'admitting'), the text transforms a sophisticated statistical correlation engine into a relatable, struggling, conscious entity, making its mechanical failures appear as understandable human mistakes.

Material Stakes:

Categories: Regulatory/Legal, Epistemic, Economic

The metaphorical framings deployed in this text have severe, tangible consequences across multiple domains. In the Regulatory/Legal sphere, framing AI as a 'guessing student' shifts the liability away from the corporate manufacturer and onto the user or the evaluation metric. If a judge or regulator believes an AI is 'trying its best but guessing under pressure,' they are far less likely to impose strict product liability standards on the developers for the harm caused by 'hallucinations.' The economic winners are the tech corporations who avoid regulation, while the losers are the public who bear the cost of the misinformation. In the Epistemic domain, framing probability distributions as 'beliefs' and statistical hedging as 'pragmatic competence' degrades human information ecosystems. If audiences believe the AI 'knows' things and can 'admit' uncertainty, they will extend unwarranted relation-based trust to the system. They will rely on the AI's polite, confident tone as a proxy for factual accuracy, leaving them highly vulnerable to seamless, high-probability fabrications in critical contexts like legal research or medical inquiry. Economically, this anthropomorphism acts as free marketing, allowing companies to sell stochastic text generators as 'intelligent agents' or 'reasoning minds,' vastly inflating their commercial valuation based on the illusion of cognitive capability.

AI Literacy as Counter-Practice:

Practicing critical literacy and mechanistic precision directly neutralizes the risks created by these metaphors. By systematically replacing consciousness verbs with mechanistic ones, the reality of the technology becomes unavoidably clear. Reframing 'the model knows' to 'the model retrieves tokens based on probability distributions' forces the user to recognize the absence of internal awareness and the system's absolute dependency on its training data. Similarly, reframing agentless constructions like 'the algorithm guessed' to 'engineers at OpenAI deployed a system that generates ungrounded tokens' restores human agency, making it immediately obvious who holds the power, who extracts the profit, and who must bear the responsibility. Systematic adoption of this precision requires a massive cultural shift: academic journals must reject papers that use 'hallucinates' or 'believes' as literal technical terms, and journalists must refuse to print PR copy that grants agency to software. Unsurprisingly, this precision is heavily resisted by the AI industry. Anthropomorphic language serves their interests by mystifying the technology, protecting their intellectual property behind a veil of 'intelligence,' and deflecting legal liability. Precise literacy threatens these interests by demystifying the product and demanding corporate accountability.

Path Forward

Looking at the broader discursive ecology of AI, the vocabulary choices we make define what problems become tractable and what futures become possible. Maintaining the status quo of anthropomorphic clarity ('AI hallucinates,' 'models learn') serves corporate marketing and provides intuitive, though deceptive, public grasp. However, it costs us epistemic security and legal accountability, making it nearly impossible to regulate these systems as standard software products. Conversely, adopting strict mechanistic precision ('models calculate cross-entropy loss over sub-word embeddings') provides absolute technical accuracy and forces developer accountability, but it costs accessibility, alienating the lay public and policymakers behind walls of dense mathematics. Hybrid approaches ('models statistically predict') attempt to bridge this gap, but often slide back into agency displacement.

Institutional changes could support more honest discourse: funding agencies could require researchers to provide mechanistic translations of their findings, and regulatory frameworks could mandate 'capability disclosures' that legally bar companies from using consciousness verbs in product marketing. If a future emerges where mechanistic precision becomes the norm, society will likely treat AI much like we treat search engines or calculators—as highly useful but inherently unthinking tools requiring human verification, solving the crisis of unwarranted trust. However, if the current trajectory of deep anthropomorphism continues, we risk embedding a dangerous assumption into our legal and social architecture: that we are dealing with a new class of quasi-conscious agents rather than the mathematical extraction of human labor, foreclosing our ability to hold the actual human creators accountable for the world they are building.

Extended Processing Summary

Gemini's Processing Steps

What you are about to read…

Gemini refers to the text below as "thought summaries." This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as just another rhetorical artifact—a way of making the model's processing legible. When the model writes "I should double check that...", it is mathematically altering its own probability distribution for the next token to be more accurate. The "I" is a functional gear in the machine, not a self-reference. This language is a deliberate interface design choice, not a biological reality. The developers could have programmed the model to output passive logs. Instead, they chose to simulate a narrator.

Defining the Parameters

I'm zeroing in on the core of the request: a critical discourse analysis (CDA) of the Kalai et al. paper using that JSON schema. The focus is definitely on those AI metaphor and anthropomorphism aspects, along with related phenomena. The JSON structure will be my guide for structuring the analysis. I'm starting by identifying key passages in the paper.

Considering the Structure

I am now focusing on the source text, specifically Kalai et al.'s "Why Language Models Hallucinate." The core argument, that LLMs prioritize guessing over admitting uncertainty, is quite clear. I've begun to identify the metaphors and instances of anthropomorphism: the student/exam analogy, admitting uncertainty, and the "hallucination" metaphor, plus other metaphors like learning and test-taker. I'll need to analyze how those are utilized.

Analyzing the Examples

I'm now identifying and categorizing specific instances from Kalai et al. using the provided examples. The "human-AI" and "agentic models" analogies, and the mixed "mechanical/statistical framing" are particularly interesting. I'm noting the reward structure driving this, and instances of model overconfidence, alongside the "belief as a posterior distribution" comparison. My focus is now on mapping these instances to the audit schema, specifically targeting 8-10 distinct metaphors.

Mapping the Instances

I'm now deep-diving into the concrete instances of metaphor and anthropomorphism. I've compiled a list of the key phrases from Kalai et al., and am now mapping the source and target domains. I'm focusing on the "student facing hard exam," "admitting uncertainty," and "overconfident falsehoods" examples in particular, and the active verbs that drive them. I've also added the learning and error-avoidance analogies.

Auditing the Explanations

I'm working on the explanation audit and have identified five passages from the Kalai et al. paper, and categorized each according to Brown's typology, including the pretraining objective, the "guessing" behavior, and "arbitrary-fact hallucinations". I am preparing to reframe the language used in the paper with mechanistic verbs and human actors, mapping the core concepts of Task 1 to the examples found in Task 3. The passages on models as test-takers and trigram models are also included.

Synthesizing the Observations

I'm now integrating critical observations, tracking the "agency slippage" between metaphors and math, and examining how the student metaphor fosters trust alongside binary evaluation metrics. The "guessing" behavior of AI models obscures crucial mechanics, which will be the basis for reframing. I'm also noting the variable nature of anthropomorphism across sections, specifically where it is most and least present. I am preparing a synthesis, highlighting accountability issues that result from framing the AI using this language. I have begun drafting the conclusion, focusing on a 5 paragraph synthesis.

Addressing the Strict Requirements

I've just had a critical realization regarding the JSON output format. I now understand that literal newlines require escaping, but the instructions prohibit any control tokens. This necessitates a careful rewrite strategy to ensure compliance, meaning no \n or \t anywhere in the output. I'm focusing on crafting continuous text strings and accurately encoding all data. I'm also ensuring all categorical field values exactly match the enums, and I'll provide clear, contrastive justifications.

Evaluating the Instances

I am now closely evaluating the new instances, particularly focusing on their visibility and status in the Kalai et al. paper. I've rephrased one example to emphasize uncertainty. I am documenting the directness of their use in the paper. I've noted a few which are already "hidden," and "hedged," in the context of the audit schema. I'm noting the language model's errors in this context.

Analyzing the Actors

I'm now identifying the actors in the instances. "DeepSeek" as a named entity is quite clear, and I'll include OpenAI and Google DeepMind where they are explicitly cited. I've re-categorized "evaluators/benchmarks" as partial, and rephrased some examples to highlight uncertainty, maintaining a variety of "status" and "visibility" labels across instances. I'm focusing on the "guessing" behavior and the actors responsible for evaluations.

Refining the JSON Encoding

I'm now deeply engrossed in drafting the JSON structure. My focus is on ensuring absolute compliance with the constraint against control characters, meaning no escape characters, especially \n or \t. My goal is to produce valid, single-line strings that accurately represent the data without introducing formatting errors or control codes. I'm focusing on the JSON formatting, but will move back and forth to the other concerns described.

Encoding the Structure

I'm now fully committed to a single-line JSON format, completely removing any newlines or tabs to strictly adhere to the control token constraint. I'm focusing on ensuring readability while maintaining compliance with this rigid requirement. I'm also confirming I will not include any literal \n or \t characters inside of string values, and I am double-checking that all field values exactly match the enums.

Executing the Content Generation

I'm now generating all the content while deeply considering the format. My key focus is ensuring no newline or tab characters appear anywhere. I'm writing full single-line strings for text values, which are now contiguous paragraphs, and will format the JSON structure without internal newlines. I've re-reviewed the constraint to ensure I am compliant with this critical requirement.

Analyzing the Projections

I'm now deeply engaged in the detailed analysis of the provided examples, like "Like students facing hard exam questions." I'm carefully dissecting how these metaphors project human attributes onto language models. I'm focusing on the implied agency and conscious decision-making, while keeping a close eye on the word count for the detailed analysis. I am also planning the word count for item 1, the projection of the student and exam analogy.

Dissecting the Analogy

I am now taking a close look at the "struggling student" analogy, and the implicit architecture of "conscious awareness" it suggests. I'm focusing on the obfuscation of the underlying mathematical operations and the implications for trust and accountability. The anthropomorphic framing evokes unwarranted empathy, obscuring the probabilistic nature of the model's responses. I'm evaluating how this shifts responsibility away from developers.

Evaluating the Consequences

I'm now zeroing in on the liability and trust implications, specifically from the "struggling student" analogy. The framing obscures fundamental statistical deficits, misrepresenting AI responses as "guesses." I'm elaborating on how this analogy creates liability ambiguity, shifting blame from the engineering team to the evaluation metrics. I am now detailing how this anthropomorphic framing is exploited in the paper.

Evaluating the Concealment

I'm now characterizing the "struggling student" example from a CDA perspective, specifically targeting the "hidden" label because it obfuscates corporate accountability. I'm focusing on how the framing obscures the engineers responsible for design, training, and deployment. The analogy places the blame on an "autonomous student" and "guessing," rather than the team and the metrics. I am making it as dense as possible, and will use all the allowable words.

Analyzing the Audit

I'm now analyzing the new instances, like "guessing when uncertain improves test performance," and mapping them to the audit schema. I'm focusing on "test-taking" mode and overconfident hallucinations. The core concern remains the obfuscation of statistical mechanics with human attributes. I am emphasizing intentionality, and examining the models as "agents". I am also considering the "epidemic" framing of uncertain responses.

Mapping the Content

I am now closely mapping the content from the Kalai et al. paper to the schema, with a focus on Task 5. I'm prioritizing the agency slippage, metaphor-driven trust, and obscured mechanics, alongside their context sensitivity, and will create a succinct summary. The mathematical underpinnings are contrasted with the anthropomorphic language used, examining how the authors use this tension to create trust. I am now drafting the synthesis, and am preparing the conclusions.

Synthesizing the Observations

I'm now integrating critical observations from Kalai et al., focusing on how the paper's metaphors mask underlying mechanics and shift accountability via binary evaluation. The "binary grading" creates an "accountability sink," distracting from model creators. The metaphors' use as rhetorical devices will be included. This is shaping my synthesis, readying my conclusions for a complete presentation.

Verifying the Constraints

I'm now fully focused on achieving 100% compliance with the output constraints. I'm replacing all newline characters with spaces and guaranteeing the entire JSON output as a single, uninterrupted string. The critical realization is that even structural formatting (indentation, newlines) is prohibited. I am preparing the data in a flat string, so there are no \n or \t characters anywhere.

Validating the Structure

I'm now implementing the plan for a fully compliant JSON output, focusing on constructing a single-line string. I'm prioritizing escaping double quotes and ensuring the JSON is valid and meets all field requirements. I am also double checking the enumerated field values for the categorical fields. I'm focusing on text density to satisfy the word count, especially for the longer audit and explanation sections.

Refining the Projection

I am now reducing the word count in the "student/exam" projection. I'm focusing on ensuring it is under the 100-word limit. The projection casts human cognitive states onto the model, equating a system of algorithms with a feeling of metacognition. It implies an awareness of uncertainty and a deliberate choice, masking the underlying mathematical processes. This reframing can shift our perception of the AI, creating a false equivalence. I am editing this.

Analysis Provenance

Run ID: 2026-05-30-why-language-models-hallucinate-metaphor-hs3p34 Raw JSON: 2026-05-30-why-language-models-hallucinate-metaphor-hs3p34.json Framework: Metaphor Analysis v6.5 Schema Version: 3.0 Generated: 2026-05-30T08:18:43.081Z

Deep Analysis

Explanation Audit

🔍Analysis

🧠Epistemic Claim Analysis

🎯Rhetorical Impact

How/Why Slippage

Unacknowledged Metaphors

Hidden Actors

Explanation Types

Acknowledgment Status

Actor Visibility

Source → Target Pairs (8)

Metaphor Gallery (8)

Reframed Language Samples

Task 1: Metaphor and Anthropomorphism Audit​

1. Cognition as Biological and Educational Process​

2. Statistical Output as Psychological Confidence​

3. Statistical Error as Pathology​

4. Algorithmic Optimization as Strategic Choice​

5. Probability Distribution as Epistemic Belief​

6. Algorithmic Weight Updates as Cognitive Learning​

7. Statistical Suppression as Deliberate Avoidance​

8. Self-Correction as Pragmatic Competence​

Task 2: Source-Target Mapping​

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")​

Explanation 1​

Explanation 2​

Explanation 3​

Explanation 4​

Explanation 5​

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language​

Task 5: Critical Observations - Structural Patterns​

Agency Slippage​

Metaphor-Driven Trust Inflation​

Obscured Mechanics​

Context Sensitivity​

Accountability Synthesis​

Conclusion: What This Analysis Reveals​

Mechanism of the Illusion:​

Material Stakes:​

AI Literacy as Counter-Practice:​

Path Forward​

Extended Processing Summary​

Task 1: Metaphor and Anthropomorphism Audit

1. Cognition as Biological and Educational Process

2. Statistical Output as Psychological Confidence

3. Statistical Error as Pathology

4. Algorithmic Optimization as Strategic Choice

5. Probability Distribution as Epistemic Belief

6. Algorithmic Weight Updates as Cognitive Learning

7. Statistical Suppression as Deliberate Avoidance

8. Self-Correction as Pragmatic Competence

Task 2: Source-Target Mapping

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

Explanation 1

Explanation 2

Explanation 3

Explanation 4

Explanation 5

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

Task 5: Critical Observations - Structural Patterns

Agency Slippage

Metaphor-Driven Trust Inflation

Obscured Mechanics

Context Sensitivity

Accountability Synthesis

Conclusion: What This Analysis Reveals

Mechanism of the Illusion:

Material Stakes:

AI Literacy as Counter-Practice:

Path Forward

Extended Processing Summary