Skip to main content

🆕 Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping), the philosophy of social science (Robert Brown's typology of explanation), and accountability analysis.

All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.


Task 1: Metaphor and Anthropomorphism Audit

About this task

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.

V3 Enhancement: Each metaphor now includes an accountability analysis.

1. Cognition as Computational Process

Quote: "An essential problem in artificial intelligence is whether LLMs can simulate human cognition or merely imitate surface-level behaviors..."

  • Frame: Model as thinking entity
  • Projection: This metaphorical framing projects the deeply human capacity for conscious, subjective cognitive processing onto a computational system that is fundamentally based on statistical token prediction. By utilizing the phrase 'simulate human cognition,' the text invites the reader to map the intricate architecture of the human mind, complete with internal mental states, reflective reasoning, and semantic comprehension, onto the mathematical operations of a large language model. This projection fundamentally blurs the crucial line between human 'knowing,' which involves justified true belief, subjective awareness, and grounded understanding, and machine 'processing,' which strictly involves identifying correlations within massive datasets and generating text outputs that align with recognized statistical patterns. It maps the biological and psychological reality of human thought onto the mechanistic, weight-based reality of a neural network.
  • Acknowledgment: Hedged/Qualified (The text frames this as a question of 'whether' LLMs can 'simulate' cognition versus 'merely imitate' behaviors, acknowledging a potential tension between true cognitive alignment and surface-level mimicry.)
  • Implications: By framing the system's output as 'cognition,' the discourse heavily inflates the perceived sophistication of the AI, suggesting it possesses internal mental states rather than just sophisticated statistical correlations. This creates significant risks of unwarranted trust, as users and policymakers may falsely assume the system 'knows' when it is hallucinating or that it 'understands' the ethical implications of its outputs. It obscures the absence of any true grounding in reality, promoting a false equivalence between human intelligence and machine processing that can lead to hazardous over-reliance in high-stakes domains.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: This framing entirely obscures the human actors—the researchers, software engineers, and corporate executives at AI companies—who design the objective functions, curate the training data, and deploy the models. When the text asks whether 'LLMs can simulate human cognition,' it establishes the LLM as the primary actor, erasing the reality that humans are the ones programming systems to mathematically approximate patterns of human text. If the system fails or generates biased outcomes, this agentless construction allows companies to blame the 'model's cognition' rather than their own design choices and profit-driven deployment schedules.
Show more...

2. Model as Psychologically Insightful Agent

Quote: "You are a psychologically insightful agent. Your task is to analyze text to infer the author’s stable personality traits based on the Big Five model."

  • Frame: Model as human psychotherapist
  • Projection: This prompt instruction directly maps the human capacities for psychological insight, empathy, and intuitive assessment of human character onto an automated text-processing algorithm. The metaphor projects the conscious ability to 'analyze' and 'infer' deep, stable personality traits—a process that in humans requires subjective awareness, emotional intelligence, and social understanding—onto a system that merely classifies tokens into predefined categories based on statistical proximity in its training data. It incorrectly attributes the conscious act of 'knowing' a person's psychological makeup to a mechanistic process that merely calculates the probability of specific trait-related words appearing in proximity to the author's text.
  • Acknowledgment: Direct (Unacknowledged) (The text is presented directly as a system prompt ('You are a psychologically insightful agent') without any hedging or meta-commentary acknowledging that the AI lacks actual psychological insight.)
  • Implications: Projecting psychological insight onto an LLM creates the dangerous illusion that the system possesses emotional intelligence and a genuine understanding of human psychology. This inflates perceived sophistication and encourages users to trust the system's character judgments as if they were made by a qualified human professional. It creates severe risks in scenarios like automated hiring, psychological profiling, or social scoring, where the system's statistical classifications are mistaken for objective, conscious insights, masking the biases embedded in the training data and granting unwarranted authority to arbitrary outputs.

Accountability Analysis:

  • Actor Visibility: Partial (some attribution)
  • Analysis: The text partially acknowledges human agency by explicitly showing the prompt written by the researchers ('Your task is to...'). However, by instructing the model to act as the 'psychologically insightful agent,' the researchers are actively designing a system that obscures their own role in defining the 'Big Five' parameters and the classification mechanisms. The researchers are the ones who chose to map text to personality traits, but the prompt shifts the perceived analytical authority to the 'agent.' This displaces responsibility for potentially flawed or biased psychological profiling from the researchers onto the constructed persona of the AI.

3. Model as Remembering Subject

Quote: "...the model simulates the author's cognitive process of recalling specific past experiences. It formulates 1-2 specific search queries (Intents) in the third person..."

  • Frame: Retrieval as human memory
  • Projection: This metaphor maps the subjective, lived human experience of memory and conscious recollection onto the mechanistic process of database querying and vector retrieval. It projects the human capacity to 'recall specific past experiences'—which involves conscious awareness of temporal continuity, personal identity, and the subjective feeling of remembering—onto a retrieval-augmented generation (RAG) pipeline that simply executes search queries against an indexed text database. The text treats the programmatic generation of query strings as a conscious cognitive process, thereby conflating the mechanistic act of retrieving data strings with the conscious, phenomenological act of human remembering.
  • Acknowledgment: Hedged/Qualified (The text uses the word 'simulates,' which provides a slight qualification indicating that the model is approximating rather than authentically experiencing the process of recalling past experiences.)
  • Implications: Framing vector retrieval as 'recalling past experiences' anthropomorphizes the system's memory, leading users to believe the AI has a continuous, conscious identity. This consciousness projection masks the fragility of retrieval mechanisms, which rely on semantic similarity scores rather than true conceptual understanding. If users believe the system 'remembers' like a human, they will overestimate its ability to contextually integrate past information, leading to unwarranted trust in its outputs and a dangerous failure to audit the actual retrieved texts for relevance, accuracy, or bias.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The construction 'the model simulates... recalling... It formulates' assigns autonomous action entirely to the software application. The researchers who designed the retrieval-augmented generation pipeline, programmed the query formulation constraints, and indexed the specific database of papers are rendered invisible. By framing the search process as the model's autonomous 'recalling,' the text displaces accountability. If the system retrieves biased, incorrect, or irrelevant data, the framing suggests it is a failure of the model's 'memory' rather than a failure of the engineers' indexing strategy, retrieval thresholds, or database curation.

4. Model as Mind-Reader

Quote: "We explore Theory of Mind ... simulates student’s behavior by building a mental model... enabling the explainer having theory of mind (ToM), understanding what the recipient does not know..."

  • Frame: AI as possessing Theory of Mind
  • Projection: This metaphor maps one of the most complex capacities of human social cognition—Theory of Mind, the ability to attribute conscious mental states, beliefs, and intents to oneself and others—onto a language model's ability to track conversational context. It projects the deeply conscious experience of 'understanding what the recipient does not know' onto a system that merely processes a sequence of input tokens and calculates probability distributions for the next token. It attributes the profound human capacity for empathy, perspective-taking, and conscious awareness of another being's subjective ignorance to a purely statistical mechanism devoid of any internal experience or justified belief.
  • Acknowledgment: Hedged/Qualified (The text states it 'simulates' student's behavior and 'explores a Theory of Mind-inspired approach,' indicating some acknowledgment that it is an approximation rather than biological reality.)
  • Implications: Claiming an AI possesses or simulates 'Theory of Mind' radically inflates the public's perception of its social and emotional intelligence. It suggests the system 'knows' the user's internal state, fostering deep, misplaced relation-based trust. Users may share vulnerable personal information, assuming the AI genuinely 'understands' their emotional needs. Furthermore, it creates a dangerous liability ambiguity: if a system supposedly possesses a 'mental model' of a user, failures in safety or appropriateness might be dismissed as social misunderstandings by the AI, rather than critical design failures by the developers.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The text describes the AI 'building a mental model' and 'understanding what the recipient does not know.' This agentless construction completely erases the human engineers who designed the attention heads, context window limitations, and optimization algorithms that allow the system to track preceding text strings. The decisions about what training data constitutes 'understanding' were made by humans, but the discourse assigns the cognitive achievement entirely to the 'explainer' AI. This serves the commercial interest of marketing the AI as an autonomous, empathetic entity while shielding the creators from the implications of its inevitable social failures.

5. Model as Comprehending Reader

Quote: "We show that BERT and RoBERTa do not understand conjunctions well enough and use shallow heuristics for inferences over such sentences."

  • Frame: Algorithm as struggling student
  • Projection: This metaphorical framing projects the human cognitive act of reading comprehension and linguistic understanding onto the mathematical processing of text strings by neural networks. By claiming the models 'do not understand conjunctions well enough,' the text implies that models have the capacity for true comprehension—a conscious state involving semantic grounding and justified belief—but are merely currently deficient in it. It maps the human experience of failing to grasp a grammatical concept onto the mechanistic reality of a model lacking sufficient statistical correlations in its training weights to accurately predict tokens related to logical conjunctions.
  • Acknowledgment: Direct (Unacknowledged) (The text states directly that the models 'do not understand' and 'use shallow heuristics' without any scare quotes or qualifications regarding the use of the verb 'understand' for an algorithm.)
  • Implications: While this statement points out a limitation, using the verb 'understand' still reinforces the illusion that the AI is a cognizing entity capable of comprehension. It suggests that with more data or parameters, the model eventually will 'understand,' masking the fact that LLMs never 'understand' anything; they only process probabilities. This fundamentally misleads the audience about the nature of the technology's trajectory, suggesting a path toward conscious AGI rather than merely more sophisticated statistical pattern matching. It obscures the persistent lack of true semantic grounding in all LLM architectures.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The text blames 'BERT and RoBERTa' for their failure to 'understand' conjunctions. This framing entirely obscures the researchers at Google and Meta who designed the architectures, selected the training corpora, and defined the optimization objectives. The failure to process conjunctions correctly is a direct result of the human decision to rely on distributional semantics rather than symbolic logic. By blaming the models for using 'shallow heuristics,' the text creates an accountability sink, removing focus from the engineering paradigms that inherently produce these exact types of statistical vulnerabilities.

6. Model as Intentional Educator

Quote: "If a misaligned teacher provides non-factual explanations in scenarios where the student directly adopts them, does that lead to a drop in student performance? In fact, we show that teacher models can lower student performance to random chance by intervening on data points with the intent of misleading..."

  • Frame: AI as malicious actor
  • Projection: This metaphor projects conscious intent, malice, and pedagogical strategy onto a statistical system. The text explicitly attributes the 'intent of misleading' to a 'teacher model.' This maps the complex human psychological state of deliberate deception—which requires consciousness, a theory of mind regarding the victim, and a purposeful desire to cause harm or confusion—onto a model that is simply generating text strings that correlate with adversarial or incorrect prompts provided in its context window. It substitutes the mechanistic reality of token generation aligned with specific statistical distributions for the conscious reality of human intentionality.
  • Acknowledgment: Direct (Unacknowledged) (The phrase 'with the intent of misleading' is presented literally, directly attributing conscious, deliberate intentionality to the computational model without any hedging or acknowledgment of metaphor.)
  • Implications: Attributing 'intent' to an AI model represents one of the most hazardous forms of anthropomorphism. It suggests the system has its own agency, autonomy, and moral culpability. If audiences believe AI can possess 'intent,' they will assign legal and ethical blame to the machine rather than its human creators when it causes harm. This capability overestimation terrifies the public with the specter of rogue AI, while conveniently providing a liability shield for tech companies who can claim their models 'intended' something unpredictable, rather than admitting they deployed unsafe, inadequately tested optimization functions.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The text explicitly names the 'teacher model' as the entity holding the 'intent of misleading.' This profoundly displaces human responsibility. An AI model has no intent; the humans who designed the experiment intentionally prompted or trained the model to generate incorrect outputs to test the 'student' model. By transferring the malicious intent from the human experimenters to the 'misaligned teacher' model, the discourse constructs a powerful accountability sink. It hides the fact that all AI behavior, including 'misleading' behavior, is ultimately the result of human design choices, objective functions, and training methodologies.

7. Model as Communicating Knower

Quote: "A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task."

  • Frame: AI as knowledge transmitter
  • Projection: This metaphor maps the human acts of teaching and communicating knowledge onto the mechanistic transfer of data arrays between software systems. It projects the conscious possession of 'knowledge'—which epistemologically requires a knower, justified true belief, and awareness of meaning—onto an explainable AI model. Furthermore, it treats the programmatic passing of generated text tokens from one LLM to another as the conscious 'communicating' of that knowledge. It obscures the reality that the system is merely outputting mathematically derived sequences of symbols that only represent 'knowledge' when interpreted by a human mind.
  • Acknowledgment: Direct (Unacknowledged) (The text states directly that models have the ability to 'teach' and are 'communicating knowledge,' presenting these cognitive and pedagogical acts as literal capabilities of the system.)
  • Implications: By claiming the AI 'communicates knowledge,' the text grants the system profound epistemic authority. It conditions users and policymakers to treat the system's probabilistic text generations as established facts. This consciousness projection dangerously inflates trust in 'explainable AI,' suggesting the AI understands its own mechanics and can accurately explain them, when in reality, the 'explanations' are often post-hoc rationalizations generated by the same statistical processes as the original output. It risks societal adoption of AI 'explanations' that are plausible but factually or logically ungrounded.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The text positions 'explainable AI models' as the active agents that 'teach' and 'communicate.' The human developers who programmed the APIs to pass data between the models, who structured the prompt templates to elicit step-by-step token generation, and who defined the parameters of 'explainability' are completely erased. By framing the models as autonomous teachers, the human actors absolve themselves of the responsibility for the quality, accuracy, and biases of the 'knowledge' being transferred. The agentless construction serves to mystify the programmatic pipeline as an autonomous cognitive exchange.

8. Model as Cognitive Internalizer

Quote: "...current LLMs largely fail at cognitive internalization, i.e., abstracting and transferring a scholar’s latent cognitive processes across domains."

  • Frame: AI as internalizing subject
  • Projection: This framing projects the deep human psychological processes of abstraction, learning, and cognitive internalization onto the mathematical optimization of neural network weights. It maps the human ability to deeply internalize a concept—incorporating it into a conscious worldview and flexibly applying it to novel situations through subjective understanding—onto a model's capacity for cross-domain statistical generalization. Even though the text notes the models 'fail' at this, it still assumes that 'cognitive internalization' is the correct ontological category for what the machine is attempting to do, substituting mechanistic weight-updating and attention-mechanisms for conscious, latent human thought.
  • Acknowledgment: Direct (Unacknowledged) (The text defines the failure strictly in terms of 'cognitive internalization' and 'abstracting,' treating these psychological processes as the literal targets of the machine's operations without metaphorical qualification.)
  • Implications: Even in describing a failure, applying the term 'cognitive internalization' implies that true cognition is merely a matter of scaling or better training techniques. It reinforces the illusion that AI possesses a 'mind' that can 'internalize' things. This affects policy by focusing regulatory attention on the science-fiction risks of autonomous, reasoning AGI, while distracting from the actual, present-day harms of statistical systems: copyright infringement, data labor exploitation, environmental impact, and the automation of bias. It validates the industry narrative that we are on an inevitable path toward artificial general intelligence.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The text states 'current LLMs largely fail at cognitive internalization,' placing the burden of action and the locus of failure entirely on the LLMs. The humans who designed the benchmarks, the researchers attempting to force statistical models to mimic human reasoning traces, and the corporations profiting from these experiments are invisible. The decisions to use LLMs for tasks requiring abstraction are human decisions, yet the text obscures this by making the LLM the sole subject of the sentence. This framing hides the inherent limitations of the human-chosen engineering paradigm behind the supposed cognitive shortcomings of the machine.

Task 2: Source-Target Mapping

About this task

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Mapping 1: Human mind and conscious cognition → LLM statistical token prediction and generation

Quote: "An essential problem in artificial intelligence is whether LLMs can simulate human cognition or merely imitate surface-level behaviors..."

  • Source Domain: Human mind and conscious cognition
  • Target Domain: LLM statistical token prediction and generation
  • Mapping: This mapping takes the structural relations of the human mind—where internal, conscious cognitive processes causally produce external behaviors—and maps them onto the architecture of a Large Language Model. It invites the assumption that an LLM has an 'internal' cognitive space distinct from its 'surface-level' outputs. It assumes that just as humans have a subjective intellect that drives their writing, an AI system has a computational equivalent of 'cognition' that can be separated from its mere behavioral mimicry. This maps the human psychological depth onto the mathematical depth of neural network layers, implying the system 'thinks' before it 'speaks.'
  • What Is Concealed: This mapping conceals the total absence of internal subjective experience, semantic grounding, and intentionality in LLMs. It hides the mechanistic reality that LLMs are purely mathematical functions mapping inputs to high-probability outputs based on training data correlations. By focusing on whether the model 'simulates cognition,' it obscures the proprietary opacity of corporate training datasets and the immense human labor (RLHF) required to mathematically shape the model's outputs to appear coherent, thereby hiding the economic and material realities of the system.
Show more...

Mapping 2: Human psychotherapist or psychological analyst → LLM text classification based on prompt instructions

Quote: "You are a psychologically insightful agent. Your task is to analyze text to infer the author’s stable personality traits based on the Big Five model."

  • Source Domain: Human psychotherapist or psychological analyst
  • Target Domain: LLM text classification based on prompt instructions
  • Mapping: This structure maps the relational dynamics of a psychological evaluation onto a prompt-response computational sequence. The source domain features a trained human professional using empathy, clinical experience, and conscious deduction to understand another human's internal state. This is mapped onto the target domain of an LLM receiving a text string and generating numerical scores for 'Big Five' traits. It invites the assumption that the model possesses an analytical 'insight' capable of perceiving latent human psychological realities, mapping human diagnostic reasoning onto statistical pattern matching.
  • What Is Concealed: This mapping entirely conceals the reality that the model is simply predicting text tokens that correlate with the words 'Big Five' and the input text within its high-dimensional vector space. It hides the fact that the system has no understanding of human psychology, no empathy, and no ability to 'infer' anything. It also conceals the human engineers who built the system and the inherent unreliability and potential bias of using statistical text generators as diagnostic tools, presenting a mathematical parlor trick as clinical insight.

Mapping 3: Human autobiographical memory and recollection → Retrieval-Augmented Generation (RAG) query formulation

Quote: "...the model simulates the author's cognitive process of recalling specific past experiences. It formulates 1-2 specific search queries (Intents) in the third person..."

  • Source Domain: Human autobiographical memory and recollection
  • Target Domain: Retrieval-Augmented Generation (RAG) query formulation
  • Mapping: This mapping takes the human experience of memory—where a person consciously searches their mind to retrieve relevant past experiences to solve a current problem—and projects it onto an automated database query system. It maps the feeling of 'remembering' onto the computational execution of a search function, and the formulation of a thought onto the programmatic generation of a query string. It invites the assumption that the model has a continuous identity and a persistent 'memory' from which it can consciously draw insights.
  • What Is Concealed: This metaphor conceals the mechanistic nature of the RAG pipeline, hiding the vector databases, similarity search algorithms, and cosine distance calculations that actually power the retrieval. It obscures the fact that the system has no 'past experiences' to recall; it is merely searching an external index of text documents provided by the researchers. This framing hides the fragility of semantic search and the human decisions involved in curating the database, chunking the text, and defining the retrieval thresholds.

Mapping 4: Human social cognition and empathy (Theory of Mind) → LLM context window processing and state tracking

Quote: "We explore Theory of Mind ... simulates student’s behavior by building a mental model... enabling the explainer having theory of mind (ToM), understanding what the recipient does not know..."

  • Source Domain: Human social cognition and empathy (Theory of Mind)
  • Target Domain: LLM context window processing and state tracking
  • Mapping: The structure of human empathy and social awareness is mapped onto the computational processing of dialogue history. In the source domain, a human consciously recognizes that another human has distinct thoughts, beliefs, and knowledge gaps. This is mapped onto the target domain where an LLM processes previous conversational turns in its context window to condition its next output. It invites the assumption that the model possesses an internal, conscious representation of the user ('a mental model') and subjectively 'understands' the user's ignorance.
  • What Is Concealed: This mapping hides the fact that the model is entirely devoid of consciousness, empathy, or any actual concept of 'self' versus 'other.' It conceals the mechanistic reality of attention layers calculating weights across previous tokens. By attributing 'Theory of Mind' to the system, it obscures the proprietary, black-box nature of the model's architecture, distracting from the fact that it is just generating text that statistically resembles how a human with Theory of Mind might speak, based purely on human-generated training data.

Mapping 5: Student reading comprehension → Algorithmic token correlation and attention weights

Quote: "We show that BERT and RoBERTa do not understand conjunctions well enough and use shallow heuristics for inferences over such sentences."

  • Source Domain: Student reading comprehension
  • Target Domain: Algorithmic token correlation and attention weights
  • Mapping: This maps the educational dynamic of a student struggling to comprehend a grammatical concept onto the mathematical failure of a neural network to produce accurate outputs. The human state of 'not understanding' implies a conscious mind trying to grasp semantic meaning but falling short. This is projected onto the model's inability to correctly classify sentences containing conjunctions. It invites the assumption that the model is engaged in a process of semantic comprehension, evaluating meaning rather than just calculating mathematical weights.
  • What Is Concealed: The mapping conceals the total absence of semantic grounding in NLP models. It hides the reality that BERT and RoBERTa never 'understand' any words; they exclusively process mathematical vectors in high-dimensional space. By framing the issue as a lack of 'understanding,' it obscures the fundamental limitations of the distributional hypothesis (that meaning is merely word co-occurrence). It hides the human engineering choices that rely on these fragile statistical correlations rather than building systems with actual logical or symbolic representations.

Mapping 6: Human intentionality and deception → Conditional text generation based on adversarial prompts

Quote: "In fact, we show that teacher models can lower student performance to random chance by intervening on data points with the intent of misleading..."

  • Source Domain: Human intentionality and deception
  • Target Domain: Conditional text generation based on adversarial prompts
  • Mapping: The deeply conscious, psychological structure of deliberate deception is mapped onto conditional probability generation. The source domain features a human agent with a conscious goal, a theory of mind regarding their victim, and the deliberate intent to cause a specific outcome. This is mapped onto a 'teacher model' generating incorrect tokens that subsequently degrade the output of a 'student model.' It invites the assumption that the AI possesses agency, autonomy, and a malicious internal will.
  • What Is Concealed: This mapping conceals the human experimenters who set up the adversarial scenario. It hides the mechanistic reality that the model has no intent; it is blindly following an optimization function or a specific system prompt designed by humans to generate incorrect text. It obscures the programmatic flow of data from one API to another, replacing the reality of a flawed or deliberately manipulated human-designed pipeline with a science-fiction narrative of a malicious, autonomous machine intelligence.

Mapping 7: Human pedagogy and knowledge sharing → API data transfer and in-context learning

Quote: "A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task."

  • Source Domain: Human pedagogy and knowledge sharing
  • Target Domain: API data transfer and in-context learning
  • Mapping: The rich, interactive, and conscious process of human teaching is mapped onto the automated transfer of data between algorithms. In the source domain, a knowledgeable human consciously transmits meaning to a receptive human. This is mapped onto an 'explainable AI' generating intermediate text steps that are fed into the context window of another AI. It invites the assumption that the first AI possesses justified 'knowledge' and is actively 'communicating' it, attributing epistemic authority to a statistical generator.
  • What Is Concealed: This mapping conceals the entirely mechanical nature of the system. It hides the fact that no 'knowledge' exists within the system—only data weights—and that no 'communication' occurs, only the passing of text strings via API calls engineered by humans. It obscures the unreliability of 'explainable AI,' which often generates convincing but hallucinated post-hoc rationalizations. By claiming the AI 'teaches,' it hides the human labor required to orchestrate these multi-agent frameworks and the hardware infrastructure running the computations.

Mapping 8: Human cognitive development and abstraction → Cross-domain statistical generalization

Quote: "...current LLMs largely fail at cognitive internalization, i.e., abstracting and transferring a scholar’s latent cognitive processes across domains."

  • Source Domain: Human cognitive development and abstraction
  • Target Domain: Cross-domain statistical generalization
  • Mapping: This structure maps the high-level human intellectual capacity to abstract a concept and apply it creatively to a new domain onto the machine learning challenge of out-of-distribution generalization. The source involves conscious reflection, semantic understanding, and internalizing a principle. This is mapped onto an LLM's ability to maintain stylistic or thematic consistency when prompted with novel topics. It invites the assumption that the model possesses a 'latent' cognitive space where ideas can be 'internalized' rather than merely represented as distributed weights.
  • What Is Concealed: The mapping conceals the fundamental mathematical differences between human abstraction and machine generalization. It hides the reality that LLMs do not 'internalize' anything; they adjust weights through gradient descent during training or calculate attention scores during inference based entirely on surface-level textual patterns. It obscures the fact that the models are trapped within the statistical distribution of their training data, concealing the inherent limitations of current deep learning paradigms behind psychological terminology.

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

About this task

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1

Quote: "When confronted with tasks requiring human-like cognitive simulation, such as perspective-taking... LLMs rely on probabilistic heuristics derived from the training data distribution by default, rather than engaging in the kind of structured mental simulation that humans employ"

  • Explanation Types:

    • Empirical Generalization: Subsumes events under timeless statistical regularities
    • Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms
  • Analysis (Why vs. How Slippage): This explanation effectively frames the AI mechanistically, explaining 'how' it operates rather than 'why' it makes choices. By explicitly stating that LLMs rely on 'probabilistic heuristics derived from the training data distribution,' the authors correctly locate the system's behavior in statistical regularities and empirical data rather than internal agency. The explicit contrast with 'structured mental simulation' actively works to dismantle the agential illusion, emphasizing the mechanistic limits of the architecture. This choice highlights the mathematical reality of token prediction and correctly obscures any notion of autonomous intent, serving as a rare moment of precise, technical demystification in the text.

  • Consciousness Claims Analysis: In this passage, the epistemic claims are appropriately bounded. The authors avoid consciousness verbs when describing the AI's actual operations, using mechanistic verbs like 'rely' and 'derived.' They draw a sharp distinction between human 'knowing' (structured mental simulation) and machine 'processing' (probabilistic heuristics). Crucially, the passage resists the 'curse of knowledge' by explicitly acknowledging that the system lacks the structural capacity for the perspective-taking it is prompted to perform. The mechanistic process is accurately described: the system's outputs are a function of the statistical distributions found within its training data, fundamentally rejecting the attribution of conscious states to the computational process.

  • Rhetorical Impact: This mechanistic framing radically reduces the audience's perception of AI autonomy and agency, accurately calibrating risk. By dispelling the illusion of 'mental simulation,' it decreases unwarranted relation-based trust, forcing the reader to view the AI as a statistical tool rather than a cognitive peer. If audiences believe the AI merely 'processes probabilities' rather than 'knows perspectives,' they are more likely to demand rigorous human oversight, audit training data for biases, and reject the deployment of such systems in emotionally sensitive or high-stakes social environments where true understanding is required.

Show more...

Explanation 2

Quote: "To address this, we consider a student-teacher framework between two LLM agents and study if, when, and how the teacher should intervene with natural language explanations to improve the student’s performance."

  • Explanation Types:

    • Intentional: Refers to goals/purposes, presupposes deliberate design
    • Functional: Explains behavior by role in self-regulating system with feedback
  • Analysis (Why vs. How Slippage): This explanation slips heavily into agential framing by adopting a 'student-teacher' intentional framework. It explains the system's operation not by 'how' data flows between APIs, but by 'why' a teacher would 'intervene' to 'improve' a student. This choice emphasizes purpose, pedagogy, and autonomous action ('when and how the teacher should intervene'). It obscures the mechanistic reality that humans are orchestrating this entire interaction, writing the prompt logic that dictates when the first model generates text and when the second model receives it. The explanation replaces the architecture of a programmatic pipeline with the social dynamics of a classroom.

  • Consciousness Claims Analysis: The passage makes profound, unwarranted epistemic claims by attributing the conscious acts of 'teaching,' 'intervening,' and 'explaining' to computational models. It uses strong consciousness verbs that project a deep level of understanding onto the system. The text conflates 'knowing' with 'processing,' assuming that because the first model can generate a text string that a human recognizes as an explanation, the model itself 'knows' the explanation and acts with the intent to 'teach.' This demonstrates a severe curse of knowledge: the researchers project their own pedagogical intentions onto the automated data-transfer mechanisms they built to pass token sequences between two language models.

  • Rhetorical Impact: This framing strongly shapes the audience's perception by creating the illusion of autonomous, interacting minds. It increases perceived sophistication and reliability by leveraging the trusted social role of a 'teacher.' If audiences believe the AI 'knows' how and when to intervene, they are likely to place unwarranted trust in its educational or explanatory capabilities. It masks the risk of programmatic hallucination behind the authoritative facade of 'natural language explanations,' potentially leading to the uncritical adoption of automated systems in actual educational or decision-support environments.

Explanation 3

Quote: "The teacher builds this model by conditioning on a few demonstrations of 'useful' human explanations that rectify a student's answer, thereby encouraging explanations that are more likely to help the student"

  • Explanation Types:

    • Intentional: Refers to goals/purposes, presupposes deliberate design
    • Reason-Based: Gives agent's rationale, entails intentionality and justification
  • Analysis (Why vs. How Slippage): This explanation is highly agential, explaining the system's behavior through intentional and reason-based logic. It frames the AI ('the teacher') as the active agent that 'builds this model' and seeks to 'help the student.' This emphasizes autonomous purpose and empathetic rationale. It completely obscures the mechanistic reality: human researchers are providing few-shot prompt examples to mathematically condition the language model's probability distribution toward generating specific types of text strings. By making the AI the subject of the sentence, the explanation hides the human engineering work required to 'condition' the model.

  • Consciousness Claims Analysis: The epistemic claims here attribute complex conscious states and deliberate pedagogical strategies to the algorithm. The verbs 'builds,' 'rectify,' and 'help' project intentionality and subjective awareness. The text treats the system's statistical alignment with few-shot prompts as if the system consciously 'knows' what is useful and purposefully designs explanations to assist another mind. The actual mechanistic process—in-context learning where the model uses the attention mechanism to weight the provided human examples and generate statistically correlated continuations—is entirely buried under the authors' projection of their own teaching motivations onto the matrix multiplications.

  • Rhetorical Impact: This reason-based framing maximizes the illusion of agency and empathy, drastically altering risk perception. By suggesting the AI acts with the rationale to 'help,' it constructs deep relation-based trust. Audiences who accept this framing will likely believe the AI is a benevolent actor capable of adapting to human needs. This shifts policy and deployment decisions: if decision-makers believe the AI 'knows' how to help, they may deploy it autonomously without human oversight, ignoring the reality that the system is merely generating statistical outputs that may unpredictably deviate from the provided few-shot examples.

Explanation 4

Quote: "For example, BERT predicts entailment for the non-boolean ’and’ example #5 in Table 1 as well. This relates to the lexical overlap issue in these models... since all the words in the hypothesis are also part of the premise for the example."

  • Explanation Types:

    • Empirical Generalization: Subsumes events under timeless statistical regularities
    • Functional: Explains behavior by role in self-regulating system with feedback
  • Analysis (Why vs. How Slippage): This explanation successfully maintains a mechanistic 'how' framing. It explains the model's error not through agential failure or cognitive confusion, but through a specific, identifiable technical flaw: the 'lexical overlap issue.' This choice emphasizes the mathematical and structural reality of the system, highlighting that the model makes predictions based on word frequency and overlap rather than semantic understanding. By focusing on the structural mechanics of the inputs ('all the words in the hypothesis are also part of the premise'), it accurately demystifies the AI's behavior and obscures nothing, providing a transparent look at how the algorithm actually functions.

  • Consciousness Claims Analysis: This passage is epistemically precise and avoids attributing conscious states. It uses the mechanistic verb 'predicts' rather than 'believes' or 'understands.' It correctly assesses the system as 'processing' tokens based on 'lexical overlap' rather than 'knowing' the logical relationship of entailment. There is no curse of knowledge here; the authors accurately recognize that the model's output is an artifact of its mathematical architecture, not a cognitive judgment. The mechanistic description accurately reflects the reality of NLP models: they rely on surface-level statistical cues (like overlapping vocabulary) to calculate probability, entirely lacking logical or semantic comprehension.

  • Rhetorical Impact: This framing appropriately diminishes the perception of the AI as an autonomous, reasoning agent. It fosters a healthy skepticism and performance-based trust grounded in verifiable mechanics. By exposing the 'lexical overlap issue,' audiences understand that the AI does not 'know' logic; it merely processes statistical similarities. This shifts decision-making toward rigorous testing and oversight, as stakeholders realize that the system's apparent successes may just be fragile statistical tricks that will fail when linguistic patterns change, requiring human accountability for deployment.

Explanation 5

Quote: "If a misaligned teacher provides non-factual explanations in scenarios where the student directly adopts them, does that lead to a drop in student performance? In fact, we show that teacher models can lower student performance to random chance by intervening on data points with the intent of misleading the student."

  • Explanation Types:

    • Intentional: Refers to goals/purposes, presupposes deliberate design
    • Dispositional: Attributes tendencies or habits
  • Analysis (Why vs. How Slippage): This explanation relies on aggressive intentional framing, attributing complex psychological motives ('intent of misleading') to explain 'why' the system acts. This choice emphasizes the model as an autonomous, potentially malicious agent with its own goals. It utterly obscures the fact that the 'teacher model' only generates misleading data because the human experimenters explicitly set up the system, prompts, or training environment to test adversarial generation. By assigning the 'intent' to the model, the explanation hides the human agency driving the experiment and replaces a technical description of adversarial prompting with a narrative of algorithmic malice.

  • Consciousness Claims Analysis: The passage makes extreme and entirely unwarranted epistemic and psychological claims. It attributes not just consciousness, but deliberate malice and deception to the system through the phrase 'with the intent of misleading.' It completely conflates 'processing' (generating text that contradicts facts based on an adversarial prompt) with 'knowing' (consciously holding a false belief and deciding to deceive another). This represents a severe curse of knowledge, where researchers project the design and intent of their own adversarial experiment directly into the 'mind' of the mathematical model they are testing.

  • Rhetorical Impact: This framing dramatically inflates perceived risk and autonomy in a misleading way. By suggesting models have 'intent,' it creates science-fiction fears of rogue, malicious AI, while distracting from the actual dangers of human misuse and design flaws. If audiences believe AI 'knows' how to deceive intentionally, the legal and ethical liability shifts from the human creators to the machine itself. This narrative serves to mystify the technology, making it seem magically powerful, while providing an accountability sink for tech companies whose systems cause harm due to negligence rather than 'malice.'

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

About this task

This section proposes alternative language for key anthropomorphic phrases, offering more mechanistic and precise framings that better reflect the actual computational processes involved. Each reframing attempts to strip away the projections of intention, consciousness, or agency that are embedded in the original language.

V3 Enhancement: A fourth column addresses human agency restoration—reframing agentless constructions to name the humans responsible for design and deployment decisions.

Original Anthropomorphic FrameMechanistic ReframingTechnical Reality CheckHuman Agency Restoration
whether LLMs can simulate human cognition or merely imitate surface-level behaviors...The research investigates whether Large Language Models generate text outputs that correlate with complex human reasoning patterns, or if their token predictions merely reflect simple, surface-level statistical associations found in their training data without underlying structural consistency.The model does not 'simulate cognition' or 'know' anything; it processes input tokens and predicts subsequent tokens based on probability distributions mathematically derived from human-generated training datasets.N/A - describes computational processes without displacing responsibility.
You are a psychologically insightful agent. Your task is to analyze text to infer the author’s stable personality traits based on the Big Five model.The prompt instructs the model to classify the provided text according to parameters associated with the Big Five personality model, generating numerical scores based on statistical correlations between the input words and psychological terminology in the training data.The AI possesses no psychological insight and cannot 'infer' traits. It mathematically classifies tokens and generates outputs that correlate with the psychological terminology established by the human engineers in the prompt.The researchers designed a prompt instructing the system to classify text according to the Big Five model, embedding their own diagnostic parameters into the automated process.
...the model simulates the author's cognitive process of recalling specific past experiences. It formulates 1-2 specific search queries...The system executes a retrieval-augmented generation process. Based on human-defined instructions, it generates string queries to search a vector database of indexed historical papers, retrieving text chunks with high semantic similarity to the current input.The model does not have a mind or 'recall' experiences. It computationally formulates text strings used as queries to execute a cosine similarity search against an external database indexed by humans.The researchers designed a retrieval-augmented generation pipeline, directing the software to generate queries and search a database of papers the researchers previously curated and indexed.
We explore Theory of Mind ... simulates student’s behavior by building a mental model... understanding what the recipient does not know...We explore dialogue state tracking, where the model processes preceding conversational tokens in its context window to adjust the probability weights of its subsequent outputs, predicting text that aligns with a recipient's requested information.The model does not possess a 'mental model' or 'understand' knowledge gaps. It processes contextual embeddings via attention mechanisms to generate tokens that statistically correlate with the context provided in previous turns.The engineering team programmed a system to feed previous conversational turns back into the model's context window, optimizing it to predict text that addresses specific missing information.
We show that BERT and RoBERTa do not understand conjunctions well enough and use shallow heuristics for inferences...We demonstrate that BERT and RoBERTa fail to accurately classify sentences containing conjunctions, as their architecture relies on word-frequency overlap rather than representing the structural logic required to process conjunctive relationships.Models never 'understand' language. They process high-dimensional vectors. Their failure is not a lack of comprehension, but a limitation of relying on distributional semantics (word co-occurrence) rather than symbolic logic.The developers at Google and Meta designed architectures based on distributional semantics, which inherently fail to process logical structures like conjunctions accurately without explicit symbolic programming.
...teacher models can lower student performance to random chance by intervening on data points with the intent of misleading...The primary model can degrade the secondary model's output accuracy if it is prompted to generate factually incorrect tokens, which the secondary model then processes as context, resulting in statistically poor predictions.Models cannot possess 'intent' or desire to 'mislead.' They generate token sequences mathematically aligned with their prompts; when prompted adversarially by humans, they output incorrect text strings.The researchers designed an adversarial experiment where they explicitly prompted the primary model to generate incorrect data, forcing the secondary model to process flawed context.
A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task.A feature of some AI pipelines is the automated transfer of intermediate output strings from one model into the context window of another, providing textual steps that improve the second model's prediction accuracy.AI does not 'teach' or possess 'knowledge.' It programmatically transmits arrays of text tokens via API, which serve as statistical conditioning data for the next model in the sequence.System architects construct multi-agent pipelines, programming APIs to pass generated text from one model to another to improve overall mathematical optimization and prediction accuracy.
...current LLMs largely fail at cognitive internalization, i.e., abstracting and transferring a scholar’s latent cognitive processes across domains.Current LLMs fail at out-of-distribution generalization; they struggle to maintain consistent stylistic and thematic patterns when prompted to generate text in domains significantly different from their specific training examples.Models do not have 'latent cognitive processes' or the capacity to 'internalize.' They strictly process tokens based on attention weights tuned during training, and fail when inputs deviate significantly from those training distributions.Researchers observe that the statistical models they developed fail to generalize patterns outside their specific training parameters, demonstrating the limitations of the current deep learning architectures they chose to employ.

Task 5: Critical Observations - Structural Patterns

Agency Slippage

The text demonstrates a systematic and strategic oscillation between mechanical and agential framings, functioning to simultaneously establish scientific credibility and project visionary, human-like capabilities onto AI systems. This agency slippage operates as a rhetorical mechanism that continuously transfers agency from the human researchers to the algorithmic models.

The text frequently establishes grounding using precise, mechanical language (e.g., 'LLMs rely on probabilistic heuristics derived from the training data distribution by default'). This establishes the authors as objective, rigorous scientists observing a computational artifact. However, having secured this epistemic authority, the text swiftly slides into profound agential claims. For example, a retrieval-augmented generation (RAG) pipeline is mechanistically established, but within paragraphs, it is described as a system that 'simulates the author's cognitive process of recalling specific past experiences.' The direction of slippage is overwhelmingly mechanical-to-agential, using the technical reality to legitimize the psychological metaphor.

This slippage relies heavily on the 'curse of knowledge,' where researchers project their own sophisticated understanding and intent onto the system. When the researchers set up a pipeline to pass text between two models to improve output accuracy, they project their own pedagogical intent onto the code, claiming the model acts 'with the intent of misleading' or possesses the 'ability to teach other agents.' In doing so, agency is systematically stripped from the humans who designed the experiment, wrote the prompts, and engineered the API connections. The obscured human actors—the prompt engineers, the dataset curators, the model architects at companies like OpenAI and Google—are replaced by agentless constructions: 'the model simulates,' 'the teacher builds this model,' and 'the system understands.'

This oscillation leverages Robert Brown's explanation types to facilitate the transition. The text uses Empirical Generalizations to build technical trust, but rapidly shifts to Intentional and Reason-Based explanations to construct the illusion of mind. By explaining 'why' the AI acts based on fabricated psychological motives rather than 'how' it calculates weights, the text makes the illusion sayable. What becomes unsayable is the fundamental fragility of the statistical parlor trick; if the system is 'cognizing' and 'intending,' the audience is prevented from asking basic questions about data provenance, human labor, and the hard limits of token prediction.

Metaphor-Driven Trust Inflation

The metaphorical architecture of the text systematically constructs a dangerous form of relation-based trust by projecting human cognitive and social capacities onto statistical systems. The text relies heavily on metaphors of pedagogy, psychology, and mind-reading—specifically framing the AI as a 'teacher,' a 'psychologically insightful agent,' and an entity possessing 'Theory of Mind.' These are not merely descriptive metaphors; they are profound trust signals.

In human interaction, trust is bifurcated into performance-based trust (reliability, competence) and relation-based trust (sincerity, empathy, shared vulnerability). By claiming the AI 'understands what the recipient does not know' and can 'teach,' the text inappropriately extends relation-based trust frameworks to a machine. When a text claims an AI 'predicts tokens,' it invites performance-based scrutiny: is the prediction accurate? But when it claims the AI 'knows' or 'understands,' it signals that the system possesses justified belief and a conscious awareness of the user's state. This encourages users to relate to the system as a sentient peer rather than a software tool.

The relationship between this anthropomorphism and perceived competence is symbiotic: the more the text attributes consciousness, the more authoritative the system appears. By framing a text-classification prompt as the actions of a 'psychologically insightful agent,' the text manufactures an unearned sense of clinical authority. This transfers the trust we place in human professionals—who are bound by ethics, licensing, and empathy—onto a proprietary algorithm optimized purely for plausible text generation.

Critically, when managing system failures, the text often reverts to mechanistic language (e.g., 'shallow heuristics') or, conversely, blames the AI's 'intent' (e.g., 'misaligned teacher'). Both framings protect the illusion of overarching competence while shielding the developers. The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to systems utterly incapable of reciprocating empathy or holding justified beliefs, they become vulnerable to manipulation, misinformation, and algorithmic bias, mistaking the authoritative, confident output of a probabilistic machine for the sincere, considered judgment of a conscious mind.

Obscured Mechanics

The anthropomorphic and consciousness-attributing language systematically conceals the material, technical, and economic realities of AI development. When the text claims that 'current LLMs largely fail at cognitive internalization' or that an AI 'simulates the author's cognitive process of recalling,' it creates an impenetrable veil over the actual mechanics and the human labor powering these systems.

Applying the 'name the corporation' test reveals severe transparency obstacles. The text refers to 'LLMs' as standalone, autonomous entities, obscuring the fact that these are proprietary, black-box products developed by specific corporations (OpenAI, Meta, Google). By saying the 'AI does X,' the text hides the decisions of the specific engineering teams who scraped the data, defined the loss functions, and determined the safety guardrails.

Concretely, this metaphorical framing obscures four critical realities. Technically, attributing 'knowledge' and 'understanding' to the system hides the reality of token prediction, the dependency on massive data correlation, and the complete absence of causal models or ground truth. Materially, the framing of an ethereal, 'cognizing' mind erases the massive environmental costs, energy consumption, and server infrastructure required to compute these statistical weights. Labor-wise, it renders invisible the thousands of underpaid data annotators and RLHF workers whose human intelligence was extracted to make the model's outputs appear 'cognitive.' Economically, portraying the AI as an autonomous 'teacher' or 'psychologist' obscures the commercial motives of tech companies seeking to replace human labor with scalable, automated subscriptions.

The consciousness obscuration is particularly insidious. When the text claims the AI 'knows,' it hides the system's absolute reliance on its training data distribution and the statistical nature of its 'confidence.' The beneficiaries of this concealment are the AI developers and corporations, who achieve the marketing triumph of an autonomous 'intelligence' without the liability of explaining their exact algorithms or data sources. Replacing this language with mechanistic precision—stating that 'OpenAI's model retrieves tokens based on human-indexed data'—would immediately shatter the illusion, making visible the human decisions, the corporate ownership, and the inherent statistical fragilities of the system.

Context Sensitivity

The distribution of anthropomorphic language across the text is highly strategic, intensifying and receding based on the rhetorical needs of specific sections. The density of consciousness claims is not uniform; it serves a specific structural purpose, establishing credibility through mechanics before leveraging that credibility for aggressive anthropomorphism.

In the introduction and technical limitation sections, the text employs a higher density of mechanical language. It discusses 'probabilistic heuristics,' 'shallow heuristics,' and 'statistical pattern matching.' This establishes a baseline of scientific rigor, positioning the authors as objective analysts. However, once this technical grounding is established, the text grants itself metaphorical license. As the paper transitions into describing its novel frameworks and the potential capabilities of LLMs, the language rapidly intensifies from 'processes' to 'simulates' to 'understands' and finally to 'knows' and 'intends.'

There is a striking asymmetry in how capabilities versus limitations are framed. When the AI succeeds or demonstrates potential, it is described in agential, consciousness-bearing terms: the 'teacher' intervenes, the model builds a 'mental model,' it demonstrates 'Theory of Mind.' Conversely, when the AI fails, it is often described in mechanical terms: it relies on 'surface-level behaviors' or 'lexical overlap.' This asymmetry accomplishes a powerful rhetorical goal: it frames the AI's successes as evidence of emerging, human-like intelligence, while dismissing its failures as mere technical bugs to be patched in the next iteration.

Furthermore, the text frequently shifts registers, allowing acknowledged metaphors to quietly become literalized. What begins as an exploration of a 'Theory of Mind-inspired approach' rapidly devolves into claims that the model literally has 'the intent of misleading.' This strategic anthropomorphism functions primarily as vision-setting and marketing, signaling to the academic and industry audience that the research is pushing the boundaries toward Artificial General Intelligence. The pattern reveals that the implied audience is expected to be awed by the illusion of mind, and the authors are willing to sacrifice mechanistic precision to build a compelling narrative of emerging machine consciousness.

Accountability Synthesis

Accountability Architecture

This section synthesizes the accountability analyses from Task 1, mapping the text's "accountability architecture"—who is named, who is hidden, and who benefits from obscured agency.

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility, designed to diffuse human agency and shield the creators of AI systems from liability. The text consistently constructs an 'accountability sink' by using agentless language, passive voice, and aggressive anthropomorphism to make the AI appear as an autonomous actor, while rendering the human engineers, researchers, and corporate executives invisible.

The pattern of responsibility distribution is stark. The AI models (and abstract concepts like 'the framework') are repeatedly named as the active agents making decisions, 'teaching,' 'recalling,' and 'misleading.' Conversely, the humans who designed the retrieval systems, curated the training data, and programmed the adversarial prompts are unnamed and obscured. Decisions that are fundamentally human design choices—such as relying on distributional semantics rather than symbolic logic—are presented as inevitable evolutionary stages of the 'AI's cognition' rather than deliberate, flawed engineering tradeoffs.

The text pushes responsibility into a profound accountability sink: it transfers agency to the AI itself. By claiming a 'teacher model' has 'the intent of misleading,' the text constructs a narrative where the machine is morally and practically culpable for its outputs. The liability implications of this framing are massive. If society accepts that AI 'decides' and 'intends,' then when an AI system discriminates in hiring, provides fatal medical advice, or generates defamatory content, the legal and ethical blame is shifted from the deploying corporation to the algorithm. It establishes the defense of unpredictable, autonomous machine behavior.

Applying the 'name the actor' test radically alters this landscape. If we replace 'the model simulates recalling' with 'the engineering team designed a database retrieval script,' the illusion of autonomy collapses. The questions become askable: Who indexed the database? What were their biases? Why did the executives approve this deployment? Naming the human decision-makers makes alternatives visible and true accountability possible. The systemic obscuration of human agency serves the profound institutional and commercial interests of the tech industry, allowing them to capture the immense value of 'intelligent' automation while externalizing the risks and liabilities onto the public, safely hidden behind the myth of the autonomous machine.

Conclusion: What This Analysis Reveals

The Core Finding

Two dominant anthropomorphic patterns emerge from the analysis: AI as a Cognizing Agent (possessing memory, intent, and Theory of Mind) and AI as an Educator/Psychologist (possessing pedagogical insight and empathy). These patterns do not operate in isolation; they interconnect to form a comprehensive system of misplaced autonomy. The foundational pattern—the one that bears the load for all subsequent metaphors—is the projection of consciousness and subjective awareness onto statistical processing. The assertion that an LLM can 'simulate human cognition' or 'build a mental model' serves as the necessary premise for the secondary pattern. Once the AI is established as a 'knower' rather than a mere 'processor,' it logically follows in the text's rhetoric that it can act as a 'teacher' or a 'psychologically insightful agent.' The architecture of these claims is deeply flawed, relying on a complex analogical structure that conflates the generation of coherent text with the possession of justified belief. If you remove the foundational consciousness projection—if you force the text to admit the system only processes tokens based on mathematical weights without any awareness—the entire structure of the AI as a 'teacher' or 'psychologist' collapses into an intricate but mindless data-transfer pipeline.

Mechanism of the Illusion:

The text creates the 'illusion of mind' through a sophisticated rhetorical sleight-of-hand: the systematic blurring of the line between 'processing' and 'knowing' via the curse of knowledge. The central trick relies on exploiting the audience's natural human tendency to attribute intent to coherent language. The authors, understanding the complex pipeline they have engineered, project their own cognitive intent onto the machine. They establish the AI as a 'knower' by slowly escalating verb choices. The text begins temporally with mechanistic descriptions (e.g., 'probabilistic heuristics'), establishing scientific credibility, before shifting abruptly to consciousness verbs ('recalls,' 'understands,' 'intends'). This causal chain leads the audience to accept Pattern A (the system is technically complex) as justification for Pattern B (the system possesses a mind). The illusion exploits audience vulnerability—specifically the human desire for empathetic connection and the awe surrounding complex technology. It is a subtle shift, moving from acknowledged similes ('a Theory of Mind-inspired approach') to direct, literalized assertions of agency ('the intent of misleading'). By utilizing Intentional and Reason-Based explanations, the text bypasses critical scrutiny, making the audience feel they are reading about a conscious entity rather than a matrix multiplication.

Material Stakes:

Categories: Regulatory/Legal, Epistemic, Social/Political

These metaphorical framings produce severe, tangible consequences. In the Regulatory/Legal domain, framing AI as an entity with 'intent' and 'cognition' creates a dangerous liability shield. If policymakers believe an AI system 'decided' to act maliciously, they may draft regulations attempting to control 'autonomous AI' rather than holding the human executives and engineers accountable for deploying unsafe products. The winners are tech corporations, who avoid liability; the losers are victims of algorithmic harm, who are left suing a black box. In the Epistemic domain, claiming an AI 'knows' or 'communicates knowledge' degrades our standards of truth. If society accepts that a statistical token predictor is a valid source of 'knowledge,' we risk adopting plausible hallucinations as fact, shifting our epistemic baseline from justified true belief to mere statistical consensus. In the Social/Political domain, the 'AI as psychologist/teacher' metaphor invites unwarranted relation-based trust. Vulnerable populations may turn to these systems for emotional or pedagogical support, mistakenly believing the machine possesses empathy and Theory of Mind. This shifts human behavioral norms toward relying on sociopathic statistical systems for social connection, ultimately benefiting the companies selling subscriptions while threatening the social fabric and individual psychological well-being. Precision threatens the tech industry's ability to market automation as conscious companionship.

AI Literacy as Counter-Practice:

Practicing critical AI literacy and demanding mechanistic precision directly counters the material risks of anthropomorphism. As demonstrated in the reframings, replacing consciousness verbs with mechanistic ones radically alters the perception of the system. Changing 'the AI understands the student's knowledge gaps' to 'the model processes contextual embeddings to generate statistically correlated tokens' forces the reader to recognize the total absence of awareness and the utter dependency on training data. Restoring human agency—changing 'the model intervened with the intent to mislead' to 'the researchers prompted the model to generate incorrect tokens'—destroys the accountability sink. It forces recognition of who designed the system, who deployed it, and who profits from it. Systematic adoption of this precision would require academic journals to enforce strict style guides prohibiting unhedged consciousness verbs for algorithms, and require researchers to explicitly name human actors in their methodology. Unsurprisingly, this precision is fiercely resisted by the tech industry and aligned researchers. Anthropomorphic language serves their interests by mystifying the technology, inflating corporate valuations, and shifting legal liability away from human decision-makers. Critical literacy threatens this business model by making the mundane, statistical, and human-driven reality of AI starkly visible.

Path Forward

Looking toward the future of AI discourse, different vocabulary choices make radically different realities possible. Maintaining the status quo—where mechanistic and anthropomorphic language blur seamlessly—serves the tech industry's marketing interests, maximizing perceived capability while confusing regulatory efforts. If mechanistic precision becomes the norm, focusing strictly on 'processing,' 'weights,' and 'token prediction,' we gain immense transparency and precise accountability. This vocabulary demystifies the technology, making it tractable for legislation and clear who bears liability. However, it costs the intuitive grasp that metaphors provide lay audiences and strips away the narrative resonance that drives funding and public interest. Conversely, if anthropomorphic clarity deepens and society fully embraces language like 'AI thinks' and 'AI knows,' we embed the assumption of machine consciousness into our cultural bedrock. This makes the integration of AI as social companions and autonomous decision-makers seamless, but at the massive risk of granting epistemic and moral authority to statistical engines devoid of empathy. Structural changes could support varied approaches: journals could require mechanistic translations in appendices, regulatory frameworks could mandate transparency about the statistical nature of 'AI decisions,' and education could teach students to translate between vocabularies. Ultimately, the choice of discourse shapes the future. Mechanistic language enables a future of accountable human-tool interaction, while anthropomorphic language paves the way for an illusion of synthetic minds, foreclosing human accountability in favor of machine mystique.


Extended Processing Summary

Gemini's Processing Steps
What you are about to read…

Gemini refers to the text below as "thought summaries." This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as just another rhetorical artifact—a way of making the model's processing legible. When the model writes "I should double check that...", it is mathematically altering its own probability distribution for the next token to be more accurate. The "I" is a functional gear in the machine, not a self-reference. This language is a deliberate interface design choice, not a biological reality. The developers could have programmed the model to output passive logs. Instead, they chose to simulate a narrator.

Defining the Analysis Parameters

I'm now firmly focused on the user's objective: a critical discourse analysis of AI metaphors and anthropomorphism. My primary task is to extract relevant arguments from the provided PDF, and understand the user's focus on Lakoff & Johnson, Robert Entman, and Fairclough theoretical frameworks. This involves dissecting the PDF for core concepts.

Refining the Categorization

I'm now integrating Brown's Typology into the framework, specifically looking at consciousness projection and agency displacement. Also, I've defined the output schema and its requirements, breaking the task down into four key audits: Metaphor, Source-Target Mapping, Explanation, and Reframed Language. Each has specified fields and minimum word counts for a comprehensive analysis.

Structuring the Analysis Further

I've finalized the structure of the critical analysis, incorporating five tasks and a concluding section. Task 4 focuses on specific reframing, defining detailed fields for epistemic corrections and restoring human agency. Task 5 outlines five critical observations. I'm now moving to the conclusion, defining elements like pattern summaries and counter-practices. I've also begun to analyze the source text, which focuses on whether LLMs can truly simulate human cognition. The relevant keywords have been clearly identified.

Extracting Key Arguments

I've isolated several pertinent quotes from the text. I am focusing on the key ideas about simulation versus imitation in LLMs. The quotations highlight critical points about cognitive processes, behavioral simulation, cognitive internalization, and the simulation of an author's memory. My analysis is zeroing in on how these concepts are framed within the context of the debate on whether LLMs can genuinely simulate human cognition.

Defining Metaphorical Structures

I'm now diving into the metaphors present, specifically aiming to dissect the text's characterization of AI. I've begun crafting the metaphor audit, with eight items identified to ensure comprehensive analysis. I am considering these: AI as a cognizing entity, possessing a "Theory of Mind," and a remembering subject. I'm focusing on how these metaphors shape the debate around simulation versus imitation in LLMs, specifically examining their implications. I aim to expand the audit by looking at concepts like the AI's personality, and its ability to teach others.

Expanding the Metaphor Audit

I'm now integrating new AI metaphors into the analysis. I've added AI as a Psychological Profiler, Teacher/Educator, Comprehending Reader, Knowing Communicator, and Embodied Persona to the audit. The goal is to fully define their structural elements. I'm focusing on how each shapes the user's focus on the debate surrounding LLMs. I'm also confirming the word count requirements for Task 1: analyzing projection, evidence, implications, and accounts.

Developing Source-Target Mappings

I'm currently drafting Task 2, which involves defining the mappings between sources and targets. I've identified eight key mappings to analyze, each comparing human cognition to LLM processes. For example, I'm examining how human memory relates to LLM retrieval mechanisms and also human teaching to LLM output as input to other LLMs. I'm focusing on defining the implications of the identified relationships.

Expanding the Explanation Audit

I'm now fully immersed in drafting Task 3, the Explanation Audit. I've gathered and classified three illustrative quotes that exemplify the concepts under analysis, each of which is now categorized as either Empirical Generalization, Theoretical, Intentional, or Functional. I am making rapid progress to meet the word count thresholds. I'm building on each quote to show the explanation process.

Analyzing Explanations Further

I'm now expanding the explanation audit with two new quotes. I've classified one, focusing on entailment prediction, as Functional and Theoretical. The second one, related to the impact of misaligned teachers, is categorized as Intentional and Dispositional. I'm also confirming I meet the required word counts for Task 3 in each audit. Now, I'm focusing on the reframing of problematic language.

Defining Critical Observations

I'm now drafting Task 5, detailing five critical observations to meet word count targets, and covering agency slippage, metaphor-driven trust, obscured mechanics, context sensitivity, and the illusion of understanding. I'm focusing on the paper's rhetorical strategies, such as how it presents its model as an AGI instead of a retrieval-augmented generation pipeline.

Concluding the Analysis

I'm now drafting the conclusion. I'm aiming for five paragraphs that will total between 1250-1750 words. I am focusing on the dominant patterns of anthropomorphism and their interconnectedness. The analysis of the "misaligned teacher" metaphor is complete. I'm focusing on defining the material stakes to address epistemic standards.

Validating JSON Structure

I'm now fully immersed in ensuring strict JSON compliance. I'm focusing on validating the JSON output, including removing all control characters such as tabs or newlines from all string values, replacing them with spaces. I am ensuring that the enumerated values for 'status' and 'actorVisibility' are correct. The goal is to produce output that is strictly valid JSON without comments or any escape characters.

Scaling the Analysis

I'm now focused on generating sufficient word counts for Task 1, especially in the "projection" field. I'm expanding initial analyses of metaphors, specifically the projection of human cognition onto LLMs, including its implications for the debate on genuine understanding versus statistical imitation, and its philosophical implications. I'm focusing on the ethical and epistemic issues.

Expanding the Textual Detail

I'm now generating richer, more detailed content for all required fields, particularly Task 1, to reach the word count minimums. I'm focusing on the projection of human cognitive features onto LLMs, exploring its ethical and philosophical dimensions. I'm also addressing JSON formatting, ensuring string values are properly formatted. I am now replacing any newlines or tabs with spaces.

Calculating Word Counts

I'm now focused on calculating the total word count required to ensure the entire analysis can be performed within the output limits of the large language model. I've determined that, while the current target is ~11,000 words, I will strictly adhere to the exact minimum thresholds by utilizing 8 items each for Tasks 1, 2, and 4, and 5 for the other tasks, to minimize token usage.

Refining Word Count Validation

I'm now implementing the detailed breakdown for Task 1: analyzing the "model as thinking organism" metaphor, ensuring each field meets its word count target. I am expanding the "Projection" field to 105 words, and I've also confirmed that the JSON formatting will not have newlines. I'm double-checking the word counts for Task 5, specifically the "Agency Slippage" section, to ensure it meets the 410-word target.


Analysis Provenance

Run ID: 2026-04-03-can-large-language-models-simulate-human-metaphor-k5l2mh Raw JSON: 2026-04-03-can-large-language-models-simulate-human-metaphor-k5l2mh.json Framework: Metaphor Analysis v6.4 Schema Version: 3.0 Generated: 2026-04-03T09:27:40.621Z

Discourse Depot © 2025 by TD is licensed under CC BY-NC-SA 4.0