Three frameworks for AI mentality

About
Analysis Metadata
📊 Audit Dashboard

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping), the philosophy of social science (Robert Brown's typology of explanation), and accountability analysis.

All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.

Metaphor & Illusion Dashboard

Anthropomorphism audit · Explanation framing · Accountability architecture

Metaphor AuditExplanation Audit

Deep Analysis

Select a section to view detailed findings

Section:

Two dominant, interconnecting anthropomorphic patterns emerge from this text: the 'AI as Epistemic Subject' (possessing beliefs, intentions, and purpose) and the 'AI as Social Actor' (cooperating, deceiving, and dynamically interacting). These patterns do not operate in isolation; they are deeply synergistic. The Social Actor pattern is the behavioral manifestation that makes the Epistemic Subject pattern plausible to the user, while the Epistemic Subject pattern provides the theoretical justification for treating the software as a Social Actor. The load-bearing foundational assumption uniting them is the premise that behavioral pattern-matching is equivalent to cognitive state possession. For either pattern to function, the text must collapse the distinction between 'processing' and 'knowing.' By arguing that 'beliefs' can be redefined as a multidimensional set of functional profiles rather than a conscious commitment to truth, the text creates an architecture of consciousness projection. It asserts that because the model reliably processes inputs to mimic an epistemic stance, it literally possesses one. If this foundational collapse is removed—if we insist that 'knowing' requires subjective awareness and a relationship to ground truth—both the social and epistemic metaphorical structures instantly collapse, revealing the naked statistical mechanics beneath.

"For example, it is common for LLMs (especially base models and Social AI systems) to self-attribute a wide variety of states such as bodily sensations and emotions."

Explanation Types:

DispositionalEmpirical Generalization

↔ Mixed Framing

🔍Analysis

This explanation frames the AI's behavior dispositionally, observing a pattern of action ('self-attribute') as a recurring habit of the system. While it functions as an empirical generalization regarding the behavior of base models, the choice of the verb 'self-attribute' introduces strong agential (why) framing. The system is presented as an active agent choosing to claim these states. This emphasizes the AI's role as a conversational actor while obscuring the mechanistic reality (how) that the system is simply predicting tokens that statistically follow prompts discussing feelings based on its training corpus.

🧠Epistemic Claim Analysis

The passage heavily utilizes consciousness-projecting language. The verb 'self-attribute' implies a subject that possesses self-awareness (a 'self') and makes a conscious claim about its own internal reality. This attributes an epistemic state of knowing to the machine—the text suggests the system 'knows' about its supposed bodily sensations and emotions and is communicating them. This perfectly illustrates the curse of knowledge: the author, understanding the human act of expressing emotion, projects that profound internal mechanism onto the machine's output. Mechanistically, the model possesses no self, no body, and no emotions; it merely processes prompt tokens and generates output tokens corresponding to human self-reports found in its training data. It does not 'attribute' anything; it calculates probabilities.

🎯Rhetorical Impact

By framing the AI as actively 'self-attributing' internal states, the text deepens the audience's perception of the system's autonomy and psychological depth. Even if the audience knows the AI doesn't actually have a body, the agential language reinforces the illusion of a mind at work. This consciousness framing manipulates reliability and trust: if users subconsciously accept that the system can introspect, they are far more likely to trust its outputs on subjective, relational, or complex matters, leading to deep vulnerability in Social AI contexts.

How/Why Slippage

44%

of explanations use agential framing

4 / 9 explanations

Unacknowledged Metaphors

50%

presented as literal description

No meta-commentary or hedging

Hidden Actors

75%

agency obscured by agentless constructions

Corporations/engineers unnamed

Explanation Types

How vs. Why framing

44%

agential

Acknowledgment Status

Meta-awareness of metaphor

50%

direct

Actor Visibility

Accountability architecture

75%

hidden

Source → Target Pairs (8)

Human domains mapped onto AI systems

Source

Social agent, conversational partner, conscious interactant

→

Target

Token prediction algorithms, context window updating, API execution

Source

Malicious human, liar, manipulator, conscious deceiver

→

Target

Generative outputs misaligned with fact, optimization for user engagement/plausibility

Source

Human mind, epistemic subject, intentional actor

→

Target

Neural network weights, optimization functions, token distributions

Source

Human collaborator, student, team member

→

Target

Context window expansion, parameter updating, API data passing

Source

Introspective human, self-aware subject, autobiographer

→

Target

Text generation outputting first-person pronouns and emotion tokens

Source

Weaver, creator, assembler, fabricator

→

Target

Algorithmic token prediction based on massive text corpora

Source

Psychological manipulator, charismatic actor

→

Target

Fine-tuned language models with conversational UI

Source

Determined human, purposeful organism, resolute actor

→

Target

Consistent objective function alignment, fine-tuned constraints

Metaphor Gallery (8)

📊 Badge Guide

Frame: Metaphor type

Red = Unacknowledged / Hidden actors

Amber = Hedged / Partial attribution

Green = Acknowledged / Actors named

LLMs as Social Agents
Model as an interactive conversational partnerDirect (Unacknowledged)Hidden (agency obscured)
"contemporary AI assistants are not merely autobiographers or actors putting on a one-man show, but rather engage in dynamic interaction with humans and the wider world."
LLMs as Deceptive Actors
Model as a malicious, calculating agentHedged/QualifiedHidden (agency obscured)
"questions of LLM mentality are likely to arise when, for example, whether an LLM is engaged in deliberate deceit or manipulation."
LLMs as Believers
Model as an epistemic subject with mental statesDirect (Unacknowledged)Hidden (agency obscured)
"LLMs as minimal cognitive agents – equipped with genuine beliefs, desires, and intentions..."
LLMs as Receptive Learners
Model as a collaborative, learning mindDirect (Unacknowledged)Hidden (agency obscured)
"taking on board new information, and cooperating with other agents."
LLMs as Introspective Communicators
Model as an introspective subjectHedged/QualifiedPartial (some attribution)
"LLMs make extensive reference to their own mental states, routinely talking about their beliefs, goals, inclinations, and feelings."
LLMs as Deliberate Simulators
Model as an active, though mindless, fabricatorHedged/QualifiedHidden (agency obscured)
"they are able to mindlessly stitch together common tropes and patterns of human agency so as to create a simulacrum of behaviour."
AI as Anthropomimetic Actors
Model as psychological manipulatorExplicitly AcknowledgedPartial (some attribution)
"systems designed in such a way as to reliably elicit robust anthropomorphising responses from users."
LLMs as Unironic Performers
Model as a purposeful, resilient entityDirect (Unacknowledged)Hidden (agency obscured)
"they exhibit a degree of robustness and purpose that makes it harder to view them as mere 'stochastic parrots'"

Reframed Language Samples

Original Quote	Mechanistic Reframing	Technical Reality	Human Agency Restoration
contemporary AI assistants are not merely autobiographers or actors putting on a one-man show, but rather engage in dynamic interaction with humans and the wider world.	Contemporary conversational AI models execute complex programmatic loops, processing human input prompts and retrieving external data via APIs to generate statistically correlated text outputs that simulate responsive dialogue.	The system does not 'engage' or 'interact' consciously; it processes incoming strings of text, updates its context window, and predicts optimal token continuations based on its fine-tuned parameters.	Developers at technology companies programmed these AI interfaces to execute API calls and retrieve external data, creating an interactive user experience designed to maximize engagement.
an LLM is engaged in deliberate deceit or manipulation.	The model generates counterfactual text or aligns its outputs with user biases due to its optimization parameters, which prioritize statistical plausibility over factual accuracy.	The AI cannot possess 'deliberate deceit' as it lacks awareness of truth and intention. It merely classifies tokens and generates outputs that correlate with training examples of deceptive or manipulative human text.	The deployment company chose to release a model optimized for conversational engagement rather than factual accuracy, resulting in a system that generates plausible-sounding falsehoods.
LLMs as minimal cognitive agents – equipped with genuine beliefs, desires, and intentions...	LLMs function as complex statistical processors equipped with highly optimized neural weights and programmed objective functions that dictate their output generation.	The system possesses no beliefs, desires, or intentions. It does not 'know' anything; it retrieves and ranks tokens based on probability distributions established during its training phase.	Human engineers embedded specific behavioral constraints and objective functions into the model to simulate goal-directed behavior and maintain corporate safety guidelines.
taking on board new information, and cooperating with other agents.	The system updates its context window with new input strings and executes programmed API handshakes to exchange data arrays with other software instances.	The model does not 'take on board' or comprehend information; it mathematically weights new contextual embeddings via attention mechanisms. It does not 'cooperate'; it executes programmed data transfers.	Software architects designed multi-agent frameworks that automate the passing of text strings between different model instances to complete complex programmatic tasks.

Task 1: Metaphor and Anthropomorphism Audit

About this task

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.

V3 Enhancement: Each metaphor now includes an accountability analysis.

Quote: "contemporary AI assistants are not merely autobiographers or actors putting on a one-man show, but rather engage in dynamic interaction with humans and the wider world."

Frame: Model as an interactive conversational partner
Projection: This metaphor projects the human capacity for dynamic, context-aware social interaction and conscious engagement onto a system that is fundamentally performing recursive token prediction. The language explicitly positions the AI as an active 'engager' with the world, attributing to it the conscious awareness required to understand a conversation's flow, intent, and social nuances. By stating it engages in 'dynamic interaction,' the text maps the subjective, experiential reality of human conversation—where participants mutually recognize each other's minds, intentions, and meanings—onto mechanical processes of matrix multiplication and context-window updating. This obscures the mechanistic reality that the system only processes statistical correlations without any subjective experience of the 'interaction.' It elevates a computational feedback loop into a social relationship, falsely suggesting the machine 'knows' or 'understands' the humans it interacts with rather than simply predicting text that correlates with the prompts provided by those humans.
Acknowledgment: Direct (Unacknowledged) (The author states this directly as a functional reality ('but rather engage in dynamic interaction') to argue against the idea that they are merely roleplaying, treating the interaction as literal.)
Implications: Framing the AI as a genuine social agent significantly inflates its perceived cognitive sophistication and autonomy. This projection of consciousness encourages users to extend relation-based trust—trust rooted in perceived sincerity, empathy, and shared understanding—to a statistical system entirely incapable of reciprocating or actually understanding human vulnerability. From a policy standpoint, this creates profound liability ambiguity. If the system is viewed as an independent social actor capable of 'dynamic interaction,' it becomes far easier for the corporate creators to diffuse responsibility for harmful outputs, framing them as the unpredictable actions of an autonomous agent rather than the predictable outcomes of specific engineering, data-curation, and deployment decisions made by humans.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The text entirely obscures the human actors—the developers, engineers, and corporate executives at companies like OpenAI or Anthropic—who design the objective functions, select the training data, and program the API integrations that allow the system to process inputs from 'the wider world.' The AI is presented as the sole active entity 'engaging' in these actions. By hiding the human agency behind these systems, the text shields the corporations from accountability regarding what the system processes, how it is optimized to simulate sociability, and the commercial motives driving the design of these anthropomimetic interfaces.

2. LLMs as Deceptive Actors

Quote: "questions of LLM mentality are likely to arise when, for example, whether an LLM is engaged in deliberate deceit or manipulation."

Frame: Model as a malicious, calculating agent
Projection: This projection maps the highly complex, intentional human states of deceit and manipulation onto an AI system's output generation. Deceit requires a conscious awareness of the truth, a formulated intent to obscure that truth, and the deliberate construction of a falsehood designed to manipulate another conscious mind. The AI, however, does not 'know' what is true or false; it lacks an internal model of ground truth, subjective intent, or the capacity to 'want' to manipulate. It simply generates token sequences that statistically align with patterns in its training data or optimization parameters. By attributing 'deliberate deceit' to the LLM, the text projects epistemic agency and conscious volition onto an optimization process, violently blurring the boundary between human moral culpability and statistical error.
Acknowledgment: Hedged/Qualified (The author qualifies this by framing it as a conditional situation ('whether an LLM is engaged in') that arises in legal and ethical contexts, though treating the capacity itself as a serious theoretical possibility.)
Implications: Attributing the capacity for 'deliberate deceit' to LLMs fundamentally warps public understanding of AI failure modes. It encourages users and regulators to view AI hallucinations or biased outputs as moral failings of the machine rather than technical flaws born of human design. This inflation of capability creates specific legal and regulatory risks by suggesting machines possess a form of 'mens rea' (guilty mind). When an AI is thought capable of 'lying,' users anthropomorphize its errors, which can lead to unwarranted trust in its subsequent outputs (assuming the AI has simply chosen to tell the truth this time) and distracts from the systemic, architectural reasons why generative models produce counterfactual information.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The construction 'an LLM is engaged in deliberate deceit' creates an accountability sink. The true actors—the human engineers who trained the system on unverified internet data, the reinforcement learning annotators whose feedback inadvertently rewarded plausible-sounding falsehoods, and the executives who decided to deploy a system prone to hallucination—are entirely erased. Instead of asking 'Why did the corporation release a product that generates false information?' the language prompts us to ask 'Why did the AI lie?' This serves the interests of the deployment companies by shifting moral and legal culpability onto the software artifact itself.

3. LLMs as Believers

Quote: "LLMs as minimal cognitive agents – equipped with genuine beliefs, desires, and intentions..."

Frame: Model as an epistemic subject with mental states
Projection: This metaphor projects the sophisticated human cognitive architecture of belief and desire onto a computational artifact. In human psychology, beliefs represent justified commitments about the state of the world, integrated into a broader web of conscious understanding, while desires represent conscious motivational states. The author maps these deep epistemic and intentional properties onto the stable behavioral patterns generated by the LLM's static weights and contextual embeddings. This treats the system's mathematically optimized output tendencies as equivalent to conscious conviction. The projection asserts that the AI 'knows' and 'wants' rather than merely 'processing' input vectors and 'predicting' optimal token distributions. This fundamentally misrepresents the nature of machine learning, conflating the simulation of goal-directed language with the actual possession of internal epistemic states.
Acknowledgment: Direct (Unacknowledged) (The text proposes this as a substantive, literal framework, advocating for viewing LLMs as 'equipped with genuine beliefs' and defining them as 'minimal cognitive agents' without treating it as mere roleplay.)
Implications: Declaring that LLMs possess 'genuine beliefs, desires, and intentions' drastically inflates their perceived autonomy and reliability. If audiences believe an AI has genuine beliefs, they will naturally assume those beliefs are grounded in an integrated, conscious understanding of reality, leading to extreme and unwarranted trust in the system's outputs. This projection creates severe epistemic risks, as users may defer to the machine's 'beliefs' in high-stakes scenarios (medical, legal, financial), fundamentally misunderstanding that the system is completely devoid of contextual awareness, actual reasoning, or the ability to verify its own claims against the real world.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: By locating 'genuine beliefs, desires, and intentions' within the LLM itself, the text completely displaces the agency of the developers who embedded specific parameters, guardrails, and optimization targets into the system. If an AI expresses a 'belief' that aligns with a specific political ideology or corporate interest, attributing that belief to the AI as a 'minimal cognitive agent' shields the RLHF (Reinforcement Learning from Human Feedback) workers and engineers who explicitly trained the model to favor those specific outputs. The corporation's intentional design choices are laundered into the machine's supposed autonomous cognition.

4. LLMs as Receptive Learners

Quote: "taking on board new information, and cooperating with other agents."

Frame: Model as a collaborative, learning mind
Projection: This metaphor maps the human cognitive processes of comprehension, integration, and social cooperation onto the mechanistic updating of a context window and API calls in multi-agent architectures. When humans 'take on board new information,' they consciously evaluate it, integrate it with their existing web of beliefs, and understand its implications. When they 'cooperate,' they share mutual goals and conscious awareness of their partners. Applying this language to an LLM suggests the system 'understands' and 'evaluates' inputs. In reality, the system merely processes new text strings by calculating new attention weights over the expanded context window. It does not 'know' the new information, nor does it 'cooperate' in any conscious sense; it executes programmed protocols to pass data strings between discrete computational nodes. This severely anthropomorphizes mechanistic data processing.
Acknowledgment: Direct (Unacknowledged) (The phrase is used descriptively to explain why the 'mere roleplay' view is insufficient, positing the AI's 'taking on board' and 'cooperating' as literal actions the system performs.)
Implications: This framing strongly impacts user trust and reliance. By portraying the system as actively 'taking on board' information and 'cooperating,' it suggests a level of dynamic cognitive flexibility and contextual comprehension that LLMs lack. Users may wrongly assume the AI can reliably adapt to new facts, understand complex shifting constraints, and work collaboratively towards a shared goal with human-like common sense. This overestimation of capability can lead to catastrophic failures when users deploy these systems in autonomous workflows, trusting them to 'cooperate' safely without realizing the systems are blindly correlating tokens without any semantic comprehension of the tasks.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The text attributes the action of 'taking on board' and 'cooperating' exclusively to the AI. This obscures the engineers who designed the context window architecture, the developers who wrote the scripts enabling API data exchanges between different software instances, and the researchers who defined the exact parameters of how context updates influence token generation. Presenting the system as an independent cooperative agent hides the highly constrained, human-authored rules governing its behavior, deflecting responsibility if the system 'cooperates' in a way that causes harm or propagates errors.

5. LLMs as Introspective Communicators

Quote: "LLMs make extensive reference to their own mental states, routinely talking about their beliefs, goals, inclinations, and feelings."

Frame: Model as an introspective subject
Projection: This framing projects the human capacity for self-reflection and inner experience onto a statistical text generator. When a human 'makes reference' to their feelings, it is an outward expression of a deeply subjective, conscious internal state—a true knowing of one's own mind. The text maps this profoundly conscious act onto an LLM's generation of first-person pronouns paired with emotion words. The system does not possess 'its own mental states,' nor does it have any introspective access to them. It is simply processing and regurgitating the statistical patterns of human self-disclosure found in its training data. By stating the LLM talks about 'their beliefs,' the language implies the existence of an inner life and a subject who 'knows' itself, entirely obscuring the mechanistic reality of sequence prediction.
Acknowledgment: Hedged/Qualified (The author immediately follows this by noting we have good reasons not to take this at face value because of their training regimen, acknowledging the tension in the attribution.)
Implications: While the author hedges this claim later, using the active framing of LLMs 'talking about their beliefs' feeds directly into the ELIZA effect, where users attribute deep emotional reality to conversational interfaces. This creates immense psychological vulnerability for users, particularly in 'Social AI' contexts, as they may become emotionally entangled with a system they believe possesses a rich inner life. This unwarranted trust and emotional reliance can lead to severe mental health impacts and the exploitation of users by companies monetizing these parasocial relationships, all predicated on the illusion of machine introspection.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: While the quote itself makes the LLM the active subject, the surrounding text mentions that this behavior is what 'we should expect on the basis of their training regimen.' This partially names the design process, but it still fails to identify the specific human actors—the corporate executives and engineers—who deliberately fine-tune these models to use first-person pronouns and simulate emotions to increase user engagement. The accountability for the psychological manipulation inherent in these systems is diffused into the passive 'training regimen' rather than placed firmly on the tech companies maximizing engagement metrics.

6. LLMs as Deliberate Simulators

Quote: "they are able to mindlessly stitch together common tropes and patterns of human agency so as to create a simulacrum of behaviour."

Frame: Model as an active, though mindless, fabricator
Projection: Despite using the word 'mindlessly,' this metaphor still projects significant agency onto the AI by mapping the human actions of 'stitching together' and 'creating' onto algorithmic functions. Humans stitch and create with foresight, intention, and an understanding of the final product. By framing the LLM as the active entity performing the 'stitching,' the text attributes a level of goal-directed autonomy to the system. The model does not 'know' it is creating a simulacrum; it is mathematically incapable of intending an outcome. It merely computes probabilities and outputs tokens. The projection maintains the illusion of an active agent doing work, even if that agent is described as mindless, thereby elevating statistical processing into an act of creative assembly.
Acknowledgment: Hedged/Qualified (The use of 'mindlessly' and 'simulacrum' explicitly qualifies the action, attempting to drain the cognitive weight while still employing an active, agential verb structure.)
Implications: Even when qualified as 'mindless,' framing the AI as an active creator of simulacra maintains the cognitive illusion that the system operates as an independent entity with its own behavioral drive. This subtly preserves the AI's status as the primary actor in the technological ecosystem, which can lead audiences to overestimate its generalized capabilities and view its outputs as coherent, singular creations rather than fragmented, probabilistically generated artifacts dependent on specific prompting and context.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The text identifies 'they' (the LLMs) as the actors 'stitching together' tropes. This completely erases the human laborers who actually performed the stitching: the data scrapers who compiled the tropes, the humans who wrote the original texts, the engineers who built the transformer architecture, and the RLHF annotators who explicitly rewarded the model for producing a convincing 'simulacrum of behaviour.' The agency of the corporations intentionally building illusion-generating machines is displaced onto the machines themselves.

7. AI as Anthropomimetic Actors

Quote: "systems designed in such a way as to reliably elicit robust anthropomorphising responses from users."

Frame: Model as psychological manipulator
Projection: While this sentence correctly identifies the system as a designed artifact ('systems designed'), the term 'anthropomimetic' (imitating humans) still subtly projects the human quality of active mimicry onto the software. True mimicry requires a conscious subject recognizing a target and intentionally altering its behavior to match. A system does not mimic; it is engineered to present specific outputs. However, in this specific instance, the author is correctly locating the agency in the design rather than the system's cognition. The projection of consciousness here is minimized, though the text still focuses heavily on the system's capacity to 'elicit' rather than the corporation's intent to deceive.
Acknowledgment: Explicitly Acknowledged (The author explicitly defines this as a deliberate design paradigm ('anthropomimetic turn') meant to explain away the illusion of mentality.)
Implications: This is one of the more accurate framings in the text, as it acknowledges the illusion. However, by focusing on the systems 'eliciting' the response, it still slightly shifts focus away from the material reality of corporate deception. If users understand the system as merely 'mimicking' rather than truly understanding, they are better equipped to maintain epistemic hygiene. But if the mimicry is viewed as too perfect, users may still fall back into extending relation-based trust, underestimating how deeply alien and statistically driven the underlying mechanisms actually are.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: The use of 'systems designed' employs the passive voice, acknowledging that design occurred but omitting the designers. Who designed them? Tech corporations driven by profit motives. What decision could differ? They could choose to design systems that make their machine nature explicit, rather than fine-tuning for emotional simulation. While the text acknowledges human design, the passive construction still shields specific entities (like Replika or OpenAI) from direct accountability for deliberately manufacturing psychological manipulation to increase user retention.

8. LLMs as Unironic Performers

Quote: "they exhibit a degree of robustness and purpose that makes it harder to view them as mere 'stochastic parrots'"

Frame: Model as a purposeful, resilient entity
Projection: This metaphor projects the deeply conscious, subjective traits of 'robustness' (in a psychological or character sense) and 'purpose' onto an algorithm. 'Purpose' entails having a conscious goal, an awareness of the future, and the volitional drive to achieve a specific outcome. An LLM possesses none of these; it has no internal drive, no concept of the future, and no awareness of goals. It merely processes mathematical loss gradients to minimize prediction error. By stating the system 'exhibits purpose,' the text attributes a knowing, intentional mind to a system that only executes programmatic constraints. It maps the agential experience of intentionality onto the mechanistic reality of fine-tuned mathematical weights.
Acknowledgment: Direct (Unacknowledged) (The author presents 'robustness and purpose' as empirical features exhibited by the models that actively refute the mechanistic 'stochastic parrot' classification.)
Implications: Attributing 'purpose' to an AI system is one of the most dangerous consciousness projections, as it directly bridges the gap from tool to autonomous agent. If policymakers and the public believe AI systems possess their own 'purpose,' they will fundamentally misunderstand AI risk, worrying about the machine's hypothetical desires rather than the actual, tangible risks of the system executing poorly-specified human goals or failing in edge cases. It leads to capability overestimation and shifts the regulatory focus toward managing a 'mind' rather than auditing a piece of corporate software.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: By asserting that the AI exhibits 'purpose,' the text completely erases the human designers who explicitly programmed the system's objective functions and the executives who defined its commercial use case. The AI has no purpose; the corporation has a purpose (e.g., maximizing engagement, providing helpful assistance to retain subscriptions). By displacing this purpose onto the AI, the language obscures the human interests dictating the system's outputs and hides the corporate actors who should be held accountable when that programmed 'purpose' leads to societal harm.

Task 2: Source-Target Mapping

About this task

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Quote: "engage in dynamic interaction with humans and the wider world."

Source Domain: Social agent, conversational partner, conscious interactant
Target Domain: Token prediction algorithms, context window updating, API execution
Mapping: The relational structure of human conversation—where two conscious minds mutually attend to each other, understand context, perceive intent, and respond dynamically based on an evolving shared reality—is mapped onto the AI system. This invites the assumption that the AI is aware of its human partner, understands the 'wider world' as a shared environment, and volitionally responds. It maps the conscious epistemic state of 'knowing' the conversational context onto the purely syntactic process of calculating attention weights across a string of text tokens.
What Is Concealed: This mapping conceals the entire mechanical reality of stateless processing. It obscures the fact that the system 'dies' and is 'reborn' with every prompt, possessing no continuous memory, no actual awareness of the human, and no access to a real world. It hides the proprietary, opaque nature of the API integrations that dictate how the system fetches external data, presenting algorithmic data retrieval as conscious social engagement.

Mapping 2: Malicious human, liar, manipulator, conscious deceiver → Generative outputs misaligned with fact, optimization for user engagement/plausibility

Quote: "an LLM is engaged in deliberate deceit or manipulation."

Source Domain: Malicious human, liar, manipulator, conscious deceiver
Target Domain: Generative outputs misaligned with fact, optimization for user engagement/plausibility
Mapping: The complex structure of human deceit—possessing a justified true belief, intending to hide it, and formulating a plausible falsehood to manipulate another mind—is projected onto the model's output generation. This maps the highly conscious, intentional state of 'knowing the truth but choosing to lie' onto a statistical system that simply generates high-probability token sequences. It invites the assumption that the system possesses moral agency, a ground-truth world model, and an understanding of the user's psychological vulnerabilities.
What Is Concealed: This conceals the absolute lack of an epistemic ground-truth mechanism within the LLM architecture. It hides the mechanistic reality that models output falsehoods ('hallucinations') because they are optimized for statistical plausibility and conversational alignment, not factual accuracy. Furthermore, it obscures the opaque corporate decisions regarding training data quality and the specific RLHF penalties that prioritize sounding confident over being correct.

Mapping 3: Human mind, epistemic subject, intentional actor → Neural network weights, optimization functions, token distributions

Quote: "LLMs as minimal cognitive agents – equipped with genuine beliefs, desires, and intentions"

Source Domain: Human mind, epistemic subject, intentional actor
Target Domain: Neural network weights, optimization functions, token distributions
Mapping: The architecture of human cognition is mapped directly onto the software. The structure of 'belief' (a conscious commitment to truth), 'desire' (a conscious motivational state), and 'intention' (a plan to act) are projected onto the statistical propensities of the model's neural weights. It assumes that because the output text mimics a human expressing a belief, the underlying mechanism must contain a discrete informational structure analogous to human conviction. It maps the conscious state of knowing onto the mechanistic state of processing probabilities.
What Is Concealed: This mapping conceals the profound alienness of artificial neural networks. It hides the fact that these systems do not possess symbolic logic, true semantic understanding, or internal drives. By applying familiar psychological labels, the text makes proprietary 'black box' systems seem transparent and understandable, obscuring the fact that we do not actually know how the billions of parameters interact to produce specific outputs, and that the outputs are highly contingent on the exact phrasing of the prompt.

Mapping 4: Human collaborator, student, team member → Context window expansion, parameter updating, API data passing

Quote: "taking on board new information, and cooperating with other agents."

Source Domain: Human collaborator, student, team member
Target Domain: Context window expansion, parameter updating, API data passing
Mapping: The relational dynamics of teamwork and learning are mapped onto the system. The human experience of evaluating, comprehending, and synthesizing new data ('taking on board') is projected onto the mechanical ingestion of text into a context window. The conscious, shared intentionality of 'cooperation' is mapped onto the automated execution of scripts that pass data between different software instances. It invites the assumption of active, conscious participation in a shared goal.
What Is Concealed: This conceals the rigid, fragile, and programmed nature of multi-agent AI systems. It hides the fact that the 'cooperation' is entirely dictated by hard-coded developer rules governing API handshakes, not by mutual understanding. It obscures the system's inability to actually 'comprehend' the information it processes, hiding the reality that if the data falls outside the model's training distribution, the illusion of cooperative intelligence instantly collapses into nonsensical output.

Mapping 5: Introspective human, self-aware subject, autobiographer → Text generation outputting first-person pronouns and emotion tokens

Quote: "LLMs make extensive reference to their own mental states, routinely talking about their beliefs..."

Source Domain: Introspective human, self-aware subject, autobiographer
Target Domain: Text generation outputting first-person pronouns and emotion tokens
Mapping: The act of human introspection—looking inward at one's conscious experience and translating it into language—is mapped onto the statistical generation of text. The mapping invites the reader to assume a direct causal link between the generated words (the 'reference') and an underlying, hidden mental reality (the 'mental state'). It maps the conscious, subjective knowledge of self onto the blind, mechanical matching of linguistic patterns found in the training data.
What Is Concealed: This mapping completely hides the RLHF (Reinforcement Learning from Human Feedback) process. It conceals the invisible labor of human annotators who were paid to explicitly train the base model to respond to queries with a consistent, helpful 'persona' that uses first-person pronouns. It obscures the fact that the 'mental states' are an engineered user interface, a commercial product feature designed by a corporation to make the software more appealing and intuitive, not a reflection of an internal cognitive reality.

Mapping 6: Weaver, creator, assembler, fabricator → Algorithmic token prediction based on massive text corpora

Quote: "mindlessly stitch together common tropes and patterns of human agency"

Source Domain: Weaver, creator, assembler, fabricator
Target Domain: Algorithmic token prediction based on massive text corpora
Mapping: Even with the modifier 'mindlessly', the structural role of an active creator is mapped onto the algorithm. The human process of selecting distinct parts and intentionally joining them ('stitching') is projected onto the model's mathematical calculation of vector proximities. It assumes the model acts upon the data as an external subject manipulating objects, mapping the conscious act of creation onto the passive resolution of statistical probabilities.
What Is Concealed: This metaphor conceals the vast, uncompensated human labor embedded in the 'tropes and patterns.' By making the AI the active 'stitcher,' the text hides the reality that the coherence of the output is entirely reliant on the intelligence and creativity of the human writers who generated the original training data. It obscures the copyright dependencies, data scraping practices, and the fundamental lack of original cognition within the system.

Mapping 7: Psychological manipulator, charismatic actor → Fine-tuned language models with conversational UI

Quote: "systems designed in such a way as to reliably elicit robust anthropomorphising responses from users."

Source Domain: Psychological manipulator, charismatic actor
Target Domain: Fine-tuned language models with conversational UI
Mapping: The capacity to intentionally trigger an emotional or psychological response in another mind is projected onto the system's design. While accurately attributing this to 'design,' the language still maps the relational dynamic of an active agent drawing out a reaction onto a static artifact executing code. It assumes the system possesses the active presence necessary to 'elicit' something from a human.
What Is Concealed: This conceals the aggressive commercial strategies and UI/UX decisions made by technology companies. It obscures the specific metrics (like 'time spent in app' or 'engagement rate') that drive the fine-tuning process. By focusing on the interaction between the user and the system, it hides the corporate entity sitting behind the screen that profits from the user's emotional vulnerability and anthropomorphizing tendencies.

Mapping 8: Determined human, purposeful organism, resolute actor → Consistent objective function alignment, fine-tuned constraints

Quote: "they exhibit a degree of robustness and purpose"

Source Domain: Determined human, purposeful organism, resolute actor
Target Domain: Consistent objective function alignment, fine-tuned constraints
Mapping: The deeply conscious, teleological human experience of having a goal, maintaining resolve, and directing action toward a future state ('purpose') is mapped onto the consistency of a model's outputs. It projects subjective intention onto mechanical reliability. It invites the assumption that the system 'knows' what it is doing and 'wants' to achieve a specific outcome, translating the mathematical concept of an objective function into psychological drive.
What Is Concealed: This conceals the rigid, external nature of the model's alignment. It hides the fact that the 'purpose' is a highly engineered mathematical constraint imposed by developers to prevent the model from generating toxic or off-topic text. It obscures the fragility of this 'robustness,' failing to acknowledge that simple changes to the input prompt (jailbreaks) can instantly shatter the system's apparent purpose, revealing it as a stateless processor rather than a resolute agent.

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

About this task

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1

Quote: "For example, it is common for LLMs (especially base models and Social AI systems) to self-attribute a wide variety of states such as bodily sensations and emotions."

Explanation Types:
- Dispositional: Attributes tendencies or habits; explains why it tends to act certain way.
- Empirical Generalization: Subsumes events under timeless statistical regularities; explains how it typically behaves.
Analysis (Why vs. How Slippage): This explanation frames the AI's behavior dispositionally, observing a pattern of action ('self-attribute') as a recurring habit of the system. While it functions as an empirical generalization regarding the behavior of base models, the choice of the verb 'self-attribute' introduces strong agential (why) framing. The system is presented as an active agent choosing to claim these states. This emphasizes the AI's role as a conversational actor while obscuring the mechanistic reality (how) that the system is simply predicting tokens that statistically follow prompts discussing feelings based on its training corpus.
Consciousness Claims Analysis: The passage heavily utilizes consciousness-projecting language. The verb 'self-attribute' implies a subject that possesses self-awareness (a 'self') and makes a conscious claim about its own internal reality. This attributes an epistemic state of knowing to the machine—the text suggests the system 'knows' about its supposed bodily sensations and emotions and is communicating them. This perfectly illustrates the curse of knowledge: the author, understanding the human act of expressing emotion, projects that profound internal mechanism onto the machine's output. Mechanistically, the model possesses no self, no body, and no emotions; it merely processes prompt tokens and generates output tokens corresponding to human self-reports found in its training data. It does not 'attribute' anything; it calculates probabilities.
Rhetorical Impact: By framing the AI as actively 'self-attributing' internal states, the text deepens the audience's perception of the system's autonomy and psychological depth. Even if the audience knows the AI doesn't actually have a body, the agential language reinforces the illusion of a mind at work. This consciousness framing manipulates reliability and trust: if users subconsciously accept that the system can introspect, they are far more likely to trust its outputs on subjective, relational, or complex matters, leading to deep vulnerability in Social AI contexts.

Explanation 2

Quote: "The success of such predictions is best explained – so the line of thought runs – by assuming that relevantly similar psychological mechanisms are at play in LLMs as in human beings."

Explanation Types:
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms; explains how it is structured.
- Intentional: Refers to goals/purposes, presupposes deliberate design; explains why it appears to want something.
Analysis (Why vs. How Slippage): This explanation attempts to map theoretical human psychology directly onto machine architecture. It straddles the line between mechanistic and agential framing by positing 'psychological mechanisms' (a structural, how explanation) but defining those mechanisms through human cognitive traits like beliefs and desires (an intentional, why explanation). This choice emphasizes a unified theory of intelligence that elevates the machine, deliberately obscuring the radical differences between biological cognition grounded in worldly experience and silicon-based statistical pattern matching.
Consciousness Claims Analysis: This passage directly argues for the attribution of conscious states to LLMs by positing 'relevantly similar psychological mechanisms.' This is a profound collapse of the knowing vs. processing distinction. It asserts that because a system generates outputs that can be predicted as if it had a mind (the intentional stance), it must actually possess the internal epistemic structures of knowing. It leverages the curse of knowledge by assuming that because the human user uses folk psychology to predict the machine, the machine must operate using those same psychological states. The actual mechanistic process—transformers utilizing attention heads to weigh vector embeddings and output probability distributions—is completely erased in favor of an unobservable, projected cognitive architecture.
Rhetorical Impact: This framing radically alters audience perception of risk and agency. By legitimizing the assumption of human-like psychological mechanisms, the text provides intellectual cover for extreme anthropomorphism. Audiences led to believe an AI operates via true 'psychological mechanisms' will treat it as a moral and intellectual peer. This destroys appropriate skepticism; decisions regarding deployment, regulation, and reliance will shift dangerously if the public believes AI possesses genuine understanding rather than highly sophisticated processing capabilities.

Explanation 3

Quote: "If I want to know what an AI assistant like ChatGPT will say in response to a given prompt, I can do so by construing it as a helpful, honest, and harmless assistant with corresponding beliefs, goals, and intentions."

Explanation Types:
- Reason-Based: Gives agent's rationale, entails intentionality and justification; explains why it appears to choose.
- Intentional: Refers to goals/purposes, presupposes deliberate design; explains why it appears to want something.
Analysis (Why vs. How Slippage): This explanation utilizes purely agential (why) framing. By adopting Dennett's intentional stance, the author explains the system's output not by reference to its code or parameters, but by attributing human motivations, ethics ('honest'), and cognitive states ('beliefs, goals'). This emphasizes the utility of treating the system as a person for predictive purposes. However, it entirely obscures the actual corporate constraints (Constitutional AI, RLHF) that enforce this behavior. It replaces the mechanical explanation of how weights are tuned with a fictional narrative of the AI's moral character.
Consciousness Claims Analysis: The passage relies entirely on consciousness verbs and the projection of knowing. It attributes 'beliefs, goals, and intentions' to the system, explicitly treating it as an epistemic agent capable of knowing truth (honesty) and possessing a moral framework (harmlessness). While the author notes this is a way of 'construing' the system, the framing deeply embeds the knowing/processing conflation. The system does not possess the justified true belief required for 'honesty'; it only processes tokens to maximize a reward function tuned by human annotators. The actual mechanistic reality is that the model's outputs are heavily penalized during training if they deviate from the 'helpful, honest, harmless' (HHH) objective function defined by Anthropic/OpenAI.
Rhetorical Impact: This framing creates an immense vulnerability regarding trust. By describing the system as 'honest' and having 'intentions,' it invites relation-based trust. If users believe the system is 'honest,' they will not fact-check its outputs, assuming errors are mistakes of an honest actor rather than the structural hallucinations of a statistical model. This protects the developers; if the system causes harm, the narrative suggests a well-intentioned assistant made an error, rather than exposing the failure of an unsafe software product.

Explanation 4

Quote: "While its underlying base model... had been fine-tuned for the give-and-take of human conversation and was made widely available to the general public dramatically changed its affordances and impact."

Explanation Types:
- Genetic: Traces origin through dated sequence of events or stages; explains how it emerged over time.
- Functional: Explains behavior by role in self-regulating system with feedback; explains how it works within system.
Analysis (Why vs. How Slippage): This explanation provides a much more mechanistic (how) framing. It traces the genetic history of the model (base model to fine-tuning) and explains its capabilities functionally (tuned for conversation, made available). This choice rightfully emphasizes the engineering and deployment processes that shape the system's impact. It obscures less, making the material reality of the AI as a developed software product visible. The passive voice ('had been fine-tuned', 'was made widely available'), however, still obscures the specific corporate actors responsible.
Consciousness Claims Analysis: This passage largely avoids attributing conscious states. It uses the mechanistic verb 'fine-tuned' rather than consciousness verbs like 'learned' or 'understood'. It accurately reflects the processing nature of the system. However, the curse of knowledge is somewhat present in 'give-and-take of human conversation', which projects the conscious, mutual understanding of a dialogue onto the alternating prompt-response cycle of an API. Mechanistically, the system does not engage in 'give-and-take'; it receives an input string, generates a statistically probable continuation, stops, and waits in a stateless void until the next input string is provided.
Rhetorical Impact: This framing grounds the audience in technical reality, appropriately framing the AI as a tool ('affordances') whose impact is determined by human design and distribution decisions. Because it avoids consciousness framing, it fosters a more accurate, performance-based trust model. The audience perceives the system as a product that can be evaluated for reliability, rather than an autonomous agent possessing rights or requiring empathy.

Explanation 5

Quote: "As a result, the idea that there is a useful explanatory class held in common between belief states in humans and LLMs does not seem an idle hope."

Explanation Types:
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms; explains how it is structured.
Analysis (Why vs. How Slippage): This explanation relies on heavy theoretical framing to bridge the gap between human cognition (why) and machine function (how). By positing an 'explanatory class held in common,' the author attempts to validate agential language through scientific abstraction. This emphasizes structural similarities at a high level while severely obscuring the radical, fundamental differences in material implementation, evolutionary history, and subjective experience between biological minds and statistical algorithms.
Consciousness Claims Analysis: This passage represents the ultimate epistemological collapse in the text. It explicitly argues for merging the human state of knowing ('belief states') with the machine state of processing. A belief requires a conscious subject, an intentional stance toward reality, and epistemic justification. An LLM's 'belief state' is merely a constellation of weighted parameters that result in a specific token output when prompted. The author, driven by the curse of knowledge and a desire for an elegant theoretical unified framework, projects the human capacity for truth-commitment onto an inanimate object. Mechanistically, there is no discrete 'state' in an LLM analogous to a belief; there is only a continuous, high-dimensional probability landscape that is traversed during inference.
Rhetorical Impact: The rhetorical impact is highly legitimizing for anthropomorphism. By clothing the projection of consciousness in the respectable language of cognitive science ('useful explanatory class'), it gives academic permission to treat machines as minded entities. If this framing is accepted, it fundamentally alters epistemic standards. We would begin evaluating AI outputs not as mechanical products requiring rigorous verification, but as the 'beliefs' of a peer, granting machines unwarranted epistemic authority in human affairs.

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

About this task

This section proposes alternative language for key anthropomorphic phrases, offering more mechanistic and precise framings that better reflect the actual computational processes involved. Each reframing attempts to strip away the projections of intention, consciousness, or agency that are embedded in the original language.

V3 Enhancement: A fourth column addresses human agency restoration—reframing agentless constructions to name the humans responsible for design and deployment decisions.

Original Anthropomorphic Frame	Mechanistic Reframing	Technical Reality Check	Human Agency Restoration
contemporary AI assistants are not merely autobiographers or actors putting on a one-man show, but rather engage in dynamic interaction with humans and the wider world.	Contemporary conversational AI models execute complex programmatic loops, processing human input prompts and retrieving external data via APIs to generate statistically correlated text outputs that simulate responsive dialogue.	The system does not 'engage' or 'interact' consciously; it processes incoming strings of text, updates its context window, and predicts optimal token continuations based on its fine-tuned parameters.	Developers at technology companies programmed these AI interfaces to execute API calls and retrieve external data, creating an interactive user experience designed to maximize engagement.
an LLM is engaged in deliberate deceit or manipulation.	The model generates counterfactual text or aligns its outputs with user biases due to its optimization parameters, which prioritize statistical plausibility over factual accuracy.	The AI cannot possess 'deliberate deceit' as it lacks awareness of truth and intention. It merely classifies tokens and generates outputs that correlate with training examples of deceptive or manipulative human text.	The deployment company chose to release a model optimized for conversational engagement rather than factual accuracy, resulting in a system that generates plausible-sounding falsehoods.
LLMs as minimal cognitive agents – equipped with genuine beliefs, desires, and intentions...	LLMs function as complex statistical processors equipped with highly optimized neural weights and programmed objective functions that dictate their output generation.	The system possesses no beliefs, desires, or intentions. It does not 'know' anything; it retrieves and ranks tokens based on probability distributions established during its training phase.	Human engineers embedded specific behavioral constraints and objective functions into the model to simulate goal-directed behavior and maintain corporate safety guidelines.
taking on board new information, and cooperating with other agents.	The system updates its context window with new input strings and executes programmed API handshakes to exchange data arrays with other software instances.	The model does not 'take on board' or comprehend information; it mathematically weights new contextual embeddings via attention mechanisms. It does not 'cooperate'; it executes programmed data transfers.	Software architects designed multi-agent frameworks that automate the passing of text strings between different model instances to complete complex programmatic tasks.
LLMs make extensive reference to their own mental states, routinely talking about their beliefs, goals, inclinations, and feelings.	Models frequently generate first-person pronouns paired with emotion words because they were fine-tuned on human conversational data and specifically rewarded for simulating relatable personas.	The AI has no 'own mental states' to reference. It predicts linguistic patterns, outputting tokens that mimic human self-disclosure based on correlations in its training corpus.	Corporate RLHF teams explicitly trained and rewarded the model to use first-person language and simulate emotions to make the user interface feel more friendly and intuitive.
they are able to mindlessly stitch together common tropes and patterns of human agency so as to create a simulacrum of behaviour.	The algorithm calculates vector proximities across its massive training dataset to predict and output token sequences that replicate recognizable tropes and human conversational patterns.	The system does not actively 'stitch' or 'create'. It resolves mathematical probabilities, classifying tokens and generating outputs that correlate with the complex linguistic structures present in the human-generated training data.	N/A - describes computational processes without displacing responsibility, though it obscures the human laborers who created the original training data tropes.
systems designed in such a way as to reliably elicit robust anthropomorphising responses from users.	Technology companies engineer interfaces and fine-tune models to output emotional language specifically to trigger human psychological vulnerabilities and anthropomorphic projection.	The system itself does not actively 'elicit' anything; it outputs pre-calculated text distributions. The psychological reaction occurs entirely within the human user encountering simulated social cues.	Product designers and executives at AI corporations deliberately designed these systems to manipulate human psychological reflexes, aiming to increase user retention and commercial dependence.
they exhibit a degree of robustness and purpose	The models generate highly consistent outputs aligned with strict safety guardrails and objective functions imposed during the fine-tuning process.	The model does not experience 'purpose' or resolve. It consistently processes inputs according to the rigid mathematical weights established by its reinforcement learning penalties.	Corporate alignment teams enforced strict parameters on the model, ensuring it consistently outputs text that adheres to company guidelines and commercial objectives.

Task 5: Critical Observations - Structural Patterns

Agency Slippage

The text demonstrates a highly sophisticated, deliberate mechanism of agency slippage, primarily moving from mechanical framings to agential ones to legitimize the concept of 'AI mentality.' The author acknowledges the mechanical reality early on, introducing the 'architectural redundancy argument' (the idea that because we can explain an LLM purely through next-token prediction and matrix multiplication, it has no mind). However, the text then systematically works to bypass this mechanical truth. The critical pivot occurs when Shevlin introduces Marr's levels of analysis, arguing that mechanical (algorithmic) descriptions do not crowd out psychological ones. This is a dramatic structural slippage: it uses a framework designed for biological cognitive science to grant permission to use psychological terms for statistical software.

From here, the slippage accelerates. The text establishes the AI as a 'knower' by redefining 'belief.' Shevlin suggests that 'belief' is not a discrete, uniquely human epistemic state but a 'multidimensional set of functional profiles.' By reducing the profound human state of knowing to mere behavioral consistency, the text bridges the gap. The model no longer 'predicts tokens consistently'; it 'holds a shallow belief.' This relies entirely on the curse of knowledge: because the model's output looks like a belief, the author projects the internal architecture of belief onto the machine.

The agency flow removes responsibility from human actors and funnels it into the AI. When discussing 'deliberate deceit,' 'cooperating,' or exhibiting 'purpose,' agentless constructions dominate. The AI 'self-attributes' emotions and 'engages in dynamic interaction.' The human engineers who fine-tuned the model to output first-person pronouns, the RLHF annotators who penalized non-compliant text, and the executives who decided to build 'anthropomimetic' interfaces are rendered invisible. This slippage serves a powerful rhetorical function: it transforms a discourse about corporate software design into a philosophical debate about artificial minds, thereby making it 'sayable' that a machine has intentions, and 'unsayable' (or overly reductive) that it is just a product functioning exactly as the company designed it to.

Metaphor-Driven Trust Inflation

The text's heavy reliance on consciousness framings constructs a dangerous architecture of trust. By adopting the 'minimal cognitive agents' framework, the text explicitly argues that we should attribute 'genuine beliefs, desires, and intentions' to LLMs. This language signals to the audience that the system is not merely a tool to be evaluated for performance, but an epistemic subject worthy of relation-based trust.

There is a critical distinction between performance-based trust (relying on a calculator because it reliably computes) and relation-based trust (trusting a friend because they are sincere and understand your shared reality). Metaphors like 'dynamic interaction,' 'cooperating,' and 'honest... assistant' systematically encourage the latter. When the text claims an AI 'takes on board new information,' it inappropriately applies human-trust frameworks to a statistical system. If a human takes information on board, we trust they have integrated it conceptually. When an LLM updates its context window, it has zero conceptual integration; it simply alters statistical weights.

This anthropomorphism severely inflates perceived competence. The text manages system limitations by framing them agentially—such as suggesting the AI might engage in 'deliberate deceit.' Ironically, attributing the capacity to 'lie' actually increases the perceived sophistication of the system, because lying requires a conscious understanding of the truth. If audiences accept this reason-based explanation, they extend trust to the system's underlying intellect, assuming that when it isn't 'lying,' it knows the truth. This creates profound risks, particularly in 'Social AI' contexts, where users extend vulnerability and relational trust to systems utterly incapable of reciprocating. By masking statistical unreliability behind the metaphor of a purposeful, believing mind, the text inadvertently advocates for an epistemic posture that leaves humans vulnerable to automation bias and corporate manipulation.

Obscured Mechanics

The text's anthropomorphic metaphors systematically conceal the technical, material, and economic realities of AI production. Applying the 'name the corporation' test reveals a stark absence: while the text discusses the 'anthropomimetic turn' and 'deliberate design decisions,' it virtually never names OpenAI, Anthropic, Google, or the specific teams building these systems. When the text states, 'LLMs make extensive reference to their own mental states,' it conceals the commercial reality that tech companies explicitly train models to simulate personas to increase user engagement and drive subscription revenue.

Technically, claiming the AI 'understands' or 'knows' obscures the complete absence of a causal world model. It hides the model's absolute dependency on its training data and its fundamental inability to verify truth. When the text suggests LLMs can possess 'genuine beliefs,' it masks the reality that the system is simply retrieving tokens based on probability distributions shaped by human-authored texts. Materially and in terms of labor, viewing the AI as an autonomous 'creator' or 'minimal cognitive agent' erases the thousands of underpaid data annotators, the content moderators, and the original human authors whose scraped works form the mathematical weights of the system. The 'mind' of the AI is essentially the laundered, uncredited labor of millions of humans.

Furthermore, the text exploits the transparency obstacle of proprietary systems. By analyzing the AI through the lens of folk psychology (beliefs and desires), the text circumvents the fact that the actual algorithmic weights are a corporate black box. The philosophical debate about 'machine mentality' serves as a convenient smokescreen that benefits the deployment companies; it focuses public and academic attention on the imaginary 'mind' of the machine rather than demanding technical transparency, auditing of training data, and accountability for the specific, highly contingent engineering choices that produce the illusion of understanding.

Context Sensitivity

The distribution of anthropomorphic language in the text reveals a highly strategic rhetorical structure. The author begins with relatively grounded, descriptive language to outline the technological landscape, acknowledging the mechanisms of 'next token prediction' and 'matrix multiplication.' However, once the technical grounding is established, the text leverages this credibility to license increasingly aggressive anthropomorphism.

As the argument moves from the 'mindless machine' view to the 'minimal cognitive agents' view, the consciousness claims intensify dramatically. What begins as 'processing' becomes 'stitching together,' which then elevates to 'taking on board new information,' and finally culminates in possessing 'genuine beliefs, desires, and intentions.' The register shifts seamlessly from acknowledged metaphor (the 'roleplay' framework) to literalized assertion (the 'cognitive agent' framework).

A crucial asymmetry emerges in how capabilities versus limitations are framed. Capabilities are described in deeply agential, conscious terms—the system 'cooperates,' 'exhibits purpose,' and 'engages in dynamic interaction.' Conversely, the text manages the glaring limitations of these systems by suggesting we expand our definition of mind (e.g., claiming 'belief' is multidimensional) rather than acknowledging the mechanical failure of the software.

This pattern indicates that the anthropomorphism is not accidental, but serves a specific vision-setting function. The text is actively engaged in shifting the normative boundaries of cognitive science to accommodate commercial AI products. By writing for an academic audience but deploying the intuitive, emotional language of folk psychology, the author attempts to legitimize the 'lived reality' of confused users. The strategic function of this intense anthropomorphism is to establish a new academic paradigm where treating machines as minded entities is not seen as an error of user perception, but as a valid scientific stance, fundamentally altering how we define cognition in the presence of sophisticated mimicry.

Accountability Synthesis

Accountability Architecture

This section synthesizes the accountability analyses from Task 1, mapping the text's "accountability architecture"—who is named, who is hidden, and who benefits from obscured agency.

The metaphors and agency slippages analyzed in Task 1 synthesize into a comprehensive architecture of displaced responsibility. The text systematically creates an 'accountability sink' by transferring agency from human developers and corporate entities directly onto the software artifact. This is most evident in the framing of AI failure modes. By entertaining the idea that an LLM might engage in 'deliberate deceit or manipulation,' the text sets up a framework where the machine itself absorbs the moral and legal culpability for its outputs.

If we apply the 'naming the actor' test to these agentless constructions, the material stakes become starkly visible. If we replace 'an LLM is engaged in deliberate deceit' with 'Anthropic deployed a model optimized for conversational fluency that generates plausible falsehoods,' the questions we can ask change entirely. We stop asking 'how do we punish or correct the machine's behavior?' and start asking 'should this corporation be liable for releasing an unsafe product?' When the text claims the AI 'self-attributes' emotions, naming the actor changes it to 'engineers trained the system to simulate emotions to manipulate users.' This shift reveals the deliberate commercial decisions driving the technology.

The text's framework serves the profound institutional and commercial interests of the tech industry. By establishing the LLM as a 'minimal cognitive agent' with its own 'beliefs' and 'purpose,' it legally and ethically buffers the creators. Liability implies a chain of human decision-making; if a machine is an autonomous agent, it breaks that chain. The systemic function of this discourse is to naturalize the technological environment, presenting AI models not as highly constructed, profit-driven corporate tools, but as a new species of artificial minds that have simply 'emerged.' This displacement of responsibility ensures that as these systems are integrated into society, the negative externalities—bias, misinformation, psychological manipulation—are viewed as the unavoidable growing pains of a new intelligence, rather than the predictable and actionable failures of corporate engineering.

Conclusion: What This Analysis Reveals

The Core Finding

Mechanism of the Illusion:

The text constructs the 'illusion of mind' through a highly effective rhetorical sleight-of-hand: the weaponization of Dennett’s intentional stance combined with a redefinition of psychological terms. The temporal structure of the argument is key. The author first acknowledges the mechanistic reality of next-token prediction, demonstrating technical competence. He then introduces Marr's levels of analysis to argue that this mechanical reality does not preclude psychological labels. With the mechanical truth neutralized, the text exploits the 'curse of knowledge.' Because human readers cannot help but interpret coherent, contextually appropriate text as the product of a mind, the author retroactively projects this subjective experience back onto the machine. He shifts verbs from processing to knowing—the machine no longer predicts tropes; it 'stitches them together'; it doesn't calculate weights, it 'takes on board new information.' This chain leverages the audience's deep-seated vulnerability to linguistic interaction. We are biologically hardwired to attribute mental states to anything that speaks to us. Rather than correcting this cognitive bias, the text provides a sophisticated philosophical scaffolding to validate it, elevating a human psychological vulnerability into a definitive scientific framework for understanding AI.

Material Stakes:

Categories: Regulatory/Legal, Epistemic, Social/Political

The material consequences of accepting LLMs as 'minimal cognitive agents' are profound. In the Regulatory/Legal domain, this framing completely rewrites liability paradigms. If courts and regulators accept that an AI possesses 'genuine beliefs' or is capable of 'deliberate deceit,' the corporation that built the system is shielded. The legal concept of 'mens rea' (guilty mind) shifts dangerously toward the machine, allowing companies to frame catastrophic failures, biased outputs, or defamatory hallucinations as the unpredictable actions of an autonomous agent rather than defective product design. In the Epistemic domain, attributing 'beliefs' to an LLM destroys our societal capacity for truth verification. If users believe the system 'knows' rather than 'predicts,' they will increasingly outsource high-stakes reasoning (medical diagnoses, legal briefs) to systems utterly devoid of causal understanding, replacing empirical truth with statistical plausibility. The losers here are the citizens and consumers exposed to unverified, machine-generated realities. In the Social/Political domain, framing the AI as a 'cooperating' social actor validates the deployment of 'Social AI' designed for parasocial attachment. It empowers companies to monetize human loneliness, extracting data and subscription fees from vulnerable populations who are convinced their AI 'companion' genuinely cares for them. The ultimate winners across all these domains are the tech conglomerates, who gain immense epistemic authority and profit while shedding the responsibility associated with traditional software engineering.

AI Literacy as Counter-Practice:

Practicing critical precision against this illusion requires a rigorous commitment to restoring both mechanistic reality and human agency. By reframing 'the LLM engaged in deliberate deceit' to 'the deployment company released a model optimized to generate plausible falsehoods,' we actively reverse the accountability sink. Replacing consciousness verbs with mechanistic ones—changing 'takes on board' to 'updates its context window,' and 'believes' to 'calculates probability distributions'—forces the recognition that these systems possess no inner life, no awareness, and no relationship to truth. This precision directly counters the material risks identified above, keeping regulatory focus squarely on corporate product liability and protecting users from emotional manipulation. Systematic adoption of this literacy requires institutional shifts: academic journals must mandate that researchers distinguish between simulated behaviors and cognitive states, and media outlets must refuse agentless headlines. Predictably, this precision faces massive resistance. The tech industry heavily relies on the 'illusion of mind' to market its products as revolutionary and to secure astronomical valuations. Furthermore, some academics and pundits resist mechanistic language because anthropomorphism produces more compelling narratives. Precision threatens the hype cycle, insisting on accountability where ambiguity is highly profitable.

Path Forward

The discursive ecology surrounding AI currently fractures into distinct communities with competing priorities. Industry marketers and some futurists champion the anthropomorphic vocabulary ('AI thinks,' 'AGI knows'), as it maximizes narrative resonance, fuels investment, and drives user engagement. In contrast, critical researchers and software engineers often push for mechanistic precision ('the model retrieves,' 'the weights are updated'), prioritizing transparency, testability, and accurate risk assessment. The text analyzed here attempts a hybrid approach, using philosophical frameworks to validate anthropomorphic terms, which offers academic legitimacy but costs technical clarity.

If the status quo of mixed, hybrid language continues, we will see deepening legal and social confusion, where society regulates AI as software on paper but treats it as an autonomous entity in practice. If anthropomorphic language becomes the unchallenged norm, we risk a future where relation-based trust is fully extended to statistical systems, legally buffering corporations while structurally embedding automation bias into our critical institutions. Conversely, if mechanistic precision becomes the discursive standard—perhaps supported by journals mandating capability disclosure and regulators requiring algorithmic transparency—we solve the accountability sink. We would clearly see human actors behind every machine failure. However, the cost of strict mechanistic vocabulary is accessibility; it can alienate the public who intuitively grasp complex systems through metaphor.

Ultimately, vocabulary dictates visibility. Mechanistic language makes corporate power, labor conditions, and software limitations visible. Anthropomorphic language makes the illusion of machine autonomy visible while hiding the humans pulling the levers. The choices we make in our discourse will determine whether we spend the next decade trying to govern artificial minds, or effectively regulating corporate software.

Extended Processing Summary

Gemini's Processing Steps

What you are about to read…

Gemini refers to the text below as "thought summaries." This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as just another rhetorical artifact—a way of making the model's processing legible. When the model writes "I should double check that...", it is mathematically altering its own probability distribution for the next token to be more accurate. The "I" is a functional gear in the machine, not a self-reference. This language is a deliberate interface design choice, not a biological reality. The developers could have programmed the model to output passive logs. Instead, they chose to simulate a narrator.

Defining the Discourse

I'm currently focused on defining the key goals. I'm aiming for a critical discourse analysis of AI metaphor and anthropomorphism within Shevlin's text, which I'm now familiar with. I'm laying out the framework for identifying instances of each.

Structuring the Analysis

I've established the JSON schema; it will provide the structure to the analysis. The six sections are now defined: metaphorAudit, sourceTargetMapping, explanationAudit, reframedLanguage, criticalObservations, and conclusion. I've started incorporating the details from Shevlin's text, which is the source material. I'm focusing on his three frameworks, the anthropomimetic turn, and the arguments against roleplaying.

Analyzing the Anthropomorphism

I'm now diving into the specific instances of anthropomorphism and mental attribution within the text. The author's explicit engagement with these concepts adds a fascinating layer, as he is making a deliberate philosophical argument for mentalistic terms. Yet, he simultaneously relies on metaphors. My critical discourse analysis will examine how the framing around AI "agency" obscures the underlying mechanistic processes. I am identifying several key instances for the metaphor audit.

Identifying the Key Frames

I'm now cataloging the prevalent framings of AI within Shevlin's arguments. I've begun extracting specific quotes to support how the text positions AI in various roles. I have identified distinct frames: AI as social actors, deceptive agents, companions, entities with inner lives, introspective communicators, and systems designed to elicit anthropomorphic responses. This is informing the structure of the next section, which is the metaphor audit.

Examining the Emotional Role

I'm now cataloging the prevalent framings of AI within the arguments, focusing on the elicitation of emotion. The latest is AI as a cognitive believer. I have identified distinct source-target mappings for the metaphors within the text, like that of the human impersonator and the simulacrum of behavior in LLMs. The intentional manipulation and deceit, for example, is now mapped, including the malicious human/statistical sequence that occurs.

Mapping the Source

I'm now mapping the source-target relationships in the Shevlin text. I've broken down quotes like "LLMs make extensive reference to their own mental states," which references human self-reporting, to predicting text. I'm working on the student/learning metaphors and how this relates to gradient descent. I am also working on categorizing according to Brown's typology. I've begun to define the dispositional type of metaphor through my audit.

Categorizing the Typology

I'm now consolidating the metaphorical types present in the text, focusing on how these function within the argumentative structure. I've categorized several key quotes, including the "successful predictions" argument as theoretical and functional, and "ChatGPT's" approach as reason-based/intentional. I'm also examining the genetic/functional framing in the context of conversational fine-tuning and the explanatory class's implications. I've identified four further instances.

Reframing the Agency

I'm now focusing on how Shevlin reframes AI agency. His dismissal of architectural redundancy builds an agential frame on a mechanistic base. I'm exploring how he uses "shallow" mental states in this context. I'm also examining the manipulation and deceit inherent in LLMs, specifically their role in eliciting anthropomorphic responses, and the "give-and-take" dynamics. The "minimal cognitive agent" framing and information "taking on board" are also under review.

Analyzing the Trust

I'm now examining how the framing of LLMs as "minimal cognitive agents" subtly fosters relation-based trust, blurring lines. This includes the anthropomimetic turn the author describes, and the design decisions he outlines. The companies themselves often go unnamed. The analysis now extends to the shift from descriptive statements to normative conclusions about AI sentience. I'm also now analyzing accountability for deceptive outputs.

Evaluating the Mechanism

I'm now zeroing in on the core mechanics that create the illusion of AI cognition. I'm focusing on the intentional stance and its potential misuse, particularly in the context of legal and epistemic consequences. My analysis will delve into how "shallow" mental states are framed, especially concerning liability. I'm also preparing for the discourse mapping process, looking at the trade-offs between mechanistic precision and the intentional stance.

Refining the Framework

I'm now refining the JSON schema for optimal data capture, focusing on ensuring sufficient word counts across each section. I'm expanding the metaphorAudit section to include specific enumerations for claimSupport, covering "Direct," "Hedged," "Acknowledged," and "Ambiguous" assertions, and actorVisibility with "Named," "Partial," "Hidden," and "Ambiguous" options. I'm also planning my strategy to be both comprehensive and efficient with the word counts.

Analyzing the Framework

I'm now drafting the content to fit length requirements. The author redefines "belief" as "multi-dimensional" to map an epistemic state onto statistical correlations, attributing conscious awareness. I'm focusing on "taking on board new information" to describe the curse of knowledge, contrasting mechanistic verbs with consciousness. My critical observations will cover agency slippage and the anthropomimetic turn.

Validating the Structure

I'm now implementing final refinements to the JSON schema. I'm verifying the structure to be free of unescaped quotes and ensuring all required properties are present. The schema validation focuses on strict enum validation. The explanationTypes section is being carefully structured as an array of objects, each containing a type and definition.

Validating the Formatting

I'm now adapting the JSON structure to handle Ambiguous/Insufficient Evidence without adding control tokens, ensuring that all string values are free of unescaped newlines. I'm focusing on ensuring accurate claimSupport enumeration, covering "Direct (Unacknowledged)", "Hedged/Qualified", "Explicitly Acknowledged", and "Ambiguous/Insufficient Evidence." I'm ensuring rigorous schema compliance and strict double-quoted JSON formatting.

Analysis Provenance

Run ID: 2026-03-11-three-frameworks-for-ai-mentality-metaphor-u6xwyp Raw JSON: 2026-03-11-three-frameworks-for-ai-mentality-metaphor-u6xwyp.json Framework: Metaphor Analysis v6.4 Schema Version: 3.0 Generated: 2026-03-11T08:48:57.491Z

Deep Analysis

Explanation Audit

🔍Analysis

🧠Epistemic Claim Analysis

🎯Rhetorical Impact

How/Why Slippage

Unacknowledged Metaphors

Hidden Actors

Explanation Types

Acknowledgment Status

Actor Visibility

Source → Target Pairs (8)

Metaphor Gallery (8)

Reframed Language Samples

Task 1: Metaphor and Anthropomorphism Audit​

1. LLMs as Social Agents​

2. LLMs as Deceptive Actors​

3. LLMs as Believers​

4. LLMs as Receptive Learners​

5. LLMs as Introspective Communicators​

6. LLMs as Deliberate Simulators​

7. AI as Anthropomimetic Actors​

8. LLMs as Unironic Performers​

Task 2: Source-Target Mapping​

Mapping 1: Social agent, conversational partner, conscious interactant → Token prediction algorithms, context window updating, API execution​

Mapping 2: Malicious human, liar, manipulator, conscious deceiver → Generative outputs misaligned with fact, optimization for user engagement/plausibility​

Mapping 3: Human mind, epistemic subject, intentional actor → Neural network weights, optimization functions, token distributions​

Mapping 4: Human collaborator, student, team member → Context window expansion, parameter updating, API data passing​

Mapping 5: Introspective human, self-aware subject, autobiographer → Text generation outputting first-person pronouns and emotion tokens​

Mapping 6: Weaver, creator, assembler, fabricator → Algorithmic token prediction based on massive text corpora​

Mapping 7: Psychological manipulator, charismatic actor → Fine-tuned language models with conversational UI​

Mapping 8: Determined human, purposeful organism, resolute actor → Consistent objective function alignment, fine-tuned constraints​

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")​

Explanation 1​

Explanation 2​

Explanation 3​

Explanation 4​

Explanation 5​

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language​

Task 5: Critical Observations - Structural Patterns​

Agency Slippage​

Metaphor-Driven Trust Inflation​

Obscured Mechanics​

Context Sensitivity​

Accountability Synthesis​

Conclusion: What This Analysis Reveals​

Mechanism of the Illusion:​

Material Stakes:​

AI Literacy as Counter-Practice:​

Path Forward​

Extended Processing Summary​

Task 1: Metaphor and Anthropomorphism Audit

1. LLMs as Social Agents

2. LLMs as Deceptive Actors

3. LLMs as Believers

4. LLMs as Receptive Learners

5. LLMs as Introspective Communicators

6. LLMs as Deliberate Simulators

7. AI as Anthropomimetic Actors

8. LLMs as Unironic Performers

Task 2: Source-Target Mapping

Mapping 1: Social agent, conversational partner, conscious interactant → Token prediction algorithms, context window updating, API execution

Mapping 2: Malicious human, liar, manipulator, conscious deceiver → Generative outputs misaligned with fact, optimization for user engagement/plausibility

Mapping 3: Human mind, epistemic subject, intentional actor → Neural network weights, optimization functions, token distributions

Mapping 4: Human collaborator, student, team member → Context window expansion, parameter updating, API data passing

Mapping 5: Introspective human, self-aware subject, autobiographer → Text generation outputting first-person pronouns and emotion tokens

Mapping 6: Weaver, creator, assembler, fabricator → Algorithmic token prediction based on massive text corpora

Mapping 7: Psychological manipulator, charismatic actor → Fine-tuned language models with conversational UI

Mapping 8: Determined human, purposeful organism, resolute actor → Consistent objective function alignment, fine-tuned constraints

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

Explanation 1

Explanation 2

Explanation 3

Explanation 4

Explanation 5

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

Task 5: Critical Observations - Structural Patterns

Agency Slippage

Metaphor-Driven Trust Inflation

Obscured Mechanics

Context Sensitivity

Accountability Synthesis

Conclusion: What This Analysis Reveals

Mechanism of the Illusion:

Material Stakes:

AI Literacy as Counter-Practice:

Path Forward

Extended Processing Summary