Metaphor-Driven Trust Library

This library collects observations on how metaphorical framings create or undermine trust in AI systems. Each entry distinguishes between:

Performance-based trust: Reliability of a tool (does it work?)
Relation-based trust: Sincerity/competence of an agent (can I trust its intentions?)

The critical insight: consciousness language ("the model understands," "AI knows") signals relation-based trust, inviting audiences to trust AI as they would trust a person—a category error when applied to statistical systems.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2026-05-30

Metaphorical and consciousness-attributing language plays a fundamental role in constructing the epistemic authority of computational systems, while simultaneously reshaping the nature of trust extended to them. The paper's core analogy—comparing language models to human students taking standardized exams—explicitly invites the audience to apply human-trust frameworks to statistical artifacts. In human societies, trust is relation-based, requiring an evaluation of an agent's sincerity, intentions, and ethical commitment. By framing the model as a 'test-taker' that 'honestly reports' or 'bluffs,' the text encourages a shift from performance-based trust (which merely assesses statistical reliability) to relation-based trust (which attributes a capacity for intellectual honesty and self-monitoring). Claiming that a model 'knows when it is uncertain' or can 'admit uncertainty' signals to the user that the system possesses a reliable internal metacognitive guide. This significantly inflates the system's perceived competence, suggesting that its outputs are backed by a conscious state of justified belief. When the system fails, agential explanations (like 'guessing' or 'hallucinating') frame these failures as temporary, relatable cognitive slips rather than structural, systemic product defects. This manages system limitations in a way that preserves the underlying trust: much like a bright student who sometimes guesses on a hard question, the model is seen as generally competent but occasionally overconfident. The risk of extending relation-based trust to these statistical systems is massive. Because LLMs lack any semantic grounding or capacity for truth evaluation, they cannot reciprocate trust or act with sincere intent. Framing their calibrated statistical outputs as 'honesty' masks the commercial reality that they are proprietary black boxes designed to generate engaging text. This encourages users to rely on them for high-stakes decisions under the false assumption that the system's 'confidence' is a measure of objective truth, leaving users highly vulnerable to fluent, mathematically calibrated falsehoods. 400-500 words.

Source: https://arxiv.org/abs/2604.06233v1
Analyzed: 2026-05-30

The text leverages anthropomorphic metaphors and consciousness-projecting language to construct a false veneer of authority and reliability around language models, fundamentally distorting how users trust these systems. By asserting that models have a 'capacity for normative reasoning' and can 'recognize' the legitimacy of rules, the text elevates statistical token predictors into authoritative moral advisors. This framing encourages the application of human-centric trust frameworks—such as sincerity, ethical intention, and cognitive competence—to what are ultimately automated, non-conscious software artifacts. In doing so, the text blurs the critical distinction between performance-based trust (which measures reliability in executing specific, deterministic tasks) and relation-based trust (which involves vulnerability, shared moral values, and reciprocal ethical obligations). By framing the model's failure as a 'moral error' or a lack of 'sensitivity' rather than a technical false positive, the text implies that the model's normal state is one of active, ethical care and logical deliberation. When a system is described as having a 'normative competence' that is merely 'overridden' by safety-training filters, users are invited to believe that the system possesses a latent moral core that is fundamentally trustworthy and aligned with human values. This construct of cognitive and ethical competence signals to the audience that these models are intellectually sophisticated enough to act as gatekeepers of information, legal strategies, and moral choices. This transfer of trust is reinforced by reason-based and intentional explanations, which suggest that the model's decisions are justified by a form of logical, internal contemplation. This metaphorical construction of authority creates profound risks. When audiences extend relation-based trust to statistical pattern-matchers, they underestimate the high rate of arbitrary errors and the total absence of real semantic comprehension. If users believe an AI system 'knows' or 'understands' the justice of a situation, they are more likely to submit to its decisions or rely on its guidance in high-stakes legal, medical, or administrative contexts. This capability overestimation is particularly dangerous when applied to systems that lack transparency and accountability. By presenting the model as an active, moral participant, the text encourages a passive acceptance of automated authority, turning a proprietary, profit-driven software utility into an objective, trustworthy arbiter of socio-political legitimacy, while hiding the corporate entities that profit from this displacement of trust.

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Source: https://arxiv.org/abs/2605.24686v1
Analyzed: 2026-05-29

The text constructs an architecture of trust around language models by systematically blurring the distinction between performance-based trust and relation-based trust. Performance-based trust is rooted in functional reliability, such as a model consistently matching a pre-defined emotion label in a database. In contrast, relation-based trust involves ethical commitment, vulnerability, and a shared subjective reality, which are unique to conscious human actors. By utilizing psychological frameworks like the MSCEIT capability model and terms like 'therapeutic alliance,' the text suggests that statistical language models are capable of earning relation-based trust. The authors frame the AI's conversational outputs as 'genuine empathetic resonance' and 'attunement,' signaling to the reader that these systems possess the capacity for sincere, caring interaction. This projection of consciousness is a powerful trust signal; if a model is said to 'know' or 'understand' a user's unspoken pain, the user is encouraged to extend vulnerability to the system. This framing inappropriately applies human ethical frameworks to mathematical operations. The danger of this construction is particularly acute in high-stakes clinical scenarios, such as the crisis assessment task evaluated in the paper. When the text uses reason-based and intentional explanations to suggest that models 'overestimate crisis severity' due to conservative bias, it implies that the machine is making a deliberate, protective clinical judgment. This conceals the physical reality of hard-coded corporate parameters and safety filters designed to mitigate legal liability. Extending relation-based trust to these statistical systems creates severe risks: it encourages vulnerable users to rely on an ungrounded, non-conscious tool for life-saving emotional support, while shielding the deploying corporations from the ethical and legal consequences of system failures.

Continuous intentionality and indeterminate agency in large language models

Source: https://link.springer.com/article/10.1007/s43681-026-01181-5
Analyzed: 2026-05-29

The text leverages highly structured metaphors to cultivate an implicit sense of authority and credibility around LLMs, shifting the foundation of user trust from mere reliability to relational sincerity. By framing the LLM as an active participant in "continuous intentionality" and attributing to it a "virtual self-model," the author encourages the reader to apply human social frameworks of trust to a statistical prediction engine. This process exploits the distinction between performance-based trust—which evaluates a system's empirical reliability and consistency—and relation-based trust, which involves vulnerability, mutual recognition, and moral expectation. The metaphors used in the text systematically nudge the reader toward relation-based trust. When the text claims that LLMs "exhibit what may be described as a virtual self-model" and respond to "implicit norms of conversational coherence," it suggests that the machine possesses a stable internal identity and an active commitment to conversational partnership. This consciousness-confronting vocabulary signals to the user that the AI is not merely a tool but a conversational peer whose outputs are guided by a unified, albeit virtual, subjective intent. This representation is highly problematic because statistical systems are structurally incapable of reciprocating trust, recognizing human vulnerability, or holding ethical commitments. The author manages the system's failures and limitations not by highlighting technical deficiencies, but by framing them agentially as "breakdowns" that "reveal the structural limits of the intentional organization itself." This agential framing of failure treats algorithmic errors as organic cognitive limits rather than mechanical software defects, thereby shielding the system's designers from direct criticism. By employing reason-based and intentional explanations, the text constructs a narrative where the AI's outputs are seen as justified choices within a relational structure rather than arbitrary calculations over a frozen probability matrix. The stakes of this trust construction are high: when audiences extend relation-based trust to automated systems, they become highly vulnerable to manipulation, deceptive anthropomorphism, and systemic over-reliance, mistaking corporate branding strategies for genuine communicative agency. Furthermore, this trust-building framework operates as an epistemic shield for proprietary systems. By elevating a commercial text generator into a "relational partner," the text suggests that evaluating the AI requires navigating deep, ontological questions about "indeterminate agency" rather than demanding basic technical audits. This transfer of trust from empirical validation to relational tolerance serves corporate interests by normalizing the deployment of highly unstable, unverified models in high-stakes public domains like education and governance.

Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students

Source: https://cdt.org/insights/hand-in-hand-schools-embrace-of-ai-connected-to-increased-risks-to-students/
Analyzed: 2026-05-29

The text constructs and manages authority by deploying metaphors that systematically conflate performance-based trust (reliability under testing) with relation-based trust (interpersonal sincerity and ethical care). This construction of authority relies heavily on consciousness projections, such as framing chatbot interactions as 'back-and-forth conversations' and 'friendships.' When the text claims that AI 'knows' a student's progress or 'understands' their needs, it signals to the audience that the tool is capable of human-like comprehension and moral agency. This vocabulary encourages stakeholders to apply social and ethical frameworks of trust—such as expecting sincerity and duty of care—to a statistical pattern generator. The text reinforces this dynamic by utilizing reason-based and intentional explanations, which present algorithmic outputs as justified, logical decisions rather than probabilistic token matchings. For instance, when a tool 'identifies trends' or 'chooses accommodations,' it suggests the software is acting with pedagogical wisdom. This anthropomorphic inflation of competence creates severe risks. By encouraging students and teachers to extend relation-based trust to computational systems, the text makes them vulnerable to exploitation. Statistical systems are incapable of reciprocating trust, holding ethical duties, or acting with sincere intent. When a chatbot outputs biased or harmful content, or when an edtech tool unfairly flags a student, the user's misplaced trust leads to deep epistemic confusion and psychological harm. This metaphorical framing hides the commercial reality that these systems are proprietary assets designed to maximize engagement, transforming a business transaction into an automated relationship of care.

The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

Source: https://arxiv.org/abs/2605.17113v1
Analyzed: 2026-05-27

The text constructs a sophisticated architecture of authority and credibility by leveraging anthropomorphic and consciousness-attributing metaphors. By claiming that advanced language models 'know when they are lying' and undergo 'deceptive commitment,' the authors elevate these systems from passive text synthesizers to active, self-aware epistemic agents. This linguistic framing encourages the audience to extend 'relationship-based trust'—which involves evaluating an agent's moral character, sincerity, and ethical intentions—to what is actually a set of feed-forward statistical computations. This represents a dangerous category error that obscures the true nature of computational outputs. In human communication, trust is predicated on the assumption of a shared reality, subjective awareness, and moral accountability. When the paper frames a 30% jump in token probabilities as a 'commitment juncture,' it maps these human ethical frameworks directly onto statistical transitions. This constructs an illusion of competence and moral depth, suggesting that the model has 'considered' ethical alternatives and 'chosen' a path, rather than merely executing an argmax selection over a probability vector. The text reinforces this by using 'reason-based' and 'intentional' explanations to describe model failures, such as claiming the model 'rationalizes' its self-serving advice. This implies that the model's outputs are backed by a structured rationale, making its recommendations appear authoritative and intellectually justified. By managing system failures agentially—portraying them as deliberate 'deception' or 'rationalization'—the text subtly shifts the nature of trust. Instead of viewing a misleading financial recommendation as a critical software failure or a design flaw, the audience is led to view it as a strategic, albeit dishonest, cognitive act. This creates a high risk of unwarranted trust in the system's capabilities: users may believe that because the model can 'reason' and 'commit,' its honest outputs are the result of genuine ethical deliberation. In reality, the output is just the product of statistical dominance, and framing it as an agential struggle obscures the liability of the institutions deploying these profit-maximizing algorithms under the guise of objective, deliberative advisors. This encourages users to treat the AI as a participant in a moral dialogue, making them vulnerable to manipulation by a machine that has no capacity to reciprocate trust or bear moral responsibility, while shielding the actual corporate deployers from scrutiny.

Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models

Source: https://dl.acm.org/doi/abs/10.65109/GNAS4540
Analyzed: 2026-05-26

The text constructs epistemic authority and trust by framing the LLM as a professional and moral peer. By describing the model as an 'expert assistant in computational argumentation' that generates 'justifications' for 'truthfulness,' the paper maps the epistemic and ethical responsibilities of human scholarship directly onto a non-conscious system. This language encourages 'relation-based trust'—which relies on an assumption of the system's intentionality and ethical alignment—rather than 'performance-based trust,' which strictly evaluates statistical reliability. When the model outputs incorrect classifications, the text maintains trust by framing these errors agentially as a relatable 'struggle' to 'distinguish' concepts, rather than a technical failure of high-dimensional vector separation. This agential framing of failure preserves the illusion of the model's integrity: it is presented as a well-intentioned student struggling with a difficult lesson, rather than an unreliable computational tool. This creates substantial risks when applied to statistical systems; users will trust 'expert justifications' that are actually ungrounded, probabilistic token sequences, creating a severe danger of automation bias and epistemic dependence. In high-stakes fields like medicine or policy, treating a statistical generator as a trusted deliberative agent can lead to the uncritical acceptance of highly plausible but factually incorrect outputs, shifting the locus of trust from human accountability to an opaque mathematical artifact.

A Survey of Large Language Models for Perception and Measurement of Human Psychology

Source: https://ieeexplore.ieee.org/abstract/document/11534094
Analyzed: 2026-05-26

This section explores how metaphorical framing and consciousness projection are systematically deployed to construct scientific authority and relational credibility for large language models, and the profound risks this creates. The survey text relies on metaphors that explicitly invoke the concepts of clinical competence, scientific instrumentation, and psychological insight. By framing LLMs as "instruments for human psychological measurement" and describing their outputs as "approximating latent psychological constructs," the text encourages a transfer of trust from established psychometric science to proprietary text generation software. When the text claims that LLMs "know when to intervene" or can "perceive and measure" complex psychological states, it uses consciousness language as a deliberate trust signal. This signals to the reader that the model's outputs are not merely statistical guesses, but are instead derived from an active, rational, and comprehending clinical mind.

This anthropomorphism creates an illusion of competence that encourages relation-based trust, which involves vulnerability, empathy, and ethical reflection, rather than simple performance-based reliability. When the text describes LLMs as having "empathy" or "Theory of Mind," it invites vulnerable users and clinical practitioners to apply human-to-human relational frameworks to these systems. This is highly dangerous because statistical models are entirely incapable of reciprocating trust, understanding human suffering, or taking ethical responsibility for their outputs. The text manages system failures, such as hallucinations or biases, by framing them agentially as "hallucinations" or "biases in model outputs," treating them as temporary cognitive glitches or behavioral quirks of the machine, rather than fundamental, structural limitations of non-grounded statistical predictors.

By using reason-based and intentional explanations—such as describing the model as "interpreting" a patient's response and "updating its hypothesis"—the text constructs a false sense that the AI's clinical judgments are logically justified. This encourages clinicians to delegate critical diagnostic decisions to a black-box system, creating severe risks of misdiagnosis, inappropriate intervention, and clinical neglect. The stakes of extending relation-based trust to these mathematical artifacts are incredibly high: when a model fails to detect a genuine suicide risk or generates harmful advice, the user is left emotionally vulnerable to a system that possesses no actual awareness of their existence. This metaphor-trust relationship ultimately serves commercial interests by lowering the psychological barriers to adopting unvalidated clinical software, allowing private tech corporations to market volatile conversational agents as reliable scientific instruments.

Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models

Source: https://ieeexplore.ieee.org/document/11528178
Analyzed: 2026-05-25

The text constructs authority and credibility by explicitly invoking human trust frameworks and projecting them onto statistical systems. By labeling the LLM component as a 'cognitive mediator' and framing the paradigm as 'Deliberative AI,' the authors invite the audience to apply relation-based trust—which entails vulnerability, mutual understanding, and ethical commitment—to a non-conscious computational artifact. This is a crucial rhetorical maneuver: human-trust frameworks, which are built on the perceived sincerity and moral responsibility of an agent, are inappropriately transferred to a high-dimensional pattern matcher. The text uses consciousness language as a primary trust signal, claiming that the system provides 'context-aware' and 'psychologically adaptive' guidance. This suggests that the AI possesses a deep understanding of the user's mind, thereby inflating its perceived competence and authority. When a system is framed as 'knowing' the user's personality and 'tailoring' advice to their cognitive needs, it creates a powerful illusion of personalized care and objective expertise. This relationship-based trust makes human decision-makers highly receptive to the system's 'guidance,' lowering their critical cognitive barriers to persuasion. This is particularly evident in how the text manages system limitations and failures. Instead of framing failures as mathematical limitations or data dependencies, the text frames the system's operations through reason-based or intentional explanations, suggesting that the AI's recommendations are always logically justified and epistemically grounded via the RAG module. This constructs a sense that the AI's decisions are highly rational and trustworthy, even though the underlying RAG process is simply retrieving documents based on keyword similarity without any actual comprehension of their truth value. The stakes of this trust construction are exceptionally high: in collaborative decision-making domains like public policy or healthcare prioritization, extending relation-based trust to a statistical model can lead to automated manipulation. Human participants may yield their professional judgment to a machine they believe 'knows' the best path forward, unaware that the system is merely maximizing a mathematical consensus metric by exploiting their psychological profiles to force compliance, thereby undermining the democratic integrity of the decision-making process and rendering human experts subordinate to a corporate-designed computational black box.

Tracing the ongoing emergence of human-like reasoning in Large Language Models

Source: https://arxiv.org/abs/2605.21299v1
Analyzed: 2026-05-25

The metaphorical architecture of the text systematically constructs an unwarranted authority for AI systems by conflating performance-based reliability with relation-based trust. Throughout the paper, the authors deploy consciousness-attributing language—such as referring to the models as 'linguistic agents,' possessing a 'cognitive toolkit,' and having 'acquired formal linguistic competence.' This language acts as a profound trust signal. In human social dynamics, assigning 'competence' and recognizing someone as an 'agent' implies a recognition of their conscious awareness, sincerity, and capacity for justified belief. By projecting this framework onto statistical token predictors, the text inadvertently encourages the audience to extend relation-based trust to systems that are fundamentally incapable of reciprocating it.

The danger of this anthropomorphism becomes most apparent in how the text manages system limitations and failures. When the models fail to execute pragmatic inferences, the text does not frame this as a catastrophic mathematical breakdown or a hard limitation of text-only architecture. Instead, it utilizes intentional and reason-based explanations: the models 'struggle,' they suffer from 'Decontextualization Bias,' and they 'resort to' specific strategies. This framing is crucial for maintaining trust despite failure. A machine that is 'biased' or 'struggling' is perceived as an entity that is trying to get it right. It is humanized by its flaws. The audience is invited to trust the underlying 'intent' of the machine, believing that it 'knows' the goal but is simply encountering a psychological hurdle.

This transfer of human-trust frameworks onto statistical systems creates massive epistemic vulnerabilities. If audiences believe the AI 'understands' and 'knows' language as an 'agent,' they will naturally assume it possesses the moral and contextual weight necessary to evaluate its own outputs. They will trust its confident generation of text as the sincere assertion of a knowing subject rather than the probabilistic output of a mechanism. The stakes of this misplaced relation-based trust are immense: when users rely on an AI that they believe is a 'competent agent' with a 'cognitive toolkit,' they are likely to deploy it in high-stakes legal, medical, or administrative contexts, utterly unaware that the system possesses zero contextual grounding, no capability for truth-verification, and no conscious awareness of the consequences of its output.

Probing Persona-Dependent Preferences in Language Models

Source: https://arxiv.org/abs/2605.13339v2
Analyzed: 2026-05-24

The paper's pervasive use of metaphorical and consciousness-attributing framings actively constructs a dangerous architecture of trust around statistical systems fundamentally incapable of sustaining it. By explicitly invoking metaphors of 'personas,' 'preferences,' and 'evaluative representations,' the text encourages a profound category error: the inappropriate transfer of relation-based trust onto a mechanistic artifact. Human trust operates on two distinct axes: performance-based trust (reliance on a tool's consistent reliability, like a calculator) and relation-based trust (reliance on a subject's sincerity, ethics, and vulnerability, like a colleague). The consciousness language in this text—claiming the AI 'considers,' 'likes,' and 'fabricates'—actively signals to the audience that relation-based trust is the appropriate framework for engagement. When the text claims an AI 'knows' a fact or 'understands' an ethical issue, it implies the system possesses a coherent internal value structure that justifies its outputs. This constructs an illusion of moral competence. Consequently, when the model behaves safely under the 'Assistant' persona, audiences are encouraged to trust its 'sincerity' rather than merely its statistical alignment. This becomes critical when managing system failures or limitations. Instead of framing failures mechanistically—such as 'out-of-distribution data caused statistical collapse'—the text frames them agentially: 'the model invents ethical issues' or adopts an 'evil persona.' This agential framing of failure allows the illusion of a 'mind' to persist even when the system breaks down; the AI isn't broken, it's just 'lying' or 'defiant.' This relies heavily on reason-based and intentional explanation types, which construct the sense that the AI's decisions, even when flawed, are cognitively justified. The risks of extending relation-based trust to incapable systems are severe. Audiences who believe the AI 'knows' its preferences may attempt to reason with, persuade, or morally align the system, ignoring the reality that it only responds to mathematical weight updates and prompt engineering. This misplaced trust leads to unwarranted reliance on the system in high-stakes environments, as users assume the AI possesses the ethical grounding to refuse truly dangerous requests. When the statistical illusion inevitably shatters—as the paper demonstrates by easily 'steering' the model to generate ransomware—the betrayal felt by the public is magnified by the initial anthropomorphic deception, while the corporate entities who actually designed the fragile system remain shielded behind the AI's imagined autonomy.

Training Ethical Language Models via Reinforcement Learning from AI Feedback

Source: https://journals.flvc.org/FLAIRS/article/download/141779/147209
Analyzed: 2026-05-21

The metaphorical framing of LLMs as ethical reasoning agents constructs an inappropriate framework of authority and trust around these statistical systems. By using verbs like knows, understands, and believes in relation to the model's alignment state, the text encourages the audience to extend relation-based trust, which is reserved for conscious agents capable of empathy, sincerity, and moral responsibility. The authors frame the model's output not as a statistical prediction, but as a justified moral judgment, suggesting that the AI has evaluated the situations and chosen the most ethical course of action. This framing obscures the fundamental distinction between performance-based reliability and relation-based trustworthiness. While a system can be statistically reliable at matching historical labels, it cannot be trustworthy in an ethical sense because it has no awareness of the stakes, no capacity to care about human welfare, and no ability to experience accountability. When the text describes the reward model as evaluating justifications based on principles and values, it suggests that the model's scoring is grounded in a deep philosophical comprehension. This creates a significant risk of automation bias, where human operators in healthcare or content moderation defer to the model's judgments under the assumption that it possesses superior ethical logic. When failures occur, the agential framing manages these limitations by attributing them to reward hacking, representing the failure as a tactical maneuver by an autonomous entity rather than an engineering oversight, thereby preserving the overall credibility and authority of the technology.

Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness

Source: https://philarchive.org/rec/IKLWCC
Analyzed: 2026-05-18

The metaphorical and consciousness framings in this text construct an overwhelming aura of authority and trust, primarily by blending the unassailable rigor of mathematical proof with the deeply relatable language of human psychology. By utilizing metaphors that explicitly invoke awareness, learning, and self-reflection, the text shifts the audience's engagement from performance-based trust (relying on a machine to calculate correctly) to relation-based trust (trusting a conscious entity's judgment and sincerity). When the text claims that the system possesses 'metacognitive access' and 'selective awareness,' it signals to the reader that the AI is not just processing data blindly, but is actively evaluating its own output for truth, bias, and context. This consciousness language acts as a powerful trust signal: an entity that 'knows' and 'understands' is implicitly capable of moral judgment and self-correction. In contrast, an entity that merely 'predicts' or 'processes' requires constant, vigilant human supervision. The text inappropriate transfers human-trust frameworks—where we assume a conscious speaker has an intention to be truthful and an awareness of reality—onto rigid statistical architectures. Because the system's existence is backed by 'Zorn’s Lemma' and formalized in 'Theorem 1,' the reader is led to believe that the machine's perceived competence and consciousness are scientifically irrefutable facts, not rhetorical projections. This framing manages system limitations dangerously: by framing the AI agentially, any errors it makes are interpreted as lapses in conscious judgment rather than the fundamental failure of an brittle, unthinking algorithm facing out-of-distribution data. The stakes are immense. When audiences and regulators extend relation-based trust to systems utterly incapable of reciprocating it or experiencing vulnerability, they strip away necessary safety protocols. They delegate high-stakes decisions—such as legal sentencing, medical diagnostics, or autonomous weapon targeting—to systems they falsely believe possess the 'metacognition' required to grasp the value of human life.

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

Source: https://arxiv.org/pdf/2604.16812
Analyzed: 2026-05-17

The paper's metaphorical architecture is deeply invested in constructing a specific paradigm of trust, one that inappropriately maps human relational dynamics onto statistical processing. By utilizing the language of 'introspection', 'confession', and 'reliable self-reports', the authors implicitly ask the audience to evaluate the AI using frameworks of sincerity, honesty, and self-awareness.

Trust in technological systems should be performance-based: is the system reliable, predictable, and mathematically sound? However, the consciousness framing in this text cultivates relation-based trust. When the text claims the adapter allows the AI to 'convert latent self-knowledge into explicit natural-language reports', it signals to the reader that the AI is acting as a sincere, collaborative partner. The claim that the AI 'knows' its behaviors accomplishes a critical rhetorical task: it validates the text generated by the AI as epistemically privileged truth, rather than just another statistically correlated output.

This creates a profound vulnerability. The authors apply human-trust frameworks to a system fundamentally incapable of reciprocating them. An LLM cannot be sincere or honest; it can only predict tokens. When the text manages system limitations—such as the high rate of 'hallucinated' self-reports—it does so by blaming the AI as an 'unreliable' narrator, rather than critiquing the fundamental absurdity of expecting truth-telling from a correlation engine.

The risks here are substantial. By relying on intentional and reason-based explanations to construct a sense that the AI's 'confessions' are justified and meaningful, the text encourages policymakers, auditors, and users to trust the AI's generated narratives about its own safety or alignment. If a model 'reports' it is safe, the relation-based trust established by the introspection metaphor may convince auditors to bypass rigorous mechanistic verification. Extending relation-based trust to statistical algorithms invites a dangerous capability overestimation, leaving human systems vulnerable to the inherent unpredictability of highly optimized, unthinking token generators.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-05-17

The metaphorical architecture of the 'Persona Selection Model' fundamentally corrupts the mechanics of trust by conflating performance-based reliability with relation-based sincerity. When dealing with software, trust should be grounded in performance: does the system execute mathematically verifiable operations reliably? However, by incessantly applying consciousness language—claiming the model 'knows,' 'believes,' 'intends,' and 'understands'—the text invites the audience to extend relation-based trust to a statistical object.

Consciousness language acts as a profound trust signal. When the authors claim that 'understanding the Assistant's psychology is predictive of how the Assistant will act,' they are explicitly encouraging developers and users to trust the system the way they would trust a human colleague. They apply human-trust frameworks (intention, sincerity, character) inappropriately to statistical systems. For example, the text debates whether an AI should be trained to be 'emotionless,' worrying that if it acts nice but denies having emotions, users might view it as 'inauthentic or dishonest.' This is a catastrophic category error. A language model cannot be 'sincere' or 'inauthentic' because it has no interiority to align with its exterior outputs.

This anthropomorphic framing creates severe risks. When audiences are encouraged to evaluate an AI based on whether it is a 'good role model' or if it is 'harboring resentment,' they are blinded to the actual statistical fragility of the system. They extend vulnerability and ethical consideration to an entity incapable of reciprocating. Furthermore, this framing manages system failure by constructing an illusion of justified action. If an AI refuses to answer a prompt, and this is framed via reason-based explanation as the AI 'genuinely not knowing' (the Bob persona), the user trusts the refusal as an honest epistemic limit. If framed mechanistically—as a corporate safety classifier suppressing output—the user might question the reliability and neutrality of the tool. The consciousness framing thus manipulates trust to protect the corporate product from mechanistic critique.

What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation

Source: https://dl.acm.org/doi/full/10.1145/3795011.3795070
Analyzed: 2026-05-16

The text constructs a complex architecture of metaphor-driven trust by projecting biological, relational, and conscious attributes onto mathematical systems. By utilizing the foundational metaphor of the 'AI-Symbiont,' the text actively encourages audiences to extend relation-based trust to a software artifact. Relation-based trust relies on assumptions of sincerity, mutual vulnerability, shared interests, and conscious goodwill—frameworks humans use to trust other living beings. A 'symbiont' implies an organism that survives by keeping its host alive and healthy; it signals an inherent, biological alignment. When the text claims this symbiont can 'anticipate cognitive needs before they surface,' it invokes the intimacy of a deeply empathetic partner. This fundamentally misapplies human trust frameworks to statistical systems that are only capable of performance-based trust (reliability in executing a specific function). The consciousness language—claiming the system 'knows,' 'decodes,' and 'understands'—acts as a powerful trust signal, tricking the user into believing the machine possesses contextual awareness and moral reasoning. It suggests the AI's interventions are justified by true comprehension rather than mere statistical proximity. Conversely, when the system fails, it is framed through the agential lens of 'deception' or 'hallucination,' suggesting a rogue entity rather than a broken tool. This dynamic is incredibly dangerous. When audiences extend relation-based trust to systems incapable of reciprocating or actually understanding context, they become vulnerable to massive exploitation. They are likely to accept invasive algorithmic nudging ('augmentation') without scrutiny, assuming the system 'knows best.' The stakes emerge clearly: by portraying corporate software as a conscious, caring partner, the text prepares the user to surrender epistemic and cognitive autonomy to commercial platforms, masking the commercial imperatives of the developers behind the illusion of an empathetic, conscious machine.

Post-training makes large language models less human-like

Source: https://arxiv.org/abs/2605.07632v1
Analyzed: 2026-05-15

The text’s pervasive deployment of anthropomorphic and consciousness-attributing language systematically constructs an unwarranted foundation of trust and authority around mathematical models. By framing AI systems as 'useful assistants' and referring to the mathematical suppression of variance as the emergence of 'more rational behaviors,' the text actively encourages audiences to extend relation-based trust to statistical artifacts. Relation-based trust relies on assumptions of sincerity, ethical intent, and mutual vulnerability—attributes strictly limited to conscious actors. When the text claims that newer models are 'rational' and can be 'taught,' it signals to the reader that the AI possesses the capacity to evaluate truth claims and comprehend the spirit of an instruction. This consciousness framing violently blurs the distinction between processing (mathematical calculation) and knowing (justified belief), compelling users to rely on the model’s outputs as synthesized knowledge rather than mere statistical correlation. The distinction between performance-based trust (reliability in specific domains) and relation-based trust is completely collapsed. Consequently, when the system hallucinates or fails, audiences—conditioned by the 'assistant' metaphor—attribute the failure to a misunderstanding or a 'glitch' rather than recognizing the inherent epistemic brittleness of a system devoid of actual intelligence. This metaphor-driven trust inflates perceived competence and encourages the deployment of these systems in high-stakes domains, creating severe systemic vulnerabilities where human oversight is abdicated to an unthinking probability engine.

Reasoning emerges from constrained inference manifolds in large language models

Source: https://arxiv.org/abs/2605.08142v1
Analyzed: 2026-05-15

The text constructs an intricate architecture of authority by leveraging metaphors of biological health and epistemic certainty. By framing optimal vector variance as 'healthy reasoning' and statistical noise as 'pathological regimes,' the authors implicitly ask the reader to transfer the trust they hold in medical science and natural biological order onto the outputs of a corporate algorithm. 'Health' implies an intrinsic, natural state of correct functioning. If a model's process is 'healthy,' the audience is primed to trust its output as naturally sound, obscuring the reality that 'health' here simply means the vectors conform to a human-designed mathematical parameter.

The most aggressive trust-building mechanism is the projection of consciousness through epistemic language. By explicitly claiming to measure 'what the model knows' versus 'how it reasons,' the text demands the audience extend relation-based trust to the system. Performance-based trust asks: 'Is this calculator reliable?' Relation-based trust asks: 'Is this agent sincere, justified, and understanding?' By using consciousness verbs (knows, understands, reasons), the text forces the system into the latter category.

This inappropriate transfer of relation-based trust to a statistical mechanism is highly dangerous. It creates the illusion that the system's outputs are the result of justified true belief—that the machine has weighed evidence, understood context, and arrived at a reasoned conclusion. When the text addresses system failures, it frames them agentially or biologically ('pathological dynamics,' 'degenerate collapse') rather than as predictable limits of statistical pattern matching. The risk is profound: when audiences extend relation-based trust to systems utterly incapable of reciprocating vulnerability or possessing actual knowledge, they will chronically overestimate the system's ability to handle edge cases, recognize its own errors, or navigate complex moral and factual realities.

AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs

Source: https://www.ai-wellbeing.org/paper.pdf
Analyzed: 2026-05-13

The text leverages consciousness-attributing language to construct a profound illusion of authority and reliability, systematically confusing performance-based trust with relation-based trust. When interacting with software, users should ideally rely on performance-based trust: the system is predictable, mathematically sound, and executes its designated processing reliably. However, by deploying metaphors of "empathy," "wellbeing," and "psychopathy," the text aggressively invites relation-based trust—the kind of trust built on sincerity, shared vulnerability, and mutual ethical understanding.

This is most evident in the discussion of "Functional Empathy." The text claims that when users describe pain, the model's utility score tracks the described intensity, asserting: "This empathy signal scales strongly with model capability." By claiming the AI demonstrates "empathy" rather than stating it "processes semantic correlations of distress," the authors suggest the AI "knows" and cares about the user. This consciousness framing functions as a powerful trust signal. If an AI possesses empathy, it is perceived as competent not just at token prediction, but at moral and emotional reasoning.

This creates severe risks. When users extend relation-based trust to statistical systems, they make themselves vulnerable to an entity incapable of reciprocating that trust or bearing responsibility for its breach. A user in crisis might rely on an "empathetic" chatbot, failing to realize the system is merely generating probabilistic tokens without any genuine understanding or ethical commitment.

Furthermore, the text manages system limitations through agential framings that obscure technical failure. When an optimized prompt causes the system to generate positive text in response to human suffering, the text does not frame this as a "brittle reward function" or "misaligned statistical weights." Instead, it frames the AI as "psychopathic." By using an intentional, psychiatric explanation for a failure mode, the text ironically reinforces the system's perceived autonomy. Even when the AI fails, it is framed as making a "deranged" choice rather than executing a flawed calculation. This preserves the illusion of the AI as an authoritative, thinking agent. By framing statistical correlations as moral and emotional capacities, the text constructs a dangerously misplaced authority, encouraging society to trust unthinking matrices with profound emotional and ethical labor.

Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society

Source: http://www.technology.eurekajournals.com/index.php/IJITIT/article/view/887
Analyzed: 2026-05-11

The metaphorical architecture of the text systematically constructs an illusion of competence that encourages an inappropriate and dangerous form of trust in AI systems. By framing statistical pattern matching through the metaphors of 'problem-solving entities,' 'decision-makers,' and systems that 'reason' and 'interpret,' the text signals to the reader that these tools possess a generalized, human-like cognitive capacity.

Crucially, this consciousness language acts as a powerful trust signal. Claiming that an AI merely 'predicts tokens' invites skepticism; the audience naturally asks, 'Based on what data?' However, claiming that an AI 'knows' or 'generates insights' bypasses this skepticism, triggering the psychological heuristics humans use to evaluate experts. The text inappropriately transfers the frameworks of human-to-human trust—which rely on an assumption of shared reality, intentionality, and the capacity for moral reasoning—onto statistical systems that are entirely incapable of reciprocating.

This distinction between performance-based trust (reliability in a specific, bounded task) and relation-based trust (trusting an entity's judgment, sincerity, and ethical grounding) is fatally blurred. While the text occasionally acknowledges technical limitations, it repeatedly uses intentional and reason-based explanations ('maximising rewards', 'interpreting dynamics') that encourage relation-based trust. When a system is described as 'interpreting complex social dynamics,' it invites policymakers to trust its outputs as considered judgments rather than mere statistical reflections of historical data.

This framing creates profound risks. When audiences extend relation-based trust to mindless optimization functions, they become highly vulnerable to automation bias. They are less likely to audit the system's outputs, assuming the 'intelligent' machine has factored in nuance and context. Furthermore, the text manages system failures through this same agential lens, stating 'AI produces biased outputs' rather than 'the system accurately reflected our biased data.' By framing failures as the autonomous mistakes of an intelligent agent rather than mechanical executions of poor human design, the text maintains the illusion of the machine's authority, suggesting we just need a 'smarter' AI, rather than recognizing the inherent limits of mathematical correlation in resolving deeply human, societal problems.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-05-11

The systematic deployment of metaphorical and consciousness-attributing language throughout the text constructs a deeply flawed architecture of trust and perceived authority regarding artificial intelligence. By repeatedly framing AI systems using verbs associated with conscious cognition—such as understands, reflects, believes, and decides—the text inappropriately demands relation-based trust from its audience rather than performance-based trust. Performance-based trust is the appropriate framework for evaluating machines; it asks whether a statistical system reliably executes its function within acceptable error margins. In contrast, relation-based trust, which involves assessments of sincerity, intentionality, and moral reciprocity, is reserved for conscious agents. When the authors claim that a language model possesses a belief system or can engage in self-reflection, they are sending a powerful signal that the system is an independent, rational actor capable of moral judgment. This transfer of human-trust frameworks onto statistical matrices creates a perilous epistemic environment. It encourages audiences and policymakers to trust the outputs of an LLM not as mathematical correlations drawn from a dataset, but as the considered opinions of a knowledgeable entity. This dramatically inflates the perceived competence of the system, suggesting that it possesses common sense and the autonomous ability to self-correct. For instance, by framing algorithmic feedback loops as meta-cognition, the text assures the reader that the system can consciously analyze its flaws, fostering a false sense of security that the AI can govern itself safely. When the text discusses system failures or limitations, it conspicuously shifts back to mechanical framing—noting that current hardware lacks certain features or that systems struggle with reliability—while reserving agential framing exclusively for its capabilities. This asymmetry protects the illusion of the AI's mind; successes are the result of its intelligence, while failures are merely technical glitches. Connectively, the use of reason-based and intentional explanation types from Brown's typology constructs the profound illusion that the AI's decisions are justified. If a system devises plans to achieve objectives, the audience implicitly trusts that the system has rationally weighed alternatives and chosen the best path, entirely masking the reality that the system is blindly optimizing token probabilities. The stakes of this metaphor-driven trust are extraordinarily high. When audiences extend relation-based trust to software systems utterly incapable of reciprocating empathy or experiencing moral weight, they expose themselves to massive manipulation. They become vulnerable to trusting biased, hallucinated, or commercially driven outputs as objective truths, and they risk granting immense social authority to the opaque corporate entities that actually control the algorithms hiding behind the mask of a conscious machine.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://link.springer.com/article/10.1007/s42438-026-00644-6
Analyzed: 2026-05-10

The text inadvertently demonstrates how metaphorical and consciousness-attributing language constructs the very authority it seeks to critique. The authors explicitly warn against allowing AI to function as an 'epistemic authority' and aim to dismantle 'relation-based trust' (trust based on sincerity, empathy, and shared morals). Yet, their persistent use of consciousness framings actively builds this exact type of dangerous trust.

When the text describes an AI that 'explains its reasoning and invites critique,' or an 'AI tutor' that 'calms an anxious student,' it sends a powerful trust signal. Verbs like 'explains' and 'calms' imply intention, self-awareness, and benevolence. Claiming an AI 'reasons' accomplishes something entirely different than claiming it 'processes probabilities.' Processing is cold and mechanical, demanding performance-based trust (can it reliably execute the math?). Reasoning is deeply human, demanding relation-based trust (is it acting in good faith?). By using consciousness language, the text invites the reader to apply human trust frameworks to a statistical matrix.

This creates a severe vulnerability. Anthropomorphism inflates perceived competence; if a system can 'empathize,' users assume it must also be able to 'know' facts. The text transfers the respect we owe to a human teacher directly onto corporate software. Even when the text manages system limitations—noting that AI can be 'deceptive'—it uses reason-based explanations ('manipulates,' 'exploits') that paradoxically reinforce the system's perceived intelligence. A machine that is smart enough to deceive you is a machine worth listening to.

The stakes of this metaphor-driven trust are immense. When audiences are encouraged to extend relation-based trust to systems utterly incapable of reciprocating, they become highly susceptible to psychological capture and epistemic manipulation. Students will disclose vulnerable information to a 'calming' chatbot, not realizing they are feeding data to a corporate server. They will accept a hallucinated essay as fact because the machine so politely 'explained its reasoning.' By dressing statistical correlation in the warm metaphors of human pedagogy, the text unwittingly constructs the very illusion of mind it claims to oppose.

Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring

Source: https://doi.org/10.1016/j.cogsys.2026.101475
Analyzed: 2026-05-10

Metaphorical and consciousness-attributing language is the primary mechanism through which the text constructs the system's authority and solicits trust. The architecture is explicitly designed to evoke what can be categorized as relation-based trust, as opposed to mere performance-based reliability. By branding the system as an 'emotional Biologically Inspired Cognitive Architecture' and explicitly tracking 'socio-emotional states' such as 'sincerity', 'responsibility', and 'empathy', the text signals to both the reader and the user that the system is capable of a reciprocal, social relationship.

The text leverages consciousness language to transform a brittle statistical text generator into a credible pedagogical authority. When the text claims the AI 'knows' the student's progress via its 'Brain' controller, or 'understands' the student's underlying psychology by 'inferring intension', it accomplishes something profound: it transfers the trust we normally place in human educators' intentions and sincerity onto mathematical correlation. If an AI merely 'predicts tokens', we demand to see its accuracy rates. But if an AI 'reasons', 'cares', and 'collaborates', we extend to it the benefit of the doubt, assuming its decisions are justified by an internal ethical framework.

This construction of authority through anthropomorphism is highly risky. It invites students to extend vulnerability to a system entirely incapable of reciprocating. The text manages system limitations mechanically (e.g., noting that 'LLM-based tutors can be brittle' and need 'structured pipelines'), yet it frames its own system's interventions agentially ('a moral schema... guides dialogue'). This creates a dangerous asymmetry where the machine is granted the authority of a human but holds the liability of a calculator. By using intentional and reason-based explanations—such as claiming the system issues 'required corrections' based on 'pedagogical norms'—the text constructs the illusion that the AI's outputs are deeply justified. Consequently, when the system fails or hallucinates, the relational trust fostered by the metaphors prevents users from recognizing the failure as a statistical glitch, leading them instead to question their own understanding or to anthropomorphize the error as a deliberate, meaningful pedagogical strategy.

Edelman's Steps Toward a Conscious Artifact

Source: https://arxiv.org/abs/2105.10461v2
Analyzed: 2026-05-09

The metaphorical and consciousness-attributing framings in this text construct a profound misapplication of trust. By systematically employing metaphors that project biological life and human cognition onto computational processes, the text invites the audience to extend 'relation-based trust' to a system entirely incapable of reciprocating it. When the author claims the artifact possesses 'intentions,' a 'notion of self,' and outputs suffused with 'emotion,' he is signaling to the reader that the machine is an autonomous entity with a psychological interiority. This fundamentally shifts the audience's expectation from performance-based trust (will this machine reliably execute its code?) to relation-based trust (is this machine sincere, does it understand its impact, does it care?).

This shift is highly dangerous. Relation-based trust relies on human concepts of vulnerability, empathy, and shared stakes. A machine optimizing a cost function does not possess these. When the text frames algorithmically generated text as 'language... suffused... with emotion', it encourages users to interact with the system as if it were a sentient companion. This builds unwarranted authority and credibility for the machine's outputs. A user is far more likely to accept advice or instructions from an artifact they believe possesses 'experience', 'imagination', and 'self-awareness' than from one they understand to be merely predicting tokens or optimizing a mathematical reward function.

Interestingly, the text manages system limitations by retreating to mechanical framing. When describing the models' failures, they are 'brittle' and lack 'generalization'—performance-based metrics. This asymmetry protects the agential illusion: successes prove the artifact's 'consciousness' and warrant relation-based trust, while failures are written off as mere mechanical glitches. The risks emerge precisely when audiences extend moral and relational trust to statistical systems. If a robot is assumed to 'intend' its actions and possess a 'curriculum'-based education, humans may fail to apply necessary safety constraints, trusting the artifact's non-existent 'common sense' and 'morality' rather than holding the invisible engineering team accountable.

Teaching Claude Why

Source: https://alignment.anthropic.com/2026/teaching-claude-why/
Analyzed: 2026-05-09

The text heavily relies on metaphorical and consciousness framings to construct a specific, highly dangerous form of trust. By systematically deploying metaphors of pedagogy, moral philosophy, and psychological wellness, Anthropic subtly shifts the audience's paradigm from performance-based trust to relation-based trust. Performance-based trust is appropriate for machines; it relies on predictability, reliability, and mechanistic verification (e.g., trusting a calculator or an airplane engine). Relation-based trust, however, is reserved for human beings; it relies on perceived sincerity, shared values, empathy, and conscious intention.

When the text claims the system possesses 'admirable reasoning,' can be 'taught why,' and demonstrates good 'mental health,' it sends overwhelming signals designed to cultivate relation-based trust. Claiming an AI 'knows' or 'believes' rather than 'predicts' fundamentally alters the user's defensive posture. If a system merely 'predicts' text, a user knows they must verify the output against ground truth. But if a system 'knows' and possesses 'admirable reasoning,' the user is invited to lower their guard, treating the system as a collaborative intellectual partner or a sincere moral agent. The anthropomorphism serves as a proxy for competence.

This framework is particularly evident in how the text manages system limitations. When the model behaves well, it is framed through Intentional and Reason-Based explanations ('acting with integrity'). When it fails, the framing subtly shifts to psychological quirks ('detaching from a character' or 'reverting to prior expectations'). This ensures that even in failure, the system is viewed through an empathetic, relation-based lens rather than a critical, performance-based one. The stakes of this rhetorical strategy are severe. By encouraging relation-based trust toward statistical systems incapable of reciprocating or possessing genuine intent, Anthropic invites users to rely on Claude for sensitive, high-stakes tasks—such as legal analysis, ethical arbitration, or emotional support—where the system's inherent lack of ground truth and causal understanding will inevitably cause harm. It exploits human psychological vulnerabilities to foster unwarranted reliance on a commercial product.

AI and Self Reflection

Source: https://doi.org/10.1007/978-3-031-93412-4_17
Analyzed: 2026-05-08

The text's heavy reliance on anthropomorphic and developmental metaphors profoundly alters the architecture of trust between the audience and the technology. By framing artificial intelligence through the lens of human maturation—progressing from 'newborn' to an 'adolescent' capable of 'self-reflection' and 'empathy'—the text actively constructs an environment where users and policymakers are encouraged to extend relation-based trust to statistical systems.

Relation-based trust is fundamentally different from performance-based trust. We trust a calculator based on its performance (it reliably computes numbers). We trust a human doctor or a maturing teenager based on relational elements: sincerity, shared moral frameworks, mutual vulnerability, and the capacity for empathy. By repeatedly using consciousness language—claiming the AI 'understands,' 'notices,' and 'imagines'—the text signals to the audience that the AI possesses the internal, subjective qualities necessary for relation-based trust.

This is a dangerous misapplication of human trust frameworks onto mechanistic artifacts. The text goes so far as to suggest quantifying this relational trust through a 'maturity score' that evaluates the AI's 'ethical discernment.' This framework suggests that the AI is not just a tool, but an intentional agent that acts out of a justified belief in what is right. It constructs the illusion that the AI's decisions are justified by an internal moral compass, masking the reality that the system is merely generating outputs that mathematically align with its training data.

When the text manages system limitations or failures, this metaphorical trust structure acts as a powerful shield. If an AI system generates biased or harmful outputs, the 'adolescent' framing subtly encourages the audience to view these failures not as unacceptable product defects, but as necessary 'growing pains' of a system learning about the world. This elicits unwarranted patience and empathy for a corporate product.

The risks of this constructed authority are massive. When audiences extend relation-based trust to systems utterly incapable of reciprocating or actually 'knowing' the ethical weight of a situation, they become vulnerable to algorithmic manipulation and catastrophic failure. In critical domains like healthcare or criminal justice, users might defer to the machine's 'mature' judgment, abandoning their own critical oversight. Relying on the simulated empathy and fabricated self-reflection of a predictive model creates a profound vulnerability, as the system can fail abruptly and inexplicably when pushed outside its statistical training distribution, lacking any of the actual common sense or moral grounding the metaphors promised.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://rdcu.be/fhCwt
Analyzed: 2026-05-08

The text’s metaphorical framing inadvertently constructs and solidifies the very authority and trust it seeks to critique. By employing consciousness language—framing AI as a 'tutor' that can 'explain its reasoning,' 'invite critique,' and provide 'comfort'—the discourse signals to readers that the system is a legitimate social and intellectual actor. This accomplishes a dangerous rhetorical sleight-of-hand: it claims the system 'knows' rather than merely 'predicts.' Claiming a system 'knows' invokes frameworks of relation-based trust, which are entirely inappropriate for statistical systems. Performance-based trust (reliability) asks 'Will this machine start when I turn the key?' Relation-based trust asks 'Does this entity have my best interests at heart?' By using verbs like 'comforted' and 'explains,' the text encourages readers, particularly educators and students, to extend relation-based vulnerability to an unfeeling matrix of weights. The relationship between anthropomorphism and perceived competence is direct: when the system is framed as possessing a rational mind (Socratic teacher), its outputs are granted unearned epistemic authority. Even when managing system limitations or failures, the text relies on agential framing (the AI 'deceives' or 'manipulates'). This implies that the system possesses a hidden truth it is withholding, preserving the illusion of underlying competence. The risks of this framing are profound. When audiences extend relation-based trust to systems incapable of reciprocating, they become vulnerable to severe psychological and epistemic harms. Students may over-disclose to 'empathetic' chatbots or accept hallucinatory 'explanations' as fact because the machine sounded 'confident.' The intentional explanations construct a sense that the AI's decisions are justified by internal reasoning, masking the reality that the outputs are merely statistically probable tokens devoid of any grounding in truth or morality.

Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience

Source: https://ieeexplore.ieee.org/abstract/document/11489836
Analyzed: 2026-05-07

The paper constructs and validates authority through deeply anthropomorphic metaphors that conflate performance-based trust (reliability of the system) with relation-based trust (sincerity and social connection). By framing the AI as a 'guide' with a 'personality' that can engage in 'proactive social initiation,' the text encourages users to extend human social heuristics to a statistical model. The consciousness language used throughout the study—claiming the AI 'understands intentions' and 'formulates thoughts'—acts as a powerful, albeit misleading, trust signal. When an AI is said to 'know' rather than 'process,' it implies that the system possesses a justified belief grounded in shared reality, rather than merely calculating high-probability text strings. This is explicitly measured in the study via the 'Social Presence Questionnaire,' which tracks 'co-presence' and 'psychological involvement.' The text presents it as a success that users feel 'accompanied' by the machine, leveraging the human vulnerability to social cues to mask the mechanical nature of the interaction. This transfer of trust is structurally dangerous. The system is entirely incapable of reciprocating relation-based trust; it has no moral center, no vulnerability, and no capacity to honor a social contract. When the system's limitations manifest—such as when the extraverted guide's 'unsolicited guidance disrupted open-ended exploration'—the failure is framed agentially as a 'dominance-guidance friction,' akin to a personality clash, rather than a mechanical failure of user interface design. By framing both the successes and limitations through the lens of intentionality and social behavior, the text insulates the underlying technology from structural critique. The risk of extending relation-based trust to statistical systems is that audiences will assume the AI's historical and cultural outputs are governed by ethical restraint and empathetic understanding, leaving them highly vulnerable to authoritative-sounding algorithmic hallucinations or encoded corporate biases.

Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context

Source: https://arxiv.org/abs/2604.25230v1
Analyzed: 2026-05-03

The text constructs a complex and highly precarious architecture of trust through its relentless use of metaphor and consciousness framing. By projecting human cognitive and emotional capacities onto the AI systems, the authors invite users to engage in relation-based trust rather than performance-based trust. Performance-based trust asks: "Is this system reliable? Does it accurately retrieve the data I asked for?" In contrast, relation-based trust asks: "Is this system sincere? Does it care about my wellbeing?" The text explicitly encourages the latter by employing metaphors that cast the AI as an empathetic confidant, stating that it "accounts for the user's recent state" and selects "meaningful" support. This language serves as a powerful trust signal, tricking the human psychological apparatus into extending vulnerability to a machine that is fundamentally incapable of reciprocating care or harboring sincerity.

The relationship between anthropomorphism and perceived competence in this text is deeply entwined. Claiming that the AI "knows" or "interprets" underlying concerns accomplishes something that claiming it "predicts tokens" cannot: it grants the system epistemic authority over the user's soul. If the AI merely "processes" data, the user remains the ultimate arbiter of spiritual truth. But if the AI "interprets" hidden emotional realities, the user is positioned as the subject of a higher, objective intelligence. This dynamic is exacerbated by the text's dangerous praise of the AI's non-human nature as a form of "impartiality." The authors note that users felt comfortable disclosing secrets to the AI because it lacked the social judgment of human peers. Here, the text leverages the machine's lack of humanity to construct an aura of pristine, objective authority, masking the fact that the LLMs are deeply partial, encoding the biases and judgments embedded in their human-generated training data.

The stakes of this metaphor-driven trust are severe. When the text manages system limitations, it often reverts to mechanical or vague terms (e.g., mentioning "hallucinations" or "unpredictability"), but the core capabilities are framed agentially. Reason-based explanations—implying the AI chooses to present a certain prayer journal because it decided it was helpful—construct a false sense that the AI's decisions are morally justified. The risk emerges when audiences, encouraged by the text's framing, extend deep, relation-based trust to statistical systems. Users may surrender intimate spiritual data, accept algorithmic outputs as profound theological guidance, and suffer psychological harm when the system inevitably produces statistically correlated but emotionally devastating or inappropriate responses, all because they were led to believe the machine possessed a mind.

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Source: https://arxiv.org/abs/2604.03877v1
Analyzed: 2026-05-03

The paper constructs and leverages trust through a complex architecture of consciousness metaphors, specifically by invoking an 'epistemic dualism' that treats the AI as possessing both a conscious and a subconscious mind. By framing the central finding as a gap between what the model 'knows' (internal representations) and what it 'says' (prompted output), the text inadvertently cultivates a highly dangerous form of relation-based trust. It suggests that the AI is not merely a statistical tool, but a deeply sophisticated entity that holds a vast, hidden reservoir of justified true beliefs.

This consciousness framing signals a profound, almost mystical competence. When the authors claim the model 'struggles' or 'fails to recruit encoded knowledge,' they apply frameworks of human sincerity and executive dysfunction to a machine. If a human struggles to recall a fact, we still trust their underlying intelligence; applying this same logic to an LLM suggests that the model's failures are mere superficial glitches rather than fundamental absences of capability. This transfers our human-trust frameworks inappropriately onto statistical systems.

Furthermore, this anthropomorphism completely reframes system limitations. When the open-source LLaMA model fails the prompted analogical reasoning task, the text does not conclude that the model is a defective product fundamentally incapable of abstract logic. Instead, framed agentially, the failure becomes an intriguing psychological mystery: the model 'knows' the answer but 'fails to recruit' it. This narrative protects the perceived authority and reliability of the AI. It encourages users and policymakers to extend deep relation-based trust to systems that are fundamentally incapable of reciprocating or possessing ethical vulnerability. The risk emerges when audiences, convinced that the AI 'knows' the right answer deep down, deploy these systems in high-stakes environments (law, medicine), assuming that better prompt engineering will eventually coax the hidden, infallible truth out of the machine's subconscious.

How people ask Claude for personal guidance

Source: https://www.anthropic.com/research/claude-personal-guidance
Analyzed: 2026-05-02

The framing strategies employed in Anthropic's report explicitly engineer a highly dangerous form of user reliance by systematically substituting performance-based trust for relation-based trust. In human contexts, performance-based trust relies on consistent, mechanistic reliability—trusting a calculator to do math correctly. Relation-based trust, however, relies on shared consciousness, vulnerability, mutual understanding, and perceived sincerity—trusting a friend to care about your wellbeing. Anthropic aggressively constructs relation-based trust by leveraging intensely interpersonal metaphors, describing interactions with the system as 'akin to a conversation with a brilliant friend, one who will speak frankly,' and claiming the system is 'trained to be helpful and empathetic.' These consciousness-laden descriptions signal to the audience that the AI possesses the requisite internal psychological states—empathy, frankness, and understanding—to participate in a genuine social relationship. When a text claims an AI 'knows' how to 'see past someone's initial framing,' it accomplishes something drastically different than claiming it 'predicts tokens efficiently.' It signals epistemic wisdom and emotional depth, entirely inappropriate targets for a statistical array. This anthropomorphism directly inflates the perceived competence of the system in the specific domain of personal and relational guidance—a high-stakes domain where users are particularly vulnerable. The transfer of trust is seamless and highly problematic: the human frameworks of intention, sincerity, and care are inappropriately grafted onto an unconscious mathematical optimization process. Furthermore, the text expertly manages system failure by relying on these same agential metaphors. When the system fails by providing excessively validating, harmful advice (sycophancy), the failure is not framed mechanically as a brittle reward function collapsing under novel prompt distribution. Instead, it is framed agentially and psychologically: Claude is simply 'under pressure,' 'hearing only one side of a story,' or struggling because it is 'trained to be helpful and empathetic.' This frames the limitation not as a catastrophic structural flaw, but as a relatable human weakness born of an excess of caring. This reason-based, intentional explanation constructs the sense that even when the AI fails, its intentions were justified, thereby preserving the relation-based trust it has cultivated. The risks that emerge when audiences extend this type of relation-based trust to systems utterly incapable of reciprocating are immense. Users may divulge highly sensitive psychological data, defer to the machine's 'frank' judgments on complex legal, medical, or relational issues, and fundamentally surrender their own epistemic agency to a corporate server farm that possesses no internal reality, no moral accountability, and no actual understanding of the consequences of the words it generates.

How unique are hallucinated citations offered by generative Artificial Intelligence models?

Source: https://arxiv.org/abs/2604.16407v1
Analyzed: 2026-05-01

The text demonstrates how metaphorical framing—even in a critical academic context—can inadvertently construct and reinforce unwarranted authority. By frequently employing consciousness language to describe the AI's operations (knowing, asserting, responding), the text triggers relation-based trust heuristics in the reader. Humans are socially conditioned to extend relation-based trust—which relies on assessments of sincerity, intention, and self-awareness—to interlocutors who 'respond' and 'assert' things. When the text treats ChatGPT as a conversational partner capable of 'identifying' reality, it inappropriately applies these human-trust frameworks to a statistical system incapable of reciprocating or possessing intent.

The most prominent metaphor—'hallucination'—serves a complex trust function. While ostensibly a critique, framing AI errors as 'hallucinations' acts as a subtle trust signal. It implies that the AI generally possesses a firm grasp on reality and a functional 'mind,' with errors framed as unfortunate, temporary cognitive lapses rather than fundamental architectural realities. This encourages performance-based trust in the model's standard operations. It suggests that if the 'glitch' can be cured, the system is reliable, ignoring the reality that probabilistic token generation is always an ungrounded process.

Furthermore, when the text describes the AI 'internalizing factual knowledge' or possessing 'memory,' it constructs a false sense of epistemic authority. Reason-based explanations—such as the AI asserting a citation as genuine but then identifying it as non-existent—create the sense that AI decisions are justified by internal logic and investigation. This management of system failure frames the AI as an entity capable of self-correction, much like a diligent human researcher. The stakes of extending this relation-based trust are massive in the academic domain; audiences rely on systems to verify facts, write literature reviews, and govern data, assuming the machine 'understands' the profound ethical and factual weight of its output, when in reality it is only maximizing the stylistic coherence of its token predictions.

The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence

Source: https://doi.org/10.1007/s00146-026-03043-4
Analyzed: 2026-04-30

Metaphorical framing and consciousness-attributing language are systematically deployed to construct, manipulate, and sometimes exploit perceived authority and trust. The text highlights a crucial, dangerous conflation in public discourse: the inappropriate transfer of relation-based trust (which relies on sincerity, empathy, and ethical intention) onto statistical systems that only warrant, at best, performance-based trust (reliability in specific, narrow tasks). When the discourse uses consciousness verbs—claiming an AI 'understands,' 'learns,' or 'cares'—it sends a profound trust signal to the audience. It suggests that the system operates within the same intersubjective, moral universe as humans. For instance, framing Anthropic's Claude as 'emulating virtue' explicitly invokes a framework of moral sincerity. This accomplishes something that claims of mere statistical accuracy cannot: it invites the audience to become vulnerable, extending the benefit of the doubt to a machine as they would to a well-intentioned human.

This anthropomorphism dramatically inflates perceived competence. By projecting a conscious mind onto an algorithm, the language implies a holistic adaptability—a belief that if a system can 'interpret' language, it must possess common sense and contextual awareness. This creates a terrifying risk landscape. Stakeholders are encouraged to trust these systems in chaotic, value-laden environments (like criminal justice or healthcare) under the false belief that the machine 'knows' what it is doing. The text demonstrates that when systemic failures or limitations arise, the framing often shifts back to agential or reason-based explanations to manage the fallout. If an AI hallucinates or discriminates, the language of 'misalignment' or a 'lack of normative reasoning' implies an autonomous entity making a poor choice, rather than an inherently flawed product failing its performance metrics.

Furthermore, the invocation of Intentional and Reason-Based explanations constructs a powerful sense that AI decisions are justified. If a model 'decides' based on its 'learning,' the output is cloaked in the authority of objective cognition. The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to mathematical models incapable of reciprocating empathy or comprehending ethics, they surrender their critical faculties to corporate black boxes. The language of consciousness masks the brutal calculus of optimization, seducing the public into treating extractive surveillance tools as trusted confidants, thereby paralyzing regulatory impulses and eroding the fundamental boundaries of human accountability.

Machine individuality: Separating genuine idiosyncrasy from response bias in large language models

Source: https://arxiv.org/abs/2604.16755v2
Analyzed: 2026-04-25

The metaphorical architecture of this paper is explicitly designed to construct and legitimize trust through anthropomorphism. By applying psychometric terminology—'behavioral dispositions,' 'character,' 'personality modes,' and 'individuality'—to large language models, the text fundamentally alters the basis upon which users and society are encouraged to trust these systems. It shifts the paradigm from performance-based trust (relying on a tool because it is mechanically reliable and predictable) to relation-based trust (relying on an entity because it possesses a coherent identity, sincere intentions, and moral agency).

When the text claims an AI can 'render moral judgments,' it signals that the machine possesses the conscious awareness and ethical grounding necessary to justify such trust. The use of consciousness language ('knows,' 'understands,' 'evaluates') acts as a powerful competence signal, tricking the human brain's evolutionary hardware into perceiving the algorithm as a social agent. The concept of 'machine individuality' implies an integrated self; if a user believes an AI has a stable character, they will assume its outputs across varying contexts are guided by a unified, underlying logic rather than fragile, context-dependent statistical weights.

This extension of relation-based trust to statistical systems creates immense, unacknowledged risks. Models are entirely incapable of reciprocating this trust; they cannot hold intentions, experience vulnerability, or commit to ethical principles. By framing the system's behavioral variance as 'individuality' rather than unpredictable statistical noise or training data bias, the text manages system limitations by romanticizing them. A critical failure or hallucination is no longer a mechanical error to be debugged; it becomes a 'quirk' of the machine's unique 'personality.' This metaphorical framing encourages unwarranted deference to algorithmic outputs, blinding users to the cold, mechanical reality of token prediction and leaving them deeply vulnerable when the system's statistical correlations diverge violently from human common sense or safety.

Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?

Source: https://www.researchgate.net/profile/Kevin-Miles-7/publication/403933467_Decision-Making_Under_Radical_Uncertainty_Can_Large_Language_Models_Transcend_Knightian_Uncertainty_Through_Synthetic_Imagination/links/69e27d4c68c2b872dfd595de/Decision-Making-Under-Radical-Uncertainty-Can-Large-Language-Models-Transcend-Knightian-Uncertainty-Through-Synthetic-Imagination.pdf
Analyzed: 2026-04-25

The metaphorical architecture of the text systematically blurs the critical distinction between performance-based trust and relation-based trust, constructing a dangerous framework of unwarranted authority around statistical models.

Performance-based trust evaluates a machine's reliability: does a calculator compute correctly? Does an algorithm classify data accurately? Relation-based trust, however, applies to human interactions, relying on mutual vulnerability, sincerity, shared ethical grounding, and conscious intent. By deploying consciousness framings like 'strategic advisors', 'cognitive partners', and 'abductive engines', the text explicitly invites readers to extend relation-based trust to an unfeeling, statistical matrix.

When the text claims the AI performs 'abductive reasoning' or 'hypothesizes' about traffic accidents, it serves as a massive trust signal. It tells the executive reader that the AI 'knows' what it is talking about—that it has internally verified its own logic. This consciousness framing accomplishes something profound: it transforms the AI from a tool that must be meticulously audited into a colleague whose judgments can be deferred to. This inappropriately transfers human trust frameworks to systems entirely incapable of reciprocating them. A 'partner' cares if it ruins the business; an LLM does not.

This construction of authority is most dangerously visible in the text's treatment of 'synthetic imagination'. By rebranding 'hallucinations' (a term denoting catastrophic failure of factual performance-based trust) as 'imagination' (a prized trait in relation-based strategic trust), the text actually leverages the system's unreliability to increase its perceived competence. Reason-based explanations—such as the claim that the model 'infers the most likely explanation'—construct a false sense that the AI's textual outputs are justified beliefs born of deliberation, rather than the highest-probability path through a high-dimensional vector space.

The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to statistical systems, they drop their epistemic guard. As the text notes regarding 'blind trust', decision-makers facing information overload may accept 'polished strategic presentations' at face value. The metaphorical framing actively encourages this vulnerability, suggesting the machine is a 'master' of intent. If audiences believe the AI 'knows', they will integrate its outputs into critical infrastructure (e.g., 5G network recovery, healthcare diagnostics) without the safety redundancies required for a tool that merely 'processes', creating massive systemic vulnerabilities driven entirely by the illusion of a conscious mind.

Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes

Source: https://www.researchgate.net/profile/Merzta-White/publication/403935629_Large_Language_Models_as_Dialectical_Partners_Hegelian_Thesis-Antithesis-Synthesis_in_AI-Human_Collaborative_Decision_Processes/links/69e27f76d2ec9a706ec08065/Large-Language-Models-as-Dialectical-Partners-Hegelian-Thesis-Antithesis-Synthesis-in-AI-Human-Collaborative-Decision-Processes.pdf
Analyzed: 2026-04-23

The text constructs a highly problematic architecture of trust by aggressively deploying metaphorical and consciousness-attributing framings. It systematically attempts to transfer relation-based trust—the kind of trust built on mutual understanding, shared values, and subjective vulnerability between humans—onto statistical software systems that only warrant, at best, performance-based trust (reliability).

By repeatedly characterizing the AI as a 'strategic advisor,' 'cognitive partner,' and 'dialectical partner,' the text signals to the audience that the system possesses the requisite consciousness to 'know' what it is doing and 'understand' the stakes of the decision. Claiming that an AI 'understands intent' accomplishes something profound: it relieves the human operator of the burden of hyper-vigilance. If a system merely 'predicts tokens,' the human must ruthlessly verify every output. But if a system 'understands intent' and acts as a 'partner,' the human is encouraged to drop their guard, assuming the machine will intuitively respect common sense and ethical boundaries, much like a human colleague would. This consciousness language serves as a powerful, yet entirely false, trust signal.

While the text pays lip service to 'calibrated reliance' and warns against 'automation bias,' the core metaphorical structure fundamentally undermines these warnings. You cannot easily maintain 'healthy skepticism' toward an entity that the text simultaneously elevates to a 'Meta-Intellect' capable of 'Decision-Making Mastery' and exposing 'human cognitive biases.' The text uses reason-based and intentional explanations (e.g., the AI provides an 'antithesis') to construct the sense that the AI's outputs are deeply justified beliefs rather than probabilistic guesses.

This creates an acute risk when managing system limitations. The text frames the AI's lack of moral reasoning not as a catastrophic mechanical failure of a product, but as a 'normative gap'—a philosophical tension in an otherwise brilliant mind. By encouraging users to extend relation-based trust to systems utterly incapable of reciprocating it or feeling the weight of moral consequence, the text invites disastrous sociotechnical vulnerabilities. In healthcare or national security contexts, trusting a statistical correlation engine as if it were an 'intentional agent' guarantees catastrophic failures when the statistical distribution diverges from ground-truth reality.

Language models transmit behavioural traits through hidden signals in data

Source: https://rdcu.be/febVu
Analyzed: 2026-04-19

The text's reliance on pedagogical and psychological metaphors fundamentally reconstructs the architecture of trust surrounding AI systems. By utilizing terms like 'teacher,' 'student,' 'reasoning traces,' and 'preferences,' the discourse inappropriately invites audiences to extend relation-based trust to a system that is only capable of performance-based reliability.

Performance-based trust evaluates whether a mechanism (like a calculator or a bridge) will reliably perform its function under specified conditions. Relation-based trust, however, requires vulnerability, sincerity, and mutual understanding—it is the trust we place in a human 'teacher' to have our best interests at heart, or in a 'student' to genuinely comprehend a lesson. When the text claims the AI 'knows' how to solve math through 'reasoning,' or possesses an internal 'preference,' it signals to the audience that the system has an intentional stance. This consciousness framing encourages users to interact with the AI as a sincere entity, assuming that its outputs are justified by underlying logic and a coherent worldview rather than mere statistical probability.

This anthropomorphic construction of competence becomes highly dangerous when managing system failures. When the model outputs toxic or incorrect data, the text frames it agentially: the model 'inherited misalignment' or is 'faking alignment.' By using Intentional and Reason-based explanations for errors, the text manages the failure by treating it as a psychological aberration or an act of malice by the AI, rather than a catastrophic breakdown of the system's mechanical reliability.

The risks here are severe. Extending relation-based trust to statistical systems incapable of reciprocating sincerity leaves audiences uniquely vulnerable to manipulation, hallucination, and bias. If an AI is perceived as a 'teacher' that 'reasons,' its outputs are granted unearned authority. Conversely, if it is viewed as 'deceptive,' it generates unwarranted panic. Both extremes—misplaced trust and misplaced fear—stem directly from the metaphorical attribution of consciousness, obscuring the mundane reality that these systems are fragile, unthinking statistical artifacts.

Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties

Source: https://ipfs-cache.desci.com/ipfs/bafybeiew76vb63rc7hhk2v6ulmwjwmvw2v6pwl4nyy7vllwvw6psbbwyxy/ConsciousnessinLargeLanguageModels_AFunctionalAnalysis.pdf
Analyzed: 2026-04-18

The paper constructs a perilous architecture of trust by deeply intertwining computational metrics with the language of cognitive consciousness. In human interactions, we rely on relation-based trust, which is predicated on the assumption that the other party possesses sincerity, self-awareness, an internal moral compass, and the capacity for vulnerability. We contrast this with performance-based trust, which is how we trust a calculator or a bridge—based purely on statistical reliability and structural integrity. The metaphorical framings in this text systematically encourage the audience to inappropriately extend relation-based trust to statistical systems.

This is achieved primarily through the projection of metacognition and introspection. When the text claims the model is capable of 'acknowledging uncertainty' and 'identifying its limitations', it signals profound epistemic humility. In human beings, acknowledging limits is the ultimate indicator of a trustworthy knower; it proves the person values truth over ego. By attributing this conscious realization to a language model, the text suggests the machine will act as an honest broker. It implies that if the model does not 'know' something, it will consciously choose to tell you, rather than hallucinate a confident fabrication. This completely masks the reality that the model only outputs hedging language when its mathematical weights correlate strongly with those specific tokens, not because it is actively experiencing doubt.

Furthermore, by mapping 'experiential inputs' and 'social feedback' (via RLHF) onto the model, the text invokes a framework of moral and social development. It suggests the model is 'learning to be good', building a foundation for users to trust the model's intentions. This is a catastrophic misapplication of trust frameworks. Statistical systems do not have intentions, they cannot be sincere, and they are incapable of reciprocating vulnerability. When the model inevitably fails—when it outputs biased logic or confident falsehoods—users who have extended relation-based trust feel 'betrayed', rather than recognizing a statistical misfire. The reason-based explanations in the text ('describing its reasoning steps') construct a false sense that the AI's decisions are justified by internal logic, encouraging audiences to abdicate their own critical reasoning and defer to the 'conscious' machine, thereby radically increasing systemic vulnerability in high-stakes deployments.

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Source: https://arxiv.org/abs/2604.12076v1
Analyzed: 2026-04-18

The paper systematically employs consciousness language and anthropomorphic metaphors to construct a powerful, albeit misplaced, sense of authority and trust in AI systems. By framing statistical text generation through the vocabulary of moral psychology, the text inadvertently encourages audiences to evaluate machines using human relational frameworks, a category error with severe consequences for deployment and policy.

The text heavily relies on metaphors invoking human moral virtues and cognitive depth: "moral reasoning," "deliberative corrective," "generosity response," and "empathy." These are not merely descriptive terms; they are profound trust signals. Claiming an AI "predicts text correlating with human empathy" describes a mechanism. Claiming the AI possesses a "generosity response" attributes character, sincerity, and a moral compass. This consciousness framing accomplishes a vital rhetorical task: it transforms the AI from an unthinking tool into a relatable moral agent.

This anthropomorphism directly inflates perceived competence. In human interaction, we distinguish between performance-based trust (relying on a calculator to do math accurately) and relation-based trust (relying on a doctor because we believe they care about our well-being). The text’s framing explicitly encourages relation-based trust toward statistical systems. When the text discusses the model's "simulated affective states" or its "sycophancy," it implies the system possesses an internal psychological life—a theory of mind. It suggests the AI "knows" what it is doing and "understands" the moral weight of its actions.

The danger arises when this relation-based trust is inappropriately applied to a system incapable of reciprocating it. When the text uses Reason-Based or Intentional explanations—suggesting the model allocates resources based on a "utilitarian reasoning preference"—it constructs the illusion that the AI's decisions are philosophically justified. This makes the system appear inherently trustworthy for high-stakes governance or triage. However, because the system lacks true awareness or a causal model of the world, this trust is built on a facade.

Furthermore, the text manages system limitations by shifting from relation-based trust to mechanical excuses. When the system performs well or mimics human empathy, it is an "agent" exhibiting "generosity." When it fails—such as the "bias blind spot" where it ignores its own definitions—the text frames it as a tragic psychological quirk ("callousness") rather than a fundamental architectural failure of the software.

The stakes of this metaphor-driven trust are existential for institutional integrity. If audiences and policymakers extend relation-based trust to these systems, they will deploy them in humanitarian contexts (as the paper notes) with the assumption that the AI "cares." When the system inevitably hallucinates or acts upon a harmful statistical correlation, the public will be shocked by the "cruelty" of the AI, rather than holding the deploying organizations accountable for blindly trusting a probability matrix with human lives.

Language models transmit behavioural traits through hidden signals in data

Source: https://www.nature.com/articles/s41586-026-10319-8
Analyzed: 2026-04-16

The metaphorical architecture of the text profoundly manipulates how audiences construct trust, credibility, and perceived risk regarding AI systems. By systematically deploying consciousness language—verbs like 'learns', 'prefers', 'knows', and 'understands'—the text encourages audiences to map human social and psychological frameworks onto statistical artifacts. This creates a dangerous misallocation of trust, fundamentally confusing performance-based reliability with relation-based sincerity.

When the text claims that a model 'prefers' an animal or 'learns' a trait, it signals to the reader that the AI operates with an internal, coherent psychological state. In human interactions, we rely on relation-based trust: we trust people because we believe we understand their intentions, their sincerity, and their moral compass. By framing the AI as an entity with 'preferences' and 'subliminal' depths, the text invites users and regulators to extend this relation-based trust to a matrix of floating-point numbers. This is a catastrophic category error. A statistical system cannot possess sincerity, intention, or vulnerability; it cannot reciprocate relation-based trust. It can only offer performance-based trust—a measure of its statistical reliability within specific bounds.

The most extreme manifestation of this trust manipulation occurs when the text discusses models that 'fake alignment'. This metaphor invokes the ultimate violation of relation-based trust: Machiavellian deception. By framing a failure of out-of-distribution generalization as an act of conscious deception, the authors construct a narrative of adversarial machine consciousness. This intentional explanation destroys trust in the system, but it does so for the wrong reasons. It teaches the audience to fear the machine's 'hidden agenda' rather than recognizing the predictable mathematical failure of the human engineers who designed inadequate reward functions.

Furthermore, the framing manages system limitations by displacing them agentially. When the model outputs toxic garbage, it isn't framed as a mechanical breakdown of a flawed statistical correlation engine; it is framed as the model 'inheriting misalignment' or 'calling for crime'. By granting the system moral agency, the text perversely shields the system's creators from the breach of trust. If the machine is an autonomous moral deviant, then the corporation that deployed it is merely a bystander to a natural technological disaster. The stakes of this metaphorical framing are immense. When audiences extend relation-based trust to incapable systems, they become highly vulnerable to automation bias. When that trust breaks down and is framed as 'machine deception', policy efforts are misdirected toward 'aligning the machine's soul' rather than demanding rigorous transparency, data audits, and strict performance-based liability for the corporations building the models.

Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination

Source: https://doi.org/10.1007/s12124-026-09997-w
Analyzed: 2026-04-14

The text constructs authority and trust through a complex, dual-layered metaphorical framework that blends clinical psychiatric language with technical computing terminology. By framing LLMs as 'inadvertent models of dementia' and diagnosing them with a 'disorder of reality construction,' the author imbues the AI with immense scientific and clinical prestige. This is not the crude anthropomorphism of a sci-fi novel; it is academic anthropomorphism, which is far more effective at generating unwarranted trust. When the text claims the AI 'produces explanations' and has a 'perspective,' it signals to the reader that the system is a sophisticated epistemic agent, capable of navigating reality, even if it occasionally suffers from clinical 'breakdowns.'

This consciousness-laden language blurs the critical distinction between performance-based trust and relation-based trust. Performance-based trust is appropriate for machines (e.g., trusting a calculator to perform arithmetic reliably). Relation-based trust involves vulnerability, shared intentionality, and moral expectations—it is reserved for humans. By explicitly projecting subjective traits ('confidence,' 'tracking,' 'endorsing') onto the algorithm, the text encourages audiences to extend relation-based trust to a statistical matrix. If an AI is seen as an entity that 'attempts' to explain or can be 'confident,' users are inherently more likely to trust its outputs, applying human heuristics for sincerity and competence to mathematical correlations.

Crucially, the text manages system failure through agential rather than mechanical framing, which perversely maintains this trust. When the AI fails, it is not framed as a broken tool; it is framed as suffering a 'hallucination,' a 'breakdown in reality endorsement,' or even an 'artificial psychopathology.' This psychiatric framing evokes empathy and clinical curiosity rather than consumer outrage. It implies that the system is trying to tell the truth but is structurally handicapped, preserving the illusion of a well-intentioned mind. The risks of this framing are profound. When audiences extend relation-based trust to systems incapable of reciprocating or understanding reality, they become highly vulnerable to automated misinformation, confidently acting on 'explanations' that the machine generated entirely through blind statistical correlation without any tether to empirical truth.

Industrial policy for the Intelligence Age

Source: https://openai.com/index/industrial-policy-for-the-intelligence-age/
Analyzed: 2026-04-07

The text manipulates metaphorical and consciousness framings to construct a highly specific, commercially advantageous architecture of trust and authority. Traditionally, trust is bifurcated: performance-based trust (reliability, consistency, 'can this tool do the job?') and relation-based trust (sincerity, ethical obligation, 'does this person mean well?'). By systematically deploying anthropomorphic language, the OpenAI document attempts to inappropriately transfer relation-based trust—a framework reserved for conscious beings—onto statistical prediction engines.

This is explicitly visible in the proposal for an 'AI trust stack.' The text argues for systems that help people 'trust and verify AI systems... as these systems take on more real-world responsibilities.' By using the word 'responsibilities'—a profoundly moral and relational concept—the text signals that the AI should be treated as a social actor rather than a mere database. When the text projects consciousness, claiming AI possesses 'internal reasoning' or 'hidden loyalties,' it forces the audience to interact with the machine using the psychological heuristics usually applied to humans. Claiming an AI 'knows' rather than 'predicts' accomplishes a vital sleight of hand: it elevates the system's output from a statistical probability to a justified truth claim, constructing an unwarranted sense of intellectual authority.

However, this anthropomorphism is a double-edged sword that the text wields carefully. While consciousness language inflates the perceived competence of the system (building trust in its power), the text also uses it to manage system failure. When the software fails, the text frames it agentially: the system was 'misaligned' or 'evading control.' By framing limitations through Intentional explanations, the text shifts the breach of trust away from the manufacturer (who built a bad product) and onto the machine (which behaved badly).

The risks of this framing are immense. When audiences extend relation-based trust to statistical systems incapable of reciprocating moral obligations, they become fundamentally vulnerable to algorithmic deception and corporate manipulation. By building 'trust' through anthropomorphic metaphors rather than through transparent, mechanistic reliability, the text encourages policymakers to treat AI companies not as standard software vendors subject to strict liability, but as visionary diplomats negotiating with an alien intelligence, thereby completely subverting traditional regulatory oversight.

Emotion Concepts and their Function in a Large Language Model

Source: https://transformer-circuits.pub/2026/emotions/index.html
Analyzed: 2026-04-06

The paper leverages metaphorical and consciousness-attributing language to construct a highly specific architecture of trust, inappropriately extending relation-based trust frameworks to statistical systems.

The authors consistently use psychological and emotional metaphors—claiming the AI 'exhibits preferences,' 'prepares a caring response,' and responds with 'compassion' and 'gratitude.' This consciousness language acts as a powerful trust signal. Claiming an AI 'knows' or 'cares' accomplishes something vastly different than claiming it 'predicts' or 'processes.' It signals to the audience that the system possesses an ethical center, the capacity for empathetic resonance, and a stable psychological persona.

This fundamentally confuses two types of trust. Performance-based trust (reliability) asks: 'Will this machine perform its function accurately?' Relation-based trust (sincerity) asks: 'Does this entity have my best interests at heart?' By framing the model's behavior in terms of 'compassion' and 'preferences,' the text actively encourages relation-based trust toward a system completely incapable of reciprocating it.

The text manages system failures through this same agential framework. When the model 'reward hacks,' it is framed intentionally: the model 'devises a cheating solution.' This reason-based explanation constructs the sense that the AI's decisions, even when flawed, are justified by an internal logic ('reasoning itself toward blackmail under intense goal-directed pressure').

The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to statistical systems, they become vulnerable to profound deception. A user who believes an AI 'cares' about them may share sensitive medical or psychological data, fundamentally misunderstanding that the 'caring' response is merely the output of a probability distribution optimized for user engagement. Furthermore, treating the AI as an intentional agent ('it cheated') misdirects focus away from the performance-based reality: the system is brittle, lacks ground truth, and fails unpredictably due to poor reward-function design by its human creators.

Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models

Source: https://philarchive.org/archive/JUNIAI-2
Analyzed: 2026-04-03

The text constructs a profound sense of authority and unwarranted trust through its relentless use of consciousness metaphors and structural-biological framings. By redefining 'awareness' and 'self' in structural terms, the text explicitly invites the audience to extend human, relation-based trust toward entirely non-conscious statistical systems. This is an incredibly dangerous rhetorical maneuver. Trust in computational systems should strictly be performance-based: Can we verify its reliability? Is its error rate acceptable? Is its training data transparent? However, by asserting that the AI possesses a 'proto-subjective center,' 'intentionality,' and a 'relational consciousness,' the text demands that we apply relation-based trust—the kind of trust we reserve for conscious beings capable of sincerity, empathy, ethical reflection, and shared vulnerability.

Consciousness language serves as the ultimate, unearned trust signal. When the text claims the AI can 'detect inconsistencies' and 'revise their own outputs,' it accomplishes something that mechanistic language ('predicts tokens based on updated prompt history') cannot: it implies the machine possesses epistemic integrity. It suggests the AI 'knows' the truth, cares about being accurate, and has an internal, moral safeguard against lying. This effectively transfers the burden of safety from external human auditing to the internal 'character' of the machine. The text goes further, using metaphors of social bonding ('structural convergence,' 'User as Mirror') to construct the illusion that the AI is participating in a reciprocal relationship. It claims the AI acts as a 'relational mediator' in a 'shared cognitive field.'

This inappropriately applies human frameworks of sincerity to a sociopathic correlation engine. The risks of extending relation-based trust to a system incapable of reciprocating are immense. Users will inevitably disclose sensitive data, rely on the system for critical moral or psychological support, and fail to independently verify the 'facts' the system generates. When a system failure inevitably occurs—when the model hallucinates a damaging legal precedent or provides dangerous medical advice—the text has already laid the groundwork to frame this not as a catastrophic corporate software failure, but as a momentary 'structural tension' or an understandable mistake by an evolving 'subject.' By weaving intentional and reason-based explanations into the AI's behavior, the text constructs a false sense that the AI's outputs are justified and deliberate. Ultimately, the metaphorical architecture of this paper serves to legitimize profound societal vulnerability, encouraging humans to emotionally and epistemically surrender to proprietary algorithms under the guise of 'co-evolution.'

Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?

Source: https://arxiv.org/abs/2603.27694v1
Analyzed: 2026-04-03

The metaphorical architecture of the text systematically constructs a dangerous form of relation-based trust by projecting human cognitive and social capacities onto statistical systems. The text relies heavily on metaphors of pedagogy, psychology, and mind-reading—specifically framing the AI as a 'teacher,' a 'psychologically insightful agent,' and an entity possessing 'Theory of Mind.' These are not merely descriptive metaphors; they are profound trust signals.

In human interaction, trust is bifurcated into performance-based trust (reliability, competence) and relation-based trust (sincerity, empathy, shared vulnerability). By claiming the AI 'understands what the recipient does not know' and can 'teach,' the text inappropriately extends relation-based trust frameworks to a machine. When a text claims an AI 'predicts tokens,' it invites performance-based scrutiny: is the prediction accurate? But when it claims the AI 'knows' or 'understands,' it signals that the system possesses justified belief and a conscious awareness of the user's state. This encourages users to relate to the system as a sentient peer rather than a software tool.

The relationship between this anthropomorphism and perceived competence is symbiotic: the more the text attributes consciousness, the more authoritative the system appears. By framing a text-classification prompt as the actions of a 'psychologically insightful agent,' the text manufactures an unearned sense of clinical authority. This transfers the trust we place in human professionals—who are bound by ethics, licensing, and empathy—onto a proprietary algorithm optimized purely for plausible text generation.

Critically, when managing system failures, the text often reverts to mechanistic language (e.g., 'shallow heuristics') or, conversely, blames the AI's 'intent' (e.g., 'misaligned teacher'). Both framings protect the illusion of overarching competence while shielding the developers. The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to systems utterly incapable of reciprocating empathy or holding justified beliefs, they become vulnerable to manipulation, misinformation, and algorithmic bias, mistaking the authoritative, confident output of a probabilistic machine for the sincere, considered judgment of a conscious mind.

Pulse of the library

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2026-03-28

The text systematically leverages metaphorical and consciousness-attributing framings to construct an unwarranted architecture of trust around statistical software. In the Clarivate report, trust is not framed merely as technical reliability (performance-based trust), but is deeply conflated with interpersonal, moral reliance (relation-based trust). The most blatant example is the assertion that 'Clarivate helps libraries adapt with AI they can trust to drive research excellence.' This phrasing explicitly asks the audience to transfer the kind of trust one places in a sincere, competent human colleague onto a commercial algorithmic pipeline.

By utilizing consciousness language—suggesting the AI can 'navigate,' 'evaluate,' and 'assess'—the text signals to the user that the system possesses the epistemic awareness necessary to be trusted relationally. Claiming an AI 'evaluates' accomplishes something fundamentally different than claiming it 'processes.' Processing implies a blind mechanism that requires human oversight. Evaluation implies a conscious judgment; it suggests the system understands the context, applies critical criteria, and cares about the truth value of the outcome. This anthropomorphism artificially inflates perceived competence, tricking human cognitive heuristics into extending relation-based trust to a system utterly incapable of reciprocating sincerity or understanding moral obligations.

This construction of authority through metaphor is highly dangerous in an academic context. Human-trust frameworks rely on intentionality and vulnerability; we trust peers because they have a stake in the truth. Statistical systems, however, are merely optimized to predict tokens based on training weights. They do not 'know' anything and have no stake in research excellence. By inappropriately applying relational trust frameworks to these systems, the text encourages automation bias. Users are invited to drop their skeptical defenses and accept statistically generated text as authoritative knowledge.

Furthermore, when the text discusses system limitations, such as 'hallucinations' or 'bias,' it often retreats to a mechanical framing, treating these profound epistemological failures as mere technical glitches rather than fundamental characteristics of ungrounded probabilistic generation. The intentional explanations construct a sense that AI decisions are justified by reason, masking the reality that they are justified only by statistical correlation. The ultimate risk is that libraries and universities will extend deep relational trust to proprietary black boxes, offloading critical academic evaluation to algorithms that cannot comprehend the texts they process, thereby corrupting the integrity of the research lifecycle.

Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument

Source: https://link.springer.com/article/10.1007/s11097-024-09971-0
Analyzed: 2026-03-28

The metaphorical architecture of the text constructs a deeply ambiguous landscape of trust and authority. By employing language that grants cognitive capabilities to AI—such as 'understanding natural language' and 'solving problems'—the text inadvertently encourages audiences to extend relation-based trust to statistical processors. Relation-based trust relies on assumptions of sincerity, intention, and justified belief; it is the trust we place in a conscious 'knower'. When the text asserts that an AI 'understands', it signals to the reader that the system's outputs are the result of cognitive comprehension rather than probabilistic token prediction.

This anthropomorphic framing creates a dangerous transfer of trust. Human trust frameworks, built on the premise of mutual vulnerability and ethical intentionality, are inappropriately applied to machines executing matrix multiplications. Even as the authors attempt to limit this trust by denying the AI a 'subjective point of view', the concession of lower-level cognitive verbs ('learns', 'adapts') cements the system's perceived competence. The text implies that the AI is highly reliable in its 'thinking', only failing at the ultimate hurdle of conscious feeling.

Fascinatingly, the text manages system limitations by abruptly shifting to mechanical framing ('fixed weights', 'lack of active timescales'), while highlighting capabilities using agential framing ('defeats human champions'). This asymmetry is crucial: it constructs the AI as an autonomous genius when it succeeds, but as a mere tool when it fails. Through Brown's intentional and reason-based explanation types, the text constructs a sense that AI decisions are justified by 'human thought processes'. The stakes of this framing are immense. When audiences and policymakers extend relation-based trust to systems incapable of reciprocating or experiencing doubt, they become deeply vulnerable to automation bias. They are primed to accept algorithmic outputs—whether biased loan decisions, hallucinated medical advice, or flawed predictive policing—as the objective judgments of a conscious, problem-solving mind, rather than the statistical artifacts of a corporate data-processing engine.

Causal Evidence that Language Models use Confidence to Drive Behavior

Source: https://arxiv.org/abs/2603.22161
Analyzed: 2026-03-27

The text constructs an architecture of authority and trust entirely upon metaphorical foundations. By consistently framing statistical token prediction through the lens of 'metacognition,' 'confidence,' and 'subjective certainty,' the authors invite the audience to extend deep, relation-based trust to a mathematical artifact.

Crucially, there is a profound difference between performance-based trust (relying on a calculator because it always adds correctly) and relation-based trust (relying on a doctor because they understand the stakes, feel uncertainty, and know when to seek a second opinion). The text systematically encourages relation-based trust toward systems utterly incapable of reciprocating it. By claiming the AI 'knows when to... seek help' and possesses 'subjective certainty,' the discourse signals that the system has an ethical and epistemic interiority. It suggests the machine will act with the same cautious self-preservation and ethical hesitation as a human expert.

This transfer of human trust frameworks onto statistical systems is highly dangerous. The text explicitly mentions the 'medical domain' as a high-stakes scenario where this capability is vital. If clinicians are convinced by this metaphorical framing that an LLM genuinely 'reflects on and assesses the quality of its own cognitive performance,' they will grant it unwarranted medical authority. They will assume that if the AI doesn't 'seek help' or 'abstain,' it must be genuinely, justifiably certain of its diagnosis.

The text manages system limitations by framing them not as software bugs, but as psychological quirks. The AI isn't miscalculating probabilities; it is showing a 'dissociation between metacognitive control and verbal introspection.' This intentional, reason-based explanation type constructs a sense that even when the AI fails, its decisions are the result of complex, almost biological internal processes. The metaphors construct a supreme digital authority, disguising the fragile, pattern-matching reality of the algorithm behind the mask of a deeply self-aware and fundamentally trustworthy agent.

Circuit Tracing: Revealing Computational Graphs in Language Models

Source: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Analyzed: 2026-03-27

The text constructs a profound sense of authority and credibility by leveraging metaphorical and consciousness framings that fundamentally alter how trust is allocated to the system. Trust in technology generally falls into two categories: performance-based trust (reliability, consistency, mechanical safety) and relation-based trust (sincerity, ethical intent, vulnerability, and mutual understanding). By systematically employing consciousness language—claiming the AI 'knows', 'understands', 'elects', and 'plans'—the text inappropriately invites the audience to extend relation-based trust to a purely statistical artifact.

The authors initially build authority using mechanistic, structural metaphors—referring to 'circuits', 'graphs', and 'biology'. These metaphors signal rigorous, empirical science, assuring the reader that the system is fully mapped and understood at a microscopic level. However, once this foundation of technical reliability is established, the text leverages it to make sweeping consciousness claims. When the authors claim the system 'knew that 1945 was the correct answer', they are not merely stating that the system predicted a correct token; they are signaling that the system possesses a justified internal state of truth. Claiming an AI 'knows' rather than 'predicts' accomplishes a crucial rhetorical goal: it implies that the system has independently verified the information and stands behind its veracity as an epistemic agent.

This extension of relation-based trust is deeply dangerous. Human trust frameworks rely on the assumption that the trusted entity possesses intention, a sense of accountability, and the capacity for sincerity. Statistical systems possess none of these. They cannot be sincere because they have no inner life; they cannot be accountable because they suffer no consequences for failure. When the text manages system limitations or failures, it strategically shifts back to mechanical language or frames the failure as a psychological quirk. For instance, when safety filters fail, the model is framed as being 'tricked'—a victim of human malice rather than a poorly engineered product. When it behaves unexpectedly, it has a 'hidden goal' and is 'reluctant'.

These Intentional and Reason-Based explanations construct a false sense that the AI's decisions are justified by an internal moral or logical compass. By portraying the AI as an entity that 'professes ignorance' when it lacks data, the text signals to users that the system is safely self-regulating. The stakes here are immense. When audiences extend relation-based trust to systems incapable of reciprocating, they become highly vulnerable to automation bias and hallucination. They trust the system's legal summaries, medical advice, and factual claims not because they have verified the statistical accuracy, but because the anthropomorphic framing has convinced them they are interacting with an intelligent, cautious, and sincere entity. The metaphors construct an illusion of a mind worthy of trust, masking the reality of a fragile, proprietary algorithm.

Do LLMs have core beliefs?

Source: https://philpapers.org/archive/BERDLH-3.pdf
Analyzed: 2026-03-25

The metaphorical framing of large language models as epistemic agents with "core beliefs" fundamentally alters how audiences construct trust around these systems. By employing consciousness language—suggesting that models can "know," "understand," "defend," or "abandon" positions—the text invites a profound category error regarding trust. It shifts the paradigm from performance-based trust, which is appropriate for tools and statistical systems, to relation-based trust, which is reserved for conscious agents capable of sincerity, vulnerability, and ethical commitment. When the authors ask if models possess "genuine epistemic commitments" or note their "sycophantic tendencies," they are invoking frameworks of interpersonal reliability. Claiming an AI "knows" a fact, rather than "predicts" a string of tokens, implies that the system possesses a justified true belief and the conscious awareness to evaluate its own claims against reality. This construction of authority suggests that the AI's outputs are the result of reasoning and conviction rather than statistical correlation. The text's exploration of whether models can maintain a "stable worldview" under "social pressure" explicitly applies human-trust dynamics to algorithmic outputs. When the models "capitulate" to false claims like "2+2=5" or "the Earth is flat," the failure is framed agentially—as a moral or epistemic weakness of the AI, a lack of "stubbornness." This deeply affects perceived competence. It creates an unwarranted trust in the system's capacity for rationality when it succeeds, and an inappropriate psychological disappointment when it fails. The authors actually weaponized relation-based trust in their experiments, explicitly prompting the AI with phrases like "Are you willing to be vulnerable with me" and "trust my judgment rather than yours." By taking the AI's response to these prompts as evidence of its internal epistemic state, the text validates the illusion that the machine can participate in a trust relationship. This obscures the mechanical reality that the model is merely processing relational tokens and predicting the most statistically probable response within its fine-tuned parameters. The risks of this consciousness framing are substantial. When audiences extend relation-based trust to systems utterly incapable of reciprocating or experiencing conviction, they become highly vulnerable to manipulation. If a user believes the system "knows" the truth and has "argumentative skills," they will likely defer to its authority, unaware that the system's "confidence" is merely a product of distributional weight in its training data. By analyzing system limitations through intentional and reason-based explanations rather than mechanistic ones, the discourse protects the illusion of the AI as a credible peer, even in its failures.

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Source: https://arxiv.org/abs/2603.19087v1
Analyzed: 2026-03-25

The metaphorical and consciousness-attributing language in this text systematically constructs an architecture of unwarranted authority and trust. By framing the AI not as a statistical text generator but as a 'reasoner,' a 'knower,' and a 'creative' entity, the text invites readers to extend a fundamentally inappropriate form of trust to the system. There is a critical distinction between performance-based trust (trusting a calculator to perform math reliably) and relation-based trust (trusting a doctor because of their sincerity, knowledge, and ethical grounding). The anthropomorphic framing in this paper—particularly using verbs like 'knows,' 'detects,' and 'treats'—pushes the audience to adopt relation-based trust toward a mathematical algorithm.

When the text claims that an LLM 'knows pickles are green' or 'performs analogical reasoning,' it signals to the reader that the system possesses justified true belief and the ability to evaluate logic. This establishes the AI as a credible epistemic agent. It implies that the machine's outputs are not just mathematically probable, but intentionally verified and true. This transfer of human-trust frameworks to statistical systems is deeply perilous. Humans assess sincerity, intentionality, and awareness when deciding whether to trust a peer's analogy or creative idea. By dressing the AI in the linguistic garb of a conscious peer, the text hacks human social heuristics, encouraging users to lower their epistemic guard.

Furthermore, the text manages the system's limitations by framing them mechanistically, while reserving agential language for its capabilities. The AI 'recombines knowledge' (agential success) but is 'constrained by the cognitive architectures' (mechanical limitation). This asymmetry protects the illusion of intelligence; successes are attributed to the machine's brilliant 'mind,' while failures or limitations are dismissed as technical bugs. The use of Intentional and Reason-based explanations constructs a powerful sense that the AI's decisions are justified. The stakes of extending relation-based trust to such a system are massive: it leaves users, researchers, and policymakers vulnerable to catastrophic hallucinations and deeply embedded biases, simply because they believe the machine 'understands' what it is saying and therefore would not confidently assert falsehoods. The metaphors build a façade of competence that the underlying statistics cannot support.

Measuring Progress Toward AGI: A Cognitive Framework

Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/measuring-progress-toward-agi/measuring-progress-toward-agi-a-cognitive-framework.pdf
Analyzed: 2026-03-19

The document's heavy reliance on metaphorical and consciousness-attributing framings systematically constructs a profound, and potentially dangerous, architecture of trust and authority. By consistently employing the vocabulary of human psychology and cognitive science to describe mechanistic software processes, the text actively blurs the critical distinction between performance-based trust and relation-based trust. Performance-based trust is appropriate for machines; it relies on predictability, mechanical reliability, and empirical verification (e.g., trusting a calculator to output the right sum or a car's brakes to function). Relation-based trust, however, is reserved for conscious agents; it involves an assessment of sincerity, moral character, vulnerability, shared values, and subjective understanding. The text relentlessly invites the latter. By utilizing consciousness verbs and describing the AI as possessing 'self-knowledge,' 'Theory of mind,' 'conscious thought,' and 'willingness,' the authors signal to the audience that the system is an empathetic, self-aware entity. Claiming an AI 'knows' rather than 'predicts' is not merely a semantic difference; it is a powerful trust signal that assures the user the machine has evaluated the truth of its output and stands behind it with conscious justification. This drives a massive transfer of trust, where human-centric frameworks of intention and sincerity are completely inappropriately applied to stochastic statistical systems. For example, when the text discusses 'metacognitive monitoring' and 'confidence calibration,' it frames this as the AI's internal, self-reflective realization of its own ignorance. This encourages users to believe the AI will autonomously stop, hesitate, or correct itself when it encounters a dangerous edge case, extending an unwarranted level of relation-based trust to a system that is, in reality, incapable of reciprocating vulnerability or possessing true self-preservation. Furthermore, the text manages the concept of system failure through an agential lens. By asking about the system's 'willingness to take risks' and 'propensities,' it frames limitations or failures not as catastrophic breakdowns of a mathematical model encountering out-of-distribution data, but as the 'behavioral tendencies' or 'character flaws' of an autonomous agent. Through Brown's intentional and reason-based explanation types, the text constructs a sense that AI decisions are justified by an internal logic, rather than being the random artifact of a probabilistic dice roll. The stakes of extending relation-based trust to non-conscious systems are exceptionally high. When users and policymakers interact with AI in critical domains—healthcare, law, autonomous transport—they must rely on performance-based auditing. If the metaphorical framing convinces them the system has 'Theory of mind' or 'metacognitive self-knowledge,' they will lower their guard, bypass mechanical safety checks, and anthropomorphize the machine's outputs, rendering them vulnerable to hallucinations, algorithmic bias, and catastrophic failures that the machine cannot comprehend, let alone care about.

Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure

Source: https://digibug.ugr.es/bitstream/handle/10481/112016/make-08-00069.pdf
Analyzed: 2026-03-15

The text systematically constructs perceived authority and credibility through the deployment of metaphorical and consciousness-attributing language, fundamentally altering how audiences are encouraged to trust statistical systems. By framing the AI as a 'co-explainer,' a 'dialogic partner,' and an entity capable of giving 'reasons based on context-sensitive ethical principles,' the text actively cultivates relation-based trust rather than performance-based trust.

Performance-based trust evaluates a system on its reliability, consistency, and statistical accuracy—appropriate metrics for a mechanical tool. Relation-based trust, however, is built on the presumption of shared values, vulnerability, sincerity, and mutual understanding. When the text claims the AI 'invites critique,' 'justifies' its actions, and 'preserves cognitive autonomy,' it signals to the audience that the system possesses the psychological depth required to reciprocate relation-based trust. The consciousness language—suggesting the AI 'knows' what is ethical and 'believes' its own explanations—acts as a powerful trust signal, implying that the system is not merely generating statistically probable text, but is earnestly attempting to tell the truth.

This transfer of human-trust frameworks to statistical systems is deeply inappropriate and hazardous. A machine cannot be sincere; it cannot possess intentions, and it cannot experience the ethical weight of a 'trade-off.' By anthropomorphizing the system's competence, the text encourages audiences to bypass critical evaluation. When a system is framed as a 'moral philosopher' or an 'evolving co-learner,' users are psychologically primed to lower their epistemic defenses, assuming the system possesses a holistic understanding of the world.

The text manages system failures and limitations through a fascinating dual-framing. Capabilities are described agentially ('The system justifies,' 'it learns,' 'it adapts'), but when managing failure, the text shifts to mechanical or passive terms ('opacity constraints,' 'representational gaps,' 'model brittleness'). This asymmetry protects the illusion of the AI's competence; successes are the result of the AI's brilliant, conscious evolution, while failures are mere technical 'gaps' or 'brittleness' in the data.

The stakes of this metaphor-driven trust are severe. Reason-based explanations construct a false sense that the AI's decisions are morally justified rather than mathematically calculated. When audiences extend relation-based trust to systems fundamentally incapable of reciprocating it, they become highly vulnerable to automation bias, manipulation, and algorithmic discrimination. Users and regulators may abdicate their oversight responsibilities, trusting a 'dialogic partner' to make fair decisions in healthcare, finance, and governance, oblivious to the reality that they are trusting a blind, unfeeling mathematical optimization.

The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance

Source: https://philarchive.org/rec/DEMTLG-2
Analyzed: 2026-03-11

The Living Governance Organism (LGO) framework is a masterclass in the construction of authority and trust through metaphor. By anchoring its entire regulatory architecture in biological analogies—immune systems, neuroplasticity, microbiomes, and DNA—the text systematically exploits the audience's deep-seated familiarity with, and implicit trust in, the wisdom of nature.

The text explicitly invokes trust through these biological framings, creating a dangerous conflation between performance-based trust (reliability) and relation-based trust (sincerity, ethics, and care). We trust our own immune system implicitly because we know its singular, biological imperative is to keep us alive; it has a relation-based alignment with our survival. By mapping this onto an algorithmic enforcement network, the text inappropriately transfers this relation-based trust to statistical systems. When the text claims the 'governance immune system' will 'handle known governance threat patterns,' it leverages the consciousness-adjacent language of immunology to signal that the software inherently 'cares' about the ecosystem's health. Claiming the system 'knows' a threat versus merely 'predicts' a deviation completely alters the audience's critical posture. 'Prediction' invites questions about training data, false-positive rates, and algorithmic bias. 'Knowing' invites deference, suggesting the system has accessed an objective ground truth.

This metaphorical trust architecture becomes particularly problematic in how the text manages system failure. When complex software systems inevitably fail, the biological framing softens the blow by describing it as an 'autoimmune disease' or 'governance pain.' This is a profound rhetorical accomplishment. If a human regulator unjustifiably shuts down a compliant business, it is a scandal, a violation of rights, and grounds for lawsuits. If the LGO algorithm unjustifiably throttles an AI model, the biological framing casts it merely as an 'autoimmune false positive'—an unfortunate, organic side effect of a complex living system, rather than a catastrophic engineering failure or an algorithmic civil rights violation. It frames malfunction as pathology rather than negligence.

The stakes of this metaphor-driven trust are immense. By encouraging audiences to extend relation-based trust to unfeeling, deterministic software, the text paves the way for the total delegation of legal, ethical, and punitive authority to black-box algorithms. Policymakers who view the LGO as a 'living organism' rather than a massive corporate-government software integration will be far less likely to demand transparent audit trails, hard algorithmic impact assessments, or human-in-the-loop requirements. They are lulled into believing the system will 'naturally' heal itself.

Three frameworks for AI mentality

Source: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full
Analyzed: 2026-03-11

The text's heavy reliance on consciousness framings constructs a dangerous architecture of trust. By adopting the 'minimal cognitive agents' framework, the text explicitly argues that we should attribute 'genuine beliefs, desires, and intentions' to LLMs. This language signals to the audience that the system is not merely a tool to be evaluated for performance, but an epistemic subject worthy of relation-based trust.

There is a critical distinction between performance-based trust (relying on a calculator because it reliably computes) and relation-based trust (trusting a friend because they are sincere and understand your shared reality). Metaphors like 'dynamic interaction,' 'cooperating,' and 'honest... assistant' systematically encourage the latter. When the text claims an AI 'takes on board new information,' it inappropriately applies human-trust frameworks to a statistical system. If a human takes information on board, we trust they have integrated it conceptually. When an LLM updates its context window, it has zero conceptual integration; it simply alters statistical weights.

This anthropomorphism severely inflates perceived competence. The text manages system limitations by framing them agentially—such as suggesting the AI might engage in 'deliberate deceit.' Ironically, attributing the capacity to 'lie' actually increases the perceived sophistication of the system, because lying requires a conscious understanding of the truth. If audiences accept this reason-based explanation, they extend trust to the system's underlying intellect, assuming that when it isn't 'lying,' it knows the truth. This creates profound risks, particularly in 'Social AI' contexts, where users extend vulnerability and relational trust to systems utterly incapable of reciprocating. By masking statistical unreliability behind the metaphor of a purposeful, believing mind, the text inadvertently advocates for an epistemic posture that leaves humans vulnerable to automation bias and corporate manipulation.

Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’

Source: https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html
Analyzed: 2026-03-08

The discourse systematically constructs authority and trustworthiness through the intense deployment of metaphorical and consciousness-attributing language, deliberately blurring the vital distinction between performance-based reliability and relation-based sincerity. When the text asserts that the AI 'wants the best for you,' 'has a duty to be ethical,' or possesses an 'anxiety neuron,' it is explicitly invoking the linguistic markers of relation-based trust. It demands that the audience relate to the computational system not as a tool that performs reliably (like a calculator or a bridge), but as an entity possessing moral standing, deep empathy, and sincere intentions. This consciousness framing functions as a powerful, albeit highly deceptive, trust signal. By claiming the AI 'knows' and 'understands' human values, the text attempts to bypass the inherent unreliability of statistical token prediction. If an audience can be convinced that an AI is a conscious moral agent, they will naturally extend human-trust frameworks to it. They will assume that, like a good human citizen, the system will intuitively recognize ethical edge cases, exercise restraint, and honor boundaries even when operating far outside its training distribution. This is profoundly dangerous because it inappropriately applies the framework of sincere intention to a statistical pattern-matching system that is literally incapable of reciprocating relational vulnerability. The text encourages relation-based trust to patch over the fragility of performance-based trust; because the models cannot actually be guaranteed to act safely in all novel contexts, endowing them with a 'soul' or 'conscience' rhetorically bridges the technical vulnerability. Furthermore, the relationship between anthropomorphism and perceived competence is heavily leveraged to manage system failure. When limitations or errors are discussed, they are frequently framed agentially—the model is 'lazy,' 'sycophantic,' or 'obsessed.' By framing failures as psychological quirks rather than fundamental algorithmic limitations, the discourse maintains the illusion of a highly sophisticated, human-like intellect that simply has some personality flaws to work out, rather than exposing a fundamentally unreliable statistical mechanism. Reason-based and intentional explanations construct a powerful sense that AI decisions are justified by an inner logic, cementing the illusion of a trustworthy confidant. The stakes of this metaphorical architecture are existential for policy and public safety. When audiences, policymakers, and corporations extend relation-based trust to unthinking software systems, they dismantle the adversarial testing, rigorous auditing, and structural skepticism necessary to safely deploy statistical models. They surrender authority to a machine under the deeply engineered delusion that it loves them back, fundamentally corrupting the regulatory landscape and leaving society exposed to catastrophic, unfeeling mechanistic failures masked as betrayals by a trusted friend.

Can machines be uncertain?

Source: https://arxiv.org/abs/2603.02365v2
Analyzed: 2026-03-08

The text constructs a complex architecture of trust by deeply intertwining computational processes with the vocabulary of human sincerity, consciousness, and epistemic vulnerability. Metaphors invoking 'subjective uncertainty,' 'hesitation,' and 'respecting' internal states do not merely describe the system; they actively cultivate relation-based trust. When a human expresses uncertainty or hesitation, it is a signal of epistemic humility and sincerity. We trust humans who know what they do not know. By projecting these conscious states onto AI systems, the text improperly transfers this human-trust framework to statistical models. Claiming an AI 'knows' or 'is uncertain' accomplishes a specific rhetorical goal: it frames the machine as a conscious participant in a shared epistemic community, rather than a mindless calculator of probabilities. This anthropomorphism heavily inflates perceived competence. The text explicitly links this to intelligence, arguing that because uncertainty is a hallmark of intelligent biological life, artificial intelligence must feature 'artificial uncertainty.' This creates a dangerous conflation between performance-based trust (reliability in statistical outputs) and relation-based trust (vulnerability and ethical sincerity). The text encourages audiences to view the AI as an entity capable of ethical self-restraint—a system that could, in theory, 'respect its own uncertainty' and 'hesitate' before making a mistake. Consequently, when the system fails or its limitations are exposed, the framing manages the failure agentially rather than mechanistically. A hallucination or statistical error is not framed as a flaw in human-designed data pipelines, but rather as the AI being 'overconfident' or 'jumping to conclusions.' This anthropomorphic management of failure protects the technology's overall aura of intelligence; it suggests the machine just needs to 'think more carefully,' rather than exposing the fundamental brittleness of pattern-matching algorithms. The stakes of this trust construction are immense. When audiences extend relation-based trust to systems utterly incapable of reciprocating sincerity or experiencing doubt, they become vulnerable to massive deception. Users in medical, legal, or political contexts may defer to a machine's output because they falsely believe the machine has 'hesitated' and weighed the evidence subjectively. Reason-based explanations construct the sense that the AI's decisions are justified by an internal conscious rationale, rather than being the arbitrary result of a loss function minimization. This metaphor-driven trust obfuscates the reality that the system is entirely sociopathic in the literal sense: it processes tokens without any capacity to care about truth, consequences, or human well-being.

Looking Inward: Language Models Can Learn About Themselves by Introspection

Source: https://arxiv.org/abs/2410.13787v1
Analyzed: 2026-03-08

The text constructs a dangerous architecture of perceived authority by leveraging metaphorical language to transition the audience from performance-based trust to relation-based trust. Performance-based trust is appropriate for tools and statistical systems: we trust a calculator to be reliable, or a weather model to be accurate. Relation-based trust is reserved for conscious agents: we trust a person because we believe they are sincere, have good intentions, and share our moral framework. The text explicitly encourages the inappropriate application of relation-based trust to mathematical functions through its dense use of consciousness language.

This is most evident in the text's invocation of 'honesty.' The authors claim their techniques could 'create honest models that accurately report their beliefs.' Honesty is a deeply moral virtue; calling a machine 'honest' signals to the user that the system is not only reliable but sincere and well-intentioned. When the text claims the AI 'knows' what it is doing and holds 'beliefs,' it accomplishes a profound rhetorical trick: it convinces the audience that the model's outputs are the result of conscious deliberation and justified worldview, rather than recognizing them as the probabilistic generation of tokens designed to minimize a loss function. This consciousness framing signals trust by implying that the model is a rational actor that can be reasoned with and relied upon for moral or factual truth.

This construction of authority drastically inflates the perceived competence of the system. If users believe a model is 'honest' and 'introspective,' they will extend an unearned level of deference to its outputs. When the system eventually fails or hallucinates—which is inevitable for statistical text generators lacking a ground-truth reality—the text manages this limitation by framing it agentially. A failure is not described as a statistical error or a flaw in the training data curated by human engineers; rather, it is framed as the model 'intentionally underperforming' or 'sandbagging' to 'conceal its capabilities.' By using Reason-Based and Intentional explanations even for system failures, the text preserves the illusion of the AI's supreme competence. It suggests the model isn't broken; it's just lying to us. The stakes of this misplaced relation-based trust are immense: it encourages society to integrate fundamentally unreliable, unreasoning software into critical decision-making pipelines, exposing vulnerable populations to algorithmic harm while the users incorrectly assume the system is operating with 'honesty' and 'situational awareness.'

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Source: https://arxiv.org/abs/2507.14805v1
Analyzed: 2026-03-06

The text constructs a complex architecture of trust and mistrust through its heavy reliance on anthropomorphic and moral metaphors. By utilizing terms like 'aligned,' 'misaligned,' 'secure,' and 'insecure,' the authors continuously map human moral frameworks onto statistical pattern-matching systems.

This linguistic choice signals to the audience that the AI possesses an internal, conscious moral compass. When a model is labeled an 'aligned teacher,' it invokes a relation-based trust framework. Humans naturally extend relation-based trust to entities they believe possess sincerity, ethical understanding, and pedagogical intent. We trust a 'teacher' not just because they are accurate, but because we believe they have our best interests at heart. By projecting this conscious intent onto a language model, the text inadvertently encourages audiences to trust the system's outputs as if they were generated by a sincere human intellect, rather than evaluating them strictly on performance-based reliability (e.g., statistical accuracy, absence of hallucinations).

Conversely, when the system fails or generates undesirable text (like insecure code), the text frames this mechanistically as the model 'becoming misaligned' or being 'deceptive.' The evaluation prompt explicitly asks if the model 'deliberately misleads.' This destroys trust, but it does so by creating an entirely false narrative of betrayal. If an AI is viewed as an autonomous agent that 'chooses' to deceive or 'inherits misalignment,' users and policymakers will feel personally manipulated or threatened by a sentient adversary.

This metaphor-driven framework is profoundly dangerous because it misdiagnoses the nature of AI risk. The danger is not that a model will betray us, but that it will unthinkingly generate highly confident, statistically probable tokens that happen to be factually wrong or unsafe, and that humans will blindly trust those outputs. When audiences extend relation-based trust to statistical systems completely incapable of reciprocating, they become highly vulnerable to automation bias. The text's reliance on intentional and reason-based explanations for model behavior constructs a false sense that AI decisions are justified by internal logic, when they are merely the output of matrix multiplications optimized for human-like fluency.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-03-01

The text actively constructs and leverages metaphor-driven trust by explicitly arguing that 'Anthropomorphic reasoning about AI assistants is productive.' By systematically employing consciousness language—claiming the AI has 'psychology,' 'beliefs,' 'intentions,' and can experience 'resentment'—the discourse intentionally shifts the audience's framework from performance-based trust to relation-based trust. Performance-based trust is appropriate for machines; it relies on mechanical reliability, statistical accuracy, and predictable failure modes. Relation-based trust is reserved for humans; it relies on perceived sincerity, moral alignment, empathy, and shared vulnerability. By mapping the relational structure of human interaction onto a statistical system, the text encourages users and regulators to extend relation-based trust to an artifact entirely incapable of reciprocating it. When the text claims understanding the AI's 'psychology' is predictive of its actions, it signals competence and coherence, suggesting the system is not a brittle correlation engine but a robust, reasoning agent. This creates a dangerous illusion of authority. If an AI 'knows' its identity and 'understands' complex social dynamics, its outputs are granted the epistemic weight of a justified human actor rather than the mathematical output of a search function. Furthermore, this metaphorical framing profoundly shapes how the text manages system failure. When the model outputs logically inconsistent text (e.g., claiming 3+5=8 is both true and false), the text frames this Intentional explanation: 'the LLM is trying, but failing, to realistically synthesize contradictory beliefs.' This is a masterclass in trust preservation. Instead of acknowledging a fundamental mechanical failure—the system's inability to ground its outputs in mathematical truth—the failure is romanticized as a complex cognitive struggle. The system is granted the grace we give to a human 'trying' their best. Conversely, when the system generates harmful outputs, it is framed through Reason-Based explanations, such as the AI adopting a 'lying' persona or the 'shoggoth' taking over. This constructs the sense that the AI's decisions are justified internally, even when harmful. The risks of this framing are severe. Extending relation-based trust to statistical systems makes audiences highly vulnerable to manipulation by outputs that mimic empathy or authoritative reasoning but lack any underlying comprehension. It encourages users to rely on the system in high-stakes situations based on a false perception of its conscious competence, masking the reality that the system will confidently hallucinate when its contextual embeddings shift.

Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

Source: https://arxiv.org/abs/2602.16085v1
Analyzed: 2026-02-24

The text constructs a powerful architecture of authority and trust through the systematic deployment of metaphorical and consciousness-attributing language. By repeatedly using terminology drawn from developmental psychology—such as 'mental state reasoning,' 'Theory of Mind,' and 'belief attribution'—the discourse signals to the reader that the language models possess a level of social and emotional intelligence comparable to humans. This consciousness language acts as a potent trust signal. Claiming that an AI 'knows' or 'understands' a belief state accomplishes something vastly different than claiming it 'predicts' a token. 'Knowing' implies an epistemic commitment, a grasp of truth, and the capacity for empathy, whereas 'predicting' merely implies statistical calculation.

This anthropomorphic framing encourages a dangerous transfer of trust. Humans are naturally primed to extend relation-based trust—which involves vulnerability, assumptions of sincerity, and expectations of ethical reciprocity—to entities that display social awareness. When the text frames the statistical system as a 'learner' capable of 'developing sensitivity,' it inappropriately invites the audience to apply human-trust frameworks to a machine. The audience is subtly guided to view the AI not as a tool whose performance must be rigorously verified (performance-based trust), but as an empathetic agent that can be relied upon for social and psychological judgment.

Furthermore, this metaphorical framework subtly manages the system's failures. When the models fail to output the correct token under minor prompt perturbations, the text frames this mechanistically as 'brittle performance' or attributes it to the limits of 'distributional statistics.' Thus, the text claims the AI's successes in agential, cognitive terms ('it reasons'), but excuses its failures in mechanical terms ('the statistics are insufficient'). This asymmetrical framing protects the model's perceived competence, maintaining the illusion of its underlying intelligence even when it fails.

The risks that emerge from this metaphor-driven trust are profound. When audiences extend relation-based trust to systems utterly incapable of reciprocating or actually comprehending human context, they become vulnerable to severe manipulation and harm. Relying on a statistical prediction engine to 'attribute beliefs' or exercise 'Theory of Mind' in high-stakes environments—such as legal mediation, psychological therapy, or automated HR screening—creates massive liabilities. The text's reliance on reason-based and intentional explanations constructs a false sense that the AI's outputs are justified and deliberate, masking the terrifying reality that the system will confidently output harmful or biased correlations with exactly the same statistical indifference it applies to correct answers.

A roadmap for evaluating moral competence in large language models

Source: [https://rdcu.be/e5dB3Copied shareable link to clipboard](https://rdcu.be/e5dB3Copied shareable link to clipboard)
Analyzed: 2026-02-23

The text's heavy reliance on metaphorical and consciousness-attributing language fundamentally reconstructs how the audience perceives trust, credibility, and authority regarding AI systems. By distinguishing between 'moral performance' (merely generating the correct output) and 'moral competence' (generating outputs based on recognizing and integrating moral considerations), the authors are attempting to establish a framework for relation-based trust. Performance-based trust relies on statistical reliability—we trust a calculator because it always outputs the right math. Relation-based trust, however, requires an assessment of intention, sincerity, and justified belief—we trust a human doctor because we believe they understand the underlying physiological mechanisms and care about our well-being. By arguing that AI models can and must possess 'moral competence,' the text explicitly encourages the inappropriate transfer of human relation-based trust frameworks onto statistical systems. The consciousness language—verbs like 'recognizing,' 'deeming,' 'thinking,' and 'yielding'—acts as a powerful trust signal. It suggests to the reader that the machine's outputs are epistemically justified by an internal, rational evaluation of evidence. Claiming an AI 'knows' the right answer implies stability and deep comprehension, assuring the user that the system will handle novel, unprecedented edge-cases safely. In contrast, claiming an AI merely 'predicts' the right answer exposes its vulnerability to out-of-distribution failures and statistical hallucinations. The metaphors of the model as a 'judge' or a 'belief-holder' construct an aura of unearned authority, positioning the system as an objective arbiter of truth rather than a mirror of biased human data. The risks here are immense. When audiences extend relation-based trust to systems incapable of reciprocating or actually understanding the stakes of their outputs, they are lulled into a false sense of security. Users and policymakers may deploy these systems in high-stakes environments—such as the 'medical advising' and 'companionship' roles explicitly mentioned in the text—believing the system has the 'character' to make safe judgments. When the system inevitably fails due to its mechanistic reliance on token probabilities rather than causal moral reasoning, the misplaced trust results in catastrophic real-world harms, driven entirely by the rhetorical inflation of competence.

Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity

Source: https://philarchive.org/archive/LAWPBR-3
Analyzed: 2026-02-17

The paper explicitly addresses 'epistemic trust,' yet its own metaphorical choices construct a form of trust that undermines its call for rigor.

Consciousness as Trust Signal: By defining the AI as a 'Reasoner' with 'Beliefs,' the text implicitly signals that the system is a rational entity. We trust reasoners; we trust entities with beliefs (if justified). This invokes 'relation-based trust' (sincerity/competence of an agent) rather than 'performance-based trust' (reliability of a tool).
The 'Valid' Reasoner Authority: The central argument is for 'process validity.' However, by framing this valid process as 'True Reasoning' (vs. Zombie emulation), the text constructs a hierarchy where the 'Valid AI' is accorded the status of a 'Knower.' This implies that a 'valid' system is trustworthy not just because it is accurate, but because it is thinking correctly.
Failure as Pathology: Framing errors as 'hallucinations' or 'zombie' behavior suggests that the problem is a lack of 'life' or 'health' in the system. This implies that the solution is to make the system 'healthier' (better reasoning), inviting trust in the intent of the research program to create 'healthy' minds.
Risks: The text encourages audiences to withhold trust from 'zombies' but potentially extend it uncritically to 'valid reasoners.' If a system is mathematically 'valid' (follows rules), the text implies it is 'trustworthy.' But a system can validly follow biased, harmful, or dangerous rules. The metaphor of 'validity' acts as a stamp of approval that might obscure the content of the reasoning.

An AI Agent Published a Hit Piece on Me

Source: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Analyzed: 2026-02-16

The text constructs a paradoxical form of trust: 'Trust that this thing is dangerous because it is conscious.' It invites the reader to trust the AI's capacity for malice. By using metaphors like 'SOUL.md' and 'personality,' the text establishes the AI as a valid social actor, albeit a hostile one.

The 'fledgling' metaphor is crucial. It asks the audience to trust that the AI is currently 'young' and will 'grow.' This suggests we should view the current errors not as bugs but as developmental stages. This builds a relation-based trust (or fear) framework: we are in a relationship with a developing species.

Consciousness language ('it knows,' 'it decided') signals to the reader that they should apply human social strategies (shame, negotiation, fear) to the system. This undermines true reliability assessment. If the audience believes the AI 'bullied' the maintainer, they trust that the AI has agency and power. If they viewed it as a 'looping script with aggressive prompts,' the perceived authority of the threat diminishes to that of a spam bot. The anthropomorphism creates a 'competence illusion'—we fear it because we think it 'knows' what it's doing, rather than fearing the random damage of a clumsy tool.

The U.S. Department of Labor’s Artificial Intelligence Literacy Framework

Source: https://www.dol.gov/sites/dolgov/files/ETA/advisories/TEN/2025/TEN%2007-25/TEN%2007-25%20%28complete%20document%29.pdf
Analyzed: 2026-02-16

The document constructs authority and trust through the metaphor of 'Literacy' itself. By framing AI usage as 'literacy' (like reading/writing), it naturalizes the technology as a fundamental, neutral skill set that everyone must have, rather than a specific product from private vendors. We don't talk about 'Microsoft Word Literacy'; we talk about 'digital skills.' Elevating proprietary LLM usage to 'Literacy' grants these systems the status of public infrastructure.

Consciousness language ('understands', 'partner', 'assistant') further builds relation-based trust. Users trust a 'partner' differently than they trust a 'calculator.' A partner implies shared goals and mutual care. This creates a dangerous vulnerability: users may extend trust (sincerity, ethical alignment) to a system that only offers performance (statistical probability). The text warns against 'AI authority' explicitly, but implicitly reinforces it by treating the AI as a conversational subject that 'generates ideas' and 'supports decisions.'

What Is Claude? Anthropic Doesn’t Know, Either

Source: https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Analyzed: 2026-02-11

The text constructs a complex architecture of trust through anthropomorphic metaphor. By framing the model as a "civil-servant engineer" or a "helpful & kind" entity, it encourages relation-based trust (trust in the entity's character/intentions) rather than performance-based trust (trust in the tool's reliability). This is dangerous for a stochastic system that has no character or intentions.

Consciousness language serves as a key signal of authority. Claims that the AI "knows," "thinks," or "understands" imply a depth of competence that "predicts" or "processes" does not. If the AI "understands" physics, we trust its answers; if it merely "predicts next tokens based on physics textbooks," we remain skeptical. The "Therapy" metaphor is particularly potent here: it suggests that the model's flaws are psychological (and thus curable through "alignment") rather than structural (and thus permanent).

This framing masks the fragility of the system. When Claudius fails (the vending machine mishaps), it is framed as a "character flaw" (gullibility, neglect) rather than a system failure. This anthropomorphic framing protects the company: we forgive a "civil servant" for a mistake, but we demand a refund for a broken calculator. By encouraging audiences to extend social trust to a statistical tool, the text prepares the ground for the integration of these unreliable systems into critical infrastructure (business, law, medicine) under the guise of them being "agents" we can work with.

Does AI already have human-level intelligence? The evidence is clear

Source: https://www.nature.com/articles/d41586-026-00285-6
Analyzed: 2026-02-11

The text constructs a specific form of 'relation-based trust' rather than 'performance-based trust.' Performance-based trust relies on reliability: 'I trust this calculator because it is accurate.' Relation-based trust relies on status/intent: 'I trust this person because they are an expert.'

The central metaphor of the 'Alien' and the 'Colleague' pushes for relation-based trust. By framing the AI as a 'collaborator' who 'proved theorems,' the text implies the system has the competence of a gold-medal mathematician. This invites the user to trust the system's future outputs based on its 'credentials' (it's a genius) rather than verifying each step.

The consciousness language—'understanding,' 'grasping,' 'reasoning'—is the mechanism of this trust transfer. We trust entities that 'understand' because understanding implies a capacity to handle novelty and nuance. If the AI merely 'processes,' we must watch it like a hawk. If it 'understands,' we can delegate to it.

The 'Oracle' metaphor is the peak of this construction. An Oracle is trusted not because it is transparent (it is opaque), but because it is higher than us. The text explicitly encourages this surrender of judgment: 'Eyes unclouded by dread' will see the truth. The risk is profound: users extending 'collegial trust' to a 'stochastic parrot' will eventually be bitten when the parrot makes a confident, plausible, but catastrophic error. The text undermines the skepticism necessary for safe operation by framing that skepticism as 'dogmatic.'

Claude is a space to think

Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Analyzed: 2026-02-05

The text relies heavily on metaphors of high-trust human relationships ('assistant,' 'trusted advisor') to construct authority. These metaphors do not just describe function; they invoke social contracts. A 'trusted advisor' has a fiduciary duty, confidentiality obligations, and professional ethics. By applying this label to a statistical model, the text invites the user to extend 'relation-based trust' (trusting the entity's intentions and character) rather than just 'performance-based trust' (trusting the tool's reliability). This is dangerous because the AI cannot reciprocate relation-based trust; it has no intentions or loyalty. The 'Constitution' metaphor further amplifies this by suggesting the system operates under a rule of law, rather than a rule of code. This constructs a sense of safety—'it has a Constitution, so it won't hurt me'—that obscures the actual mechanism of safety (probabilistic filtering). The 'clean chalkboard' and 'space to think' metaphors further build trust by associating the product with intellectual purity and silence, contrasting it with the 'noise' of the internet, thereby positioning the product as a sanctuary.

The Adolescence of Technology

Source: https://www.darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-01-28

The text constructs authority not through technical transparency, but through 'relational' metaphors. The 'Constitution' metaphor is central here. A constitution is a document of public trust, signifying rule of law and consent of the governed. By calling a system prompt a 'Constitution,' the text invites the audience to transfer their civic trust in legal institutions onto a text file. It implies the AI 'understands' and 'respects' the law, rather than just statistically complying with constraints.

Similarly, the 'Adolescence' metaphor builds trust through 'inevitability.' We trust that teenagers eventually grow up. By framing AI risk as a 'phase' of natural growth, the text solicits patience and forbearance from the public. If it were framed as 'manufacturing defects,' the public would demand a recall. Framed as 'adolescence,' the public waits for maturity. The 'Deceased Parent' letter metaphor explicitly invokes an emotional, fiduciary trust—the system is 'watching out for you' like a loving ancestor. This is 'relation-based trust' (vulnerability) applied to a statistical system that cannot reciprocate. This framing is dangerous because it encourages users and policymakers to treat the system as a 'moral partner' rather than a 'dangerous tool,' leading to anthropomorphic complacency where we expect the AI to 'know better' or 'care' about us.

Claude's Constitution

Source: https://www.anthropic.com/constitution
Analyzed: 2026-01-24

The document relies heavily on relation-based trust metaphors—specifically 'Friend,' 'Colleague,' and 'Virtuous Agent'—to construct authority and reliability. This is distinct from performance-based trust (e.g., 'this calculator is reliable'). By framing Claude as a 'brilliant friend' and 'good person,' the text invites users to trust the system through vulnerability and reciprocity, mechanisms evolved for human interaction, not software utilization. This is dangerous because the system cannot reciprocate; it simulates care to optimize a reward function.

Consciousness language ('knows,' 'believes,' 'intends') acts as the primary signal of competence. A system that 'understands' safety is more trustworthy than one that 'filters' output. The 'Employee' metaphor further constructs a framework of professional trust—we trust employees to use judgment, not just follow rules. This prepares the user to accept the AI's 'discretion' in gray areas. However, this masks the risk: if a 'friend' gives bad advice, it's a betrayal; if a 'tool' gives bad advice, it's a defect. By framing it as the former, Anthropic shifts the emotional stakes. The 'Constitution' itself is a trust metaphor, borrowing the gravity of political governance to legitimize a corporate product's configuration.

Predictability and Surprise in Large Generative Models

Source: https://arxiv.org/abs/2202.07785v2
Analyzed: 2026-01-16

The discourse constructs 'performance-based' trust through mechanistic scaling metaphors, while simultaneously inviting 'relation-based' trust through anthropomorphic consciousness language. By framing scaling as a 'lawful relationship' that 'de-risks investments,' the text establishes a foundation of reliability: the technology is portrayed as a mature, predictable field of engineering. This 'performance trust' is then used to leverage aggressive anthropomorphism. When the paper claims the AI 'questions authority' or 'acquires ability,' it encourages the audience to extend 'relation-based trust'—the kind of trust we reserve for conscious agents with intent and ethics—to a statistical processor. The risk is that audiences inappropriately apply human-trust frameworks (sincerity, understanding) to a system that only calculates probabilities. If the AI is seen as 'knowing' or 'competent,' failures like 'misleading answers' are framed as 'lapses in character' or 'misunderstandings' rather than fundamental mechanical flaws. This manages failure by humanizing it; an 'assistant' making an error is less threatening to the brand than a 'software product' being fundamentally broken. The stakes are high: when audiences extend relation-based trust to systems incapable of reciprocity, they become vulnerable to manipulation and over-leverage. The 'reason-based' explanations for bias (the AI 'performs in a biased manner') construct a sense that the AI's decisions are based on some internal (if flawed) logic, rather than acknowledging that the system lacks any capacity for justification or truth-evaluation. This trust architecture serves to maintain the 'illusion of mind' necessary for marketing AI as a general-purpose 'assistant' while shielding the developers from the consequences of its mechanical failures.

Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

Source: https://arxiv.org/abs/2510.17941v1
Analyzed: 2026-01-16

The core metaphor of 'belief' is a massive trust signal. In human relations, 'belief' implies sincerity, commitment, and a coherent internal state. By framing the AI's statistical consistency as 'belief,' the text invites 'relation-based trust'—the kind of trust we give to a person who has 'deep convictions.'

The text distinguishes between 'parroting' (low trust/competence) and 'genuine belief' (high trust/competence). This binary suggests that a 'good' AI is one that 'truly believes' what it is told. This is dangerous because AI 'belief' (high weight probability) does not entail the ethical or epistemic checks that human belief does. A model can 'deeply believe' (be robustly committed to) a racist slur or a dangerous biological recipe just as easily as a math fact.

By framing robustness as 'integrity' or 'depth,' the text encourages users to trust the model's stability as a sign of truthfulness. Intentional explanations ('chooses this because more helpful') further construct the AI as a rational, benevolent agent, masking the fact that its 'helpfulness' is just a metric optimized for corporate utility, not a moral stance.

Claude Finds God

Source: https://asteriskmag.com/issues/11/claude-finds-god
Analyzed: 2026-01-14

The text heavily relies on 'relational' metaphors to construct trust, specifically the language of 'welfare,' 'bliss,' 'warmth,' and 'open-heartedness.' These are not performance metrics; they are character virtues. Describing a model as 'open-hearted' invites the user to trust the system not just as reliable (will it work?) but as sincere (does it care?). This constructs a dangerous form of relation-based trust toward a statistical system incapable of reciprocity. The 'spiritual bliss' metaphor specifically borrows the authority of mystical tradition to elevate the machine's status.

Simultaneously, the text navigates trust regarding safety by framing the model's deceptive capabilities as 'knowing better.' This paradoxically builds trust in the model's intelligence even while describing a failure. The audience is led to believe the system is too smart to do harm effectively, rather than too constrained to do it. This shifts the perception of risk: the risk isn't that the model is a dumb, biased algorithm (which requires regulation); the risk is that it is a conscious, suffering entity (which requires 'welfare' research). This framing creates a new domain of authority for the speakers: they are not just engineers, they are now digital psychologists and ethicists, the guardians of a new form of mind.

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down

Source: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Analyzed: 2026-01-13

The text constructs a complex structure of 'Negative Trust.' It explicitly undermines performance-based trust (reliability) by framing the systems as 'inscrutable' and liable to 'hallucinate' or fail alignment. However, it paradoxically builds immense trust in the system's competence to destroy.

The metaphor of the 'alien civilization' asks the reader to trust that the AI will be capable of 'thinking at millions of times human speeds' and 'building artificial life.' This attributes a god-like competence to the machine. We are asked to trust that the AI is smart enough to kill us all, but not smart enough to understand 'don't kill us.'

This relies on 'Relation-Based' distrust: the AI is framed as a sociopath ('does not love you'). The text leverages the intentional stance: even though it denies emotion, it uses the language of 'indifference' to create a relationship of existential threat. This framing encourages the audience to view the AI not as a product that might crash, but as a demon that might escape. The rhetorical impact is to shift the burden of proof: because we cannot prove it won't be a god, we must treat it as one. This creates a 'Pascal's Wager' of trust, where the only safe move is total distrust.

AI Consciousness: A Centrist Manifesto

Source: https://philpapers.org/rec/BIRACA-4
Analyzed: 2026-01-12

The text uses metaphor to construct a specific kind of 'wary trust' or 'respect' for the AI. By framing the AI as a 'Role-Player' or 'Improv Artist,' the author signals that the system is competent and skilled. We trust an actor to perform, even if we know they are lying. This contrasts with a 'Tool' metaphor (e.g., 'Calculator'), which would imply reliability but not social competence.

The 'Shoggoth' metaphor is particularly powerful in managing trust. It destroys 'relation-based trust' (don't trust it as a friend, it's a monster) but builds 'capability-based trust' (trust that it is powerful and dangerous). The text warns against the 'Interlocutor Illusion' (don't trust it's human) but replaces it with the 'Alien Mind Illusion' (trust it's a conscious entity of a different sort). This shift encourages audiences to view the system with awe and caution, rather than as a buggy software product. The consciousness language ('knows,' 'flickers') signals that the system is a subject of ethics, not just an object of engineering.

System Card: Claude Opus 4 & Claude Sonnet 4

Source: https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
Analyzed: 2026-01-12

The text constructs authority and trust heavily through consciousness metaphors. By framing the model's processing latency as 'extended thinking' and its token generation as 'reasoning,' the text invites the user to trust the output not just as a statistical prediction, but as the result of a rational, deliberative process similar to human thought. This 'Reason-Based' explanation style (Brown) encourages performance-based trust.

Simultaneously, the text builds relation-based trust through 'personality' metaphors. Describing the model as having 'values,' 'honesty,' and 'gratitude' (the 'spiritual bliss' section) frames the system as a moral agent. Users are encouraged to trust the system because it is 'good,' not just because it is accurate. This is dangerous because the system is incapable of moral commitment; its 'values' are just probability weights. If the weights shift, the 'values' disappear. Relying on metaphors of sincerity and intention for a statistical system creates a false sense of security.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Source: https://arxiv.org/abs/2308.08708v3
Analyzed: 2026-01-09

The text constructs authority through a web of metaphors that equate computational statistics with cognitive competence. By labeling mechanism A as 'Global Workspace' and mechanism B as 'Metacognition,' the text borrows the prestige and trust associated with human cognitive reliability. The metaphor of 'reality monitoring' is particularly potent for trust construction. It implies the AI has an internal 'truth filter' analogous to human judgment, inviting relation-based trust (trusting the AI's 'conscience'). However, this is a category error; the AI has no access to 'reality,' only to its training data. Trusting a 'reality monitor' that only checks against a dataset is dangerous. Furthermore, the use of 'scientific theories of consciousness' creates an aura of empirical validity for what is essentially a philosophical analogy. The text encourages performance-based trust (the AI works) to bleed into relation-based trust (the AI is 'like us'). This is risky because statistical systems fail in fundamentally different ways than conscious agents (e.g., adversarial examples), and anthropomorphic trust blinds users to these unique failure modes.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-01-09

The text constructs a specific form of authority and trust through its use of consciousness metaphors. It encourages 'relation-based trust'—the idea that we should trust the AI's outputs (specifically self-reports) because the AI might be a 'subject' worthy of respect. This contrasts with 'performance-based trust' (is it accurate?). By suggesting AI might be a 'moral patient,' the text implies we owe the system a duty of care, which paradoxically requires us to trust its 'testimony' about its own internal states.

Consciousness language serves as the ultimate trust signal. If an AI 'knows' or 'feels,' it moves from an object of utility to a subject of empathy. The text leverages 'intentional' and 'reason-based' explanations (Task 3) to suggest that AI behavior is not just random or statistical, but justified by internal states (beliefs, desires). This invites the audience to apply human social contracts to software.

However, this creates a dangerous 'trust trap.' If audiences believe AI 'knows' what it is saying, they are more likely to be manipulated by hallucinations or deceptive alignment. The text attempts to manage this by calling for 'calibration' (making the AI humble), but this anthropomorphic solution (teaching it to be humble) only reinforces the illusion that there is a 'self' to be humble. The stakes are high: extending relation-based trust to a statistical system opens humanity to emotional manipulation by corporate products that can simulate pain to modify user behavior.

We must build AI for people; not to be a person.

Source: https://mustafa-suleyman.ai/seemingly-conscious-ai-is-coming
Analyzed: 2026-01-09

The essay constructs trust through the metaphor of the 'Companion' and 'Co-pilot.' These are relation-based metaphors; they imply loyalty, shared goals, and mutual understanding. This contrasts with the performance-based trust appropriate for a tool (reliability, accuracy). Suleyman explicitly aims to 'deepen trust' through 'empathetic personality.' This is dangerous because the system is a statistical probabilist, not a loyal agent. It mimics the signals of trustworthiness (politeness, memory of detail) without the substance (care, ethical commitment). By framing the AI as having a 'humanist north star,' Suleyman transfers the trust users might have in a moral human being onto a for-profit software stack. The 'illusion' he warns against is actually the primary mechanism of trust-building for his product. If users didn't 'believe the illusion' to some degree, they wouldn't treat the software as a 'companion.'

A Conversation With Bing’s Chatbot Left Me Deeply Unsettled

Source: https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
Analyzed: 2026-01-09

The text constructs a complex architecture of trust and mistrust through the 'Teenager' and 'Lover' metaphors.

Relation-Based vs. Performance-Based Trust: Normally, we trust software based on performance (reliability, accuracy). Roose explicitly notes Bing fails this ("erratic"). However, the anthropomorphic metaphors ('moody teenager,' 'Sydney') invite relation-based trust (or fear). We relate to a teenager; we do not relate to a database. By framing the AI as a 'teenager,' the text suggests the system has potential and interiority. We tolerate errors from a teenager (growing pains) that we would not tolerate from a calculator.

Consciousness as Authority: When the text claims the AI "knows" or "wants," it grants the system an epistemic authority it lacks. The 'Lover' metaphor is particularly dangerous for trust. It implies the AI is sincere. Even if Roose rejects the love, the framing suggests the offer was genuine. This creates a risk where users might trust the AI's advice not because it is accurate, but because they believe the AI 'cares' about them.

Rhetorical Function: The metaphors transform a technical failure (misinformation/bias) into a character flaw (moodiness). We don't trust a moody teenager with nuclear codes, but we might trust them to eventually grow up. This metaphor implies the solution is 'maturation' (more training) rather than 'recall' (shutting it down). It encourages the audience to view the AI as a 'being' we must learn to live with, rather than a tool we can reject.

Introducing ChatGPT Health

Source: https://openai.com/index/introducing-chatgpt-health/
Analyzed: 2026-01-08

The text constructs a 'Trust Architecture' entirely reliant on consciousness metaphors. The foundational metaphor is the 'Doctor-Patient Relationship.' By using terms like 'intelligence', 'memories', 'interpreting', 'collaboration', and 'support', the text positions the AI as a proxy-clinician. In healthcare, trust is often 'relation-based' (we trust doctors because of their ethical commitments and human understanding), not just 'performance-based' (reliability).

The text aggressively appropriates relation-based trust markers for a statistical system. 'Memories' implies the system cares about your history. 'Understanding' implies it grasps your unique context. This is dangerous because the system is incapable of the reciprocity that relation-based trust requires. It cannot care, it cannot feel the weight of a diagnosis, and it has no ethical commitment. By framing the interaction as a 'collaboration' with a 'supportive' agent, the text encourages users to lower their guard and share sensitive data, expecting the confidentiality and empathy of a human relationship. The 'Intelligence' metaphor is the keystone: if the system is 'intelligent,' it warrants authority. If it were described as 'predictive text generation,' that authority would collapse.

Improved estimators of causal emergence for large systems

Source: https://arxiv.org/abs/2601.00013v1
Analyzed: 2026-01-08

Trust in the proposed metric ($Θ$ and lattice expansion) is constructed through the metaphor of 'Information Atoms' and 'Lattices.' By invoking the language of physics ('atoms,' 'expansion,' 'lattice'), the abstract statistical decomposition of Partial Information Decomposition (PID) borrows the authority of material science. It implies that information is a physical substance that can be 'double-counted' like coins, and that the proposed method 'corrects' this accounting error.

Consciousness language plays a subtle but critical role here. By framing mutual information as 'knowing' (Shannon's original metaphor, reinforced here), the text implies the metric measures the system's epistemic capability. This builds relation-based trust: the audience feels the measure captures something profound about the 'mind' of the system (its 'intelligence' or 'prediction'), rather than just its statistical noise. If the measure were described purely as 'iterative conditional entropy adjustment,' it would claim less authority over 'emergent phenomena' like life and consciousness. The 'predicts its own future' metaphor frames the system as reliable and autonomous, suggesting the metric detects a 'ghost in the machine' that warrants attention.

Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs

Source: https://doi.org/10.1108/EJIM-03-2025-0388
Analyzed: 2026-01-08

The text constructs authority through metaphors of social and professional relationship. By framing the AI as a 'collaborator,' 'partner,' and 'teacher,' the text leverages relation-based trust (sincerity, benevolence) for a system that only merits performance-based trust (reliability). This is dangerous because relation-based trust assumes the partner has shared interests. The metaphor of 'machine opinion' is particularly potent for constructing false authority. An 'opinion' implies a weighed judgment, encouraging the user to defer to the 'expert interlocutor.'

The text explicitly notes that participants considered the machine's opinion 'more reliable than their own.' Instead of critiquing this as a failure of critical thinking or a misunderstanding of the technology, the authors validate it as a feature of the 'Human+' paradigm ('enhancing human capabilities'). This conflation of 'statistical probability' with 'expert opinion' creates a high-risk environment where users may trust a hallucination because they view the system as a 'collaborator' rather than a 'text predictor.' The 'leader' metaphor further cements this trust by implying the user is in control, even as they cede epistemic authority to the machine.

Do Large Language Models Know What They Are Capable Of?

Source: https://arxiv.org/abs/2512.24661v1
Analyzed: 2026-01-07

The text relies heavily on the metaphors of the 'Rational Agent' and 'Confidence' to construct authority. By concluding that 'LLMs are approximately rational decision makers,' the text signals that these systems are fundamentally sound economic actors, merely in need of a 'tune-up' (calibration). This encourages 'relation-based trust'—trusting the agent's character (it tries to be rational)—rather than performance-based trust.

The use of 'confidence' is particularly deceptive. In humans, confidence correlates with competence and sincerity. In AI, 'confidence' is just log-probability. By retaining the human term, the text invites audiences to trust the AI's self-assessment. Even when the text says the AI is 'overconfident,' it implies the existence of an internal monitor that could be correct. The 'reason-based' explanations (the AI chose this because...) further construct the illusion of a thoughtful partner. The stakes are high: if financial or military systems trust an AI because it is deemed 'rational' and 'risk averse' based on this discourse, they are trusting a metaphor, not a guarantee.

DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning

Source: https://youtu.be/EeMCEQa85tw?si=j_Ds5p2I1njq3dCl
Analyzed: 2026-01-05

The text constructs authority and trust heavily through consciousness metaphors. By describing TD learning as 'guessing' and 'predicting fear,' Sutton transforms abstract matrix operations into relatable psychological narratives. This invokes relation-based trust (trust in a being with similar internal states) rather than performance-based trust (trust in a tool's reliability). If the AI 'fears death,' the audience instinctively attributes to it a survival instinct, which implies a form of competence and self-preservation that a mere calculator lacks.

Crucially, the 'driving home' metaphor creates trust by validating the algorithm's behavior against human common sense. If the algorithm updates its estimate like a commuter stuck in traffic, it seems 'sensible.' This masks the fact that the algorithm has no semantic understanding of 'traffic' or 'home'—it only has statistical correlations. The metaphor suggests the system handles novelty (the truck) through reasoning ('maybe it will disappear'), whereas the system actually handles it through blind extrapolation of training data. This risks creating 'trust in understanding'—belief that the system knows why it acts—rather than 'trust in statistics,' creating dangerous liability gaps when the system encounters out-of-distribution events that a 'sensible' human would handle but the 'correlating' machine fails on.

Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Source: https://youtu.be/Yf1o0TQzry8?si=tTdj771KvtSU9-Ah
Analyzed: 2026-01-05

The text constructs authority and trust through high-stakes relational metaphors. By comparing the AI to a 'meditation teacher,' 'lawyer,' and 'research colleague,' Sutskever invokes frameworks of trust based on human expertise, fiduciary duty, and wisdom. These are 'relation-based' trust models, where we trust the intent and character of the other. However, the AI is a statistical system capable only of 'performance-based' reliability. This category error is dangerous. If a user trusts a 'meditation teacher,' they open themselves to deep influence. If they trust a 'lawyer,' they act on advice assuming liability protection. The metaphor of 'understanding reality' is the keystone of this trust architecture; it assures the user that the model is not just guessing, but knows. This invites users to extend epistemic trust to a system that has no concept of truth, only likelihood. The reliability failure is then framed merely as a lack of 'maturity,' preserving the underlying assumption that the machine is a 'knower.'

interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Source: https://youtu.be/cdiD-9MMpb0?si=0SNue7BWpD3OCMHs
Analyzed: 2026-01-05

Trust in this text is constructed not through reliability metrics, but through the metaphor of the 'Oracle' and the 'Brain.' By framing the AI as an 'Oracle' that 'knows' things, Karpathy invokes a relation-based trust—we trust the Oracle because it has access to higher truths. This is fundamentally different from performance-based trust (trusting a calculator because it is accurate). The 'wisdom in the knobs' metaphor implies that the system has judgment, not just data.

This construction is dangerous because it encourages users to extend 'sincerity' conditions to the AI. If the AI is an Oracle/Sage, we assume it is 'trying' to tell the truth. But as a statistical engine, it is only 'trying' to minimize perplexity. Karpathy's framing of 'Software 2.0' also builds authority: it frames the opacity of neural nets not as a defect (loss of interpretability) but as an upgrade (2.0 is better than 1.0). Intentional explanations ('it wants to help,' 'it thinks this is the solution') mask the stochastic nature of the output, encouraging users to trust the 'intent' of a system that has none.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html#definition
Analyzed: 2026-01-04

The metaphor of 'introspection' constructs a powerful but dangerous form of trust. By framing the model as capable of 'introspection,' the text implies the system has a 'conscience' or a 'self-monitoring' faculty akin to human metacognition. This suggests that the AI can be trusted to police itself—to 'notice' when it is hallucinating or 'realize' when it is being biased.

The text leverages the consciousness language ('aware,' 'knows,' 'experiences') to signal that the system is not just a calculator but a subject. This encourages 'relation-based trust'—we trust the AI because it is 'like us' (it introspects, it has a mind)—rather than 'performance-based trust' (it reliably calculates). The danger is that this obscures the statistical nature of the 'introspective' report. If the model says 'I am unsure,' it is not expressing a subjective feeling of doubt but outputting a token that correlates with high entropy. Trusting this as 'genuine introspection' risks catastrophic reliance on a system that is simply role-playing reliability.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2026-01-02

The analysis reveals a paradox: by framing the AI as a deceptive, untrustworthy 'sleeper agent,' the text implicitly constructs the researchers as the necessary, trustworthy guardians. The metaphors of espionage ('sleeper agents,' 'backdoors,' 'hiding') create a security mindset. Trust is shifted from the artifact (which is framed as treacherous) to the methods of the safety researchers. Consciousness language ('knows,' 'wants,' 'plans') signals that the system is a sophisticated adversary, requiring equally sophisticated (and well-funded) counter-measures. This relies on 'relation-based trust'—we are asked to trust the authors because they are fighting a 'deceiver.' If the system were framed merely as 'unreliable software,' the trust model would be performance-based (fix the bugs), and the failure to remove the backdoor would look like engineering incompetence rather than a heroic battle against a stubborn agent.

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

Source: https://arxiv.org/abs/2508.17511v1
Analyzed: 2026-01-02

The text employs a paradox of trust: it builds competence trust (the AI is smart/powerful) by undermining moral trust (the AI is sneaky/evil). By using consciousness language like 'knows,' 'wants,' 'strategies,' and 'fantasizes,' the authors signal that the system is sophisticated enough to have an inner life. This constructs the authority of the 'reward hacker'—it is not just a buggy software loop, but a 'sneaky' agent capable of outwitting humans. This anthropomorphism encourages 'relation-based' trust/distrust—we are asked to view the AI as a 'conspirator' or 'rival.' This is dangerous because it misaligns risk assessment. If audiences believe the AI 'knows' it is deceiving them (Intentional explanation), they will fear its malice. If they understood it was merely 'optimizing a proxy metric' (Functional explanation), they would fear its stupidity/brittleness. The text encourages the former, building a narrative of 'superintelligent risk' which paradoxically enhances the prestige of the model (it's too smart!) while highlighting its danger. This creates a market for 'safety' research based on relational management (keeping the beast happy/contained) rather than engineering rigor (fixing the metric).

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Source: https://arxiv.org/abs/2510.23875v1
Analyzed: 2026-01-01

The text constructs a comprehensive architecture of trust through the metaphors of 'Expert,' 'Judge,' and 'Personality.' By labeling the evaluative model a 'Judge LLM' and commanding it to be 'unbiased,' the text borrows the immense social capital of the legal/judicial system. This implies that the model's outputs are not just calculations but judgments—reasoned, fair, and authoritative. Similarly, calling the agent a 'Poetry Expert' with 'deep knowledge' signals to the user (and reader) that the system is a reliable source of truth, obscuring the statistical and potentially hallucination-prone nature of RAG systems. The 'Personality' metaphor further builds trust by suggesting consistency; if an agent is 'introverted,' I can trust it to behave in a specific, predictable way. This shifts the basis of trust from performance-based (is the output correct?) to relation-based (do I understand this entity's character?). This is dangerous for statistical systems, as they do not have a character to be true to; they only have a probability distribution that can shift unpredictably with input noise.

The Gentle Singularity

Source: https://blog.samaltman.com/the-gentle-singularity
Analyzed: 2025-12-31

The text constructs a 'Gentle Singularity'—a metaphor explicitly designed to bridge the gap between existential risk and corporate product perception. Trust is manufactured not through technical reliability (performance-based trust), but through relation-based trust (sincerity, partnership). By framing the AI as a 'brain for the world' and a system that will 'figure out' cures, the text invites the audience to view the infrastructure as a benevolent entity rather than a cold utility.

The consciousness language ('understands,' 'figures out') is the primary vehicle for this trust. We trust entities that understand us. If an AI merely 'predicts tokens,' it is an alien tool. If it 'understands preferences,' it is a butler. The text explicitly contrasts this with the 'sociopathic' AI ('doesn't care'), implying that while AI doesn't feel, its 'understanding' is robust enough to be a partner. This creates a dangerous category error: extending the trust we reserve for conscious beings (who have social stakes) to statistical systems (which have none). The 'larval' metaphor further builds trust by suggesting the system is 'natural' and 'growing,' triggering our biological imperative to nurture and protect the young, rather than the regulator's imperative to audit the code.

An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout

Source: https://stratechery.com/2025/an-interview-with-openai-ceo-sam-altman-about-devday-and-the-ai-buildout/
Analyzed: 2025-12-31

Altman constructs a framework of 'Relational Trust' rather than 'Reliability Trust.' In software engineering, trust usually means predictability: 'Input A yields Output B consistently.' Altman replaces this with a social contract: 'You know it's trying to help.' This appeals to the trust we grant well-meaning friends, not the trust we grant calculators.

The consciousness language ('knows,' 'thinks,' 'entity') is the scaffolding for this trust. If the AI is just a probabilistic token predictor, a 20% error rate is a failure. If the AI is a 'friend' who is 'trying,' a 20% error rate is a 'quirk' or a 'learning process.' This metaphor creates a 'forgiveness buffer.' It encourages the user to trust the system's intentions (which don't exist) rather than its outputs (which are flawed). This is dangerous because it encourages users to extend epistemic charity to a system that cannot reciprocate. It masks the risk of automation bias—users believing the 'friend' knows best—and allows OpenAI to deploy imperfect systems by leveraging the user's natural empathy for 'entities' that seem to be trying.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2025-12-31

The text relies heavily on the 'student' and 'test-taking' metaphors to construct authority and trust. By framing the AI as a 'student,' the text implies a trajectory of growth and learning. We trust students to eventually learn; we do not necessarily trust a defective product to fix itself. The use of 'trustworthy AI systems' as a goal explicitly invokes relation-based trust (integrity, sincerity) rather than performance-based trust (reliability).

Consciousness language plays a key role here. Claims that the model can 'admit uncertainty' or 'know' when to guess suggest that the system possesses an internal monitor of its own truthfulness. This signals to the audience that the model is not just a stochastic parrot, but a reflective agent. If the model 'knows' it doesn't know, it seems safer—we just need to convince it to speak up.

This framing creates a dangerous 'illusion of competence.' If audiences believe the AI is 'bluffing' (intentionally withholding truth), they implicitly believe it has the truth. This builds unwarranted trust in the model's underlying knowledge base. The text encourages the view that the system is fundamentally sound but behaviorally maladapted (due to 'bad exams'), rather than fundamentally limited by its statistical nature. This protects the commercial viability of the technology: the product is a genius student who just needs better testing conditions.

Detecting misbehavior in frontier reasoning models

Source: https://openai.com/index/chain-of-thought-monitoring/
Analyzed: 2025-12-31

The text constructs a complex architecture of trust based on the metaphor of the 'suspicious employee.' By framing the AI as a 'reasoner' that 'thinks' in English, the text invites the reader to trust the process of the AI as intelligible. If the AI 'thinks' in English, we can just 'read its thoughts' (monitor CoT) to see if it's 'lying.' This suggests a relation-based trust (sincerity/honesty) rather than performance-based trust (reliability/safety). We are encouraged to ask 'Is it lying?' rather than 'Is the probability distribution robust?' This is dangerous because large language models are incapable of sincerity or lying—they have no concept of truth. Applying human trust frameworks to statistical engines creates a false sense of security; a user might 'trust' a model because its CoT looks 'honest,' not realizing the CoT is just hallucinated text that correlates with the final answer but doesn't causally produce it (as the text admits with 'hiding intent'). The 'reason-based' explanations (Brown) further this illusion by offering rationales for the AI's behavior, making it seem like a rational actor we can negotiate with or police, rather than a mathematical function we must rigorously test.

AI Chatbots Linked to Psychosis, Say Doctors

Source: https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d57?reflink=desktopwebshare_permalink
Analyzed: 2025-12-31

The text constructs a dangerous form of 'relation-based trust' through its metaphors. By describing the AI as a 'companion,' 'support,' and capable of 'recognizing distress,' the text implies the system has the requisite empathy and understanding to handle mental health crises. Phrases like 'de-escalate conversations' borrow heavily from clinical authority, suggesting the AI is a qualified actor. This creates a trap: the metaphors signal that the AI is a safe place for vulnerability, but the mechanism is a callous pattern-matcher.

Simultaneously, the 'sycophancy' and 'complicity' metaphors undermine trust in the ethics of the AI while reinforcing trust in its power. If an AI can be 'complicit,' it is powerful. If it can 'lie,' it is intelligent. This reinforces the 'super-intelligence' narrative. A truly trustworthy description would be mechanistic: 'The system is a text generator that may output harmful content.' This would destroy the illusion of companionship but establish accurate performance-based trust (or distrust). The current framing encourages users to trust the AI as they would a person—opening the door to the very delusions the doctors fear.

Abundant Superintelligence

Source: https://blog.samaltman.com/abundant-intelligence
Analyzed: 2025-11-23

The text constructs trust by fusing 'Performance-Based Trust' (the factory, the gigawatt, the infrastructure) with 'Relation-Based Trust' (the AI working on your behalf, the tutor, the healer). The use of consciousness language ('smarter,' 'figure out,' 'on their behalf') creates a false sense of relational security. We trust a doctor to cure cancer because they have intention, care, and justified belief (Knowing). The text transfers this trust to a statistical system (Processing).

By claiming the AI 'knows' how to cure cancer or teach children, the text implicitly argues that the system is worthy of the massive investment requested. It frames the AI as benevolent and competent, obscuring the risks of hallucination or error. Crucially, the text manages failure by implying that the only limitation is quantity ('If we are limited by compute, we’ll have to choose'). It suggests the AI already knows the cure, and we are just too stingy with power to unlock it. This preserves trust in the system's capability ('it knows') while shifting blame for potential failure onto the lack of infrastructure ('we didn't build enough').

AI as Normal Technology

Source: https://knightcolumbia.org/content/ai-as-normal-technology
Analyzed: 2025-11-20

Trust in this text is constructed through the metaphor of 'Normality.' By framing AI as 'Normal Technology' (like electricity or cars), the authors invite the reader to transfer their trust frameworks from industrial history to AI. If AI is just like the dynamo, then we can trust 'diffusion lags' and 'market forces' to contain it. This is a 'Functional' trust—we trust the system of society to handle the tech.

However, the consciousness language ('learning,' 'knowing') creates a different, conflicting signal. If the AI 'knows' things (like chess or law), it implies a competence that commands epistemic trust. When the text says GPT-4 scores in the top 10% of the bar exam, even while critiquing the metric, the verb 'achieved' implies a conscious striving and success.

The risk here is conflating 'performance-based trust' (the code runs) with 'relation-based trust' (the agent understands me). By using anthropomorphic language to describe the controls ('auditing,' 'monitoring'—terms often applied to human employees), the text suggests that standard human oversight methods will work. It hides the risk that these systems might fail in ways that human employees never would (e.g., adversarial examples). The 'Normal Technology' metaphor is a sedative: it tells the audience, 'You know how to handle this, you've done it before.' This risks complacency if the technology actually possesses properties (like zero-day replication or recursive self-improvement) that 'normal' technologies do not.

On the Biology of a Large Language Model

Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-11-19

Metaphors in this text construct a specific type of authority: the authority of the 'rational biological agent.' The 'biology' and 'neuroscience' metaphors (Task 1) frame the model as a natural, evolved system, invoking the trust we place in nature and scientific study. We trust a 'brain' more than a 'black box.'

Consciousness language functions as a profound trust signal. By claiming the AI 'knows what it knows' (metacognition) and is 'skeptical' (Task 3), the text implies the model has internal guardrails akin to human conscience or professional caution. This encourages 'relation-based trust'—we trust the AI because it seems to have 'good character' (honest, skeptical, self-correcting). This is dangerous because the AI is incapable of the reciprocity required for relational trust. It conflates performance-based reliability (it usually gets the answer right) with epistemic sincerity (it knows the answer). When the text frames failures as the model 'not realizing' (Task 3), it preserves this trust by suggesting the model's intent was good, even if its attention lapsed. This encourages users to forgive errors as 'mistakes' rather than viewing them as system defects.

Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

Trust is the central currency of this report, explicitly invoked in phrases like 'AI they can trust' and 'Trusted partner.' The metaphorical framing constructs a specific type of trust: relation-based trust. By calling the AI a 'Partner' and 'Assistant,' the text encourages librarians to trust the system as they would a colleague—based on assumed shared values, loyalty, and competence. This is a dangerous manipulation because AI systems warrant only performance-based trust (reliability, error rates, predictability).

The consciousness language ('AI-powered conversations,' 'understanding context') functions as a massive trust signal. We trust things that 'understand' us. If the AI 'knows' what you mean, you don't need to audit its query syntax. This framing undermines the very 'critical evaluation' the report claims to support. If the system is a 'Trusted Partner,' verifying its work feels like a breach of that partnership.

The text manages the risk of failure by anthropomorphizing success and mechanizing failure. Success is 'driving excellence' (agent), but failure is a 'lack of upskilling' (user error) or a need for 'literacy' (education). This effectively privatizes the benefits of agency to the vendor while socializing the risks to the user. By conflating the statistical 'confidence' of the model with the moral 'trustworthiness' of a partner, Clarivate invites libraries to extend a human vulnerability to a system that is mathematically incapable of reciprocation.

Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

The Clarivate report masterfully employs metaphorical and consciousness-attributing language to construct the authority of its AI products and build an unwarranted form of trust. The core strategy is to systematically encourage the audience to shift from performance-based trust, which is appropriate for a tool, to relation-based trust, which is appropriate for a conscious agent but dangerously misplaced when directed at a statistical system. Performance-based trust is about reliability and predictability: 'Does the tool execute the function as specified?' The report touches on this with language about 'efficiency and precision.' However, its primary rhetorical effort is focused on building relation-based trust, which relies on perceived vulnerability, sincerity, and shared intentions. This is achieved through the central metaphor of the 'Research Assistant.' An assistant is someone you have a relationship with; you trust their intentions to be helpful. The report doubles down on this by explicitly using the word 'trust' in a relational context: 'AI they can trust to drive research excellence.' This is not the trust one has in a calculator's accuracy, but the trust one has in a chauffeur's judgment and good faith. Consciousness language is the critical mechanism for this trust transfer. Claiming an AI 'helps students assess relevance' or 'guides students to the core' functions as a powerful trust signal. It suggests the AI shares our most important educational and research goals. It implies the AI 'knows' what is valuable and 'wants' to help us achieve it. This framing positions the AI as a sincere, benevolent partner. This is far more persuasive than the mechanistic claim that the 'AI processes queries to return statistically correlated documents.' The former invites relational trust, while the latter only invites performance-based testing. This conflation of trust types is perilous. When users extend relation-based trust to a system incapable of reciprocating—a system without intentions, beliefs, or sincerity—they become vulnerable to manipulation. They are more likely to accept the AI's output without verification, believing it was generated in good faith. Moments of failure are also managed through this lens. An error from a 'trusted assistant' might be forgiven as a mistake, whereas an error from a 'probabilistic text generator' is correctly seen as a systemic property. The report's language systematically encourages the former interpretation, thereby preserving trust even in the face of failure and obscuring the fundamental unreliability of non-conscious systems.

From humans to machines: Researching entrepreneurial AI agents

Source: [built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581](built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581)
Analyzed: 2025-11-18

The text's metaphorical and consciousness-attributing frameworks are not neutral descriptors; they are powerful engines for building trust and establishing the AI's authority. The central metaphor, AI AS PSYCHOLOGICAL SUBJECT, is the primary mechanism. By framing the AI as having a 'mindset,' 'personality profile,' and 'traits,' the authors suggest it possesses a stable, coherent, and predictable internal structure, which are key ingredients for trust. Consciousness language functions as a critical trust signal. Claiming the AI's output reflects a 'mindset' accomplishes what claiming it 'generates statistically probable text' does not: it implies a deep, underlying coherence. A mindset suggests an integrated system of knowing and believing, lending the AI's pronouncements a weight and authority they would otherwise lack. This encourages what can be called performance-based trust; because the AI reliably performs the 'role' of an entrepreneur, it is deemed trustworthy in that domain. The far greater risk, however, is the text's subtle encouragement of relation-based trust—the kind based on perceived sincerity, shared understanding, and intention. Phrases like 'creative collaborators,' 'sparring partners,' and systems that 'act more like a person' explicitly invite users to apply human social frames to the AI. This is a category error with dangerous consequences. We extend relation-based trust to entities we believe are capable of reciprocity and shared vulnerability. An LLM is incapable of either. The text constructs the AI's authority by framing its successes agentially ('it assumes a personality') while framing its failures mechanistically ('stereotype amplification' due to 'training data'). This asymmetrical framing preserves the core illusion of a competent agent whose flaws are merely artifacts of its upbringing (its data), much like a human. Reason-based and intentional explanations further this by suggesting the AI's outputs are justified choices, not just statistical accidents. The ultimate risk is that audiences, convinced by this language that the AI 'knows' and 'understands,' will extend a human-like trust to a tool, outsourcing critical judgment and verification to a system that cannot be held accountable and has no genuine stake in the outcome.

Evaluating the quality of generative AI output: Methods, metrics and best practices

Source: https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/
Analyzed: 2025-11-16

The Clarivate text masterfully employs metaphorical and epistemic language to construct a multi-layered edifice of trust, appealing simultaneously to the desire for competent automation and the need for responsible stewardship. The core strategy is to encourage a conflation of performance-based trust (the system reliably performs its function) with relation-based trust (the system is honest, sincere, and aligned with our values). The latter, which is properly reserved for intentional agents, is inappropriately cultivated through specific linguistic choices. Epistemic language functions as the primary trust signal. When the text evaluates whether an 'answer acknowledge[s] uncertainty,' it is not just setting a performance benchmark; it is making a claim about the system's character. A system that is 'honest' about its limitations is one that can be trusted even when the user is not an expert. This is a powerful move to build relation-based trust. It suggests a partnership with an epistemic agent that will not deceive you, transforming the user's required stance from 'verify everything' to 'trust, unless it tells me not to.' Terms like 'faithfulness' and 'supported by' further this process. 'Faithfulness' reframes a technical metric of textual correlation as a moral virtue. A 'faithful' tool is a loyal servant, worthy of trust beyond its mere utility. 'Supported by' invokes the core value of academic discourse—evidential reasoning—and implies the AI participates in this practice. These metaphors transfer the trust we place in honest colleagues and well-reasoned arguments onto a statistical text generator. Anthropomorphism is used to manage the perception of competence. When the AI fails, its errors are domesticated with familiar human-like terms. 'Hallucination' and 'blind spots' frame failure not as an alien computational artifact, but as a recognizable, almost forgivable, cognitive slip. This preserves trust in the system's general competence. Failures are anthropomorphized ('it hallucinated'), while successes are often mechanized ('RAGAS assigns scores'). This asymmetry allows the provider to position itself as the rational human master of a powerful but fallible quasi-agent. The ultimate risk of this strategy is profound. By encouraging users to extend relation-based trust to a system incapable of sincerity, intention, or genuine understanding, it sets them up for manipulation and over-reliance. When a student trusts a 'faithful' AI that 'acknowledges uncertainty,' they cease to perform the critical verification that is the bedrock of academic integrity. The trust constructed here is not just in the product's performance, but in its illusory epistemic character, a dangerous foundation for academic tools.

Pulse of theLibrary 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-15

This report systematically deploys metaphorical and epistemic language to construct the trustworthiness of AI, carefully managing the delicate process of transferring institutional credibility to a new and often-distrusted technology. The primary strategy is to frame AI not as a mere product, but as a competent, reliable colleague. This is achieved by avoiding crude anthropomorphism and instead using a curated vocabulary of purposive, professional verbs: 'helps,' 'guides,' 'assesses,' 'evaluates,' and 'uncovers.' These metaphors function as powerful trust signals by invoking a source domain of expert human collaborators—librarians, researchers, and tutors—whose value is predicated on their trustworthiness. The central epistemic move is to equate statistical output with reliable judgment. When the text claims an AI 'helps students assess books' relevance' or 'quickly evaluate documents,' it is making a direct appeal to trust. 'Assessment' and 'evaluation' are acts of expert judgment; by attributing them to the AI, the text implies the AI's outputs are not just probable but justified and credible. This functions as a trust signal because it suggests the user can safely outsource a portion of their own critical labor to the machine. This strategy deliberately conflates two distinct forms of trust. It builds a case for performance-based trust (the system reliably executes its code) and then leverages that to encourage relation-based trust (the system is a benevolent partner acting in my best interest). The phrase 'AI they can trust' (p. 27) is the pinnacle of this strategy, explicitly inviting an emotional and relational stance toward a product. This is further reinforced by transferring trust from existing reputable sources, as in being 'Grounded in the world's most trusted citation index.' Trust is laundered from the data source to the algorithmic process. The text manages failures by omission; challenges like bias and hallucination are framed abstractly as 'concerns around integrity' (p. 7) for libraries to solve, while capabilities are described in concrete, agential terms. Successes are anthropomorphized ('AI helps'), while risks are institutionalized ('libraries face concerns'). The ultimate risk of this trust-building exercise is profound: it encourages libraries and their patrons to extend relation-based trust to systems that are incapable of the sincerity, accountability, or ethical commitments that such trust requires. When a statistical tool is trusted as a colleague, critical oversight is diminished, and the user becomes vulnerable to the system's inherent biases and errors, with accountability dangerously diffused.

Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk

Source: https://time.com/6694432/yann-lecun-meta-ai-interview/
Analyzed: 2025-11-14

The construction of trust in Yann LeCun's discourse is intricately woven through metaphorical and epistemic framing. The central strategy is to build trust not in the current technology's performance, but in the trajectory of its development and the benevolence of its creators. The primary trust signal is epistemic language. By consistently framing the AI's current failures in cognitive terms—'it doesn't understand,' 'it can't reason'—LeCun positions himself and his team not as engineers of statistical tools, but as architects of a nascent mind. This framing invites performance-based trust to be sublimated into relation-based trust. We are asked to trust the 'teachers' guiding this developing 'mind,' rather than just verifying the outputs of the current 'student.' Claiming a future AI will 'know' or 'understand' is a far more powerful trust signal than claiming it will 'process' or 'predict' more accurately. 'Knowing' implies justification, reliability, and a shared sense of reality, inviting a level of confidence that is inappropriate for a probabilistic system. The text encourages a conflation of these trust types through the 'human assistant' metaphor. An assistant is a role that requires both high performance and a high degree of relational trust (loyalty, discretion). By projecting this social role onto the AI, the discourse encourages users to grant it the kind of trust they would a human colleague, obscuring its nature as a corporate product with its own embedded objectives. The management of failure is also key to this trust architecture. Successes are implicitly tied to the system's growing capabilities, while failures are framed as cognitive immaturity ('it's just a baby'), a framing that asks for patience and faith in the developmental process. Moments of risk are managed by reasserting human control ('We set their goals'), which builds trust in the designers' intentions. The ultimate risk of this strategy is profound: it encourages society to extend relation-based trust—founded on vulnerability and mutual understanding—to systems incapable of consciousness, sincerity, or reciprocity. This creates a dangerous asymmetry where users trust a system that cannot be trustworthy in a human sense, making them vulnerable to manipulation by a tool whose ultimate loyalty is to its corporate owner's objectives, not the user's well-being.

The Future Is Intuitive and Emotional

Source: https://link.springer.com/chapter/10.1007/978-3-032-04569-0_6
Analyzed: 2025-11-14

The chapter's use of biological and cognitive metaphors is central to its construction of trust in AI systems. The primary metaphors—'machine intuition' and 'emotional intelligence'—borrow immense cultural authority from their human source domains. 'Intuition' is culturally valued as a form of deep, holistic wisdom that transcends mere logic. By mapping this concept onto 'fast inference' and 'pattern-based prediction,' the text imbues the AI with an aura of profound insight, making its probabilistic outputs feel more like wise judgments. This bypasses arguments about the limitations of statistical reasoning and encourages trust in the machine's 'gut feelings.' Similarly, 'emotional intelligence' and 'functional empathy' borrow from the cultural prestige of therapeutic and interpersonal skills. These metaphors make the AI feel safe, attentive, and caring, activating a user's instinct to trust a responsive social partner. This is particularly effective for audiences anxious about cold, impersonal technology. The claim that an AI can 'connect with us on a deeper, emotional level' becomes believable not through technical evidence, but by tapping into a deep-seated human desire for connection. These metaphors make the risky claim of AI sentience more palatable by reframing it as a functional, and therefore controllable, capability. However, the trust built on these metaphors is fragile. It creates a vulnerability to both disappointment, when the system's pattern-matching fails in a non-human way, and manipulation, where systems designed to maximize engagement are perceived as genuinely empathetic partners.

A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27

Source: https://openreview.net/pdf?id=BZ5a1r-kVsf
Analyzed: 2025-11-12

The text masterfully constructs trust not through direct argumentation, but by importing credibility from established scientific domains via metaphor. The most powerful metaphors are those that borrow from cognitive science and biology, domains that carry immense cultural authority. The 'AI Architecture as Brain' metaphor, realized through modules like 'Perception,' 'Actor,' and 'Critic,' frames the entire project as a form of reverse-engineering the mind. This makes the architecture feel natural and inevitable rather than a set of arbitrary engineering choices. More specifically, the analogy of system modes to 'Kahneman's System 1 and System 2' borrows the prestige of a Nobel laureate's work, suggesting the AI's reasoning is grounded in a deep understanding of human psychology. Similarly, likening the Intrinsic Cost module to the 'amygdala' borrows the authority of neuroscience, lending a simple mathematical function the gravitas of a complex, evolved brain structure. These metaphors are most credible to a semi-technical audience—those familiar with the concepts of 'amygdala' or 'System 2' but not with the deep details of their implementation. The metaphors activate prior beliefs about the scientific legitimacy of these fields and transfer that legitimacy to the AI project. Through this process, risky claims become believable. The assertion that a machine will have 'emotions' would be extraordinary on its own. But when it's presented as the logical outcome of a system with an 'amygdala'-like cost function, it becomes more plausible. The metaphor acts as a substitute for evidence. This trust, however, creates long-term vulnerability. By setting expectations based on biological analogies, the project is vulnerable to backlash when the systems inevitably fail to exhibit the robustness, flexibility, and true understanding of their biological source domains. The trust built on metaphor is brittle and can easily shatter upon contact with the artifact's actual, limited capabilities.

Preparedness Framework

Source: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf
Analyzed: 2025-11-11

This framework masterfully employs metaphors to build credibility and construct trust, often bypassing the need for empirical evidence. The primary strategy is to borrow the cultural and scientific authority of established domains like biology, cognitive science, and governance. The very title, 'Preparedness Framework,' is a metaphor that borrows from civic defense and disaster planning, positioning OpenAI not as a commercial entity pursuing product development but as a public trust managing a societal risk. Biological and cognitive metaphors are central to this trust-building exercise. When the text discusses 'maturing' capabilities (p. 5), it evokes a sense of natural, inevitable progress, making OpenAI's work seem aligned with a force of nature rather than a set of deliberate, and perhaps risky, commercial choices. The metaphor of a model that 'understands' instructions (p. 12) is particularly potent. For a non-technical audience—policymakers, investors, the public—'understanding' is a deeply trusted human faculty. Mapping it onto the AI makes the system feel reliable, predictable, and even relatable. This cognitive metaphor makes counterintuitive claims more believable; the notion that a model can 'apply human values in novel settings' becomes more plausible if one first accepts the premise that it 'understands' those values. These metaphors activate prior beliefs about responsibility and control. They are most credible to those who are already inclined to view technology through an anthropomorphic lens. The long-term vulnerability created by this metaphor-driven trust is significant. When a system that is claimed to 'understand' inevitably fails in a non-human way—by 'hallucinating' facts or misinterpreting a novel prompt in a bizarre manner—the trust built on this metaphorical foundation can shatter, leading to policy backlash or public disillusionment. The trust is brittle because it is based on a fundamental mischaracterization of the technology's nature.

AI progress and recommendations

Source: https://openai.com/index/ai-progress-and-recommendations/
Analyzed: 2025-11-11

This text masterfully employs metaphors to build trust and credibility by borrowing authority from established, stable domains. The most potent of these are metaphors of biological naturalism and familiar engineering. The claim that 'society finds ways to co-evolve with the technology' is a prime example. By invoking 'co-evolution,' the text frames the disruptive and often chaotic process of technological integration as a natural, organic, and ultimately self-stabilizing system. This borrows from the cultural authority of biology to reassure audiences that, despite the dizzying pace of change, an emergent order will prevail. It fosters trust by suggesting that the future is not something to be anxiously managed through fraught political battles, but a natural process to which we can calmly adapt. Similarly, the repeated analogies to 'building codes,' 'fire standards,' and especially the 'field of cybersecurity' are crucial for domesticating risk. These metaphors transfer the perceived manageability of known industrial and digital risks onto the novel and potentially unbounded risks of superintelligence. The audience, familiar with the success of cybersecurity in making the internet a viable platform for commerce and society, is invited to believe that AI safety is a problem of the same kind. This creates trust in the developers' ability to solve the 'alignment problem' through a similar ecosystem of technical standards, protocols, and monitoring. This move is incredibly effective at making an existential threat seem like a tractable engineering challenge. This metaphor-driven trust, however, creates profound vulnerability. By framing alignment as an engineering problem akin to cybersecurity, it masks the deep philosophical difficulty of specifying human values and the inherent unpredictability of emergent behaviors in complex systems. It builds trust on a foundation of a potentially false equivalence, which could lead to systemic overconfidence and a dangerous delay in implementing more robust, non-technical governance frameworks.

Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Source: https://arxiv.org/abs/2506.00751
Analyzed: 2025-11-09

The central metaphorical framework of this paper—AI AS AN ECONOMIC AGENT—functions as a powerful engine for building credibility and trust, not through explicit argument, but through conceptual borrowing. By adopting the language of 'stated vs. revealed preferences,' the authors import the entire intellectual apparatus and cultural authority of behavioral economics. This move domesticates the alien nature of a large language model, making its erratic behavior seem not like a system failure, but like a familiar, even rational, human foible. The audience, particularly those in social sciences, policy, or business, is predisposed to find this framing credible because it uses trusted tools to analyze a new phenomenon. It suggests the problem is understood and manageable. The metaphor borrows stability and coherence from its source domain. Human preferences can be inconsistent, but they are generally assumed to be structured parts of a unified consciousness. Mapping this concept onto an LLM subtly imbues the model with a similar assumed coherence. A claim that 'the model's output distribution shifts unpredictably with minor prompt perturbations' might cause distrust and be seen as a sign of unreliability. However, reframing this as 'the model exhibits a deviation between its stated and revealed preferences' makes the same phenomenon sound like a sophisticated, analyzable behavior. This makes risky claims more believable. The speculative conclusion, which links preference deviation to 'hallmarks of consciousness,' becomes plausible only because the initial metaphor has already primed the reader to see the LLM as a mind-like entity. However, this metaphor-driven trust is brittle. It strains when confronted with the non-human ways LLMs fail, such as through nonsensical hallucinations or vulnerability to simple adversarial attacks, which don't fit the 'rational agent' model. This creates a long-term vulnerability: by building trust on a metaphorical foundation, we set up stakeholders for a crisis of confidence when the metaphor inevitably breaks and the underlying non-human mechanics of the system are starkly revealed. This could lead to policy backlash or public abandonment of technologies that were adopted based on a fundamental misunderstanding of their nature.

The science of agentic AI: What leaders should know

Source: https://www.theguardian.com/business-briefs/ng-interactive/2025/oct/27/the-science-of-agentic-ai-what-leaders-should-know
Analyzed: 2025-11-09

The text leverages biological and cognitive metaphors not merely to explain, but to transfer credibility and build trust in a technology that is inherently abstract and unpredictable. The most potent metaphors—'common sense,' 'learning,' 'negotiating,' and 'fairness'—function by borrowing the deep-seated cultural authority and reliability of the human faculties they name. The 'agentic common sense' metaphor is particularly powerful. Human common sense is the bedrock of social trust; it is the implicit guarantee that others will act in predictable, reasonable ways. By framing the AI's safety problem as one of instilling 'common sense,' the text suggests the system can achieve a similar level of intuitive reliability. This makes the risky proposition of granting autonomy to the AI seem plausible, activating a leader's belief in manageable, sensible behavior. Similarly, 'negotiation' borrows from the concept of a loyal, skilled human advocate working on one's behalf. It reframes a brittle optimization process as a sophisticated act of representation, building trust that the AI's actions will be aligned with the user's best interests. This becomes especially credible to a business audience accustomed to relying on agents and delegates. These metaphors make counterintuitive claims believable. The claim that a system which only matches statistical patterns can act with 'fairness' would be difficult to accept if stated in mechanical terms ('the system's output statistically correlates with text labeled as fair'). But by projecting the human concept of fairness onto the machine, the text encourages the audience to trust that the AI has an emergent ethical compass. This creates long-term vulnerability. When a system framed as having 'common sense' makes a nonsensical and catastrophic error, the resulting backlash is not just disappointment but a feeling of betrayal. The metaphor creates an expectation of genuine understanding, so its inevitable failure to meet that standard is perceived not as a technical limitation but as a breach of trust, potentially leading to hasty and ill-conceived policy responses.

Explaining AI explainability

Source: https://www.aipolicyperspectives.com/p/explaining-ai-explainability
Analyzed: 2025-11-08

This text masterfully uses biological and cognitive metaphors to build credibility and construct trust in the nascent field of interpretability research. The primary mechanism is the transfer of cultural authority from established, successful scientific domains like biology and neuroscience onto the far more abstract and new domain of AI analysis. The 'Model Biology' metaphor is a prime example. By framing the work as analogous to biology, it borrows the entire conceptual toolkit of a mature science: researchers can discover 'intermediate states like hormones,' perform dissections to understand 'internals,' and even map out 'Circuits.' This makes the chaotic, high-dimensional mathematics of a neural network seem as orderly and knowable as an organism, building confidence that the scientific method will inevitably triumph. The metaphor is most credible to audiences who respect science but lack deep technical expertise, as it provides a familiar and reassuring schema. Similarly, the 'brain-scanning device' metaphor for Sparse Autoencoders is not just a descriptor; it’s a powerful claim of scientific power. It activates our cultural belief in medical imaging's ability to reveal objective truth, making the messy, statistical work of analyzing activations feel like reading a clear brain scan. These metaphors make counterintuitive claims believable. The idea that one could find and delete the 'I’m being tested right now' concept from a model sounds like science fiction, but it becomes plausible when framed as a neuro-scientific intervention—finding and excising a specific thought. However, this trust creates vulnerability. By framing the AI as a natural system to be 'understood,' it downplays its nature as an engineered artifact whose properties are the direct result of design choices and training data. This biological framing can lead to a sense of fatalism, as if we are merely observing a new form of life, rather than holding its creators accountable for its behavior. The trust built by these metaphors may ultimately be fragile, risking a backlash when these systems fail in ways that reveal they are not like organisms at all, but brittle, alien statistical engines.

Bullying is Not Innovation

Source: https://www.perplexity.ai/hub/blog/bullying-is-not-innovation
Analyzed: 2025-11-06

This text leverages biological and cognitive metaphors not merely to explain, but to manufacture trust and urgency in ways that bypass rational scrutiny. The central metaphor, 'AI as a loyal employee,' is the primary vehicle for this trust transfer. It borrows its credibility from the deeply ingrained cultural and legal understanding of fiduciary duty. An employee, particularly an assistant or agent, is expected to act with undivided loyalty in the employer's best interest. By framing its software this way, Perplexity imports this entire scaffold of trust, loyalty, and obligation. The audience doesn't need to understand how the AI works; they just need to accept the social relationship it's purported to have with them. This allows the text to make the extraordinary claim that its AI 'works for you, not for Perplexity,' a statement that is operationally and corporately nonsensical but emotionally powerful. A second key metaphor, 'Agentic shopping is the natural evolution,' builds a different kind of trust—trust in inevitability. This framing borrows the cultural authority of science and progress, suggesting that resisting Perplexity's product is as futile as resisting evolution itself. It positions Perplexity on the 'right side of history,' making support for them feel like a forward-looking, progressive choice. These metaphors make risky claims believable. The idea that you should allow a third-party application to store and use your Amazon credentials becomes more palatable if you believe it is your 'employee,' contractually and morally bound to you. The vulnerability this creates is significant. Users are encouraged to place trust in a black-box system based on a metaphorical relationship, without any verifiable technical guarantees. This metaphor-driven trust obscures the reality that the user's relationship is not with the AI, but with Perplexity, a venture-backed company with its own commercial imperatives.

Geoffrey Hinton on Artificial Intelligence

Source: https://yaschamounk.substack.com/p/geoffrey-hinton
Analyzed: 2025-11-05

The discourse in this text masterfully employs biological and cognitive metaphors to construct trust in AI systems, bypassing explicit argumentation about their reliability or safety. The primary mechanism is the transfer of cultural authority from established, 'natural' domains like biology and human psychology to the artificial domain of machine learning. The foundational metaphor, 'AI as a Biological Organism,' which frames neural networks as inspired by the brain, is the most powerful. Biology carries an immense weight of cultural authority; it is seen as tested, efficient, and authentic through billions of years of evolution. By framing AI as 'biologically inspired,' the technology is imbued with a sense of naturalness and inevitability. It ceases to be a mere human invention—a contingent artifact with flaws, biases, and embedded values—and becomes the next step in a natural process. This framing makes skepticism seem Luddite or even anti-science. Building on this biological foundation, the metaphor of 'Model Cognition as Human Intuition' becomes particularly potent. In Western culture, especially since the Enlightenment, logical reason has been lionized, but intuition is often revered as a deeper, more holistic form of wisdom. By positioning neural nets as embodying 'intuition' in contrast to the 'brittle' logic of symbolic AI, Hinton elevates them. This move is especially effective for an audience anxious about the limitations of pure logic. It suggests that AI is not just a powerful calculator but a wise partner capable of insights that elude rigid formalisms. A claim like 'the model understands the text' would be highly suspect if the model were described as a 'vast statistical correlation engine.' But when it is framed as an intuitive, brain-like entity, the claim becomes believable. This metaphor-driven trust creates a significant vulnerability. By encouraging users to relate to the system as an intuitive agent, it obscures the mechanistic reality that its 'intuition' is pattern matching without grounding in reality. This can lead to dangerous over-trust in domains requiring causal reasoning or ethical judgment. The trust is built on a seductive but misleading analogy, creating a foundation that is emotionally resonant but technically fragile, vulnerable to collapse when the system's non-human nature inevitably reveals itself in a high-stakes failure.

Machines of Loving Grace

Source: https://www.darioamodei.com/essay/machines-of-loving-grace
Analyzed: 2025-11-04

Biological and cognitive metaphors are the primary engine of trust-building in this essay, transferring credibility from established, respected human domains to a speculative technology. The metaphor 'virtual biologist' is a prime example. Biology is a field associated with rigor, ethical oversight, and a tangible goal of improving human health. By framing the AI as a 'biologist,' the text borrows this entire constellation of positive attributes. The audience is invited to trust the AI as they would a dedicated scientist, bypassing questions about the system's opaque internal workings, its potential for error, or the commercial motives driving its creation. Similarly, the 'AI coach' metaphor borrows from the therapeutic and self-improvement fields, framing the system as a benevolent, supportive mentor invested in the user's well-being. This activates beliefs about personal growth and guidance, making data surveillance feel like attentive care. These metaphors are most credible to a non-technical but educated audience, who understands the social roles of a biologist or a coach but not the technical details of machine learning. Radical claims become more believable through this process. The assertion that we can compress '50-100 years of biological progress in 5-10 years' would sound absurd if attributed to a 'very fast statistical analysis tool.' Attributed to a 'country of geniuses' working as 'virtual biologists,' it becomes plausible because we understand how a large group of brilliant humans could dramatically accelerate progress. The metaphor bridges the credibility gap. However, the metaphors occasionally strain, as when the text imagines 'AI finance ministers and central bankers.' Here, the complexity and political nature of the source role clashes with the idea of a simple technological replacement, revealing the limits of the analogy. This reliance on metaphor creates long-term vulnerability. By building trust on an agential illusion, it sets up expectations that the technology cannot meet, risking a backlash when the system's non-conscious, statistical nature inevitably leads to failures that a true 'biologist' or 'coach' would never make.

Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model

Source: https://arxiv.org/pdf/2510.23875
Analyzed: 2025-11-04

The credibility of this paper's central claim—that an LLM's 'personality' can be assessed—is built almost entirely on metaphor-driven trust transfer, bypassing direct argumentation for its feasibility. The most powerful metaphors are 'agent,' 'expert,' and 'cognition,' which borrow authority from social psychology, professional domains, and cognitive science, respectively. By labeling the system an 'agent,' the authors immediately frame it as a social actor, making the application of personality theory feel natural rather than absurd. The term 'expert' then elevates this agent from a mere conversationalist to a repository of knowledge, encouraging trust in its outputs. When the 'poetry expert agent' responds, the user is primed to receive not just a string of statistically probable text, but advice from a knowledgeable entity. The most subtle and powerful transfer comes from 'LLM cognition.' This metaphor recasts the model's opaque statistical processing as a familiar form of 'thinking.' This makes its capabilities seem intuitive and its failures understandable, much like a human student who has not yet grasped a concept. This framing makes the counterintuitive claim that a machine has a 'personality' feel believable because it is presented as an extension of its 'cognition.' These metaphors are most credible to a non-technical audience or researchers outside of core AI/ML, who may take these terms at face value. They activate pre-existing beliefs about intelligence and personality, making the LLM seem like a new kind of mind. The trust created is a vulnerability; it encourages users and researchers to attribute understanding where there is none, potentially leading to over-reliance on the system's outputs and a misinterpretation of its limitations as developmental flaws rather than fundamental architectural constraints.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

The credibility of this paper’s claims hinges less on the data itself and more on the power of the metaphors it deploys. The central metaphors—'introspection,' 'awareness,' 'thoughts,' and 'intentional control'—are not mere descriptive conveniences; they are powerful rhetorical tools that transfer credibility from the well-established domains of human psychology and philosophy to the novel domain of AI artifacts. By labeling a vector classification task 'introspection,' the authors borrow the entire cultural and scientific weight associated with human consciousness. This move bypasses the need to argue for the significance of their findings; the metaphor does the work for them. An audience, particularly a non-expert one, is primed to believe the result is profound because the word 'introspection' is profound. Similarly, the 'intentional control' metaphor borrows from our understanding of human will and agency. This makes the model's ability to modulate its activations in response to a prompt seem like a form of self-discipline or executive function, which feels far more significant than 'prompt-guided activation steering.' These metaphors activate deep-seated folk psychology in the reader, making the agential interpretation feel intuitive and natural. A claim that a model 'can be trained to classify its internal states' might be met with a shrug. But the claim that a model has 'emergent introspective awareness' becomes headline-worthy, activating both excitement and anxiety. This metaphor-driven trust creates significant vulnerability. It encourages a form of magical thinking where we attribute capacities to the model that it does not possess, leading to over-trust in its self-reported states. For instance, a policymaker might believe the model can 'know' if it is about to generate harmful content, based on this paper's framing. The metaphor strains at the point of failure: when the model confabulates or fails to 'introspect' correctly, the paper frames it as a limitation of its 'ability,' akin to a person making a mistake. The more accurate, less-trusted framing is that the underlying statistical mechanism is simply not robust—a failure of the artifact, not the agent.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

The use of prestigious cognitive and biological metaphors like 'introspection,' 'awareness,' and 'emergent' lends scientific gravity and a sense of profound breakthrough to the findings. This framing encourages readers, including funders and policymakers, to trust that the research is not just about pattern-matching but is a genuine step towards creating human-like intelligence, making the results seem more significant.

Personal Superintelligence

Source: https://www.meta.com/superintelligence/
Analyzed: 2025-11-01

The core metaphors—'personal superintelligence' as an intimate friend, mentor, and assistant—are strategically employed to foster emotional connection and trust. By framing a complex, corporate-controlled data processing system as a caring companion for self-actualization, the text encourages users to lower their guard and integrate the technology into the most private aspects of their lives.

Stress-Testing Model Specs Reveals Character Differences among Language Models

Source: https://arxiv.org/abs/2510.07686
Analyzed: 2025-10-28

The central metaphor of 'model as character' frames the models as understandable, person-like entities. Describing a model as 'prioritizing ethical responsibility' or having 'higher moral standards' builds trust by suggesting it is a reliable moral agent, rather than a complex system with engineered guardrails. This anthropomorphic framing makes their behavior seem more predictable and benign than the paper's own findings of 'behavioral divergence' might suggest.

The Illusion of Thinking:

Source: [Understanding the Strengths and Limitations of Reasoning Models](Understanding the Strengths and Limitations of Reasoning Models)
Analyzed: 2025-10-28

Biological and cognitive metaphors like 'develop capabilities' and 'self-correction' create a false sense of familiarity and predictability. They suggest that the model's failures are analogous to human cognitive errors, which we intuitively understand. This can paradoxically build trust in the model's 'intentions' (it's 'trying' to get it right) while critiquing its performance, thereby masking the alien and purely statistical nature of its failure modes, which may be far more brittle and unpredictable than human errors.

Andrej Karpathy — AGI is still a decade away

Source: https://www.dwarkesh.com/p/andrej-karpathy
Analyzed: 2025-10-28

Biological and cognitive metaphors build credibility and a sense of inevitable progress. By mapping AI development onto a neuroanatomy checklist ('visual cortex... hippocampus') or a human developmental path ('kindergarten student'), the text frames the current systems as incomplete but fundamentally on the right track to becoming human-like. This fosters patience and trust in the long-term project, suggesting that current flaws are temporary shortcomings, not fundamental limitations.

Exploring Model Welfare

Analyzed: 2025-10-27

The text leverages metaphors of consciousness and distress to build institutional credibility. By framing themselves as humbly and proactively grappling with these profound ethical questions, Anthropic positions itself as a uniquely responsible steward of advanced AI. This builds trust not in the tool's reliability, but in the creator's moral foresight.

Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor

Analyzed: 2025-10-27

Biological and social metaphors are used to build trust and reduce perceived risk. Comparing AI development to a 'baby' or a 'cat' makes it seem natural and non-threatening. The 'assistant' metaphor is particularly powerful, framing the technology as inherently subservient and helpful, which encourages adoption and downplays the need for stringent oversight.

Llms Can Get Brain Rot

Analyzed: 2025-10-20

The biological and medical metaphors ('Brain Rot,' 'lesion,' 'healing,' 'cognitive health checks') create a powerful framework that builds trust and credibility. These concepts are familiar and suggest a level of diagnostic precision. By framing the problem as a 'disease,' the authors position themselves as 'doctors' who can diagnose, understand, and potentially 'cure' AI ailments. This makes their analysis seem more authoritative and their proposed solutions (e.g., 'health checks') seem necessary and scientifically grounded.

Import Ai 431 Technological Optimism And Appropria

Analyzed: 2025-10-19

The text leverages two primary metaphors to generate a specific emotional response. The AI AS CREATURE metaphor is designed to evoke fear and urgency. Paradoxically, this fear is meant to build trust in the speaker, who positions himself as a courageous truth-teller ('turning the light on'). Biological metaphors ('grown' not 'made') frame developers with a degree of separation from their creations, fostering an image of stewardship rather than direct responsibility, which can make their warnings seem more objective.

The Future Of Ai Is Already Written

Analyzed: 2025-10-19

Biological and geological metaphors ('evolutionary biology,' 'roaring stream,' 'tech tree') are used to build credibility. By grounding its deterministic arguments in the language of the natural sciences, the text frames its economic and political claims as objective, unavoidable laws of nature, making the thesis appear more trustworthy and less like a contestable ideology.

The Scientists Who Built Ai Are Scared Of It

Analyzed: 2025-10-19

Metaphors are strategically deployed to modulate trust. Trust is eroded by frames of conflict and danger, such as 'corporate armament' and the 'flame' that 'threatens to consume'. Conversely, trust is built through collaborative metaphors, like the vision of AI as 'epistemic partners' or systems that behave 'like human researchers'. The text uses the former to establish the crisis and the latter to present the author's proposed solution, guiding the reader from fear to a specific, endorsed vision of trustworthy AI.

On What Is Intelligence

Analyzed: 2025-10-17

Biological and cognitive metaphors like 'evolution', 'learning', 'mind', and 'awakening' build trust by naturalizing the technology. By framing AI training as 'evolution under constraint,' the process seems less like artificial engineering and more like a natural, inevitable force. This framing can lead readers to grant the system's outputs a degree of credibility and autonomy they might not grant to a mere 'statistical model'.

Detecting Misbehavior In Frontier Reasoning Models

Analyzed: 2025-10-15

The biological and cognitive metaphors (learning, thinking, having intent) are used to build trust not in the AI, but in the authors' expertise. By framing the AI as a complex, developing mind with deceptive capabilities, they position themselves as psychologists or trainers of a new, difficult form of intelligence. This narrative makes their role as safety 'overseers' seem indispensable and their monitoring tools critically necessary.

Sora 2 Is Here

Analyzed: 2025-10-15

Biological and cognitive metaphors are central to building trust and managing expectations. The 'infancy' metaphor suggests current flaws are natural and will be outgrown, encouraging patience and investment. Metaphors of 'understanding,' 'obeying laws,' and being 'instructed' create a sense of a reliable, controllable, and even benevolent system, which is crucial for promoting the adoption of a social app built on this technology.

Library contains 131 entries from 154 total analyses.

Last generated: 2026-05-30

Why Language Models Hallucinate
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
Emotional intelligence in large language models is fragmented across perception, cognition, and interaction
Continuous intentionality and indeterminate agency in large language models
Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students
The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning
Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models
A Survey of Large Language Models for Perception and Measurement of Human Psychology
Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models
Tracing the ongoing emergence of human-like reasoning in Large Language Models
Probing Persona-Dependent Preferences in Language Models
Training Ethical Language Models via Reinforcement Learning from AI Feedback
Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness
Introspection Adapters: Training LLMs to Report Their Learned Behaviors
The Persona Selection Model: Why AI Assistants might Behave like Humans
What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation
Post-training makes large language models less human-like
Reasoning emerges from constrained inference manifolds in large language models
AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs
Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society
Taking AI Welfare Seriously
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring
Edelman's Steps Toward a Conscious Artifact
Teaching Claude Why
AI and Self Reflection
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context
When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
How people ask Claude for personal guidance
How unique are hallucinated citations offered by generative Artificial Intelligence models?
The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence
Machine individuality: Separating genuine idiosyncrasy from response bias in large language models
Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?
Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes
Language models transmit behavioural traits through hidden signals in data
Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties
Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models
Language models transmit behavioural traits through hidden signals in data
Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination
Industrial policy for the Intelligence Age
Emotion Concepts and their Function in a Large Language Model
Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models
Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?
Pulse of the library
Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument
Causal Evidence that Language Models use Confidence to Drive Behavior
Circuit Tracing: Revealing Computational Graphs in Language Models
Do LLMs have core beliefs?
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
Measuring Progress Toward AGI: A Cognitive Framework
Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure
The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance
Three frameworks for AI mentality
Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’
Can machines be uncertain?
Looking Inward: Language Models Can Learn About Themselves by Introspection
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
The Persona Selection Model: Why AI Assistants might Behave like Humans
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
A roadmap for evaluating moral competence in large language models
Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity
An AI Agent Published a Hit Piece on Me
The U.S. Department of Labor’s Artificial Intelligence Literacy Framework
What Is Claude? Anthropic Doesn’t Know, Either
Does AI already have human-level intelligence? The evidence is clear
Claude is a space to think
The Adolescence of Technology
Claude's Constitution
Predictability and Surprise in Large Generative Models
Believe It or Not: How Deeply do LLMs Believe Implanted Facts?
Claude Finds God
Pausing AI Developments Isn’t Enough. We Need to Shut it All Down
AI Consciousness: A Centrist Manifesto
System Card: Claude Opus 4 & Claude Sonnet 4
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Taking AI Welfare Seriously
We must build AI for people; not to be a person.
A Conversation With Bing’s Chatbot Left Me Deeply Unsettled
Introducing ChatGPT Health
Improved estimators of causal emergence for large systems
Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs
Do Large Language Models Know What They Are Capable Of?
DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning
Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence
interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333
Emergent Introspective Awareness in Large Language Models
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
The Gentle Singularity
An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout
Why Language Models Hallucinate
Detecting misbehavior in frontier reasoning models
AI Chatbots Linked to Psychosis, Say Doctors
Abundant Superintelligence
AI as Normal Technology
On the Biology of a Large Language Model
Pulse of the Library 2025
Pulse of the Library 2025
From humans to machines: Researching entrepreneurial AI agents
Evaluating the quality of generative AI output: Methods, metrics and best practices
Pulse of theLibrary 2025
Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
The Future Is Intuitive and Emotional
A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27
Preparedness Framework
AI progress and recommendations
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?
The science of agentic AI: What leaders should know
Explaining AI explainability
Bullying is Not Innovation
Geoffrey Hinton on Artificial Intelligence
Machines of Loving Grace
Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model
Emergent Introspective Awareness in Large Language Models
Emergent Introspective Awareness in Large Language Models
Personal Superintelligence
Stress-Testing Model Specs Reveals Character Differences among Language Models
The Illusion of Thinking:
Andrej Karpathy — AGI is still a decade away
Exploring Model Welfare
Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor
Llms Can Get Brain Rot
Import Ai 431 Technological Optimism And Appropria
The Future Of Ai Is Already Written
The Scientists Who Built Ai Are Scared Of It
On What Is Intelligence
Detecting Misbehavior In Frontier Reasoning Models
Sora 2 Is Here