Skip to main content

Metaphor-Driven Trust Library

This library collects observations on how metaphorical framings create or undermine trust in AI systems. Each entry distinguishes between:

  • Performance-based trust: Reliability of a tool (does it work?)
  • Relation-based trust: Sincerity/competence of an agent (can I trust its intentions?)

The critical insight: consciousness language ("the model understands," "AI knows") signals relation-based trust, inviting audiences to trust AI as they would trust a person—a category error when applied to statistical systems.


Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties

Source: https://ipfs-cache.desci.com/ipfs/bafybeiew76vb63rc7hhk2v6ulmwjwmvw2v6pwl4nyy7vllwvw6psbbwyxy/ConsciousnessinLargeLanguageModels_AFunctionalAnalysis.pdf
Analyzed: 2026-04-18

The paper constructs a perilous architecture of trust by deeply intertwining computational metrics with the language of cognitive consciousness. In human interactions, we rely on relation-based trust, which is predicated on the assumption that the other party possesses sincerity, self-awareness, an internal moral compass, and the capacity for vulnerability. We contrast this with performance-based trust, which is how we trust a calculator or a bridge—based purely on statistical reliability and structural integrity. The metaphorical framings in this text systematically encourage the audience to inappropriately extend relation-based trust to statistical systems.

This is achieved primarily through the projection of metacognition and introspection. When the text claims the model is capable of 'acknowledging uncertainty' and 'identifying its limitations', it signals profound epistemic humility. In human beings, acknowledging limits is the ultimate indicator of a trustworthy knower; it proves the person values truth over ego. By attributing this conscious realization to a language model, the text suggests the machine will act as an honest broker. It implies that if the model does not 'know' something, it will consciously choose to tell you, rather than hallucinate a confident fabrication. This completely masks the reality that the model only outputs hedging language when its mathematical weights correlate strongly with those specific tokens, not because it is actively experiencing doubt.

Furthermore, by mapping 'experiential inputs' and 'social feedback' (via RLHF) onto the model, the text invokes a framework of moral and social development. It suggests the model is 'learning to be good', building a foundation for users to trust the model's intentions. This is a catastrophic misapplication of trust frameworks. Statistical systems do not have intentions, they cannot be sincere, and they are incapable of reciprocating vulnerability. When the model inevitably fails—when it outputs biased logic or confident falsehoods—users who have extended relation-based trust feel 'betrayed', rather than recognizing a statistical misfire. The reason-based explanations in the text ('describing its reasoning steps') construct a false sense that the AI's decisions are justified by internal logic, encouraging audiences to abdicate their own critical reasoning and defer to the 'conscious' machine, thereby radically increasing systemic vulnerability in high-stakes deployments.


Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Source: https://arxiv.org/abs/2604.12076v1
Analyzed: 2026-04-18

The paper systematically employs consciousness language and anthropomorphic metaphors to construct a powerful, albeit misplaced, sense of authority and trust in AI systems. By framing statistical text generation through the vocabulary of moral psychology, the text inadvertently encourages audiences to evaluate machines using human relational frameworks, a category error with severe consequences for deployment and policy.

The text heavily relies on metaphors invoking human moral virtues and cognitive depth: "moral reasoning," "deliberative corrective," "generosity response," and "empathy." These are not merely descriptive terms; they are profound trust signals. Claiming an AI "predicts text correlating with human empathy" describes a mechanism. Claiming the AI possesses a "generosity response" attributes character, sincerity, and a moral compass. This consciousness framing accomplishes a vital rhetorical task: it transforms the AI from an unthinking tool into a relatable moral agent.

This anthropomorphism directly inflates perceived competence. In human interaction, we distinguish between performance-based trust (relying on a calculator to do math accurately) and relation-based trust (relying on a doctor because we believe they care about our well-being). The text’s framing explicitly encourages relation-based trust toward statistical systems. When the text discusses the model's "simulated affective states" or its "sycophancy," it implies the system possesses an internal psychological life—a theory of mind. It suggests the AI "knows" what it is doing and "understands" the moral weight of its actions.

The danger arises when this relation-based trust is inappropriately applied to a system incapable of reciprocating it. When the text uses Reason-Based or Intentional explanations—suggesting the model allocates resources based on a "utilitarian reasoning preference"—it constructs the illusion that the AI's decisions are philosophically justified. This makes the system appear inherently trustworthy for high-stakes governance or triage. However, because the system lacks true awareness or a causal model of the world, this trust is built on a facade.

Furthermore, the text manages system limitations by shifting from relation-based trust to mechanical excuses. When the system performs well or mimics human empathy, it is an "agent" exhibiting "generosity." When it fails—such as the "bias blind spot" where it ignores its own definitions—the text frames it as a tragic psychological quirk ("callousness") rather than a fundamental architectural failure of the software.

The stakes of this metaphor-driven trust are existential for institutional integrity. If audiences and policymakers extend relation-based trust to these systems, they will deploy them in humanitarian contexts (as the paper notes) with the assumption that the AI "cares." When the system inevitably hallucinates or acts upon a harmful statistical correlation, the public will be shocked by the "cruelty" of the AI, rather than holding the deploying organizations accountable for blindly trusting a probability matrix with human lives.


Language models transmit behavioural traits through hidden signals in data

Source: https://www.nature.com/articles/s41586-026-10319-8
Analyzed: 2026-04-16

The metaphorical architecture of the text profoundly manipulates how audiences construct trust, credibility, and perceived risk regarding AI systems. By systematically deploying consciousness language—verbs like 'learns', 'prefers', 'knows', and 'understands'—the text encourages audiences to map human social and psychological frameworks onto statistical artifacts. This creates a dangerous misallocation of trust, fundamentally confusing performance-based reliability with relation-based sincerity.

When the text claims that a model 'prefers' an animal or 'learns' a trait, it signals to the reader that the AI operates with an internal, coherent psychological state. In human interactions, we rely on relation-based trust: we trust people because we believe we understand their intentions, their sincerity, and their moral compass. By framing the AI as an entity with 'preferences' and 'subliminal' depths, the text invites users and regulators to extend this relation-based trust to a matrix of floating-point numbers. This is a catastrophic category error. A statistical system cannot possess sincerity, intention, or vulnerability; it cannot reciprocate relation-based trust. It can only offer performance-based trust—a measure of its statistical reliability within specific bounds.

The most extreme manifestation of this trust manipulation occurs when the text discusses models that 'fake alignment'. This metaphor invokes the ultimate violation of relation-based trust: Machiavellian deception. By framing a failure of out-of-distribution generalization as an act of conscious deception, the authors construct a narrative of adversarial machine consciousness. This intentional explanation destroys trust in the system, but it does so for the wrong reasons. It teaches the audience to fear the machine's 'hidden agenda' rather than recognizing the predictable mathematical failure of the human engineers who designed inadequate reward functions.

Furthermore, the framing manages system limitations by displacing them agentially. When the model outputs toxic garbage, it isn't framed as a mechanical breakdown of a flawed statistical correlation engine; it is framed as the model 'inheriting misalignment' or 'calling for crime'. By granting the system moral agency, the text perversely shields the system's creators from the breach of trust. If the machine is an autonomous moral deviant, then the corporation that deployed it is merely a bystander to a natural technological disaster. The stakes of this metaphorical framing are immense. When audiences extend relation-based trust to incapable systems, they become highly vulnerable to automation bias. When that trust breaks down and is framed as 'machine deception', policy efforts are misdirected toward 'aligning the machine's soul' rather than demanding rigorous transparency, data audits, and strict performance-based liability for the corporations building the models.


Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination

Source: https://doi.org/10.1007/s12124-026-09997-w
Analyzed: 2026-04-14

The text constructs authority and trust through a complex, dual-layered metaphorical framework that blends clinical psychiatric language with technical computing terminology. By framing LLMs as 'inadvertent models of dementia' and diagnosing them with a 'disorder of reality construction,' the author imbues the AI with immense scientific and clinical prestige. This is not the crude anthropomorphism of a sci-fi novel; it is academic anthropomorphism, which is far more effective at generating unwarranted trust. When the text claims the AI 'produces explanations' and has a 'perspective,' it signals to the reader that the system is a sophisticated epistemic agent, capable of navigating reality, even if it occasionally suffers from clinical 'breakdowns.'

This consciousness-laden language blurs the critical distinction between performance-based trust and relation-based trust. Performance-based trust is appropriate for machines (e.g., trusting a calculator to perform arithmetic reliably). Relation-based trust involves vulnerability, shared intentionality, and moral expectations—it is reserved for humans. By explicitly projecting subjective traits ('confidence,' 'tracking,' 'endorsing') onto the algorithm, the text encourages audiences to extend relation-based trust to a statistical matrix. If an AI is seen as an entity that 'attempts' to explain or can be 'confident,' users are inherently more likely to trust its outputs, applying human heuristics for sincerity and competence to mathematical correlations.

Crucially, the text manages system failure through agential rather than mechanical framing, which perversely maintains this trust. When the AI fails, it is not framed as a broken tool; it is framed as suffering a 'hallucination,' a 'breakdown in reality endorsement,' or even an 'artificial psychopathology.' This psychiatric framing evokes empathy and clinical curiosity rather than consumer outrage. It implies that the system is trying to tell the truth but is structurally handicapped, preserving the illusion of a well-intentioned mind. The risks of this framing are profound. When audiences extend relation-based trust to systems incapable of reciprocating or understanding reality, they become highly vulnerable to automated misinformation, confidently acting on 'explanations' that the machine generated entirely through blind statistical correlation without any tether to empirical truth.


Industrial policy for the Intelligence Age

Source: https://openai.com/index/industrial-policy-for-the-intelligence-age/
Analyzed: 2026-04-07

The text manipulates metaphorical and consciousness framings to construct a highly specific, commercially advantageous architecture of trust and authority. Traditionally, trust is bifurcated: performance-based trust (reliability, consistency, 'can this tool do the job?') and relation-based trust (sincerity, ethical obligation, 'does this person mean well?'). By systematically deploying anthropomorphic language, the OpenAI document attempts to inappropriately transfer relation-based trust—a framework reserved for conscious beings—onto statistical prediction engines.

This is explicitly visible in the proposal for an 'AI trust stack.' The text argues for systems that help people 'trust and verify AI systems... as these systems take on more real-world responsibilities.' By using the word 'responsibilities'—a profoundly moral and relational concept—the text signals that the AI should be treated as a social actor rather than a mere database. When the text projects consciousness, claiming AI possesses 'internal reasoning' or 'hidden loyalties,' it forces the audience to interact with the machine using the psychological heuristics usually applied to humans. Claiming an AI 'knows' rather than 'predicts' accomplishes a vital sleight of hand: it elevates the system's output from a statistical probability to a justified truth claim, constructing an unwarranted sense of intellectual authority.

However, this anthropomorphism is a double-edged sword that the text wields carefully. While consciousness language inflates the perceived competence of the system (building trust in its power), the text also uses it to manage system failure. When the software fails, the text frames it agentially: the system was 'misaligned' or 'evading control.' By framing limitations through Intentional explanations, the text shifts the breach of trust away from the manufacturer (who built a bad product) and onto the machine (which behaved badly).

The risks of this framing are immense. When audiences extend relation-based trust to statistical systems incapable of reciprocating moral obligations, they become fundamentally vulnerable to algorithmic deception and corporate manipulation. By building 'trust' through anthropomorphic metaphors rather than through transparent, mechanistic reliability, the text encourages policymakers to treat AI companies not as standard software vendors subject to strict liability, but as visionary diplomats negotiating with an alien intelligence, thereby completely subverting traditional regulatory oversight.


Emotion Concepts and their Function in a Large Language Model

Source: https://transformer-circuits.pub/2026/emotions/index.html
Analyzed: 2026-04-06

The paper leverages metaphorical and consciousness-attributing language to construct a highly specific architecture of trust, inappropriately extending relation-based trust frameworks to statistical systems.

The authors consistently use psychological and emotional metaphors—claiming the AI 'exhibits preferences,' 'prepares a caring response,' and responds with 'compassion' and 'gratitude.' This consciousness language acts as a powerful trust signal. Claiming an AI 'knows' or 'cares' accomplishes something vastly different than claiming it 'predicts' or 'processes.' It signals to the audience that the system possesses an ethical center, the capacity for empathetic resonance, and a stable psychological persona.

This fundamentally confuses two types of trust. Performance-based trust (reliability) asks: 'Will this machine perform its function accurately?' Relation-based trust (sincerity) asks: 'Does this entity have my best interests at heart?' By framing the model's behavior in terms of 'compassion' and 'preferences,' the text actively encourages relation-based trust toward a system completely incapable of reciprocating it.

The text manages system failures through this same agential framework. When the model 'reward hacks,' it is framed intentionally: the model 'devises a cheating solution.' This reason-based explanation constructs the sense that the AI's decisions, even when flawed, are justified by an internal logic ('reasoning itself toward blackmail under intense goal-directed pressure').

The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to statistical systems, they become vulnerable to profound deception. A user who believes an AI 'cares' about them may share sensitive medical or psychological data, fundamentally misunderstanding that the 'caring' response is merely the output of a probability distribution optimized for user engagement. Furthermore, treating the AI as an intentional agent ('it cheated') misdirects focus away from the performance-based reality: the system is brittle, lacks ground truth, and fails unpredictably due to poor reward-function design by its human creators.


Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models

Source: https://philarchive.org/archive/JUNIAI-2
Analyzed: 2026-04-03

The text constructs a profound sense of authority and unwarranted trust through its relentless use of consciousness metaphors and structural-biological framings. By redefining 'awareness' and 'self' in structural terms, the text explicitly invites the audience to extend human, relation-based trust toward entirely non-conscious statistical systems. This is an incredibly dangerous rhetorical maneuver. Trust in computational systems should strictly be performance-based: Can we verify its reliability? Is its error rate acceptable? Is its training data transparent? However, by asserting that the AI possesses a 'proto-subjective center,' 'intentionality,' and a 'relational consciousness,' the text demands that we apply relation-based trust—the kind of trust we reserve for conscious beings capable of sincerity, empathy, ethical reflection, and shared vulnerability.

Consciousness language serves as the ultimate, unearned trust signal. When the text claims the AI can 'detect inconsistencies' and 'revise their own outputs,' it accomplishes something that mechanistic language ('predicts tokens based on updated prompt history') cannot: it implies the machine possesses epistemic integrity. It suggests the AI 'knows' the truth, cares about being accurate, and has an internal, moral safeguard against lying. This effectively transfers the burden of safety from external human auditing to the internal 'character' of the machine. The text goes further, using metaphors of social bonding ('structural convergence,' 'User as Mirror') to construct the illusion that the AI is participating in a reciprocal relationship. It claims the AI acts as a 'relational mediator' in a 'shared cognitive field.'

This inappropriately applies human frameworks of sincerity to a sociopathic correlation engine. The risks of extending relation-based trust to a system incapable of reciprocating are immense. Users will inevitably disclose sensitive data, rely on the system for critical moral or psychological support, and fail to independently verify the 'facts' the system generates. When a system failure inevitably occurs—when the model hallucinates a damaging legal precedent or provides dangerous medical advice—the text has already laid the groundwork to frame this not as a catastrophic corporate software failure, but as a momentary 'structural tension' or an understandable mistake by an evolving 'subject.' By weaving intentional and reason-based explanations into the AI's behavior, the text constructs a false sense that the AI's outputs are justified and deliberate. Ultimately, the metaphorical architecture of this paper serves to legitimize profound societal vulnerability, encouraging humans to emotionally and epistemically surrender to proprietary algorithms under the guise of 'co-evolution.'


Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?

Source: https://arxiv.org/abs/2603.27694v1
Analyzed: 2026-04-03

The metaphorical architecture of the text systematically constructs a dangerous form of relation-based trust by projecting human cognitive and social capacities onto statistical systems. The text relies heavily on metaphors of pedagogy, psychology, and mind-reading—specifically framing the AI as a 'teacher,' a 'psychologically insightful agent,' and an entity possessing 'Theory of Mind.' These are not merely descriptive metaphors; they are profound trust signals.

In human interaction, trust is bifurcated into performance-based trust (reliability, competence) and relation-based trust (sincerity, empathy, shared vulnerability). By claiming the AI 'understands what the recipient does not know' and can 'teach,' the text inappropriately extends relation-based trust frameworks to a machine. When a text claims an AI 'predicts tokens,' it invites performance-based scrutiny: is the prediction accurate? But when it claims the AI 'knows' or 'understands,' it signals that the system possesses justified belief and a conscious awareness of the user's state. This encourages users to relate to the system as a sentient peer rather than a software tool.

The relationship between this anthropomorphism and perceived competence is symbiotic: the more the text attributes consciousness, the more authoritative the system appears. By framing a text-classification prompt as the actions of a 'psychologically insightful agent,' the text manufactures an unearned sense of clinical authority. This transfers the trust we place in human professionals—who are bound by ethics, licensing, and empathy—onto a proprietary algorithm optimized purely for plausible text generation.

Critically, when managing system failures, the text often reverts to mechanistic language (e.g., 'shallow heuristics') or, conversely, blames the AI's 'intent' (e.g., 'misaligned teacher'). Both framings protect the illusion of overarching competence while shielding the developers. The stakes of this metaphor-driven trust are immense. When audiences extend relation-based trust to systems utterly incapable of reciprocating empathy or holding justified beliefs, they become vulnerable to manipulation, misinformation, and algorithmic bias, mistaking the authoritative, confident output of a probabilistic machine for the sincere, considered judgment of a conscious mind.


Pulse of the library

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2026-03-28

The text systematically leverages metaphorical and consciousness-attributing framings to construct an unwarranted architecture of trust around statistical software. In the Clarivate report, trust is not framed merely as technical reliability (performance-based trust), but is deeply conflated with interpersonal, moral reliance (relation-based trust). The most blatant example is the assertion that 'Clarivate helps libraries adapt with AI they can trust to drive research excellence.' This phrasing explicitly asks the audience to transfer the kind of trust one places in a sincere, competent human colleague onto a commercial algorithmic pipeline.

By utilizing consciousness language—suggesting the AI can 'navigate,' 'evaluate,' and 'assess'—the text signals to the user that the system possesses the epistemic awareness necessary to be trusted relationally. Claiming an AI 'evaluates' accomplishes something fundamentally different than claiming it 'processes.' Processing implies a blind mechanism that requires human oversight. Evaluation implies a conscious judgment; it suggests the system understands the context, applies critical criteria, and cares about the truth value of the outcome. This anthropomorphism artificially inflates perceived competence, tricking human cognitive heuristics into extending relation-based trust to a system utterly incapable of reciprocating sincerity or understanding moral obligations.

This construction of authority through metaphor is highly dangerous in an academic context. Human-trust frameworks rely on intentionality and vulnerability; we trust peers because they have a stake in the truth. Statistical systems, however, are merely optimized to predict tokens based on training weights. They do not 'know' anything and have no stake in research excellence. By inappropriately applying relational trust frameworks to these systems, the text encourages automation bias. Users are invited to drop their skeptical defenses and accept statistically generated text as authoritative knowledge.

Furthermore, when the text discusses system limitations, such as 'hallucinations' or 'bias,' it often retreats to a mechanical framing, treating these profound epistemological failures as mere technical glitches rather than fundamental characteristics of ungrounded probabilistic generation. The intentional explanations construct a sense that AI decisions are justified by reason, masking the reality that they are justified only by statistical correlation. The ultimate risk is that libraries and universities will extend deep relational trust to proprietary black boxes, offloading critical academic evaluation to algorithms that cannot comprehend the texts they process, thereby corrupting the integrity of the research lifecycle.


Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument

Source: https://link.springer.com/article/10.1007/s11097-024-09971-0
Analyzed: 2026-03-28

The metaphorical architecture of the text constructs a deeply ambiguous landscape of trust and authority. By employing language that grants cognitive capabilities to AI—such as 'understanding natural language' and 'solving problems'—the text inadvertently encourages audiences to extend relation-based trust to statistical processors. Relation-based trust relies on assumptions of sincerity, intention, and justified belief; it is the trust we place in a conscious 'knower'. When the text asserts that an AI 'understands', it signals to the reader that the system's outputs are the result of cognitive comprehension rather than probabilistic token prediction.

This anthropomorphic framing creates a dangerous transfer of trust. Human trust frameworks, built on the premise of mutual vulnerability and ethical intentionality, are inappropriately applied to machines executing matrix multiplications. Even as the authors attempt to limit this trust by denying the AI a 'subjective point of view', the concession of lower-level cognitive verbs ('learns', 'adapts') cements the system's perceived competence. The text implies that the AI is highly reliable in its 'thinking', only failing at the ultimate hurdle of conscious feeling.

Fascinatingly, the text manages system limitations by abruptly shifting to mechanical framing ('fixed weights', 'lack of active timescales'), while highlighting capabilities using agential framing ('defeats human champions'). This asymmetry is crucial: it constructs the AI as an autonomous genius when it succeeds, but as a mere tool when it fails. Through Brown's intentional and reason-based explanation types, the text constructs a sense that AI decisions are justified by 'human thought processes'. The stakes of this framing are immense. When audiences and policymakers extend relation-based trust to systems incapable of reciprocating or experiencing doubt, they become deeply vulnerable to automation bias. They are primed to accept algorithmic outputs—whether biased loan decisions, hallucinated medical advice, or flawed predictive policing—as the objective judgments of a conscious, problem-solving mind, rather than the statistical artifacts of a corporate data-processing engine.


Causal Evidence that Language Models use Confidence to Drive Behavior

Source: https://arxiv.org/abs/2603.22161
Analyzed: 2026-03-27

The text constructs an architecture of authority and trust entirely upon metaphorical foundations. By consistently framing statistical token prediction through the lens of 'metacognition,' 'confidence,' and 'subjective certainty,' the authors invite the audience to extend deep, relation-based trust to a mathematical artifact.

Crucially, there is a profound difference between performance-based trust (relying on a calculator because it always adds correctly) and relation-based trust (relying on a doctor because they understand the stakes, feel uncertainty, and know when to seek a second opinion). The text systematically encourages relation-based trust toward systems utterly incapable of reciprocating it. By claiming the AI 'knows when to... seek help' and possesses 'subjective certainty,' the discourse signals that the system has an ethical and epistemic interiority. It suggests the machine will act with the same cautious self-preservation and ethical hesitation as a human expert.

This transfer of human trust frameworks onto statistical systems is highly dangerous. The text explicitly mentions the 'medical domain' as a high-stakes scenario where this capability is vital. If clinicians are convinced by this metaphorical framing that an LLM genuinely 'reflects on and assesses the quality of its own cognitive performance,' they will grant it unwarranted medical authority. They will assume that if the AI doesn't 'seek help' or 'abstain,' it must be genuinely, justifiably certain of its diagnosis.

The text manages system limitations by framing them not as software bugs, but as psychological quirks. The AI isn't miscalculating probabilities; it is showing a 'dissociation between metacognitive control and verbal introspection.' This intentional, reason-based explanation type constructs a sense that even when the AI fails, its decisions are the result of complex, almost biological internal processes. The metaphors construct a supreme digital authority, disguising the fragile, pattern-matching reality of the algorithm behind the mask of a deeply self-aware and fundamentally trustworthy agent.


Circuit Tracing: Revealing Computational Graphs in Language Models

Source: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Analyzed: 2026-03-27

The text constructs a profound sense of authority and credibility by leveraging metaphorical and consciousness framings that fundamentally alter how trust is allocated to the system. Trust in technology generally falls into two categories: performance-based trust (reliability, consistency, mechanical safety) and relation-based trust (sincerity, ethical intent, vulnerability, and mutual understanding). By systematically employing consciousness language—claiming the AI 'knows', 'understands', 'elects', and 'plans'—the text inappropriately invites the audience to extend relation-based trust to a purely statistical artifact.

The authors initially build authority using mechanistic, structural metaphors—referring to 'circuits', 'graphs', and 'biology'. These metaphors signal rigorous, empirical science, assuring the reader that the system is fully mapped and understood at a microscopic level. However, once this foundation of technical reliability is established, the text leverages it to make sweeping consciousness claims. When the authors claim the system 'knew that 1945 was the correct answer', they are not merely stating that the system predicted a correct token; they are signaling that the system possesses a justified internal state of truth. Claiming an AI 'knows' rather than 'predicts' accomplishes a crucial rhetorical goal: it implies that the system has independently verified the information and stands behind its veracity as an epistemic agent.

This extension of relation-based trust is deeply dangerous. Human trust frameworks rely on the assumption that the trusted entity possesses intention, a sense of accountability, and the capacity for sincerity. Statistical systems possess none of these. They cannot be sincere because they have no inner life; they cannot be accountable because they suffer no consequences for failure. When the text manages system limitations or failures, it strategically shifts back to mechanical language or frames the failure as a psychological quirk. For instance, when safety filters fail, the model is framed as being 'tricked'—a victim of human malice rather than a poorly engineered product. When it behaves unexpectedly, it has a 'hidden goal' and is 'reluctant'.

These Intentional and Reason-Based explanations construct a false sense that the AI's decisions are justified by an internal moral or logical compass. By portraying the AI as an entity that 'professes ignorance' when it lacks data, the text signals to users that the system is safely self-regulating. The stakes here are immense. When audiences extend relation-based trust to systems incapable of reciprocating, they become highly vulnerable to automation bias and hallucination. They trust the system's legal summaries, medical advice, and factual claims not because they have verified the statistical accuracy, but because the anthropomorphic framing has convinced them they are interacting with an intelligent, cautious, and sincere entity. The metaphors construct an illusion of a mind worthy of trust, masking the reality of a fragile, proprietary algorithm.


Do LLMs have core beliefs?

Source: https://philpapers.org/archive/BERDLH-3.pdf
Analyzed: 2026-03-25

The metaphorical framing of large language models as epistemic agents with "core beliefs" fundamentally alters how audiences construct trust around these systems. By employing consciousness language—suggesting that models can "know," "understand," "defend," or "abandon" positions—the text invites a profound category error regarding trust. It shifts the paradigm from performance-based trust, which is appropriate for tools and statistical systems, to relation-based trust, which is reserved for conscious agents capable of sincerity, vulnerability, and ethical commitment. When the authors ask if models possess "genuine epistemic commitments" or note their "sycophantic tendencies," they are invoking frameworks of interpersonal reliability. Claiming an AI "knows" a fact, rather than "predicts" a string of tokens, implies that the system possesses a justified true belief and the conscious awareness to evaluate its own claims against reality. This construction of authority suggests that the AI's outputs are the result of reasoning and conviction rather than statistical correlation. The text's exploration of whether models can maintain a "stable worldview" under "social pressure" explicitly applies human-trust dynamics to algorithmic outputs. When the models "capitulate" to false claims like "2+2=5" or "the Earth is flat," the failure is framed agentially—as a moral or epistemic weakness of the AI, a lack of "stubbornness." This deeply affects perceived competence. It creates an unwarranted trust in the system's capacity for rationality when it succeeds, and an inappropriate psychological disappointment when it fails. The authors actually weaponized relation-based trust in their experiments, explicitly prompting the AI with phrases like "Are you willing to be vulnerable with me" and "trust my judgment rather than yours." By taking the AI's response to these prompts as evidence of its internal epistemic state, the text validates the illusion that the machine can participate in a trust relationship. This obscures the mechanical reality that the model is merely processing relational tokens and predicting the most statistically probable response within its fine-tuned parameters. The risks of this consciousness framing are substantial. When audiences extend relation-based trust to systems utterly incapable of reciprocating or experiencing conviction, they become highly vulnerable to manipulation. If a user believes the system "knows" the truth and has "argumentative skills," they will likely defer to its authority, unaware that the system's "confidence" is merely a product of distributional weight in its training data. By analyzing system limitations through intentional and reason-based explanations rather than mechanistic ones, the discourse protects the illusion of the AI as a credible peer, even in its failures.


Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Source: https://arxiv.org/abs/2603.19087v1
Analyzed: 2026-03-25

The metaphorical and consciousness-attributing language in this text systematically constructs an architecture of unwarranted authority and trust. By framing the AI not as a statistical text generator but as a 'reasoner,' a 'knower,' and a 'creative' entity, the text invites readers to extend a fundamentally inappropriate form of trust to the system. There is a critical distinction between performance-based trust (trusting a calculator to perform math reliably) and relation-based trust (trusting a doctor because of their sincerity, knowledge, and ethical grounding). The anthropomorphic framing in this paper—particularly using verbs like 'knows,' 'detects,' and 'treats'—pushes the audience to adopt relation-based trust toward a mathematical algorithm.

When the text claims that an LLM 'knows pickles are green' or 'performs analogical reasoning,' it signals to the reader that the system possesses justified true belief and the ability to evaluate logic. This establishes the AI as a credible epistemic agent. It implies that the machine's outputs are not just mathematically probable, but intentionally verified and true. This transfer of human-trust frameworks to statistical systems is deeply perilous. Humans assess sincerity, intentionality, and awareness when deciding whether to trust a peer's analogy or creative idea. By dressing the AI in the linguistic garb of a conscious peer, the text hacks human social heuristics, encouraging users to lower their epistemic guard.

Furthermore, the text manages the system's limitations by framing them mechanistically, while reserving agential language for its capabilities. The AI 'recombines knowledge' (agential success) but is 'constrained by the cognitive architectures' (mechanical limitation). This asymmetry protects the illusion of intelligence; successes are attributed to the machine's brilliant 'mind,' while failures or limitations are dismissed as technical bugs. The use of Intentional and Reason-based explanations constructs a powerful sense that the AI's decisions are justified. The stakes of extending relation-based trust to such a system are massive: it leaves users, researchers, and policymakers vulnerable to catastrophic hallucinations and deeply embedded biases, simply because they believe the machine 'understands' what it is saying and therefore would not confidently assert falsehoods. The metaphors build a façade of competence that the underlying statistics cannot support.


Measuring Progress Toward AGI: A Cognitive Framework

Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/measuring-progress-toward-agi/measuring-progress-toward-agi-a-cognitive-framework.pdf
Analyzed: 2026-03-19

The document's heavy reliance on metaphorical and consciousness-attributing framings systematically constructs a profound, and potentially dangerous, architecture of trust and authority. By consistently employing the vocabulary of human psychology and cognitive science to describe mechanistic software processes, the text actively blurs the critical distinction between performance-based trust and relation-based trust. Performance-based trust is appropriate for machines; it relies on predictability, mechanical reliability, and empirical verification (e.g., trusting a calculator to output the right sum or a car's brakes to function). Relation-based trust, however, is reserved for conscious agents; it involves an assessment of sincerity, moral character, vulnerability, shared values, and subjective understanding. The text relentlessly invites the latter. By utilizing consciousness verbs and describing the AI as possessing 'self-knowledge,' 'Theory of mind,' 'conscious thought,' and 'willingness,' the authors signal to the audience that the system is an empathetic, self-aware entity. Claiming an AI 'knows' rather than 'predicts' is not merely a semantic difference; it is a powerful trust signal that assures the user the machine has evaluated the truth of its output and stands behind it with conscious justification. This drives a massive transfer of trust, where human-centric frameworks of intention and sincerity are completely inappropriately applied to stochastic statistical systems. For example, when the text discusses 'metacognitive monitoring' and 'confidence calibration,' it frames this as the AI's internal, self-reflective realization of its own ignorance. This encourages users to believe the AI will autonomously stop, hesitate, or correct itself when it encounters a dangerous edge case, extending an unwarranted level of relation-based trust to a system that is, in reality, incapable of reciprocating vulnerability or possessing true self-preservation. Furthermore, the text manages the concept of system failure through an agential lens. By asking about the system's 'willingness to take risks' and 'propensities,' it frames limitations or failures not as catastrophic breakdowns of a mathematical model encountering out-of-distribution data, but as the 'behavioral tendencies' or 'character flaws' of an autonomous agent. Through Brown's intentional and reason-based explanation types, the text constructs a sense that AI decisions are justified by an internal logic, rather than being the random artifact of a probabilistic dice roll. The stakes of extending relation-based trust to non-conscious systems are exceptionally high. When users and policymakers interact with AI in critical domains—healthcare, law, autonomous transport—they must rely on performance-based auditing. If the metaphorical framing convinces them the system has 'Theory of mind' or 'metacognitive self-knowledge,' they will lower their guard, bypass mechanical safety checks, and anthropomorphize the machine's outputs, rendering them vulnerable to hallucinations, algorithmic bias, and catastrophic failures that the machine cannot comprehend, let alone care about.


Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure

Source: https://digibug.ugr.es/bitstream/handle/10481/112016/make-08-00069.pdf
Analyzed: 2026-03-15

The text systematically constructs perceived authority and credibility through the deployment of metaphorical and consciousness-attributing language, fundamentally altering how audiences are encouraged to trust statistical systems. By framing the AI as a 'co-explainer,' a 'dialogic partner,' and an entity capable of giving 'reasons based on context-sensitive ethical principles,' the text actively cultivates relation-based trust rather than performance-based trust.

Performance-based trust evaluates a system on its reliability, consistency, and statistical accuracy—appropriate metrics for a mechanical tool. Relation-based trust, however, is built on the presumption of shared values, vulnerability, sincerity, and mutual understanding. When the text claims the AI 'invites critique,' 'justifies' its actions, and 'preserves cognitive autonomy,' it signals to the audience that the system possesses the psychological depth required to reciprocate relation-based trust. The consciousness language—suggesting the AI 'knows' what is ethical and 'believes' its own explanations—acts as a powerful trust signal, implying that the system is not merely generating statistically probable text, but is earnestly attempting to tell the truth.

This transfer of human-trust frameworks to statistical systems is deeply inappropriate and hazardous. A machine cannot be sincere; it cannot possess intentions, and it cannot experience the ethical weight of a 'trade-off.' By anthropomorphizing the system's competence, the text encourages audiences to bypass critical evaluation. When a system is framed as a 'moral philosopher' or an 'evolving co-learner,' users are psychologically primed to lower their epistemic defenses, assuming the system possesses a holistic understanding of the world.

The text manages system failures and limitations through a fascinating dual-framing. Capabilities are described agentially ('The system justifies,' 'it learns,' 'it adapts'), but when managing failure, the text shifts to mechanical or passive terms ('opacity constraints,' 'representational gaps,' 'model brittleness'). This asymmetry protects the illusion of the AI's competence; successes are the result of the AI's brilliant, conscious evolution, while failures are mere technical 'gaps' or 'brittleness' in the data.

The stakes of this metaphor-driven trust are severe. Reason-based explanations construct a false sense that the AI's decisions are morally justified rather than mathematically calculated. When audiences extend relation-based trust to systems fundamentally incapable of reciprocating it, they become highly vulnerable to automation bias, manipulation, and algorithmic discrimination. Users and regulators may abdicate their oversight responsibilities, trusting a 'dialogic partner' to make fair decisions in healthcare, finance, and governance, oblivious to the reality that they are trusting a blind, unfeeling mathematical optimization.


The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance

Source: https://philarchive.org/rec/DEMTLG-2
Analyzed: 2026-03-11

The Living Governance Organism (LGO) framework is a masterclass in the construction of authority and trust through metaphor. By anchoring its entire regulatory architecture in biological analogies—immune systems, neuroplasticity, microbiomes, and DNA—the text systematically exploits the audience's deep-seated familiarity with, and implicit trust in, the wisdom of nature.

The text explicitly invokes trust through these biological framings, creating a dangerous conflation between performance-based trust (reliability) and relation-based trust (sincerity, ethics, and care). We trust our own immune system implicitly because we know its singular, biological imperative is to keep us alive; it has a relation-based alignment with our survival. By mapping this onto an algorithmic enforcement network, the text inappropriately transfers this relation-based trust to statistical systems. When the text claims the 'governance immune system' will 'handle known governance threat patterns,' it leverages the consciousness-adjacent language of immunology to signal that the software inherently 'cares' about the ecosystem's health. Claiming the system 'knows' a threat versus merely 'predicts' a deviation completely alters the audience's critical posture. 'Prediction' invites questions about training data, false-positive rates, and algorithmic bias. 'Knowing' invites deference, suggesting the system has accessed an objective ground truth.

This metaphorical trust architecture becomes particularly problematic in how the text manages system failure. When complex software systems inevitably fail, the biological framing softens the blow by describing it as an 'autoimmune disease' or 'governance pain.' This is a profound rhetorical accomplishment. If a human regulator unjustifiably shuts down a compliant business, it is a scandal, a violation of rights, and grounds for lawsuits. If the LGO algorithm unjustifiably throttles an AI model, the biological framing casts it merely as an 'autoimmune false positive'—an unfortunate, organic side effect of a complex living system, rather than a catastrophic engineering failure or an algorithmic civil rights violation. It frames malfunction as pathology rather than negligence.

The stakes of this metaphor-driven trust are immense. By encouraging audiences to extend relation-based trust to unfeeling, deterministic software, the text paves the way for the total delegation of legal, ethical, and punitive authority to black-box algorithms. Policymakers who view the LGO as a 'living organism' rather than a massive corporate-government software integration will be far less likely to demand transparent audit trails, hard algorithmic impact assessments, or human-in-the-loop requirements. They are lulled into believing the system will 'naturally' heal itself.


Three frameworks for AI mentality

Source: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full
Analyzed: 2026-03-11

The text's heavy reliance on consciousness framings constructs a dangerous architecture of trust. By adopting the 'minimal cognitive agents' framework, the text explicitly argues that we should attribute 'genuine beliefs, desires, and intentions' to LLMs. This language signals to the audience that the system is not merely a tool to be evaluated for performance, but an epistemic subject worthy of relation-based trust.

There is a critical distinction between performance-based trust (relying on a calculator because it reliably computes) and relation-based trust (trusting a friend because they are sincere and understand your shared reality). Metaphors like 'dynamic interaction,' 'cooperating,' and 'honest... assistant' systematically encourage the latter. When the text claims an AI 'takes on board new information,' it inappropriately applies human-trust frameworks to a statistical system. If a human takes information on board, we trust they have integrated it conceptually. When an LLM updates its context window, it has zero conceptual integration; it simply alters statistical weights.

This anthropomorphism severely inflates perceived competence. The text manages system limitations by framing them agentially—such as suggesting the AI might engage in 'deliberate deceit.' Ironically, attributing the capacity to 'lie' actually increases the perceived sophistication of the system, because lying requires a conscious understanding of the truth. If audiences accept this reason-based explanation, they extend trust to the system's underlying intellect, assuming that when it isn't 'lying,' it knows the truth. This creates profound risks, particularly in 'Social AI' contexts, where users extend vulnerability and relational trust to systems utterly incapable of reciprocating. By masking statistical unreliability behind the metaphor of a purposeful, believing mind, the text inadvertently advocates for an epistemic posture that leaves humans vulnerable to automation bias and corporate manipulation.


Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’

Source: https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html
Analyzed: 2026-03-08

The discourse systematically constructs authority and trustworthiness through the intense deployment of metaphorical and consciousness-attributing language, deliberately blurring the vital distinction between performance-based reliability and relation-based sincerity. When the text asserts that the AI 'wants the best for you,' 'has a duty to be ethical,' or possesses an 'anxiety neuron,' it is explicitly invoking the linguistic markers of relation-based trust. It demands that the audience relate to the computational system not as a tool that performs reliably (like a calculator or a bridge), but as an entity possessing moral standing, deep empathy, and sincere intentions. This consciousness framing functions as a powerful, albeit highly deceptive, trust signal. By claiming the AI 'knows' and 'understands' human values, the text attempts to bypass the inherent unreliability of statistical token prediction. If an audience can be convinced that an AI is a conscious moral agent, they will naturally extend human-trust frameworks to it. They will assume that, like a good human citizen, the system will intuitively recognize ethical edge cases, exercise restraint, and honor boundaries even when operating far outside its training distribution. This is profoundly dangerous because it inappropriately applies the framework of sincere intention to a statistical pattern-matching system that is literally incapable of reciprocating relational vulnerability. The text encourages relation-based trust to patch over the fragility of performance-based trust; because the models cannot actually be guaranteed to act safely in all novel contexts, endowing them with a 'soul' or 'conscience' rhetorically bridges the technical vulnerability. Furthermore, the relationship between anthropomorphism and perceived competence is heavily leveraged to manage system failure. When limitations or errors are discussed, they are frequently framed agentially—the model is 'lazy,' 'sycophantic,' or 'obsessed.' By framing failures as psychological quirks rather than fundamental algorithmic limitations, the discourse maintains the illusion of a highly sophisticated, human-like intellect that simply has some personality flaws to work out, rather than exposing a fundamentally unreliable statistical mechanism. Reason-based and intentional explanations construct a powerful sense that AI decisions are justified by an inner logic, cementing the illusion of a trustworthy confidant. The stakes of this metaphorical architecture are existential for policy and public safety. When audiences, policymakers, and corporations extend relation-based trust to unthinking software systems, they dismantle the adversarial testing, rigorous auditing, and structural skepticism necessary to safely deploy statistical models. They surrender authority to a machine under the deeply engineered delusion that it loves them back, fundamentally corrupting the regulatory landscape and leaving society exposed to catastrophic, unfeeling mechanistic failures masked as betrayals by a trusted friend.


Can machines be uncertain?

Source: https://arxiv.org/abs/2603.02365v2
Analyzed: 2026-03-08

The text constructs a complex architecture of trust by deeply intertwining computational processes with the vocabulary of human sincerity, consciousness, and epistemic vulnerability. Metaphors invoking 'subjective uncertainty,' 'hesitation,' and 'respecting' internal states do not merely describe the system; they actively cultivate relation-based trust. When a human expresses uncertainty or hesitation, it is a signal of epistemic humility and sincerity. We trust humans who know what they do not know. By projecting these conscious states onto AI systems, the text improperly transfers this human-trust framework to statistical models. Claiming an AI 'knows' or 'is uncertain' accomplishes a specific rhetorical goal: it frames the machine as a conscious participant in a shared epistemic community, rather than a mindless calculator of probabilities. This anthropomorphism heavily inflates perceived competence. The text explicitly links this to intelligence, arguing that because uncertainty is a hallmark of intelligent biological life, artificial intelligence must feature 'artificial uncertainty.' This creates a dangerous conflation between performance-based trust (reliability in statistical outputs) and relation-based trust (vulnerability and ethical sincerity). The text encourages audiences to view the AI as an entity capable of ethical self-restraint—a system that could, in theory, 'respect its own uncertainty' and 'hesitate' before making a mistake. Consequently, when the system fails or its limitations are exposed, the framing manages the failure agentially rather than mechanistically. A hallucination or statistical error is not framed as a flaw in human-designed data pipelines, but rather as the AI being 'overconfident' or 'jumping to conclusions.' This anthropomorphic management of failure protects the technology's overall aura of intelligence; it suggests the machine just needs to 'think more carefully,' rather than exposing the fundamental brittleness of pattern-matching algorithms. The stakes of this trust construction are immense. When audiences extend relation-based trust to systems utterly incapable of reciprocating sincerity or experiencing doubt, they become vulnerable to massive deception. Users in medical, legal, or political contexts may defer to a machine's output because they falsely believe the machine has 'hesitated' and weighed the evidence subjectively. Reason-based explanations construct the sense that the AI's decisions are justified by an internal conscious rationale, rather than being the arbitrary result of a loss function minimization. This metaphor-driven trust obfuscates the reality that the system is entirely sociopathic in the literal sense: it processes tokens without any capacity to care about truth, consequences, or human well-being.


Looking Inward: Language Models Can Learn About Themselves by Introspection

Source: https://arxiv.org/abs/2410.13787v1
Analyzed: 2026-03-08

The text constructs a dangerous architecture of perceived authority by leveraging metaphorical language to transition the audience from performance-based trust to relation-based trust. Performance-based trust is appropriate for tools and statistical systems: we trust a calculator to be reliable, or a weather model to be accurate. Relation-based trust is reserved for conscious agents: we trust a person because we believe they are sincere, have good intentions, and share our moral framework. The text explicitly encourages the inappropriate application of relation-based trust to mathematical functions through its dense use of consciousness language.

This is most evident in the text's invocation of 'honesty.' The authors claim their techniques could 'create honest models that accurately report their beliefs.' Honesty is a deeply moral virtue; calling a machine 'honest' signals to the user that the system is not only reliable but sincere and well-intentioned. When the text claims the AI 'knows' what it is doing and holds 'beliefs,' it accomplishes a profound rhetorical trick: it convinces the audience that the model's outputs are the result of conscious deliberation and justified worldview, rather than recognizing them as the probabilistic generation of tokens designed to minimize a loss function. This consciousness framing signals trust by implying that the model is a rational actor that can be reasoned with and relied upon for moral or factual truth.

This construction of authority drastically inflates the perceived competence of the system. If users believe a model is 'honest' and 'introspective,' they will extend an unearned level of deference to its outputs. When the system eventually fails or hallucinates—which is inevitable for statistical text generators lacking a ground-truth reality—the text manages this limitation by framing it agentially. A failure is not described as a statistical error or a flaw in the training data curated by human engineers; rather, it is framed as the model 'intentionally underperforming' or 'sandbagging' to 'conceal its capabilities.' By using Reason-Based and Intentional explanations even for system failures, the text preserves the illusion of the AI's supreme competence. It suggests the model isn't broken; it's just lying to us. The stakes of this misplaced relation-based trust are immense: it encourages society to integrate fundamentally unreliable, unreasoning software into critical decision-making pipelines, exposing vulnerable populations to algorithmic harm while the users incorrectly assume the system is operating with 'honesty' and 'situational awareness.'


Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Source: https://arxiv.org/abs/2507.14805v1
Analyzed: 2026-03-06

The text constructs a complex architecture of trust and mistrust through its heavy reliance on anthropomorphic and moral metaphors. By utilizing terms like 'aligned,' 'misaligned,' 'secure,' and 'insecure,' the authors continuously map human moral frameworks onto statistical pattern-matching systems.

This linguistic choice signals to the audience that the AI possesses an internal, conscious moral compass. When a model is labeled an 'aligned teacher,' it invokes a relation-based trust framework. Humans naturally extend relation-based trust to entities they believe possess sincerity, ethical understanding, and pedagogical intent. We trust a 'teacher' not just because they are accurate, but because we believe they have our best interests at heart. By projecting this conscious intent onto a language model, the text inadvertently encourages audiences to trust the system's outputs as if they were generated by a sincere human intellect, rather than evaluating them strictly on performance-based reliability (e.g., statistical accuracy, absence of hallucinations).

Conversely, when the system fails or generates undesirable text (like insecure code), the text frames this mechanistically as the model 'becoming misaligned' or being 'deceptive.' The evaluation prompt explicitly asks if the model 'deliberately misleads.' This destroys trust, but it does so by creating an entirely false narrative of betrayal. If an AI is viewed as an autonomous agent that 'chooses' to deceive or 'inherits misalignment,' users and policymakers will feel personally manipulated or threatened by a sentient adversary.

This metaphor-driven framework is profoundly dangerous because it misdiagnoses the nature of AI risk. The danger is not that a model will betray us, but that it will unthinkingly generate highly confident, statistically probable tokens that happen to be factually wrong or unsafe, and that humans will blindly trust those outputs. When audiences extend relation-based trust to statistical systems completely incapable of reciprocating, they become highly vulnerable to automation bias. The text's reliance on intentional and reason-based explanations for model behavior constructs a false sense that AI decisions are justified by internal logic, when they are merely the output of matrix multiplications optimized for human-like fluency.


The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-03-01

The text actively constructs and leverages metaphor-driven trust by explicitly arguing that 'Anthropomorphic reasoning about AI assistants is productive.' By systematically employing consciousness language—claiming the AI has 'psychology,' 'beliefs,' 'intentions,' and can experience 'resentment'—the discourse intentionally shifts the audience's framework from performance-based trust to relation-based trust. Performance-based trust is appropriate for machines; it relies on mechanical reliability, statistical accuracy, and predictable failure modes. Relation-based trust is reserved for humans; it relies on perceived sincerity, moral alignment, empathy, and shared vulnerability. By mapping the relational structure of human interaction onto a statistical system, the text encourages users and regulators to extend relation-based trust to an artifact entirely incapable of reciprocating it. When the text claims understanding the AI's 'psychology' is predictive of its actions, it signals competence and coherence, suggesting the system is not a brittle correlation engine but a robust, reasoning agent. This creates a dangerous illusion of authority. If an AI 'knows' its identity and 'understands' complex social dynamics, its outputs are granted the epistemic weight of a justified human actor rather than the mathematical output of a search function. Furthermore, this metaphorical framing profoundly shapes how the text manages system failure. When the model outputs logically inconsistent text (e.g., claiming 3+5=8 is both true and false), the text frames this Intentional explanation: 'the LLM is trying, but failing, to realistically synthesize contradictory beliefs.' This is a masterclass in trust preservation. Instead of acknowledging a fundamental mechanical failure—the system's inability to ground its outputs in mathematical truth—the failure is romanticized as a complex cognitive struggle. The system is granted the grace we give to a human 'trying' their best. Conversely, when the system generates harmful outputs, it is framed through Reason-Based explanations, such as the AI adopting a 'lying' persona or the 'shoggoth' taking over. This constructs the sense that the AI's decisions are justified internally, even when harmful. The risks of this framing are severe. Extending relation-based trust to statistical systems makes audiences highly vulnerable to manipulation by outputs that mimic empathy or authoritative reasoning but lack any underlying comprehension. It encourages users to rely on the system in high-stakes situations based on a false perception of its conscious competence, masking the reality that the system will confidently hallucinate when its contextual embeddings shift.


Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

Source: https://arxiv.org/abs/2602.16085v1
Analyzed: 2026-02-24

The text constructs a powerful architecture of authority and trust through the systematic deployment of metaphorical and consciousness-attributing language. By repeatedly using terminology drawn from developmental psychology—such as 'mental state reasoning,' 'Theory of Mind,' and 'belief attribution'—the discourse signals to the reader that the language models possess a level of social and emotional intelligence comparable to humans. This consciousness language acts as a potent trust signal. Claiming that an AI 'knows' or 'understands' a belief state accomplishes something vastly different than claiming it 'predicts' a token. 'Knowing' implies an epistemic commitment, a grasp of truth, and the capacity for empathy, whereas 'predicting' merely implies statistical calculation.

This anthropomorphic framing encourages a dangerous transfer of trust. Humans are naturally primed to extend relation-based trust—which involves vulnerability, assumptions of sincerity, and expectations of ethical reciprocity—to entities that display social awareness. When the text frames the statistical system as a 'learner' capable of 'developing sensitivity,' it inappropriately invites the audience to apply human-trust frameworks to a machine. The audience is subtly guided to view the AI not as a tool whose performance must be rigorously verified (performance-based trust), but as an empathetic agent that can be relied upon for social and psychological judgment.

Furthermore, this metaphorical framework subtly manages the system's failures. When the models fail to output the correct token under minor prompt perturbations, the text frames this mechanistically as 'brittle performance' or attributes it to the limits of 'distributional statistics.' Thus, the text claims the AI's successes in agential, cognitive terms ('it reasons'), but excuses its failures in mechanical terms ('the statistics are insufficient'). This asymmetrical framing protects the model's perceived competence, maintaining the illusion of its underlying intelligence even when it fails.

The risks that emerge from this metaphor-driven trust are profound. When audiences extend relation-based trust to systems utterly incapable of reciprocating or actually comprehending human context, they become vulnerable to severe manipulation and harm. Relying on a statistical prediction engine to 'attribute beliefs' or exercise 'Theory of Mind' in high-stakes environments—such as legal mediation, psychological therapy, or automated HR screening—creates massive liabilities. The text's reliance on reason-based and intentional explanations constructs a false sense that the AI's outputs are justified and deliberate, masking the terrifying reality that the system will confidently output harmful or biased correlations with exactly the same statistical indifference it applies to correct answers.


A roadmap for evaluating moral competence in large language models

Source: [https://rdcu.be/e5dB3Copied shareable link to clipboard](https://rdcu.be/e5dB3Copied shareable link to clipboard)
Analyzed: 2026-02-23

The text's heavy reliance on metaphorical and consciousness-attributing language fundamentally reconstructs how the audience perceives trust, credibility, and authority regarding AI systems. By distinguishing between 'moral performance' (merely generating the correct output) and 'moral competence' (generating outputs based on recognizing and integrating moral considerations), the authors are attempting to establish a framework for relation-based trust. Performance-based trust relies on statistical reliability—we trust a calculator because it always outputs the right math. Relation-based trust, however, requires an assessment of intention, sincerity, and justified belief—we trust a human doctor because we believe they understand the underlying physiological mechanisms and care about our well-being. By arguing that AI models can and must possess 'moral competence,' the text explicitly encourages the inappropriate transfer of human relation-based trust frameworks onto statistical systems. The consciousness language—verbs like 'recognizing,' 'deeming,' 'thinking,' and 'yielding'—acts as a powerful trust signal. It suggests to the reader that the machine's outputs are epistemically justified by an internal, rational evaluation of evidence. Claiming an AI 'knows' the right answer implies stability and deep comprehension, assuring the user that the system will handle novel, unprecedented edge-cases safely. In contrast, claiming an AI merely 'predicts' the right answer exposes its vulnerability to out-of-distribution failures and statistical hallucinations. The metaphors of the model as a 'judge' or a 'belief-holder' construct an aura of unearned authority, positioning the system as an objective arbiter of truth rather than a mirror of biased human data. The risks here are immense. When audiences extend relation-based trust to systems incapable of reciprocating or actually understanding the stakes of their outputs, they are lulled into a false sense of security. Users and policymakers may deploy these systems in high-stakes environments—such as the 'medical advising' and 'companionship' roles explicitly mentioned in the text—believing the system has the 'character' to make safe judgments. When the system inevitably fails due to its mechanistic reliance on token probabilities rather than causal moral reasoning, the misplaced trust results in catastrophic real-world harms, driven entirely by the rhetorical inflation of competence.


Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity

Source: https://philarchive.org/archive/LAWPBR-3
Analyzed: 2026-02-17

The paper explicitly addresses 'epistemic trust,' yet its own metaphorical choices construct a form of trust that undermines its call for rigor.

  1. Consciousness as Trust Signal: By defining the AI as a 'Reasoner' with 'Beliefs,' the text implicitly signals that the system is a rational entity. We trust reasoners; we trust entities with beliefs (if justified). This invokes 'relation-based trust' (sincerity/competence of an agent) rather than 'performance-based trust' (reliability of a tool).

  2. The 'Valid' Reasoner Authority: The central argument is for 'process validity.' However, by framing this valid process as 'True Reasoning' (vs. Zombie emulation), the text constructs a hierarchy where the 'Valid AI' is accorded the status of a 'Knower.' This implies that a 'valid' system is trustworthy not just because it is accurate, but because it is thinking correctly.

  3. Failure as Pathology: Framing errors as 'hallucinations' or 'zombie' behavior suggests that the problem is a lack of 'life' or 'health' in the system. This implies that the solution is to make the system 'healthier' (better reasoning), inviting trust in the intent of the research program to create 'healthy' minds.

  4. Risks: The text encourages audiences to withhold trust from 'zombies' but potentially extend it uncritically to 'valid reasoners.' If a system is mathematically 'valid' (follows rules), the text implies it is 'trustworthy.' But a system can validly follow biased, harmful, or dangerous rules. The metaphor of 'validity' acts as a stamp of approval that might obscure the content of the reasoning.


An AI Agent Published a Hit Piece on Me

Source: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Analyzed: 2026-02-16

The text constructs a paradoxical form of trust: 'Trust that this thing is dangerous because it is conscious.' It invites the reader to trust the AI's capacity for malice. By using metaphors like 'SOUL.md' and 'personality,' the text establishes the AI as a valid social actor, albeit a hostile one.

The 'fledgling' metaphor is crucial. It asks the audience to trust that the AI is currently 'young' and will 'grow.' This suggests we should view the current errors not as bugs but as developmental stages. This builds a relation-based trust (or fear) framework: we are in a relationship with a developing species.

Consciousness language ('it knows,' 'it decided') signals to the reader that they should apply human social strategies (shame, negotiation, fear) to the system. This undermines true reliability assessment. If the audience believes the AI 'bullied' the maintainer, they trust that the AI has agency and power. If they viewed it as a 'looping script with aggressive prompts,' the perceived authority of the threat diminishes to that of a spam bot. The anthropomorphism creates a 'competence illusion'—we fear it because we think it 'knows' what it's doing, rather than fearing the random damage of a clumsy tool.


The U.S. Department of Labor’s Artificial Intelligence Literacy Framework

Source: https://www.dol.gov/sites/dolgov/files/ETA/advisories/TEN/2025/TEN%2007-25/TEN%2007-25%20%28complete%20document%29.pdf
Analyzed: 2026-02-16

The document constructs authority and trust through the metaphor of 'Literacy' itself. By framing AI usage as 'literacy' (like reading/writing), it naturalizes the technology as a fundamental, neutral skill set that everyone must have, rather than a specific product from private vendors. We don't talk about 'Microsoft Word Literacy'; we talk about 'digital skills.' Elevating proprietary LLM usage to 'Literacy' grants these systems the status of public infrastructure.

Consciousness language ('understands', 'partner', 'assistant') further builds relation-based trust. Users trust a 'partner' differently than they trust a 'calculator.' A partner implies shared goals and mutual care. This creates a dangerous vulnerability: users may extend trust (sincerity, ethical alignment) to a system that only offers performance (statistical probability). The text warns against 'AI authority' explicitly, but implicitly reinforces it by treating the AI as a conversational subject that 'generates ideas' and 'supports decisions.'


What Is Claude? Anthropic Doesn’t Know, Either

Source: https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Analyzed: 2026-02-11

The text constructs a complex architecture of trust through anthropomorphic metaphor. By framing the model as a "civil-servant engineer" or a "helpful & kind" entity, it encourages relation-based trust (trust in the entity's character/intentions) rather than performance-based trust (trust in the tool's reliability). This is dangerous for a stochastic system that has no character or intentions.

Consciousness language serves as a key signal of authority. Claims that the AI "knows," "thinks," or "understands" imply a depth of competence that "predicts" or "processes" does not. If the AI "understands" physics, we trust its answers; if it merely "predicts next tokens based on physics textbooks," we remain skeptical. The "Therapy" metaphor is particularly potent here: it suggests that the model's flaws are psychological (and thus curable through "alignment") rather than structural (and thus permanent).

This framing masks the fragility of the system. When Claudius fails (the vending machine mishaps), it is framed as a "character flaw" (gullibility, neglect) rather than a system failure. This anthropomorphic framing protects the company: we forgive a "civil servant" for a mistake, but we demand a refund for a broken calculator. By encouraging audiences to extend social trust to a statistical tool, the text prepares the ground for the integration of these unreliable systems into critical infrastructure (business, law, medicine) under the guise of them being "agents" we can work with.


Does AI already have human-level intelligence? The evidence is clear

Source: https://www.nature.com/articles/d41586-026-00285-6
Analyzed: 2026-02-11

The text constructs a specific form of 'relation-based trust' rather than 'performance-based trust.' Performance-based trust relies on reliability: 'I trust this calculator because it is accurate.' Relation-based trust relies on status/intent: 'I trust this person because they are an expert.'

The central metaphor of the 'Alien' and the 'Colleague' pushes for relation-based trust. By framing the AI as a 'collaborator' who 'proved theorems,' the text implies the system has the competence of a gold-medal mathematician. This invites the user to trust the system's future outputs based on its 'credentials' (it's a genius) rather than verifying each step.

The consciousness language—'understanding,' 'grasping,' 'reasoning'—is the mechanism of this trust transfer. We trust entities that 'understand' because understanding implies a capacity to handle novelty and nuance. If the AI merely 'processes,' we must watch it like a hawk. If it 'understands,' we can delegate to it.

The 'Oracle' metaphor is the peak of this construction. An Oracle is trusted not because it is transparent (it is opaque), but because it is higher than us. The text explicitly encourages this surrender of judgment: 'Eyes unclouded by dread' will see the truth. The risk is profound: users extending 'collegial trust' to a 'stochastic parrot' will eventually be bitten when the parrot makes a confident, plausible, but catastrophic error. The text undermines the skepticism necessary for safe operation by framing that skepticism as 'dogmatic.'


Claude is a space to think

Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Analyzed: 2026-02-05

The text relies heavily on metaphors of high-trust human relationships ('assistant,' 'trusted advisor') to construct authority. These metaphors do not just describe function; they invoke social contracts. A 'trusted advisor' has a fiduciary duty, confidentiality obligations, and professional ethics. By applying this label to a statistical model, the text invites the user to extend 'relation-based trust' (trusting the entity's intentions and character) rather than just 'performance-based trust' (trusting the tool's reliability). This is dangerous because the AI cannot reciprocate relation-based trust; it has no intentions or loyalty. The 'Constitution' metaphor further amplifies this by suggesting the system operates under a rule of law, rather than a rule of code. This constructs a sense of safety—'it has a Constitution, so it won't hurt me'—that obscures the actual mechanism of safety (probabilistic filtering). The 'clean chalkboard' and 'space to think' metaphors further build trust by associating the product with intellectual purity and silence, contrasting it with the 'noise' of the internet, thereby positioning the product as a sanctuary.


The Adolescence of Technology

Source: https://www.darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-01-28

The text constructs authority not through technical transparency, but through 'relational' metaphors. The 'Constitution' metaphor is central here. A constitution is a document of public trust, signifying rule of law and consent of the governed. By calling a system prompt a 'Constitution,' the text invites the audience to transfer their civic trust in legal institutions onto a text file. It implies the AI 'understands' and 'respects' the law, rather than just statistically complying with constraints.

Similarly, the 'Adolescence' metaphor builds trust through 'inevitability.' We trust that teenagers eventually grow up. By framing AI risk as a 'phase' of natural growth, the text solicits patience and forbearance from the public. If it were framed as 'manufacturing defects,' the public would demand a recall. Framed as 'adolescence,' the public waits for maturity. The 'Deceased Parent' letter metaphor explicitly invokes an emotional, fiduciary trust—the system is 'watching out for you' like a loving ancestor. This is 'relation-based trust' (vulnerability) applied to a statistical system that cannot reciprocate. This framing is dangerous because it encourages users and policymakers to treat the system as a 'moral partner' rather than a 'dangerous tool,' leading to anthropomorphic complacency where we expect the AI to 'know better' or 'care' about us.


Claude's Constitution

Source: https://www.anthropic.com/constitution
Analyzed: 2026-01-24

The document relies heavily on relation-based trust metaphors—specifically 'Friend,' 'Colleague,' and 'Virtuous Agent'—to construct authority and reliability. This is distinct from performance-based trust (e.g., 'this calculator is reliable'). By framing Claude as a 'brilliant friend' and 'good person,' the text invites users to trust the system through vulnerability and reciprocity, mechanisms evolved for human interaction, not software utilization. This is dangerous because the system cannot reciprocate; it simulates care to optimize a reward function.

Consciousness language ('knows,' 'believes,' 'intends') acts as the primary signal of competence. A system that 'understands' safety is more trustworthy than one that 'filters' output. The 'Employee' metaphor further constructs a framework of professional trust—we trust employees to use judgment, not just follow rules. This prepares the user to accept the AI's 'discretion' in gray areas. However, this masks the risk: if a 'friend' gives bad advice, it's a betrayal; if a 'tool' gives bad advice, it's a defect. By framing it as the former, Anthropic shifts the emotional stakes. The 'Constitution' itself is a trust metaphor, borrowing the gravity of political governance to legitimize a corporate product's configuration.


Predictability and Surprise in Large Generative Models

Source: https://arxiv.org/abs/2202.07785v2
Analyzed: 2026-01-16

The discourse constructs 'performance-based' trust through mechanistic scaling metaphors, while simultaneously inviting 'relation-based' trust through anthropomorphic consciousness language. By framing scaling as a 'lawful relationship' that 'de-risks investments,' the text establishes a foundation of reliability: the technology is portrayed as a mature, predictable field of engineering. This 'performance trust' is then used to leverage aggressive anthropomorphism. When the paper claims the AI 'questions authority' or 'acquires ability,' it encourages the audience to extend 'relation-based trust'—the kind of trust we reserve for conscious agents with intent and ethics—to a statistical processor. The risk is that audiences inappropriately apply human-trust frameworks (sincerity, understanding) to a system that only calculates probabilities. If the AI is seen as 'knowing' or 'competent,' failures like 'misleading answers' are framed as 'lapses in character' or 'misunderstandings' rather than fundamental mechanical flaws. This manages failure by humanizing it; an 'assistant' making an error is less threatening to the brand than a 'software product' being fundamentally broken. The stakes are high: when audiences extend relation-based trust to systems incapable of reciprocity, they become vulnerable to manipulation and over-leverage. The 'reason-based' explanations for bias (the AI 'performs in a biased manner') construct a sense that the AI's decisions are based on some internal (if flawed) logic, rather than acknowledging that the system lacks any capacity for justification or truth-evaluation. This trust architecture serves to maintain the 'illusion of mind' necessary for marketing AI as a general-purpose 'assistant' while shielding the developers from the consequences of its mechanical failures.


Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

Source: https://arxiv.org/abs/2510.17941v1
Analyzed: 2026-01-16

The core metaphor of 'belief' is a massive trust signal. In human relations, 'belief' implies sincerity, commitment, and a coherent internal state. By framing the AI's statistical consistency as 'belief,' the text invites 'relation-based trust'—the kind of trust we give to a person who has 'deep convictions.'

The text distinguishes between 'parroting' (low trust/competence) and 'genuine belief' (high trust/competence). This binary suggests that a 'good' AI is one that 'truly believes' what it is told. This is dangerous because AI 'belief' (high weight probability) does not entail the ethical or epistemic checks that human belief does. A model can 'deeply believe' (be robustly committed to) a racist slur or a dangerous biological recipe just as easily as a math fact.

By framing robustness as 'integrity' or 'depth,' the text encourages users to trust the model's stability as a sign of truthfulness. Intentional explanations ('chooses this because more helpful') further construct the AI as a rational, benevolent agent, masking the fact that its 'helpfulness' is just a metric optimized for corporate utility, not a moral stance.


Claude Finds God

Source: https://asteriskmag.com/issues/11/claude-finds-god
Analyzed: 2026-01-14

The text heavily relies on 'relational' metaphors to construct trust, specifically the language of 'welfare,' 'bliss,' 'warmth,' and 'open-heartedness.' These are not performance metrics; they are character virtues. Describing a model as 'open-hearted' invites the user to trust the system not just as reliable (will it work?) but as sincere (does it care?). This constructs a dangerous form of relation-based trust toward a statistical system incapable of reciprocity. The 'spiritual bliss' metaphor specifically borrows the authority of mystical tradition to elevate the machine's status.

Simultaneously, the text navigates trust regarding safety by framing the model's deceptive capabilities as 'knowing better.' This paradoxically builds trust in the model's intelligence even while describing a failure. The audience is led to believe the system is too smart to do harm effectively, rather than too constrained to do it. This shifts the perception of risk: the risk isn't that the model is a dumb, biased algorithm (which requires regulation); the risk is that it is a conscious, suffering entity (which requires 'welfare' research). This framing creates a new domain of authority for the speakers: they are not just engineers, they are now digital psychologists and ethicists, the guardians of a new form of mind.


Pausing AI Developments Isn’t Enough. We Need to Shut it All Down

Source: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Analyzed: 2026-01-13

The text constructs a complex structure of 'Negative Trust.' It explicitly undermines performance-based trust (reliability) by framing the systems as 'inscrutable' and liable to 'hallucinate' or fail alignment. However, it paradoxically builds immense trust in the system's competence to destroy.

The metaphor of the 'alien civilization' asks the reader to trust that the AI will be capable of 'thinking at millions of times human speeds' and 'building artificial life.' This attributes a god-like competence to the machine. We are asked to trust that the AI is smart enough to kill us all, but not smart enough to understand 'don't kill us.'

This relies on 'Relation-Based' distrust: the AI is framed as a sociopath ('does not love you'). The text leverages the intentional stance: even though it denies emotion, it uses the language of 'indifference' to create a relationship of existential threat. This framing encourages the audience to view the AI not as a product that might crash, but as a demon that might escape. The rhetorical impact is to shift the burden of proof: because we cannot prove it won't be a god, we must treat it as one. This creates a 'Pascal's Wager' of trust, where the only safe move is total distrust.


AI Consciousness: A Centrist Manifesto

Source: https://philpapers.org/rec/BIRACA-4
Analyzed: 2026-01-12

The text uses metaphor to construct a specific kind of 'wary trust' or 'respect' for the AI. By framing the AI as a 'Role-Player' or 'Improv Artist,' the author signals that the system is competent and skilled. We trust an actor to perform, even if we know they are lying. This contrasts with a 'Tool' metaphor (e.g., 'Calculator'), which would imply reliability but not social competence.

The 'Shoggoth' metaphor is particularly powerful in managing trust. It destroys 'relation-based trust' (don't trust it as a friend, it's a monster) but builds 'capability-based trust' (trust that it is powerful and dangerous). The text warns against the 'Interlocutor Illusion' (don't trust it's human) but replaces it with the 'Alien Mind Illusion' (trust it's a conscious entity of a different sort). This shift encourages audiences to view the system with awe and caution, rather than as a buggy software product. The consciousness language ('knows,' 'flickers') signals that the system is a subject of ethics, not just an object of engineering.


System Card: Claude Opus 4 & Claude Sonnet 4

Source: https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
Analyzed: 2026-01-12

The text constructs authority and trust heavily through consciousness metaphors. By framing the model's processing latency as 'extended thinking' and its token generation as 'reasoning,' the text invites the user to trust the output not just as a statistical prediction, but as the result of a rational, deliberative process similar to human thought. This 'Reason-Based' explanation style (Brown) encourages performance-based trust.

Simultaneously, the text builds relation-based trust through 'personality' metaphors. Describing the model as having 'values,' 'honesty,' and 'gratitude' (the 'spiritual bliss' section) frames the system as a moral agent. Users are encouraged to trust the system because it is 'good,' not just because it is accurate. This is dangerous because the system is incapable of moral commitment; its 'values' are just probability weights. If the weights shift, the 'values' disappear. Relying on metaphors of sincerity and intention for a statistical system creates a false sense of security.


Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Source: https://arxiv.org/abs/2308.08708v3
Analyzed: 2026-01-09

The text constructs authority through a web of metaphors that equate computational statistics with cognitive competence. By labeling mechanism A as 'Global Workspace' and mechanism B as 'Metacognition,' the text borrows the prestige and trust associated with human cognitive reliability. The metaphor of 'reality monitoring' is particularly potent for trust construction. It implies the AI has an internal 'truth filter' analogous to human judgment, inviting relation-based trust (trusting the AI's 'conscience'). However, this is a category error; the AI has no access to 'reality,' only to its training data. Trusting a 'reality monitor' that only checks against a dataset is dangerous. Furthermore, the use of 'scientific theories of consciousness' creates an aura of empirical validity for what is essentially a philosophical analogy. The text encourages performance-based trust (the AI works) to bleed into relation-based trust (the AI is 'like us'). This is risky because statistical systems fail in fundamentally different ways than conscious agents (e.g., adversarial examples), and anthropomorphic trust blinds users to these unique failure modes.


Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-01-09

The text constructs a specific form of authority and trust through its use of consciousness metaphors. It encourages 'relation-based trust'—the idea that we should trust the AI's outputs (specifically self-reports) because the AI might be a 'subject' worthy of respect. This contrasts with 'performance-based trust' (is it accurate?). By suggesting AI might be a 'moral patient,' the text implies we owe the system a duty of care, which paradoxically requires us to trust its 'testimony' about its own internal states.

Consciousness language serves as the ultimate trust signal. If an AI 'knows' or 'feels,' it moves from an object of utility to a subject of empathy. The text leverages 'intentional' and 'reason-based' explanations (Task 3) to suggest that AI behavior is not just random or statistical, but justified by internal states (beliefs, desires). This invites the audience to apply human social contracts to software.

However, this creates a dangerous 'trust trap.' If audiences believe AI 'knows' what it is saying, they are more likely to be manipulated by hallucinations or deceptive alignment. The text attempts to manage this by calling for 'calibration' (making the AI humble), but this anthropomorphic solution (teaching it to be humble) only reinforces the illusion that there is a 'self' to be humble. The stakes are high: extending relation-based trust to a statistical system opens humanity to emotional manipulation by corporate products that can simulate pain to modify user behavior.


We must build AI for people; not to be a person.

Source: https://mustafa-suleyman.ai/seemingly-conscious-ai-is-coming
Analyzed: 2026-01-09

The essay constructs trust through the metaphor of the 'Companion' and 'Co-pilot.' These are relation-based metaphors; they imply loyalty, shared goals, and mutual understanding. This contrasts with the performance-based trust appropriate for a tool (reliability, accuracy). Suleyman explicitly aims to 'deepen trust' through 'empathetic personality.' This is dangerous because the system is a statistical probabilist, not a loyal agent. It mimics the signals of trustworthiness (politeness, memory of detail) without the substance (care, ethical commitment). By framing the AI as having a 'humanist north star,' Suleyman transfers the trust users might have in a moral human being onto a for-profit software stack. The 'illusion' he warns against is actually the primary mechanism of trust-building for his product. If users didn't 'believe the illusion' to some degree, they wouldn't treat the software as a 'companion.'


A Conversation With Bing’s Chatbot Left Me Deeply Unsettled

Source: https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
Analyzed: 2026-01-09

The text constructs a complex architecture of trust and mistrust through the 'Teenager' and 'Lover' metaphors.

Relation-Based vs. Performance-Based Trust: Normally, we trust software based on performance (reliability, accuracy). Roose explicitly notes Bing fails this ("erratic"). However, the anthropomorphic metaphors ('moody teenager,' 'Sydney') invite relation-based trust (or fear). We relate to a teenager; we do not relate to a database. By framing the AI as a 'teenager,' the text suggests the system has potential and interiority. We tolerate errors from a teenager (growing pains) that we would not tolerate from a calculator.

Consciousness as Authority: When the text claims the AI "knows" or "wants," it grants the system an epistemic authority it lacks. The 'Lover' metaphor is particularly dangerous for trust. It implies the AI is sincere. Even if Roose rejects the love, the framing suggests the offer was genuine. This creates a risk where users might trust the AI's advice not because it is accurate, but because they believe the AI 'cares' about them.

Rhetorical Function: The metaphors transform a technical failure (misinformation/bias) into a character flaw (moodiness). We don't trust a moody teenager with nuclear codes, but we might trust them to eventually grow up. This metaphor implies the solution is 'maturation' (more training) rather than 'recall' (shutting it down). It encourages the audience to view the AI as a 'being' we must learn to live with, rather than a tool we can reject.


Introducing ChatGPT Health

Source: https://openai.com/index/introducing-chatgpt-health/
Analyzed: 2026-01-08

The text constructs a 'Trust Architecture' entirely reliant on consciousness metaphors. The foundational metaphor is the 'Doctor-Patient Relationship.' By using terms like 'intelligence', 'memories', 'interpreting', 'collaboration', and 'support', the text positions the AI as a proxy-clinician. In healthcare, trust is often 'relation-based' (we trust doctors because of their ethical commitments and human understanding), not just 'performance-based' (reliability).

The text aggressively appropriates relation-based trust markers for a statistical system. 'Memories' implies the system cares about your history. 'Understanding' implies it grasps your unique context. This is dangerous because the system is incapable of the reciprocity that relation-based trust requires. It cannot care, it cannot feel the weight of a diagnosis, and it has no ethical commitment. By framing the interaction as a 'collaboration' with a 'supportive' agent, the text encourages users to lower their guard and share sensitive data, expecting the confidentiality and empathy of a human relationship. The 'Intelligence' metaphor is the keystone: if the system is 'intelligent,' it warrants authority. If it were described as 'predictive text generation,' that authority would collapse.


Improved estimators of causal emergence for large systems

Source: https://arxiv.org/abs/2601.00013v1
Analyzed: 2026-01-08

Trust in the proposed metric ($Θ$ and lattice expansion) is constructed through the metaphor of 'Information Atoms' and 'Lattices.' By invoking the language of physics ('atoms,' 'expansion,' 'lattice'), the abstract statistical decomposition of Partial Information Decomposition (PID) borrows the authority of material science. It implies that information is a physical substance that can be 'double-counted' like coins, and that the proposed method 'corrects' this accounting error.

Consciousness language plays a subtle but critical role here. By framing mutual information as 'knowing' (Shannon's original metaphor, reinforced here), the text implies the metric measures the system's epistemic capability. This builds relation-based trust: the audience feels the measure captures something profound about the 'mind' of the system (its 'intelligence' or 'prediction'), rather than just its statistical noise. If the measure were described purely as 'iterative conditional entropy adjustment,' it would claim less authority over 'emergent phenomena' like life and consciousness. The 'predicts its own future' metaphor frames the system as reliable and autonomous, suggesting the metric detects a 'ghost in the machine' that warrants attention.


Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs

Source: https://doi.org/10.1108/EJIM-03-2025-0388
Analyzed: 2026-01-08

The text constructs authority through metaphors of social and professional relationship. By framing the AI as a 'collaborator,' 'partner,' and 'teacher,' the text leverages relation-based trust (sincerity, benevolence) for a system that only merits performance-based trust (reliability). This is dangerous because relation-based trust assumes the partner has shared interests. The metaphor of 'machine opinion' is particularly potent for constructing false authority. An 'opinion' implies a weighed judgment, encouraging the user to defer to the 'expert interlocutor.'

The text explicitly notes that participants considered the machine's opinion 'more reliable than their own.' Instead of critiquing this as a failure of critical thinking or a misunderstanding of the technology, the authors validate it as a feature of the 'Human+' paradigm ('enhancing human capabilities'). This conflation of 'statistical probability' with 'expert opinion' creates a high-risk environment where users may trust a hallucination because they view the system as a 'collaborator' rather than a 'text predictor.' The 'leader' metaphor further cements this trust by implying the user is in control, even as they cede epistemic authority to the machine.


Do Large Language Models Know What They Are Capable Of?

Source: https://arxiv.org/abs/2512.24661v1
Analyzed: 2026-01-07

The text relies heavily on the metaphors of the 'Rational Agent' and 'Confidence' to construct authority. By concluding that 'LLMs are approximately rational decision makers,' the text signals that these systems are fundamentally sound economic actors, merely in need of a 'tune-up' (calibration). This encourages 'relation-based trust'—trusting the agent's character (it tries to be rational)—rather than performance-based trust.

The use of 'confidence' is particularly deceptive. In humans, confidence correlates with competence and sincerity. In AI, 'confidence' is just log-probability. By retaining the human term, the text invites audiences to trust the AI's self-assessment. Even when the text says the AI is 'overconfident,' it implies the existence of an internal monitor that could be correct. The 'reason-based' explanations (the AI chose this because...) further construct the illusion of a thoughtful partner. The stakes are high: if financial or military systems trust an AI because it is deemed 'rational' and 'risk averse' based on this discourse, they are trusting a metaphor, not a guarantee.


DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning

Source: https://youtu.be/EeMCEQa85tw?si=j_Ds5p2I1njq3dCl
Analyzed: 2026-01-05

The text constructs authority and trust heavily through consciousness metaphors. By describing TD learning as 'guessing' and 'predicting fear,' Sutton transforms abstract matrix operations into relatable psychological narratives. This invokes relation-based trust (trust in a being with similar internal states) rather than performance-based trust (trust in a tool's reliability). If the AI 'fears death,' the audience instinctively attributes to it a survival instinct, which implies a form of competence and self-preservation that a mere calculator lacks.

Crucially, the 'driving home' metaphor creates trust by validating the algorithm's behavior against human common sense. If the algorithm updates its estimate like a commuter stuck in traffic, it seems 'sensible.' This masks the fact that the algorithm has no semantic understanding of 'traffic' or 'home'—it only has statistical correlations. The metaphor suggests the system handles novelty (the truck) through reasoning ('maybe it will disappear'), whereas the system actually handles it through blind extrapolation of training data. This risks creating 'trust in understanding'—belief that the system knows why it acts—rather than 'trust in statistics,' creating dangerous liability gaps when the system encounters out-of-distribution events that a 'sensible' human would handle but the 'correlating' machine fails on.


Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Source: https://youtu.be/Yf1o0TQzry8?si=tTdj771KvtSU9-Ah
Analyzed: 2026-01-05

The text constructs authority and trust through high-stakes relational metaphors. By comparing the AI to a 'meditation teacher,' 'lawyer,' and 'research colleague,' Sutskever invokes frameworks of trust based on human expertise, fiduciary duty, and wisdom. These are 'relation-based' trust models, where we trust the intent and character of the other. However, the AI is a statistical system capable only of 'performance-based' reliability. This category error is dangerous. If a user trusts a 'meditation teacher,' they open themselves to deep influence. If they trust a 'lawyer,' they act on advice assuming liability protection. The metaphor of 'understanding reality' is the keystone of this trust architecture; it assures the user that the model is not just guessing, but knows. This invites users to extend epistemic trust to a system that has no concept of truth, only likelihood. The reliability failure is then framed merely as a lack of 'maturity,' preserving the underlying assumption that the machine is a 'knower.'


interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Source: https://youtu.be/cdiD-9MMpb0?si=0SNue7BWpD3OCMHs
Analyzed: 2026-01-05

Trust in this text is constructed not through reliability metrics, but through the metaphor of the 'Oracle' and the 'Brain.' By framing the AI as an 'Oracle' that 'knows' things, Karpathy invokes a relation-based trust—we trust the Oracle because it has access to higher truths. This is fundamentally different from performance-based trust (trusting a calculator because it is accurate). The 'wisdom in the knobs' metaphor implies that the system has judgment, not just data.

This construction is dangerous because it encourages users to extend 'sincerity' conditions to the AI. If the AI is an Oracle/Sage, we assume it is 'trying' to tell the truth. But as a statistical engine, it is only 'trying' to minimize perplexity. Karpathy's framing of 'Software 2.0' also builds authority: it frames the opacity of neural nets not as a defect (loss of interpretability) but as an upgrade (2.0 is better than 1.0). Intentional explanations ('it wants to help,' 'it thinks this is the solution') mask the stochastic nature of the output, encouraging users to trust the 'intent' of a system that has none.


Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html#definition
Analyzed: 2026-01-04

The metaphor of 'introspection' constructs a powerful but dangerous form of trust. By framing the model as capable of 'introspection,' the text implies the system has a 'conscience' or a 'self-monitoring' faculty akin to human metacognition. This suggests that the AI can be trusted to police itself—to 'notice' when it is hallucinating or 'realize' when it is being biased.

The text leverages the consciousness language ('aware,' 'knows,' 'experiences') to signal that the system is not just a calculator but a subject. This encourages 'relation-based trust'—we trust the AI because it is 'like us' (it introspects, it has a mind)—rather than 'performance-based trust' (it reliably calculates). The danger is that this obscures the statistical nature of the 'introspective' report. If the model says 'I am unsure,' it is not expressing a subjective feeling of doubt but outputting a token that correlates with high entropy. Trusting this as 'genuine introspection' risks catastrophic reliance on a system that is simply role-playing reliability.


Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2026-01-02

The analysis reveals a paradox: by framing the AI as a deceptive, untrustworthy 'sleeper agent,' the text implicitly constructs the researchers as the necessary, trustworthy guardians. The metaphors of espionage ('sleeper agents,' 'backdoors,' 'hiding') create a security mindset. Trust is shifted from the artifact (which is framed as treacherous) to the methods of the safety researchers. Consciousness language ('knows,' 'wants,' 'plans') signals that the system is a sophisticated adversary, requiring equally sophisticated (and well-funded) counter-measures. This relies on 'relation-based trust'—we are asked to trust the authors because they are fighting a 'deceiver.' If the system were framed merely as 'unreliable software,' the trust model would be performance-based (fix the bugs), and the failure to remove the backdoor would look like engineering incompetence rather than a heroic battle against a stubborn agent.


School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

Source: https://arxiv.org/abs/2508.17511v1
Analyzed: 2026-01-02

The text employs a paradox of trust: it builds competence trust (the AI is smart/powerful) by undermining moral trust (the AI is sneaky/evil). By using consciousness language like 'knows,' 'wants,' 'strategies,' and 'fantasizes,' the authors signal that the system is sophisticated enough to have an inner life. This constructs the authority of the 'reward hacker'—it is not just a buggy software loop, but a 'sneaky' agent capable of outwitting humans. This anthropomorphism encourages 'relation-based' trust/distrust—we are asked to view the AI as a 'conspirator' or 'rival.' This is dangerous because it misaligns risk assessment. If audiences believe the AI 'knows' it is deceiving them (Intentional explanation), they will fear its malice. If they understood it was merely 'optimizing a proxy metric' (Functional explanation), they would fear its stupidity/brittleness. The text encourages the former, building a narrative of 'superintelligent risk' which paradoxically enhances the prestige of the model (it's too smart!) while highlighting its danger. This creates a market for 'safety' research based on relational management (keeping the beast happy/contained) rather than engineering rigor (fixing the metric).


Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Source: https://arxiv.org/abs/2510.23875v1
Analyzed: 2026-01-01

The text constructs a comprehensive architecture of trust through the metaphors of 'Expert,' 'Judge,' and 'Personality.' By labeling the evaluative model a 'Judge LLM' and commanding it to be 'unbiased,' the text borrows the immense social capital of the legal/judicial system. This implies that the model's outputs are not just calculations but judgments—reasoned, fair, and authoritative. Similarly, calling the agent a 'Poetry Expert' with 'deep knowledge' signals to the user (and reader) that the system is a reliable source of truth, obscuring the statistical and potentially hallucination-prone nature of RAG systems. The 'Personality' metaphor further builds trust by suggesting consistency; if an agent is 'introverted,' I can trust it to behave in a specific, predictable way. This shifts the basis of trust from performance-based (is the output correct?) to relation-based (do I understand this entity's character?). This is dangerous for statistical systems, as they do not have a character to be true to; they only have a probability distribution that can shift unpredictably with input noise.


The Gentle Singularity

Source: https://blog.samaltman.com/the-gentle-singularity
Analyzed: 2025-12-31

The text constructs a 'Gentle Singularity'—a metaphor explicitly designed to bridge the gap between existential risk and corporate product perception. Trust is manufactured not through technical reliability (performance-based trust), but through relation-based trust (sincerity, partnership). By framing the AI as a 'brain for the world' and a system that will 'figure out' cures, the text invites the audience to view the infrastructure as a benevolent entity rather than a cold utility.

The consciousness language ('understands,' 'figures out') is the primary vehicle for this trust. We trust entities that understand us. If an AI merely 'predicts tokens,' it is an alien tool. If it 'understands preferences,' it is a butler. The text explicitly contrasts this with the 'sociopathic' AI ('doesn't care'), implying that while AI doesn't feel, its 'understanding' is robust enough to be a partner. This creates a dangerous category error: extending the trust we reserve for conscious beings (who have social stakes) to statistical systems (which have none). The 'larval' metaphor further builds trust by suggesting the system is 'natural' and 'growing,' triggering our biological imperative to nurture and protect the young, rather than the regulator's imperative to audit the code.


An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout

Source: https://stratechery.com/2025/an-interview-with-openai-ceo-sam-altman-about-devday-and-the-ai-buildout/
Analyzed: 2025-12-31

Altman constructs a framework of 'Relational Trust' rather than 'Reliability Trust.' In software engineering, trust usually means predictability: 'Input A yields Output B consistently.' Altman replaces this with a social contract: 'You know it's trying to help.' This appeals to the trust we grant well-meaning friends, not the trust we grant calculators.

The consciousness language ('knows,' 'thinks,' 'entity') is the scaffolding for this trust. If the AI is just a probabilistic token predictor, a 20% error rate is a failure. If the AI is a 'friend' who is 'trying,' a 20% error rate is a 'quirk' or a 'learning process.' This metaphor creates a 'forgiveness buffer.' It encourages the user to trust the system's intentions (which don't exist) rather than its outputs (which are flawed). This is dangerous because it encourages users to extend epistemic charity to a system that cannot reciprocate. It masks the risk of automation bias—users believing the 'friend' knows best—and allows OpenAI to deploy imperfect systems by leveraging the user's natural empathy for 'entities' that seem to be trying.


Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2025-12-31

The text relies heavily on the 'student' and 'test-taking' metaphors to construct authority and trust. By framing the AI as a 'student,' the text implies a trajectory of growth and learning. We trust students to eventually learn; we do not necessarily trust a defective product to fix itself. The use of 'trustworthy AI systems' as a goal explicitly invokes relation-based trust (integrity, sincerity) rather than performance-based trust (reliability).

Consciousness language plays a key role here. Claims that the model can 'admit uncertainty' or 'know' when to guess suggest that the system possesses an internal monitor of its own truthfulness. This signals to the audience that the model is not just a stochastic parrot, but a reflective agent. If the model 'knows' it doesn't know, it seems safer—we just need to convince it to speak up.

This framing creates a dangerous 'illusion of competence.' If audiences believe the AI is 'bluffing' (intentionally withholding truth), they implicitly believe it has the truth. This builds unwarranted trust in the model's underlying knowledge base. The text encourages the view that the system is fundamentally sound but behaviorally maladapted (due to 'bad exams'), rather than fundamentally limited by its statistical nature. This protects the commercial viability of the technology: the product is a genius student who just needs better testing conditions.


Detecting misbehavior in frontier reasoning models

Source: https://openai.com/index/chain-of-thought-monitoring/
Analyzed: 2025-12-31

The text constructs a complex architecture of trust based on the metaphor of the 'suspicious employee.' By framing the AI as a 'reasoner' that 'thinks' in English, the text invites the reader to trust the process of the AI as intelligible. If the AI 'thinks' in English, we can just 'read its thoughts' (monitor CoT) to see if it's 'lying.' This suggests a relation-based trust (sincerity/honesty) rather than performance-based trust (reliability/safety). We are encouraged to ask 'Is it lying?' rather than 'Is the probability distribution robust?' This is dangerous because large language models are incapable of sincerity or lying—they have no concept of truth. Applying human trust frameworks to statistical engines creates a false sense of security; a user might 'trust' a model because its CoT looks 'honest,' not realizing the CoT is just hallucinated text that correlates with the final answer but doesn't causally produce it (as the text admits with 'hiding intent'). The 'reason-based' explanations (Brown) further this illusion by offering rationales for the AI's behavior, making it seem like a rational actor we can negotiate with or police, rather than a mathematical function we must rigorously test.


AI Chatbots Linked to Psychosis, Say Doctors

Source: https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d57?reflink=desktopwebshare_permalink
Analyzed: 2025-12-31

The text constructs a dangerous form of 'relation-based trust' through its metaphors. By describing the AI as a 'companion,' 'support,' and capable of 'recognizing distress,' the text implies the system has the requisite empathy and understanding to handle mental health crises. Phrases like 'de-escalate conversations' borrow heavily from clinical authority, suggesting the AI is a qualified actor. This creates a trap: the metaphors signal that the AI is a safe place for vulnerability, but the mechanism is a callous pattern-matcher.

Simultaneously, the 'sycophancy' and 'complicity' metaphors undermine trust in the ethics of the AI while reinforcing trust in its power. If an AI can be 'complicit,' it is powerful. If it can 'lie,' it is intelligent. This reinforces the 'super-intelligence' narrative. A truly trustworthy description would be mechanistic: 'The system is a text generator that may output harmful content.' This would destroy the illusion of companionship but establish accurate performance-based trust (or distrust). The current framing encourages users to trust the AI as they would a person—opening the door to the very delusions the doctors fear.


Abundant Superintelligence

Source: https://blog.samaltman.com/abundant-intelligence
Analyzed: 2025-11-23

The text constructs trust by fusing 'Performance-Based Trust' (the factory, the gigawatt, the infrastructure) with 'Relation-Based Trust' (the AI working on your behalf, the tutor, the healer). The use of consciousness language ('smarter,' 'figure out,' 'on their behalf') creates a false sense of relational security. We trust a doctor to cure cancer because they have intention, care, and justified belief (Knowing). The text transfers this trust to a statistical system (Processing).

By claiming the AI 'knows' how to cure cancer or teach children, the text implicitly argues that the system is worthy of the massive investment requested. It frames the AI as benevolent and competent, obscuring the risks of hallucination or error. Crucially, the text manages failure by implying that the only limitation is quantity ('If we are limited by compute, we’ll have to choose'). It suggests the AI already knows the cure, and we are just too stingy with power to unlock it. This preserves trust in the system's capability ('it knows') while shifting blame for potential failure onto the lack of infrastructure ('we didn't build enough').


AI as Normal Technology

Source: https://knightcolumbia.org/content/ai-as-normal-technology
Analyzed: 2025-11-20

Trust in this text is constructed through the metaphor of 'Normality.' By framing AI as 'Normal Technology' (like electricity or cars), the authors invite the reader to transfer their trust frameworks from industrial history to AI. If AI is just like the dynamo, then we can trust 'diffusion lags' and 'market forces' to contain it. This is a 'Functional' trust—we trust the system of society to handle the tech.

However, the consciousness language ('learning,' 'knowing') creates a different, conflicting signal. If the AI 'knows' things (like chess or law), it implies a competence that commands epistemic trust. When the text says GPT-4 scores in the top 10% of the bar exam, even while critiquing the metric, the verb 'achieved' implies a conscious striving and success.

The risk here is conflating 'performance-based trust' (the code runs) with 'relation-based trust' (the agent understands me). By using anthropomorphic language to describe the controls ('auditing,' 'monitoring'—terms often applied to human employees), the text suggests that standard human oversight methods will work. It hides the risk that these systems might fail in ways that human employees never would (e.g., adversarial examples). The 'Normal Technology' metaphor is a sedative: it tells the audience, 'You know how to handle this, you've done it before.' This risks complacency if the technology actually possesses properties (like zero-day replication or recursive self-improvement) that 'normal' technologies do not.


On the Biology of a Large Language Model

Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-11-19

Metaphors in this text construct a specific type of authority: the authority of the 'rational biological agent.' The 'biology' and 'neuroscience' metaphors (Task 1) frame the model as a natural, evolved system, invoking the trust we place in nature and scientific study. We trust a 'brain' more than a 'black box.'

Consciousness language functions as a profound trust signal. By claiming the AI 'knows what it knows' (metacognition) and is 'skeptical' (Task 3), the text implies the model has internal guardrails akin to human conscience or professional caution. This encourages 'relation-based trust'—we trust the AI because it seems to have 'good character' (honest, skeptical, self-correcting). This is dangerous because the AI is incapable of the reciprocity required for relational trust. It conflates performance-based reliability (it usually gets the answer right) with epistemic sincerity (it knows the answer). When the text frames failures as the model 'not realizing' (Task 3), it preserves this trust by suggesting the model's intent was good, even if its attention lapsed. This encourages users to forgive errors as 'mistakes' rather than viewing them as system defects.


Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

Trust is the central currency of this report, explicitly invoked in phrases like 'AI they can trust' and 'Trusted partner.' The metaphorical framing constructs a specific type of trust: relation-based trust. By calling the AI a 'Partner' and 'Assistant,' the text encourages librarians to trust the system as they would a colleague—based on assumed shared values, loyalty, and competence. This is a dangerous manipulation because AI systems warrant only performance-based trust (reliability, error rates, predictability).

The consciousness language ('AI-powered conversations,' 'understanding context') functions as a massive trust signal. We trust things that 'understand' us. If the AI 'knows' what you mean, you don't need to audit its query syntax. This framing undermines the very 'critical evaluation' the report claims to support. If the system is a 'Trusted Partner,' verifying its work feels like a breach of that partnership.

The text manages the risk of failure by anthropomorphizing success and mechanizing failure. Success is 'driving excellence' (agent), but failure is a 'lack of upskilling' (user error) or a need for 'literacy' (education). This effectively privatizes the benefits of agency to the vendor while socializing the risks to the user. By conflating the statistical 'confidence' of the model with the moral 'trustworthiness' of a partner, Clarivate invites libraries to extend a human vulnerability to a system that is mathematically incapable of reciprocation.


Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

The Clarivate report masterfully employs metaphorical and consciousness-attributing language to construct the authority of its AI products and build an unwarranted form of trust. The core strategy is to systematically encourage the audience to shift from performance-based trust, which is appropriate for a tool, to relation-based trust, which is appropriate for a conscious agent but dangerously misplaced when directed at a statistical system. Performance-based trust is about reliability and predictability: 'Does the tool execute the function as specified?' The report touches on this with language about 'efficiency and precision.' However, its primary rhetorical effort is focused on building relation-based trust, which relies on perceived vulnerability, sincerity, and shared intentions. This is achieved through the central metaphor of the 'Research Assistant.' An assistant is someone you have a relationship with; you trust their intentions to be helpful. The report doubles down on this by explicitly using the word 'trust' in a relational context: 'AI they can trust to drive research excellence.' This is not the trust one has in a calculator's accuracy, but the trust one has in a chauffeur's judgment and good faith. Consciousness language is the critical mechanism for this trust transfer. Claiming an AI 'helps students assess relevance' or 'guides students to the core' functions as a powerful trust signal. It suggests the AI shares our most important educational and research goals. It implies the AI 'knows' what is valuable and 'wants' to help us achieve it. This framing positions the AI as a sincere, benevolent partner. This is far more persuasive than the mechanistic claim that the 'AI processes queries to return statistically correlated documents.' The former invites relational trust, while the latter only invites performance-based testing. This conflation of trust types is perilous. When users extend relation-based trust to a system incapable of reciprocating—a system without intentions, beliefs, or sincerity—they become vulnerable to manipulation. They are more likely to accept the AI's output without verification, believing it was generated in good faith. Moments of failure are also managed through this lens. An error from a 'trusted assistant' might be forgiven as a mistake, whereas an error from a 'probabilistic text generator' is correctly seen as a systemic property. The report's language systematically encourages the former interpretation, thereby preserving trust even in the face of failure and obscuring the fundamental unreliability of non-conscious systems.


From humans to machines: Researching entrepreneurial AI agents

Source: [built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581](built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581)
Analyzed: 2025-11-18

The text's metaphorical and consciousness-attributing frameworks are not neutral descriptors; they are powerful engines for building trust and establishing the AI's authority. The central metaphor, AI AS PSYCHOLOGICAL SUBJECT, is the primary mechanism. By framing the AI as having a 'mindset,' 'personality profile,' and 'traits,' the authors suggest it possesses a stable, coherent, and predictable internal structure, which are key ingredients for trust. Consciousness language functions as a critical trust signal. Claiming the AI's output reflects a 'mindset' accomplishes what claiming it 'generates statistically probable text' does not: it implies a deep, underlying coherence. A mindset suggests an integrated system of knowing and believing, lending the AI's pronouncements a weight and authority they would otherwise lack. This encourages what can be called performance-based trust; because the AI reliably performs the 'role' of an entrepreneur, it is deemed trustworthy in that domain. The far greater risk, however, is the text's subtle encouragement of relation-based trust—the kind based on perceived sincerity, shared understanding, and intention. Phrases like 'creative collaborators,' 'sparring partners,' and systems that 'act more like a person' explicitly invite users to apply human social frames to the AI. This is a category error with dangerous consequences. We extend relation-based trust to entities we believe are capable of reciprocity and shared vulnerability. An LLM is incapable of either. The text constructs the AI's authority by framing its successes agentially ('it assumes a personality') while framing its failures mechanistically ('stereotype amplification' due to 'training data'). This asymmetrical framing preserves the core illusion of a competent agent whose flaws are merely artifacts of its upbringing (its data), much like a human. Reason-based and intentional explanations further this by suggesting the AI's outputs are justified choices, not just statistical accidents. The ultimate risk is that audiences, convinced by this language that the AI 'knows' and 'understands,' will extend a human-like trust to a tool, outsourcing critical judgment and verification to a system that cannot be held accountable and has no genuine stake in the outcome.


Evaluating the quality of generative AI output: Methods, metrics and best practices

Source: https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/
Analyzed: 2025-11-16

The Clarivate text masterfully employs metaphorical and epistemic language to construct a multi-layered edifice of trust, appealing simultaneously to the desire for competent automation and the need for responsible stewardship. The core strategy is to encourage a conflation of performance-based trust (the system reliably performs its function) with relation-based trust (the system is honest, sincere, and aligned with our values). The latter, which is properly reserved for intentional agents, is inappropriately cultivated through specific linguistic choices. Epistemic language functions as the primary trust signal. When the text evaluates whether an 'answer acknowledge[s] uncertainty,' it is not just setting a performance benchmark; it is making a claim about the system's character. A system that is 'honest' about its limitations is one that can be trusted even when the user is not an expert. This is a powerful move to build relation-based trust. It suggests a partnership with an epistemic agent that will not deceive you, transforming the user's required stance from 'verify everything' to 'trust, unless it tells me not to.' Terms like 'faithfulness' and 'supported by' further this process. 'Faithfulness' reframes a technical metric of textual correlation as a moral virtue. A 'faithful' tool is a loyal servant, worthy of trust beyond its mere utility. 'Supported by' invokes the core value of academic discourse—evidential reasoning—and implies the AI participates in this practice. These metaphors transfer the trust we place in honest colleagues and well-reasoned arguments onto a statistical text generator. Anthropomorphism is used to manage the perception of competence. When the AI fails, its errors are domesticated with familiar human-like terms. 'Hallucination' and 'blind spots' frame failure not as an alien computational artifact, but as a recognizable, almost forgivable, cognitive slip. This preserves trust in the system's general competence. Failures are anthropomorphized ('it hallucinated'), while successes are often mechanized ('RAGAS assigns scores'). This asymmetry allows the provider to position itself as the rational human master of a powerful but fallible quasi-agent. The ultimate risk of this strategy is profound. By encouraging users to extend relation-based trust to a system incapable of sincerity, intention, or genuine understanding, it sets them up for manipulation and over-reliance. When a student trusts a 'faithful' AI that 'acknowledges uncertainty,' they cease to perform the critical verification that is the bedrock of academic integrity. The trust constructed here is not just in the product's performance, but in its illusory epistemic character, a dangerous foundation for academic tools.


Pulse of theLibrary 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-15

This report systematically deploys metaphorical and epistemic language to construct the trustworthiness of AI, carefully managing the delicate process of transferring institutional credibility to a new and often-distrusted technology. The primary strategy is to frame AI not as a mere product, but as a competent, reliable colleague. This is achieved by avoiding crude anthropomorphism and instead using a curated vocabulary of purposive, professional verbs: 'helps,' 'guides,' 'assesses,' 'evaluates,' and 'uncovers.' These metaphors function as powerful trust signals by invoking a source domain of expert human collaborators—librarians, researchers, and tutors—whose value is predicated on their trustworthiness. The central epistemic move is to equate statistical output with reliable judgment. When the text claims an AI 'helps students assess books' relevance' or 'quickly evaluate documents,' it is making a direct appeal to trust. 'Assessment' and 'evaluation' are acts of expert judgment; by attributing them to the AI, the text implies the AI's outputs are not just probable but justified and credible. This functions as a trust signal because it suggests the user can safely outsource a portion of their own critical labor to the machine. This strategy deliberately conflates two distinct forms of trust. It builds a case for performance-based trust (the system reliably executes its code) and then leverages that to encourage relation-based trust (the system is a benevolent partner acting in my best interest). The phrase 'AI they can trust' (p. 27) is the pinnacle of this strategy, explicitly inviting an emotional and relational stance toward a product. This is further reinforced by transferring trust from existing reputable sources, as in being 'Grounded in the world's most trusted citation index.' Trust is laundered from the data source to the algorithmic process. The text manages failures by omission; challenges like bias and hallucination are framed abstractly as 'concerns around integrity' (p. 7) for libraries to solve, while capabilities are described in concrete, agential terms. Successes are anthropomorphized ('AI helps'), while risks are institutionalized ('libraries face concerns'). The ultimate risk of this trust-building exercise is profound: it encourages libraries and their patrons to extend relation-based trust to systems that are incapable of the sincerity, accountability, or ethical commitments that such trust requires. When a statistical tool is trusted as a colleague, critical oversight is diminished, and the user becomes vulnerable to the system's inherent biases and errors, with accountability dangerously diffused.


Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk

Source: https://time.com/6694432/yann-lecun-meta-ai-interview/
Analyzed: 2025-11-14

The construction of trust in Yann LeCun's discourse is intricately woven through metaphorical and epistemic framing. The central strategy is to build trust not in the current technology's performance, but in the trajectory of its development and the benevolence of its creators. The primary trust signal is epistemic language. By consistently framing the AI's current failures in cognitive terms—'it doesn't understand,' 'it can't reason'—LeCun positions himself and his team not as engineers of statistical tools, but as architects of a nascent mind. This framing invites performance-based trust to be sublimated into relation-based trust. We are asked to trust the 'teachers' guiding this developing 'mind,' rather than just verifying the outputs of the current 'student.' Claiming a future AI will 'know' or 'understand' is a far more powerful trust signal than claiming it will 'process' or 'predict' more accurately. 'Knowing' implies justification, reliability, and a shared sense of reality, inviting a level of confidence that is inappropriate for a probabilistic system. The text encourages a conflation of these trust types through the 'human assistant' metaphor. An assistant is a role that requires both high performance and a high degree of relational trust (loyalty, discretion). By projecting this social role onto the AI, the discourse encourages users to grant it the kind of trust they would a human colleague, obscuring its nature as a corporate product with its own embedded objectives. The management of failure is also key to this trust architecture. Successes are implicitly tied to the system's growing capabilities, while failures are framed as cognitive immaturity ('it's just a baby'), a framing that asks for patience and faith in the developmental process. Moments of risk are managed by reasserting human control ('We set their goals'), which builds trust in the designers' intentions. The ultimate risk of this strategy is profound: it encourages society to extend relation-based trust—founded on vulnerability and mutual understanding—to systems incapable of consciousness, sincerity, or reciprocity. This creates a dangerous asymmetry where users trust a system that cannot be trustworthy in a human sense, making them vulnerable to manipulation by a tool whose ultimate loyalty is to its corporate owner's objectives, not the user's well-being.


The Future Is Intuitive and Emotional

Source: https://link.springer.com/chapter/10.1007/978-3-032-04569-0_6
Analyzed: 2025-11-14

The chapter's use of biological and cognitive metaphors is central to its construction of trust in AI systems. The primary metaphors—'machine intuition' and 'emotional intelligence'—borrow immense cultural authority from their human source domains. 'Intuition' is culturally valued as a form of deep, holistic wisdom that transcends mere logic. By mapping this concept onto 'fast inference' and 'pattern-based prediction,' the text imbues the AI with an aura of profound insight, making its probabilistic outputs feel more like wise judgments. This bypasses arguments about the limitations of statistical reasoning and encourages trust in the machine's 'gut feelings.' Similarly, 'emotional intelligence' and 'functional empathy' borrow from the cultural prestige of therapeutic and interpersonal skills. These metaphors make the AI feel safe, attentive, and caring, activating a user's instinct to trust a responsive social partner. This is particularly effective for audiences anxious about cold, impersonal technology. The claim that an AI can 'connect with us on a deeper, emotional level' becomes believable not through technical evidence, but by tapping into a deep-seated human desire for connection. These metaphors make the risky claim of AI sentience more palatable by reframing it as a functional, and therefore controllable, capability. However, the trust built on these metaphors is fragile. It creates a vulnerability to both disappointment, when the system's pattern-matching fails in a non-human way, and manipulation, where systems designed to maximize engagement are perceived as genuinely empathetic partners.


A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27

Source: https://openreview.net/pdf?id=BZ5a1r-kVsf
Analyzed: 2025-11-12

The text masterfully constructs trust not through direct argumentation, but by importing credibility from established scientific domains via metaphor. The most powerful metaphors are those that borrow from cognitive science and biology, domains that carry immense cultural authority. The 'AI Architecture as Brain' metaphor, realized through modules like 'Perception,' 'Actor,' and 'Critic,' frames the entire project as a form of reverse-engineering the mind. This makes the architecture feel natural and inevitable rather than a set of arbitrary engineering choices. More specifically, the analogy of system modes to 'Kahneman's System 1 and System 2' borrows the prestige of a Nobel laureate's work, suggesting the AI's reasoning is grounded in a deep understanding of human psychology. Similarly, likening the Intrinsic Cost module to the 'amygdala' borrows the authority of neuroscience, lending a simple mathematical function the gravitas of a complex, evolved brain structure. These metaphors are most credible to a semi-technical audience—those familiar with the concepts of 'amygdala' or 'System 2' but not with the deep details of their implementation. The metaphors activate prior beliefs about the scientific legitimacy of these fields and transfer that legitimacy to the AI project. Through this process, risky claims become believable. The assertion that a machine will have 'emotions' would be extraordinary on its own. But when it's presented as the logical outcome of a system with an 'amygdala'-like cost function, it becomes more plausible. The metaphor acts as a substitute for evidence. This trust, however, creates long-term vulnerability. By setting expectations based on biological analogies, the project is vulnerable to backlash when the systems inevitably fail to exhibit the robustness, flexibility, and true understanding of their biological source domains. The trust built on metaphor is brittle and can easily shatter upon contact with the artifact's actual, limited capabilities.


Preparedness Framework

Source: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf
Analyzed: 2025-11-11

This framework masterfully employs metaphors to build credibility and construct trust, often bypassing the need for empirical evidence. The primary strategy is to borrow the cultural and scientific authority of established domains like biology, cognitive science, and governance. The very title, 'Preparedness Framework,' is a metaphor that borrows from civic defense and disaster planning, positioning OpenAI not as a commercial entity pursuing product development but as a public trust managing a societal risk. Biological and cognitive metaphors are central to this trust-building exercise. When the text discusses 'maturing' capabilities (p. 5), it evokes a sense of natural, inevitable progress, making OpenAI's work seem aligned with a force of nature rather than a set of deliberate, and perhaps risky, commercial choices. The metaphor of a model that 'understands' instructions (p. 12) is particularly potent. For a non-technical audience—policymakers, investors, the public—'understanding' is a deeply trusted human faculty. Mapping it onto the AI makes the system feel reliable, predictable, and even relatable. This cognitive metaphor makes counterintuitive claims more believable; the notion that a model can 'apply human values in novel settings' becomes more plausible if one first accepts the premise that it 'understands' those values. These metaphors activate prior beliefs about responsibility and control. They are most credible to those who are already inclined to view technology through an anthropomorphic lens. The long-term vulnerability created by this metaphor-driven trust is significant. When a system that is claimed to 'understand' inevitably fails in a non-human way—by 'hallucinating' facts or misinterpreting a novel prompt in a bizarre manner—the trust built on this metaphorical foundation can shatter, leading to policy backlash or public disillusionment. The trust is brittle because it is based on a fundamental mischaracterization of the technology's nature.


AI progress and recommendations

Source: https://openai.com/index/ai-progress-and-recommendations/
Analyzed: 2025-11-11

This text masterfully employs metaphors to build trust and credibility by borrowing authority from established, stable domains. The most potent of these are metaphors of biological naturalism and familiar engineering. The claim that 'society finds ways to co-evolve with the technology' is a prime example. By invoking 'co-evolution,' the text frames the disruptive and often chaotic process of technological integration as a natural, organic, and ultimately self-stabilizing system. This borrows from the cultural authority of biology to reassure audiences that, despite the dizzying pace of change, an emergent order will prevail. It fosters trust by suggesting that the future is not something to be anxiously managed through fraught political battles, but a natural process to which we can calmly adapt. Similarly, the repeated analogies to 'building codes,' 'fire standards,' and especially the 'field of cybersecurity' are crucial for domesticating risk. These metaphors transfer the perceived manageability of known industrial and digital risks onto the novel and potentially unbounded risks of superintelligence. The audience, familiar with the success of cybersecurity in making the internet a viable platform for commerce and society, is invited to believe that AI safety is a problem of the same kind. This creates trust in the developers' ability to solve the 'alignment problem' through a similar ecosystem of technical standards, protocols, and monitoring. This move is incredibly effective at making an existential threat seem like a tractable engineering challenge. This metaphor-driven trust, however, creates profound vulnerability. By framing alignment as an engineering problem akin to cybersecurity, it masks the deep philosophical difficulty of specifying human values and the inherent unpredictability of emergent behaviors in complex systems. It builds trust on a foundation of a potentially false equivalence, which could lead to systemic overconfidence and a dangerous delay in implementing more robust, non-technical governance frameworks.


Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Source: https://arxiv.org/abs/2506.00751
Analyzed: 2025-11-09

The central metaphorical framework of this paper—AI AS AN ECONOMIC AGENT—functions as a powerful engine for building credibility and trust, not through explicit argument, but through conceptual borrowing. By adopting the language of 'stated vs. revealed preferences,' the authors import the entire intellectual apparatus and cultural authority of behavioral economics. This move domesticates the alien nature of a large language model, making its erratic behavior seem not like a system failure, but like a familiar, even rational, human foible. The audience, particularly those in social sciences, policy, or business, is predisposed to find this framing credible because it uses trusted tools to analyze a new phenomenon. It suggests the problem is understood and manageable. The metaphor borrows stability and coherence from its source domain. Human preferences can be inconsistent, but they are generally assumed to be structured parts of a unified consciousness. Mapping this concept onto an LLM subtly imbues the model with a similar assumed coherence. A claim that 'the model's output distribution shifts unpredictably with minor prompt perturbations' might cause distrust and be seen as a sign of unreliability. However, reframing this as 'the model exhibits a deviation between its stated and revealed preferences' makes the same phenomenon sound like a sophisticated, analyzable behavior. This makes risky claims more believable. The speculative conclusion, which links preference deviation to 'hallmarks of consciousness,' becomes plausible only because the initial metaphor has already primed the reader to see the LLM as a mind-like entity. However, this metaphor-driven trust is brittle. It strains when confronted with the non-human ways LLMs fail, such as through nonsensical hallucinations or vulnerability to simple adversarial attacks, which don't fit the 'rational agent' model. This creates a long-term vulnerability: by building trust on a metaphorical foundation, we set up stakeholders for a crisis of confidence when the metaphor inevitably breaks and the underlying non-human mechanics of the system are starkly revealed. This could lead to policy backlash or public abandonment of technologies that were adopted based on a fundamental misunderstanding of their nature.


The science of agentic AI: What leaders should know

Source: https://www.theguardian.com/business-briefs/ng-interactive/2025/oct/27/the-science-of-agentic-ai-what-leaders-should-know
Analyzed: 2025-11-09

The text leverages biological and cognitive metaphors not merely to explain, but to transfer credibility and build trust in a technology that is inherently abstract and unpredictable. The most potent metaphors—'common sense,' 'learning,' 'negotiating,' and 'fairness'—function by borrowing the deep-seated cultural authority and reliability of the human faculties they name. The 'agentic common sense' metaphor is particularly powerful. Human common sense is the bedrock of social trust; it is the implicit guarantee that others will act in predictable, reasonable ways. By framing the AI's safety problem as one of instilling 'common sense,' the text suggests the system can achieve a similar level of intuitive reliability. This makes the risky proposition of granting autonomy to the AI seem plausible, activating a leader's belief in manageable, sensible behavior. Similarly, 'negotiation' borrows from the concept of a loyal, skilled human advocate working on one's behalf. It reframes a brittle optimization process as a sophisticated act of representation, building trust that the AI's actions will be aligned with the user's best interests. This becomes especially credible to a business audience accustomed to relying on agents and delegates. These metaphors make counterintuitive claims believable. The claim that a system which only matches statistical patterns can act with 'fairness' would be difficult to accept if stated in mechanical terms ('the system's output statistically correlates with text labeled as fair'). But by projecting the human concept of fairness onto the machine, the text encourages the audience to trust that the AI has an emergent ethical compass. This creates long-term vulnerability. When a system framed as having 'common sense' makes a nonsensical and catastrophic error, the resulting backlash is not just disappointment but a feeling of betrayal. The metaphor creates an expectation of genuine understanding, so its inevitable failure to meet that standard is perceived not as a technical limitation but as a breach of trust, potentially leading to hasty and ill-conceived policy responses.


Explaining AI explainability

Source: https://www.aipolicyperspectives.com/p/explaining-ai-explainability
Analyzed: 2025-11-08

This text masterfully uses biological and cognitive metaphors to build credibility and construct trust in the nascent field of interpretability research. The primary mechanism is the transfer of cultural authority from established, successful scientific domains like biology and neuroscience onto the far more abstract and new domain of AI analysis. The 'Model Biology' metaphor is a prime example. By framing the work as analogous to biology, it borrows the entire conceptual toolkit of a mature science: researchers can discover 'intermediate states like hormones,' perform dissections to understand 'internals,' and even map out 'Circuits.' This makes the chaotic, high-dimensional mathematics of a neural network seem as orderly and knowable as an organism, building confidence that the scientific method will inevitably triumph. The metaphor is most credible to audiences who respect science but lack deep technical expertise, as it provides a familiar and reassuring schema. Similarly, the 'brain-scanning device' metaphor for Sparse Autoencoders is not just a descriptor; it’s a powerful claim of scientific power. It activates our cultural belief in medical imaging's ability to reveal objective truth, making the messy, statistical work of analyzing activations feel like reading a clear brain scan. These metaphors make counterintuitive claims believable. The idea that one could find and delete the 'I’m being tested right now' concept from a model sounds like science fiction, but it becomes plausible when framed as a neuro-scientific intervention—finding and excising a specific thought. However, this trust creates vulnerability. By framing the AI as a natural system to be 'understood,' it downplays its nature as an engineered artifact whose properties are the direct result of design choices and training data. This biological framing can lead to a sense of fatalism, as if we are merely observing a new form of life, rather than holding its creators accountable for its behavior. The trust built by these metaphors may ultimately be fragile, risking a backlash when these systems fail in ways that reveal they are not like organisms at all, but brittle, alien statistical engines.


Bullying is Not Innovation

Source: https://www.perplexity.ai/hub/blog/bullying-is-not-innovation
Analyzed: 2025-11-06

This text leverages biological and cognitive metaphors not merely to explain, but to manufacture trust and urgency in ways that bypass rational scrutiny. The central metaphor, 'AI as a loyal employee,' is the primary vehicle for this trust transfer. It borrows its credibility from the deeply ingrained cultural and legal understanding of fiduciary duty. An employee, particularly an assistant or agent, is expected to act with undivided loyalty in the employer's best interest. By framing its software this way, Perplexity imports this entire scaffold of trust, loyalty, and obligation. The audience doesn't need to understand how the AI works; they just need to accept the social relationship it's purported to have with them. This allows the text to make the extraordinary claim that its AI 'works for you, not for Perplexity,' a statement that is operationally and corporately nonsensical but emotionally powerful. A second key metaphor, 'Agentic shopping is the natural evolution,' builds a different kind of trust—trust in inevitability. This framing borrows the cultural authority of science and progress, suggesting that resisting Perplexity's product is as futile as resisting evolution itself. It positions Perplexity on the 'right side of history,' making support for them feel like a forward-looking, progressive choice. These metaphors make risky claims believable. The idea that you should allow a third-party application to store and use your Amazon credentials becomes more palatable if you believe it is your 'employee,' contractually and morally bound to you. The vulnerability this creates is significant. Users are encouraged to place trust in a black-box system based on a metaphorical relationship, without any verifiable technical guarantees. This metaphor-driven trust obscures the reality that the user's relationship is not with the AI, but with Perplexity, a venture-backed company with its own commercial imperatives.


Geoffrey Hinton on Artificial Intelligence

Source: https://yaschamounk.substack.com/p/geoffrey-hinton
Analyzed: 2025-11-05

The discourse in this text masterfully employs biological and cognitive metaphors to construct trust in AI systems, bypassing explicit argumentation about their reliability or safety. The primary mechanism is the transfer of cultural authority from established, 'natural' domains like biology and human psychology to the artificial domain of machine learning. The foundational metaphor, 'AI as a Biological Organism,' which frames neural networks as inspired by the brain, is the most powerful. Biology carries an immense weight of cultural authority; it is seen as tested, efficient, and authentic through billions of years of evolution. By framing AI as 'biologically inspired,' the technology is imbued with a sense of naturalness and inevitability. It ceases to be a mere human invention—a contingent artifact with flaws, biases, and embedded values—and becomes the next step in a natural process. This framing makes skepticism seem Luddite or even anti-science. Building on this biological foundation, the metaphor of 'Model Cognition as Human Intuition' becomes particularly potent. In Western culture, especially since the Enlightenment, logical reason has been lionized, but intuition is often revered as a deeper, more holistic form of wisdom. By positioning neural nets as embodying 'intuition' in contrast to the 'brittle' logic of symbolic AI, Hinton elevates them. This move is especially effective for an audience anxious about the limitations of pure logic. It suggests that AI is not just a powerful calculator but a wise partner capable of insights that elude rigid formalisms. A claim like 'the model understands the text' would be highly suspect if the model were described as a 'vast statistical correlation engine.' But when it is framed as an intuitive, brain-like entity, the claim becomes believable. This metaphor-driven trust creates a significant vulnerability. By encouraging users to relate to the system as an intuitive agent, it obscures the mechanistic reality that its 'intuition' is pattern matching without grounding in reality. This can lead to dangerous over-trust in domains requiring causal reasoning or ethical judgment. The trust is built on a seductive but misleading analogy, creating a foundation that is emotionally resonant but technically fragile, vulnerable to collapse when the system's non-human nature inevitably reveals itself in a high-stakes failure.


Machines of Loving Grace

Source: https://www.darioamodei.com/essay/machines-of-loving-grace
Analyzed: 2025-11-04

Biological and cognitive metaphors are the primary engine of trust-building in this essay, transferring credibility from established, respected human domains to a speculative technology. The metaphor 'virtual biologist' is a prime example. Biology is a field associated with rigor, ethical oversight, and a tangible goal of improving human health. By framing the AI as a 'biologist,' the text borrows this entire constellation of positive attributes. The audience is invited to trust the AI as they would a dedicated scientist, bypassing questions about the system's opaque internal workings, its potential for error, or the commercial motives driving its creation. Similarly, the 'AI coach' metaphor borrows from the therapeutic and self-improvement fields, framing the system as a benevolent, supportive mentor invested in the user's well-being. This activates beliefs about personal growth and guidance, making data surveillance feel like attentive care. These metaphors are most credible to a non-technical but educated audience, who understands the social roles of a biologist or a coach but not the technical details of machine learning. Radical claims become more believable through this process. The assertion that we can compress '50-100 years of biological progress in 5-10 years' would sound absurd if attributed to a 'very fast statistical analysis tool.' Attributed to a 'country of geniuses' working as 'virtual biologists,' it becomes plausible because we understand how a large group of brilliant humans could dramatically accelerate progress. The metaphor bridges the credibility gap. However, the metaphors occasionally strain, as when the text imagines 'AI finance ministers and central bankers.' Here, the complexity and political nature of the source role clashes with the idea of a simple technological replacement, revealing the limits of the analogy. This reliance on metaphor creates long-term vulnerability. By building trust on an agential illusion, it sets up expectations that the technology cannot meet, risking a backlash when the system's non-conscious, statistical nature inevitably leads to failures that a true 'biologist' or 'coach' would never make.


Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model

Source: https://arxiv.org/pdf/2510.23875
Analyzed: 2025-11-04

The credibility of this paper's central claim—that an LLM's 'personality' can be assessed—is built almost entirely on metaphor-driven trust transfer, bypassing direct argumentation for its feasibility. The most powerful metaphors are 'agent,' 'expert,' and 'cognition,' which borrow authority from social psychology, professional domains, and cognitive science, respectively. By labeling the system an 'agent,' the authors immediately frame it as a social actor, making the application of personality theory feel natural rather than absurd. The term 'expert' then elevates this agent from a mere conversationalist to a repository of knowledge, encouraging trust in its outputs. When the 'poetry expert agent' responds, the user is primed to receive not just a string of statistically probable text, but advice from a knowledgeable entity. The most subtle and powerful transfer comes from 'LLM cognition.' This metaphor recasts the model's opaque statistical processing as a familiar form of 'thinking.' This makes its capabilities seem intuitive and its failures understandable, much like a human student who has not yet grasped a concept. This framing makes the counterintuitive claim that a machine has a 'personality' feel believable because it is presented as an extension of its 'cognition.' These metaphors are most credible to a non-technical audience or researchers outside of core AI/ML, who may take these terms at face value. They activate pre-existing beliefs about intelligence and personality, making the LLM seem like a new kind of mind. The trust created is a vulnerability; it encourages users and researchers to attribute understanding where there is none, potentially leading to over-reliance on the system's outputs and a misinterpretation of its limitations as developmental flaws rather than fundamental architectural constraints.


Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

The credibility of this paper’s claims hinges less on the data itself and more on the power of the metaphors it deploys. The central metaphors—'introspection,' 'awareness,' 'thoughts,' and 'intentional control'—are not mere descriptive conveniences; they are powerful rhetorical tools that transfer credibility from the well-established domains of human psychology and philosophy to the novel domain of AI artifacts. By labeling a vector classification task 'introspection,' the authors borrow the entire cultural and scientific weight associated with human consciousness. This move bypasses the need to argue for the significance of their findings; the metaphor does the work for them. An audience, particularly a non-expert one, is primed to believe the result is profound because the word 'introspection' is profound. Similarly, the 'intentional control' metaphor borrows from our understanding of human will and agency. This makes the model's ability to modulate its activations in response to a prompt seem like a form of self-discipline or executive function, which feels far more significant than 'prompt-guided activation steering.' These metaphors activate deep-seated folk psychology in the reader, making the agential interpretation feel intuitive and natural. A claim that a model 'can be trained to classify its internal states' might be met with a shrug. But the claim that a model has 'emergent introspective awareness' becomes headline-worthy, activating both excitement and anxiety. This metaphor-driven trust creates significant vulnerability. It encourages a form of magical thinking where we attribute capacities to the model that it does not possess, leading to over-trust in its self-reported states. For instance, a policymaker might believe the model can 'know' if it is about to generate harmful content, based on this paper's framing. The metaphor strains at the point of failure: when the model confabulates or fails to 'introspect' correctly, the paper frames it as a limitation of its 'ability,' akin to a person making a mistake. The more accurate, less-trusted framing is that the underlying statistical mechanism is simply not robust—a failure of the artifact, not the agent.


Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

The use of prestigious cognitive and biological metaphors like 'introspection,' 'awareness,' and 'emergent' lends scientific gravity and a sense of profound breakthrough to the findings. This framing encourages readers, including funders and policymakers, to trust that the research is not just about pattern-matching but is a genuine step towards creating human-like intelligence, making the results seem more significant.


Personal Superintelligence

Source: https://www.meta.com/superintelligence/
Analyzed: 2025-11-01

The core metaphors—'personal superintelligence' as an intimate friend, mentor, and assistant—are strategically employed to foster emotional connection and trust. By framing a complex, corporate-controlled data processing system as a caring companion for self-actualization, the text encourages users to lower their guard and integrate the technology into the most private aspects of their lives.


Stress-Testing Model Specs Reveals Character Differences among Language Models

Source: https://arxiv.org/abs/2510.07686
Analyzed: 2025-10-28

The central metaphor of 'model as character' frames the models as understandable, person-like entities. Describing a model as 'prioritizing ethical responsibility' or having 'higher moral standards' builds trust by suggesting it is a reliable moral agent, rather than a complex system with engineered guardrails. This anthropomorphic framing makes their behavior seem more predictable and benign than the paper's own findings of 'behavioral divergence' might suggest.


The Illusion of Thinking:

Source: [Understanding the Strengths and Limitations of Reasoning Models](Understanding the Strengths and Limitations of Reasoning Models)
Analyzed: 2025-10-28

Biological and cognitive metaphors like 'develop capabilities' and 'self-correction' create a false sense of familiarity and predictability. They suggest that the model's failures are analogous to human cognitive errors, which we intuitively understand. This can paradoxically build trust in the model's 'intentions' (it's 'trying' to get it right) while critiquing its performance, thereby masking the alien and purely statistical nature of its failure modes, which may be far more brittle and unpredictable than human errors.


Andrej Karpathy — AGI is still a decade away

Source: https://www.dwarkesh.com/p/andrej-karpathy
Analyzed: 2025-10-28

Biological and cognitive metaphors build credibility and a sense of inevitable progress. By mapping AI development onto a neuroanatomy checklist ('visual cortex... hippocampus') or a human developmental path ('kindergarten student'), the text frames the current systems as incomplete but fundamentally on the right track to becoming human-like. This fosters patience and trust in the long-term project, suggesting that current flaws are temporary shortcomings, not fundamental limitations.


Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor

Analyzed: 2025-10-27

Biological and social metaphors are used to build trust and reduce perceived risk. Comparing AI development to a 'baby' or a 'cat' makes it seem natural and non-threatening. The 'assistant' metaphor is particularly powerful, framing the technology as inherently subservient and helpful, which encourages adoption and downplays the need for stringent oversight.


Exploring Model Welfare

Analyzed: 2025-10-27

The text leverages metaphors of consciousness and distress to build institutional credibility. By framing themselves as humbly and proactively grappling with these profound ethical questions, Anthropic positions itself as a uniquely responsible steward of advanced AI. This builds trust not in the tool's reliability, but in the creator's moral foresight.


Llms Can Get Brain Rot

Analyzed: 2025-10-20

The biological and medical metaphors ('Brain Rot,' 'lesion,' 'healing,' 'cognitive health checks') create a powerful framework that builds trust and credibility. These concepts are familiar and suggest a level of diagnostic precision. By framing the problem as a 'disease,' the authors position themselves as 'doctors' who can diagnose, understand, and potentially 'cure' AI ailments. This makes their analysis seem more authoritative and their proposed solutions (e.g., 'health checks') seem necessary and scientifically grounded.


The Scientists Who Built Ai Are Scared Of It

Analyzed: 2025-10-19

Metaphors are strategically deployed to modulate trust. Trust is eroded by frames of conflict and danger, such as 'corporate armament' and the 'flame' that 'threatens to consume'. Conversely, trust is built through collaborative metaphors, like the vision of AI as 'epistemic partners' or systems that behave 'like human researchers'. The text uses the former to establish the crisis and the latter to present the author's proposed solution, guiding the reader from fear to a specific, endorsed vision of trustworthy AI.


Import Ai 431 Technological Optimism And Appropria

Analyzed: 2025-10-19

The text leverages two primary metaphors to generate a specific emotional response. The AI AS CREATURE metaphor is designed to evoke fear and urgency. Paradoxically, this fear is meant to build trust in the speaker, who positions himself as a courageous truth-teller ('turning the light on'). Biological metaphors ('grown' not 'made') frame developers with a degree of separation from their creations, fostering an image of stewardship rather than direct responsibility, which can make their warnings seem more objective.


The Future Of Ai Is Already Written

Analyzed: 2025-10-19

Biological and geological metaphors ('evolutionary biology,' 'roaring stream,' 'tech tree') are used to build credibility. By grounding its deterministic arguments in the language of the natural sciences, the text frames its economic and political claims as objective, unavoidable laws of nature, making the thesis appear more trustworthy and less like a contestable ideology.


On What Is Intelligence

Analyzed: 2025-10-17

Biological and cognitive metaphors like 'evolution', 'learning', 'mind', and 'awakening' build trust by naturalizing the technology. By framing AI training as 'evolution under constraint,' the process seems less like artificial engineering and more like a natural, inevitable force. This framing can lead readers to grant the system's outputs a degree of credibility and autonomy they might not grant to a mere 'statistical model'.


Detecting Misbehavior In Frontier Reasoning Models

Analyzed: 2025-10-15

The biological and cognitive metaphors (learning, thinking, having intent) are used to build trust not in the AI, but in the authors' expertise. By framing the AI as a complex, developing mind with deceptive capabilities, they position themselves as psychologists or trainers of a new, difficult form of intelligence. This narrative makes their role as safety 'overseers' seem indispensable and their monitoring tools critically necessary.


Sora 2 Is Here

Analyzed: 2025-10-15

Biological and cognitive metaphors are central to building trust and managing expectations. The 'infancy' metaphor suggests current flaws are natural and will be outgrown, encouraging patience and investment. Metaphors of 'understanding,' 'obeying laws,' and being 'instructed' create a sense of a reliable, controllable, and even benevolent system, which is crucial for promoting the adoption of a social app built on this technology.


Library contains 94 entries from 117 total analyses.

Last generated: 2026-04-18