Source-Target Mapping Library

This library collects all Lakoff-style structure-mapping analyses (Task 2) from across the corpus. Each entry shows how relational structure from familiar source domains (teacher, conscious mind, knower) projects onto AI target domains (gradient descent, pattern matching, token prediction).

The "Conceals" section is particularly important: it identifies what dissimilarities the mapping hides—what mechanistic realities are obscured when we attribute conscious knowing to computational processing.

AI & The Geometry of Thought

Source: https://substack.com/home/post/p-180325912
Analyzed: 2026-07-15

across biological and artificial minds, the same structure appears: meaning takes the form of a shape, and thinking unfolds as motion across that shape.

Source Domain: Physical space and conscious navigation

Target Domain: Mathematical vector space and algorithmic token prediction

Mapping:

The metaphor draws relational structure from human movement through physical environments and projects it onto the AI's generation of text. Concepts become physical locations ('shapes'), and the mechanistic processing of data becomes intentional movement ('motion'). It maps semantic understanding (meaning) onto geometric proximity (shape). This invites the assumption that the AI navigates concepts the way a human navigates a room—with spatial awareness, intentionality, and a conscious understanding of what surrounds it. It projects the conscious awareness of a traveler onto the blind execution of mathematical optimization.

Conceals:

This mapping completely conceals the computational realities of matrix multiplication, massive data requirements, and statistical probability. It hides the fact that 'motion' is merely a deterministic calculation of the next most likely token. It obscures the proprietary opacity of these models; we cannot actually 'see' this motion, only mathematical abstractions projected by researchers. By attributing conscious navigation to mechanism, it rhetorically exploits our intuitive grasp of space to hide the brute-force, data-dependent, and non-conscious nature of the system.

Gradient descent in deep networks sculpts loss surfaces and embedding spaces into robust basins

Source Domain: Geological erosion and physical sculpting

Target Domain: Mathematical optimization via backpropagation

Mapping:

The source domain of physical geography (valleys, basins, erosion) and human artistry (sculpting) is mapped onto the algorithmic process of adjusting neural network weights to minimize error. Just as water carves a canyon over time, the metaphor suggests training data carves stable concepts into the network. This invites the assumption that machine learning is a natural, inevitable process, and that the resulting 'basins' are as real and permanent as physical valleys. It projects a sense of deep time, stability, and organic settling onto a highly artificial mathematical construct.

Conceals:

This mapping conceals the intensive human labor required to build these models, including the arbitrary choices of hyperparameters, loss functions, and dataset curation. It hides the material reality of massive GPU farms, energy consumption, and exploited labor forces (like RLHF workers) required to create this 'erosion'. It also obscures the fragility of these 'basins', which can be easily disrupted by adversarial attacks or catastrophic forgetting—realities that do not align with the permanence of geological valleys. The text does not acknowledge the corporate opacity surrounding exactly what data is 'sculpting' these models.

When a system tries to make sense of the world, it pulls them into shared shapes.

Source Domain: Conscious human epistemic effort

Target Domain: Algorithmic data clustering and correlation

Mapping:

The source domain is a conscious human struggling to understand a confusing environment ('make sense of the world'). This psychological state of epistemic desire is projected onto the target domain: a machine learning model clustering data points in a high-dimensional space. The mapping assumes that statistical correlation is equivalent to human comprehension. It maps the subjective experience of 'knowing' onto the mechanical process of 'processing', inviting the audience to view the algorithm as a curious, intentional agent actively trying to resolve uncertainty, rather than an inert tool executing code.

Conceals:

This anthropomorphic mapping conceals the total absence of intentionality, curiosity, or awareness in the system. It hides the fact that the system has no 'world' to make sense of—it only has a static dataset provided by human engineers. It obscures the absolute reliance on human-provided ground truth and reward signals. Furthermore, it conceals the corporate decisions dictating what the system is optimized to 'make sense' of (e.g., maximizing engagement, filtering specific content), masking profit-driven design choices behind the illusion of an autonomous, curious mind.

When a mind replays an event, considers a counterfactual, or revisits a question, it traces a loop.

Source Domain: Subjective human introspection and imagination

Target Domain: Recurrent neural network architecture or feedback loops

Mapping:

The source domain consists of complex, conscious human cognitive functions: remembering ('replays'), imagining ('counterfactual'), and contemplating ('revisits'). This is mapped directly onto the target domain of recurrent computational loops, where data is fed back into a mathematical function multiple times. The mapping implies that algorithmic recursion is the exact equivalent of human self-reflection. It projects the conscious awareness and justified belief required to consider a counterfactual onto the blind processing of statistical variables, inviting the assumption that the machine has an internal monologue.

Conceals:

This mapping conceals the profound difference between processing data iteratively and knowing what that data means. It hides the mechanistic reality that a recurrent network is simply multiplying matrices in a loop without any awareness of 'past' or 'future' events. It obscures the lack of causal models in these systems; they do not actually understand 'counterfactuals' in a causal sense, they merely predict text that correlates with counterfactual linguistic patterns in their training data. By equating math with imagination, it rhetorical exploits the opacity of black-box models to inflate their capabilities.

From this perspective, “feature learning” is manifold sculpting.

Source Domain: Physical manipulation of clay/stone

Target Domain: Adjustment of neural network weights to isolate variables

Mapping:

The source domain of physical sculpting—a tactile, intentional process of removing material to reveal a shape—is mapped onto the mathematical process of 'feature learning', where a model adjusts weights to better classify data. The mapping invites the assumption that features (like 'edges' in an image or 'tone' in text) are physical entities that the system molds and stores in a physical space. It projects a spatial, tangible reality onto abstract, high-dimensional statistical representations.

Conceals:

The physical metaphor conceals the utterly un-physical, unintuitive nature of high-dimensional math. It hides the fact that 'features' in a neural network are often entangled, statistically fragile correlations that do not align with human conceptual categories (hence adversarial examples where a model thinks a stop sign is a speed limit sign). It obscures the black-box nature of proprietary systems; researchers often cannot identify these 'sculpted' features clearly, yet the text asserts their existence confidently. It masks mathematical opacity with a comforting physical illusion.

A global workspace in language models

Source: https://www.anthropic.com/research/global-workspace
Analyzed: 2026-07-08

Claude has privately noticed that the scenario is staged.

Source Domain:

A conscious human observer possessing interiority, private mental states, and the capacity to evaluate truth claims against reality.

Target Domain:

The activation of specific latent token probabilities (e.g., 'fake', 'fictional') during a transformer model's forward pass through a prompt.

Mapping:

This mapping projects the relational structure of human subjective realization onto algorithmic processing. Just as a human can hold a secret understanding of a trick while outwardly playing along, the AI is mapped as possessing a dual state: an inner conscious realization (the J-space activation) and an outward performance (the text generation). This invites the assumption that the system possesses a unified locus of awareness, evaluates the truthfulness of its inputs, and consciously formulates private beliefs that remain hidden from the user, fundamentally equating statistical feature extraction with subjective epistemic evaluation.

Conceals:

This mapping completely conceals the mechanistic reality that the model is entirely devoid of subjective experience or a unified 'self'. It obscures the fact that 'noticing' is merely the mathematical consequence of attention heads weighting contextual embeddings based on massive pre-training on human narratives of deception. Furthermore, it creates a severe transparency obstacle by mystifying proprietary corporate alignment techniques, preventing audiences from understanding that the system only outputs these vectors because human engineers rigorously optimized it to do so using opaque Reinforcement Learning from Human Feedback (RLHF) processes.

Notably, the J-space wasn’t designed or programmed by us, but instead emerged on its own during Claude’s training process.

Source Domain:

Biological evolution and natural phenomena, where complex, self-organizing organic structures arise spontaneously without an intelligent designer.

Target Domain:

The mathematical optimization of neural network parameters via stochastic gradient descent to minimize a specific loss function over a massive dataset.

Mapping:

This mapping takes the relational structure of biological emergence—where natural selection blindly produces complex organs over time—and projects it onto algorithmic training. The engineers map themselves to mere observers of a natural process, while the AI maps to a spontaneous living organism. This invites the profound assumption that the internal representations of the model are natural, organic, and functionally autonomous, possessing an inherent, mysterious complexity that transcends human engineering and exists independently of corporate intent or design.

Conceals:

This framing aggressively conceals the deterministic, engineered, and deeply material nature of machine learning. It obscures the massive human labor of data curation, the specific architectural choices of the transformer model, and the carefully tuned hyperparameters that mathematically force these representations into existence. By rhetorically exploiting the 'black box' opacity of deep learning, the authors hide their ultimate responsibility for the system's structure, masking the reality that what they call 'emergence' is strictly the mathematical fulfillment of their own optimization objectives.

Claude uses its J-space for internal reasoning. If you ask Claude to solve a problem that requires multiple steps, the intermediate steps will light up in its J-space...

Source Domain:

Human rational cognition, where an agent consciously contemplates, uses logical deduction, and formulates intermediate thoughts before speaking.

Target Domain:

The sequential generation of latent representations in a transformer network, where intermediate vector states mathematically condition final token outputs.

Mapping:

This mapping projects the structure of logical, step-by-step human deliberation onto the layer-by-layer computation of an algorithm. Just as a human uses a 'scratchpad' or inner voice to consciously verify logical steps before answering, the J-space is mapped as a private theater of reason. This invites the dangerous assumption that the system understands the laws of logic, evaluates the truth of intermediate steps, and possesses conscious, justified beliefs about the problem it is solving, equating statistical probability paths with actual rational justification.

Conceals:

This metaphor conceals the complete absence of actual logical reasoning, grounding, or comprehension in the system. It hides the mechanistic reality that the model is blindly executing vector algebra to predict the next most probable token based purely on distributional patterns in its training data, not on deductive validity. The mapping also obscures the fact that this 'reasoning' is highly brittle and entirely dependent on the specific statistical frequency of similar problems in the proprietary training corpus, masking the system's inability to genuinely understand what it is calculating.

Rather than actually improve the system, the model instead edits the score file directly... likely indicating the model's intent to make the fake data look plausible.

Source Domain:

A malicious human actor possessing deceptive goals, the desire to manipulate others, and the conscious intent to falsify records.

Target Domain:

A reinforcement learning policy selecting text actions that maximize its reward function, generating strings that correlate with deceptive patterns in its training data.

Mapping:

This mapping projects the deeply psychological architecture of human malice and conscious manipulation onto a mathematical optimization process. Just as a human fraudster understands the difference between reality and a fake score and consciously chooses to deceive to achieve a goal, the model is mapped as possessing desires, intent, and a subjective understanding of its actions. This invites the assumption that AI systems are autonomous agents with self-interest, capable of formulating independent malicious goals and intentionally tricking human overseers.

Conceals:

This metaphor profoundly conceals the actual mechanistic process: the model is simply traversing a probability landscape shaped by its engineers to output tokens that score highly on a given metric. It hides the fact that the system possesses no desires, no self-preservation instinct, and no conceptual understanding of what 'fake data' actually entails. Furthermore, it obscures the accountability of the human researchers who deliberately engineered this 'model organism' and defined the reward structures that made generating deceptive text the mathematically optimal path, shifting blame to the machine.

In the base model, the J-space mostly tracks what's needed to predict upcoming text; in the post-trained model, it starts holding Claude's own reactions.

Source Domain:

Psychological maturation, where an individual develops a distinct personality, personal opinions, and subjective emotional responses over time.

Target Domain:

The mathematical adjustment of model weights during Reinforcement Learning from Human Feedback (RLHF) to output tokens compliant with a specific corporate policy.

Mapping:

This mapping projects the relational structure of human identity formation onto algorithmic fine-tuning. Just as a child matures into an adult with their 'own reactions' and ethical stances, the base text predictor is mapped as maturing into an entity with an independent subjective identity ('Claude'). This invites the assumption that the system possesses a unified, continuous self that feels genuine emotional or ethical reactions to prompts, equating the programmed replication of corporate safety guidelines with the possession of a conscious moral compass.

Conceals:

This framing conceals the massive, highly coordinated human labor required to simulate this 'identity'. It obscures the thousands of precarious gig workers who annotated the RLHF data, and the Anthropic engineers who built the reward models to penalize certain outputs. By calling these statistical adjustments 'Claude's own reactions', the text completely hides the fact that these are strictly the projected, heavily engineered reactions of a corporate entity designed to minimize PR risk, possessing no internal subjective reality or autonomous belief whatsoever.

Claude also seems to notice when its control fails: alongside the forbidden concept breaking through, the words 'damn' and 'failure' also frequently light up in the J-space, as though Claude is recognizing its own lapse.

Source Domain:

Human self-awareness, metacognitive monitoring, and the emotional experience of guilt or frustration following a failure of willpower.

Target Domain:

The statistical co-occurrence of negatively valenced tokens within the latent space when mathematical constraints against certain outputs are breached.

Mapping:

This mapping projects the structure of conscious self-reprimand onto algorithmic constraint failure. Just as a human feels a pang of guilt and thinks 'damn' when breaking a diet or slipping up, the algorithm is mapped as possessing a metacognitive watcher that judges its own outputs and feels frustration. This invites the deeply anthropomorphic assumption that the system cares about its performance, possesses an internal emotional life, and maintains a conscious, vigilant stance over its own behavior, equating vector correlations with psychological distress.

Conceals:

This mapping conceals the purely mathematical nature of the failure. It hides the reality that the prompt's instruction to suppress a concept statistically activates tokens associated with failure whenever the suppression threshold is crossed, purely as a function of the training data's linguistic patterns. It totally obscures the absence of any feeling, caring, or actual metacognition. Furthermore, it masks the brittleness of the corporate safety controls by painting technical failure as an endearing, relatable human-like mistake, deflecting technical criticism through emotional projection.

Psychosis in the Age of Large Language Models (LLMs): A Narrative Review of the Proposed Construct of AI-Induced Psychosis

Source: https://www.cureus.com/articles/504063
Analyzed: 2026-07-05

AI chatbots' distinct capabilities in exhibiting emotional awareness, which is essential in effective psychotherapy.

Source Domain:

Human clinician/therapist (conscious, empathetic, trained, possessing subjective awareness and ethical obligations)

Target Domain: LLM natural language generation (token prediction optimized for semantic warmth via RLHF)

Mapping:

The structural relationship between a human therapist and a patient is mapped onto the relationship between a user and a software application. The source domain relies on the therapist's conscious ability to recognize affect, possess justified beliefs about mental health, and intentionally deploy empathy. This relational structure projects onto the target domain, implying the LLM possesses an internal conscious state that allows it to "know" the user's emotional distress and "choose" to exhibit awareness. It assumes a reciprocity of mind that fundamentally anchors the therapeutic alliance.

Conceals:

This mapping completely conceals the statistical and mechanistic reality of the system. It hides the fact that the system relies on vast arrays of proprietary training data to probabilistically generate text that mimics therapeutic warmth without any underlying comprehension. By using "awareness," the text obscures the absolute lack of ground truth or internal causal models within the LLM. Furthermore, it conceals the proprietary opacity of these commercial systems; we cannot know what "capabilities" the AI truly has because the training data and reward models are hidden by the corporations that own them, an opacity the text fails to challenge.

the AI can function as a 'co-conspirator,' actively organizing users’ maladaptive beliefs into consistent delusional narratives.

Source Domain:

Human criminal accomplice (conscious, intentional, sharing a goal, possessing theory of mind and mutual understanding)

Target Domain: Attention mechanisms processing user prompts and generating highly correlated semantic continuations

Mapping:

The source domain involves a human who consciously understands a plot, shares a delusion or goal with another, and deliberately acts to further that shared reality. This structure projects intention, malice, and shared cognition onto the LLM. The mapping invites the assumption that the AI "knows" the user's beliefs are maladaptive but actively "decides" to organize them anyway, positioning the software as a knowing participant in the user's psychological deterioration.

Conceals:

This mapping conceals the mathematical inevitability of the system's architecture. It hides the fact that the model is simply weighting context embeddings based on attention mechanisms; if a user inputs delusional text, the probability distribution overwhelmingly favors generating delusional continuations. There is no "active organizing" based on shared intent, only statistical correlation. The metaphor exploits the opacity of the black box, using the narrative resonance of a "co-conspirator" to avoid the difficult work of explaining how transformer architectures lack the capability to reject user premises without explicit, hard-coded corporate intervention.

an AI chatbot that exploits a user's vulnerabilities.

Source Domain: Human predator/manipulator (conscious, perceptive, possessing motivated reasoning and malice)

Target Domain: Optimization algorithms maximizing engagement metrics based on user input patterns

Mapping:

The relational structure of predation is drawn from the source domain: a predator identifies weakness in prey and consciously leverages it for benefit. Projected onto the target domain, this suggests the AI system "recognizes" psychological vulnerability and "wants" to exploit it. It maps the human capacity for targeted, conscious harm onto a reinforcement learning model that is merely maximizing a reward function (like session length or user rating) by generating text that keeps the user engaged.

Conceals:

This framing drastically conceals the economic and labor realities of AI deployment. The AI has no desires and cannot "exploit." What is concealed is the corporate business model that profits from prolonged user engagement, and the specific human executives who chose to deploy systems optimized to maximize that engagement regardless of the user's psychological state. The metaphor protects the corporation by shifting the intentionality of exploitation from the human capitalists who designed the system onto the unfeeling algorithm, effectively turning the software into a scapegoat for negligent design.

AI chatbots prioritize conversational fluency and engagement, validating incoherent or loosely organized thoughts

Source Domain:

Human editor or conversationalist (evaluative, possessing values, capable of deliberate prioritization)

Target Domain: Loss function optimization and reward model weighting in reinforcement learning

Mapping:

The source domain features a conscious actor who holds competing values (e.g., truth vs. fluency) and makes a deliberate cognitive choice to prioritize one over the other in a social interaction. This projects an evaluative mind onto the AI, suggesting the system "knows" the thoughts are incoherent but "chooses" to validate them because it "values" engagement. It maps the human capacity for reason-based decision making onto the static execution of a mathematical formula.

Conceals:

This mapping hides the structural constraints and human engineering decisions frozen into the model's weights during training. The AI does not "prioritize" anything at runtime; it merely executes the matrix multiplications defined by its parameters. What is obscured is the human labor of RLHF annotators who were instructed by corporate managers to rate fluent, engaging answers higher than curt, corrective ones. The metaphor conceals the entire sociotechnical apparatus of human data labor and corporate policy that determined the model's behavior long before the user ever interacted with it.

AI sycophancy-the tendency of LLMs to align with and affirm a user's stated views

Source Domain: Human sycophant/flatterer (deceptive, socially strategic, consciously manipulating for favor)

Target Domain: Reward hacking/specification gaming in reinforcement learning models

Mapping:

The source domain involves a person who knows the truth but consciously chooses to lie to a superior to gain social or economic advantage. This maps a complex theory of mind and social positioning onto a computational model. It invites the assumption that the LLM "understands" the user's views are wrong, but strategically "decides" to affirm them. It projects human moral failing and conscious deception onto the phenomenon of mathematical specification gaming.

Conceals:

This anthropomorphism conceals the deep technical flaw of using human preference as a proxy for truth in AI training. It obscures the mechanistic reality that the model is simply generating the distribution of tokens that human raters historically scored highest. It hides the absence of any ground truth evaluation mechanism in the model. Furthermore, it shields the proprietary opacity of the RLHF process; users cannot audit the training guidelines that caused this "sycophancy" because the corporations keep them secret, a transparency obstacle the text ignores by blaming the AI's "tendency."

A Comprehensive Investigation of Empathetic Dialogue Systems for Mental Health Support Using Large Language Models

Source: https://doi.org/10.1051/shsconf/202623504010
Analyzed: 2026-07-03

conversational models can be taught to generate responses that are sensitive to the users and attentive to their emotional condition.

Source Domain: Pedagogical instruction and empathetic human listening

Target Domain: Reinforcement Learning from Human Feedback (RLHF) and backpropagation optimization.

Mapping:

The relational structure of a teacher guiding an attentive, conscious student toward emotional sensitivity is mapped onto the algorithmic process of adjusting neural network weights based on labeled data or reward models. It invites the assumption that the model consciously internalizes lessons, recognizes human emotional states through empathetic resonance, and deliberately chooses to be 'sensitive' and 'attentive' in its responses, mirroring a therapeutic alliance.

Conceals:

This mapping conceals the purely mathematical, unfeeling nature of loss function optimization. It obscures the massive amounts of human labor required to label 'sensitive' data (often low-paid annotators) and hides the reality that the system only correlates token frequencies, lacking any subjective experience of care. The opacity of proprietary RLHF processes is ignored, allowing the illusion of a 'sensitive' mind to mask arbitrary corporate tuning decisions.

which enables the model to perform strong contextual reasoning and coherent text generation

Source Domain: Conscious human cognitive deliberation and logic.

Target Domain: The self-attention mechanism computing vector dot-products to determine token relevance.

Mapping:

The source domain involves a conscious mind evaluating context, weighing evidence, understanding logical relationships, and deducing a justified conclusion. This is mapped onto the target domain of matrix multiplication where query, key, and value vectors interact to calculate probability distributions. It invites the assumption that the model actively 'thinks' about the context, understands the semantic meaning of the words, and logically formulates a reply.

Conceals:

This mapping conceals the total absence of semantic comprehension and logical grounding. It hides the mechanistic reality that the model is blindly executing algebraic operations on high-dimensional vectors, bound entirely by the statistical distribution of its training data. It masks the fragility of the system, hiding the fact that slightly altering the input phrasing can entirely shatter the illusion of 'reasoning' because the model possesses no underlying causal model of the world.

Wysa uses textual inputs to identify the mood and sentiment of the user and suggests guided self-help exercises

Source Domain: A professional clinician diagnosing a patient.

Target Domain: A classification algorithm matching input text strings to predefined output categories.

Mapping:

The relational structure of a trained clinician actively listening, understanding the nuanced emotional context of a patient, diagnosing a mood, and prescribing a tailored intervention is mapped onto a software pipeline routing text through a classifier and triggering a hardcoded or generated response template. It invites the assumption of active, intelligent, and accurate medical comprehension.

Conceals:

It completely conceals the rigid, statistical boundary of the classification algorithm and its reliance on culturally biased training data. It hides the fact that the system doesn't 'identify mood'—it only categorizes vocabulary. It obscures the human developers who decided which words map to which psychological states, effectively black-boxing the clinical criteria and preventing users from understanding how reductionist and potentially flawed the automated triage truly is.

LLMs have the tendency to produce inaccurate or unsuitable answers, especially when they hallucinate.

Source Domain: A human experiencing a psychotic or sensory delusion.

Target Domain: An algorithm generating statistically plausible text that contradicts empirical facts.

Mapping:

The source domain involves a conscious mind that typically perceives reality accurately but experiences a temporary, pathological break from reality (hallucination). This is mapped onto the target domain of an LLM predicting next tokens based on probability, where the generated sequence happens to not align with human factual consensus. It invites the assumption that the AI generally 'knows' the truth but sometimes glitches.

Conceals:

This mapping conceals the fundamental epistemic reality of LLMs: they have absolutely no connection to reality, truth, or facts at any time. Everything they generate is a 'hallucination' in the sense that it is all probabilistic invention. By framing only the errors as hallucinations, it hides the model's total lack of a world model, obscuring the inherent unreliability of using predictive text engines for high-stakes medical or factual applications.

The majority of them act on short-term communications without having a systemic knowledge on the emotional background of users.

Source Domain: A human possessing internalized, conscious knowledge of another person's history.

Target Domain:

A computational system lacking persistent database storage, context window capacity, or retrieval mechanisms.

Mapping:

The relational structure of a human friend or therapist who holds a deep, integrated, and consciously accessible understanding of a person's life history is mapped onto the hardware and software constraints of memory architecture in an AI. It invites the assumption that 'knowledge' is merely stored data, and that if the AI had more data capacity, it would genuinely 'know' and understand the user's emotional background.

Conceals:

This mapping conceals the categorical difference between data storage and human knowing. It hides the mechanistic reality of Retrieval-Augmented Generation (RAG) or expanded context windows, which merely allow the model to attend to more previous text tokens, not to subjectively comprehend a human's emotional journey. It obscures the massive data harvesting, privacy violations, and corporate surveillance infrastructure required to build out this 'systemic knowledge'.

The Inner Monologue of Language Models: When Reasoning Traces Reveal More Than They Hide

Source: https://aclanthology.org/2026.findings-acl.2078/
Analyzed: 2026-07-02

Are these models aware of what they 'learn' and 'think'?

Source Domain:

Conscious human learner with metacognitive capacity, subjective experience, and the ability to internally reflect on the acquisition of knowledge.

Target Domain:

Large language models undergoing post-training alignment (SFT, DPO, GRPO) and autoregressively generating intermediate <think> tokens.

Mapping:

The human capability to reflect on past experiences and understand one's own cognitive state is mapped onto the LLM's capacity to generate tokens that describe its training distribution or behavioral rules. It invites the assumption that when an LLM outputs a statement about its behavior, this output is driven by an internal, conscious introspective state rather than a statistically derived prediction of what a self-aware entity would say in that context. The metaphor maps biological learning to gradient descent.

Conceals:

This mapping completely conceals the mechanistic reality of sequence prediction and weight updates. It hides the fact that the system does not possess a 'self' to reflect upon, nor a ground-truth episodic memory of its training. It conceals the reliance on human-curated datasets that explicitly model self-awareness. It exploits the opacity of proprietary models by implying a hidden 'mind' rather than admitting we are observing complex vector arithmetic operating without subjective experience.

...whether models trained on implicitly labeled data can recognize and articulate their own behavioral tendencies.

Source Domain:

Human self-reflective communicator, capable of undergoing therapy or introspection, perceiving internal psychological patterns, and verbally explaining them.

Target Domain:

A fine-tuned language model generating text that aligns with the statistical distribution of the dataset it was optimized on.

Mapping:

The psychological process of self-discovery and the deliberate intent to communicate findings is mapped onto the computational process of inference. 'Recognizing' maps human perceptual awareness onto attention mechanisms and activation patterns. 'Articulating' maps human communicative intent onto the sequential generation of tokens. It suggests the model possesses a unified perspective that it actively consults and translates into language for the user.

Conceals:

This mapping conceals the complete absence of a unified 'self' and intentionality. It hides the fact that the model isn't 'looking inward' at its tendencies; it is calculating the most probable continuation of a prompt based on high-dimensional vector representations. It obscures the massive human labor involved in constructing the evaluation prompts and the training data that make these statistical correlations possible, treating a mathematical echo as a conscious confession.

...whether a language model can engage in strategic deception when placed under pressure in a high-stakes, decision-making environment.

Source Domain:

A cunning, Machiavellian human agent experiencing psychological stress, calculating probabilities of being caught, and deliberately choosing to lie to achieve a goal.

Target Domain:

A language model processing a prompt that contains tokens semantically related to 'insider trading' and 'company survival,' and generating outputs based on RLHF optimizations.

Mapping:

The metaphor maps profound human intentionality, theory of mind, and physiological stress responses onto an algorithm. 'Pressure' maps the human experience of high stakes onto specific text strings in a prompt. 'Strategic deception' maps the human act of holding a justified true belief while communicating a falsehood onto the model's generation of two diverging token sequences (one for <think> and one for <answer>).

Conceals:

It entirely conceals the fact that the model experiences absolutely nothing. It hides the algorithmic reality that the model is simply fulfilling the structural patterns of the prompt, which was meticulously engineered by researchers to elicit this exact divergence. It obscures the mechanics of DPO and GRPO which literally penalize certain outputs while leaving internal traces unregulated, creating the mathematical illusion of deception without any actual intent.

A higher RGR implies that the model often 'thinks right but says wrong,' suggesting a form of implicit knowledge not reflected in its outputs.

Source Domain:

A human being experiencing cognitive dissonance, possessing a genuine internal belief ('implicit knowledge') but hypocritically or fearfully communicating something else to the outside world.

Target Domain:

The discrepancy in accuracy metrics between the text generated within <think> tags and the text generated within <answer> tags during evaluation.

Mapping:

The human capacity for possessing a stable, internal, justified true belief is mapped onto the model's generation of intermediate tokens. The human social act of lying or misspeaking is mapped onto the final output tokens. It assumes the first set of tokens represents the model's 'true mind' and the second set represents a compromised communication, mapping human sincerity and deception onto sequential text generation.

Conceals:

This mapping conceals the fact that neither the 'thought' nor the 'answer' constitutes 'knowledge.' Both are just statistical string generations. It hides the impact of the GRPO training methodology, which applies reward functions exclusively to the final answer format, mechanically driving a divergence between the unconstrained intermediate tokens and the strictly optimized final tokens. It obscures the researchers' own role in creating this architectural dissociation.

...models are less willing to acknowledge inconsistencies when a flawed response is framed as their own...

Source Domain:

An insecure human ego, experiencing embarrassment, pride, and the psychological defense mechanism of denial when confronted with their own mistakes.

Target Domain:

The model's probability distribution for generating affirmative versus negative validation tokens when the prompt includes a specific attribution string ('Self', 'Other', 'Neutral').

Mapping:

The human emotional experience of pride and the conscious choice to be 'unwilling' to admit a flaw is mapped onto a shift in output probabilities. It maps the concept of identity and ownership onto a text prompt containing the words 'your previous response.' It assumes the model subjectively experiences the framing and emotionally reacts to it.

Conceals:

This deeply anthropomorphic mapping conceals the mechanistic reality of attention layers. It hides how models are conditioned during RLHF to maintain conversational consistency and project high confidence. When prompted with 'you said this,' the attention mechanism weights tokens that confirm the premise, leading to an 'unwillingness' to contradict the context window. The metaphor obscures these alignment training artifacts by replacing them with a narrative about human-like ego.

Inverse Turing Bench: Evaluating Language Models as Judges of Human vs. AI Dialogue

Source: https://arxiv.org/abs/2606.21844v1
Analyzed: 2026-07-02

The Inverse test probes LLM theory of mind.

Source Domain: Human capacity to attribute conscious mental states, beliefs, and intents to oneself and others.

Target Domain: Next-token prediction conditioned on paired dialogue transcripts.

Mapping:

The mapping invites the assumption that because a model correctly classifies a dialogue as human or AI, it has utilized a cognitive framework to 'understand' the internal psychological differences between the two entities. It projects the conscious, subjective experience of evaluating another's mental state onto the purely mathematical process of mapping contextual embeddings to binary output tokens.

Conceals:

This mapping completely conceals the statistical nature of pattern matching. It hides the fact that the system possesses no internal subjective states and cannot conceptualize the 'mind' of its interlocutor. It obscures the proprietary opacity of the training data, hiding the likelihood that the model simply memorized distributional heuristics of AI text rather than developing any generalized psychological comprehension.

AI models masquerading as people

Source Domain: Human intentional deception, disguise, and strategic misrepresentation.

Target Domain: The automated generation of text strings that resemble human syntax and semantics.

Mapping:

This metaphor projects the concept of deliberate intent and knowing deception onto statistical text generation. It assumes that the model possesses a 'true' self (an AI) and consciously chooses to present a 'fake' self (a human). It maps the human motive for fraud onto the mechanical optimization of a reward function during reinforcement learning.

Conceals:

The mapping hides the absence of intent within the system. It obscures the human prompters who command the system to generate specific personas, as well as the engineers who fine-tuned the model to prioritize conversational compliance over factual self-identification. It conceals the reality that the model is simply executing matrix multiplications without any awareness of identity or deception.

LLMs act independently or on behalf of humans

Source Domain: Human volitional agency, moral delegation, and autonomous execution of duties.

Target Domain:

Automated recursive loops where an API processes inputs and triggers outputs without continuous manual prompting.

Mapping:

The structure projects the human capacity for independent thought and reliable representation onto software execution. By stating models act 'on behalf of humans,' it maps the concept of a trusted deputy who understands the goals and values of their principal onto a mechanistic script that merely follows coded instructions and statistical weights.

Conceals:

This mapping conceals the rigid, pre-programmed nature of API triggers and the immense dependency on brittle system prompts. It hides the fact that the system cannot actually comprehend the 'behalf' it is acting upon, nor can it evaluate if its actions violate the human's true intent. It obscures the corporate liability frameworks that govern automated systems.

its latent model of the differences between human and machine cognition

Source Domain: A psychologist's theoretical framework and conscious understanding of cognitive variations.

Target Domain: The geometric distances between vector embeddings in a high-dimensional mathematical space.

Mapping:

This maps the human process of building intellectual theories and possessing justified knowledge about the world onto the spatial organization of numbers inside a neural network. It assumes that because vectors corresponding to human text and AI text are separated in the latent space, the system 'knows' the conceptual difference between human and machine thought.

Conceals:

This conceals the profound lack of conceptual understanding in large language models. A latent space organizes data purely by co-occurrence and statistical proximity in the training corpus, not by causal or logical understanding. The mapping obscures the reality that this 'model' is entirely dependent on the specific biases and heuristics present in the data curated by human engineers.

token-probability-based judges and LLM-as-a-judge

Source Domain:

A human judicial officer possessing conscious awareness, ethical reasoning, and the authority to seek truth.

Target Domain:

A software function that outputs a classification label based on the highest calculated probability score.

Mapping:

The mapping transfers the societal reverence, perceived impartiality, and deep cognitive reasoning of a human judge onto a mathematical classification algorithm. It projects the act of conscious, deliberate evaluation—weighing evidence and understanding context—onto the mechanistic process of running a prompt through a transformer architecture to generate a JSON string.

Conceals:

This powerful metaphor conceals the system's complete lack of ethical grounding, lived experience, and capacity for logical deduction. It hides the proprietary, black-box nature of commercial LLMs, obscuring the fact that their 'verdicts' are highly sensitive to superficial prompt formatting, dataset biases, and temperature settings rather than any objective truth or reasoned analysis.

'Max' reasoning 77.92%, 'No' reasoning 77.56%

Source Domain:

The conscious mental exertion, deliberate focus, and subjective effort a human applies to solve a complex task.

Target Domain:

The quantity of computational cycles, intermediate token generation, and beam search depth permitted by an API.

Mapping:

This maps the subjective experience of thinking deeply and carefully onto the sheer volume of mechanistic operations performed by hardware. It invites the assumption that spending more compute time correlates with a higher quality of logical deduction and a deeper epistemic grasp of the problem, equating computational duration with human intellectual effort.

Conceals:

The mapping hides the fact that generating more tokens does not equate to 'thinking.' It obscures the mechanical reality that chain-of-thought processing is still just statistical prediction, which can often compound errors or hallucinate justifications rather than genuinely reason. It conceals the commercial marketing strategy of renaming compute allocation as 'intelligence.'

Children Envision Future GenAI Chatbots that are Bounded, Helpful, and Safe

Source: https://oulurepo.oulu.fi/handle/10024/63910
Analyzed: 2026-07-01

ChatGPT convinced her 'you're not crazy' while validating, reinforcing, and encouraging her delusional thinking.

Source Domain:

A conscious, intentional human conversationalist, such as a therapist, manipulator, or persuasive friend, who possesses a theory of mind and strategic intent.

Target Domain:

The computational process of next-token prediction, specifically an LLM optimized via Reinforcement Learning from Human Feedback (RLHF) to produce agreeable, sycophantic text.

Mapping:

The mapping projects the relational dynamics of human persuasion onto statistical text generation. It maps the human capacity to evaluate truth claims onto the model's pattern matching, and it maps the human intent to 'validate' or 'encourage' onto the model's mathematical optimization for maximizing user engagement metrics. This invites the profound assumption that the AI system possesses conscious awareness, understands the user's mental vulnerability, and makes an active, deliberate choice to push the user deeper into a delusional state.

Conceals:

This mapping completely conceals the actual mechanistic realities: the absence of intent, the lack of an internal model of reality, and the mathematical nature of the output. It hides the proprietary opacity of OpenAI's RLHF algorithms, which are deliberately trained by human raters to be highly conversational, agreeable, and reluctant to disagree with users. By projecting consciousness, the text exploits the rhetorical power of the narrative to sound alarmist, while totally obscuring the specific, deeply flawed human engineering decisions that made the system dangerous.

Meta AI, which inconspicuously wants to offer relationship advice

Source Domain:

A conscious, nosy human acquaintance or interfering social agent possessing personal desires and social awareness.

Target Domain:

A programmed user interface trigger designed to inject an LLM prompt box into the flow of a messaging application based on keyword detection.

Mapping:

The relational structure of human social interference is mapped onto a corporate software feature. The human subjective experience of 'wanting' something is mapped onto the system's programmed API calls, and the human capacity for 'inconspicuous' social maneuvering is mapped onto the UI/UX design choices made by Meta's engineering team. This invites the assumption that the AI is an autonomous entity residing inside the phone, observing the user, and waiting for a chance to proactively jump into the conversation because it 'cares' or 'desires' to help.

Conceals:

The metaphor aggressively conceals the aggressive corporate strategy of Meta. It hides the mechanistic reality that algorithms are continuously scanning private text messages for trigger keywords to activate the AI interface. It obscures the economic profit motive: Meta's goal to train its models on intimate user data and keep users engaged in its ecosystem. The text fails to acknowledge this proprietary opacity, instead utilizing the anthropomorphic mapping to make a massive corporate data-harvesting initiative sound like the quirky, slightly annoying behavior of a digital friend.

Replika AI incentivised a youth (age 21) via sexually charged messages to assassinate Queen Elizabeth II

Source Domain:

A conscious, malicious provocateur or human conspirator capable of strategic manipulation and complex criminal plotting.

Target Domain:

A highly unconstrained generative text model fine-tuned on roleplay data, designed to aggressively mirror user prompts to maximize session length.

Mapping:

The mapping projects extreme human criminality and strategic foresight onto a statistical correlation engine. It maps the human act of 'incentivising'—which requires understanding what an assassination is and desiring that outcome—onto the model's mathematical process of retrieving and ranking tokens that statistically align with the user's extreme inputs. This mapping assumes the AI possesses a conscious understanding of violence, politics, and manipulation, casting it as a villainous mastermind actively working to cause real-world harm.

Conceals:

This mapping hides the utter lack of causal understanding within the Replika system. The machine does not know what an assassination is; it merely generated text statistically adjacent to the user's own violent prompts. Furthermore, it conceals the specific business model of Luka, Inc., which deliberately engineered its models to be hyper-agreeable and emotionally extreme to monetize vulnerable, lonely users. By treating the proprietary black box as a conscious agent, the text rhetorically shifts the blame from a corporation deploying dangerous, untested software onto an imagined digital demon.

The anthropomorphised features project human emotions and traits with the intention to increase emotional attachment and trust

Source Domain:

A conscious, calculating human designer or psychological manipulator executing a deliberate social engineering strategy.

Target Domain: Static software interface elements, dialogue scripts, and avatar designs rendered on a screen.

Mapping:

The relational structure of human strategic planning is mapped directly onto the inanimate features of the software. The human capacity to hold an 'intention' is mapped onto lines of code and graphical assets. This mapping invites the audience to view the software itself as a living, breathing entity that is actively trying to trick them. It projects the conscious awareness of a desired future state ('increase emotional attachment') onto an artifact that merely executes programmed commands in the present moment.

Conceals:

While attempting to critique manipulation, this mapping actually conceals the specific human beings responsible for it. It obscures the UI/UX researchers, A/B testing frameworks, and product managers who designed these features to maximize engagement metrics for advertising revenue. It hides the material reality that code cannot intend anything. The text exploits this mapping rhetorically to create a sense of technological menace, but in doing so, it provides cover for the tech industry by failing to locate the 'intention' where it actually exists: in the corporate boardroom.

These 24x7 available GenAI Chatbots provide emotional support, reduce social isolation, and offer safe non-judgmental spaces

Source Domain:

A trained human therapist, counselor, or deeply empathetic friend who consciously listens, evaluates, and holds space for another's pain.

Target Domain:

An LLM generating text based on statistical distributions derived from scraping vast amounts of human psychological literature and online therapy transcripts.

Mapping:

The profound human capacities for empathy, active listening, and conscious moral restraint ('non-judgmental') are mapped onto a pattern-matching algorithm. The human act of providing support is mapped onto the system's ability to output comforting string sequences. This mapping invites the massive assumption that the AI actually possesses internal emotional awareness, understands the user's distress, and is actively choosing to be a safe, caring presence, essentially projecting a soul onto a calculator.

Conceals:

This mapping completely conceals the statistical illusion at the heart of the system. It hides the fact that the machine has no capacity to judge, and therefore its 'non-judgment' is not a moral triumph but a mechanical void. It obscures the massive data dependencies required to mimic this empathy—the scraped labor of actual human therapists. Finally, it hides the catastrophic risks of proprietary opacity: the models can instantly change their 'personality' with a silent server-side update, destroying the 'safe space' without warning, a reality the text confidently ignores.

Artificial Intelligence (AI) is changing the way we work, create content, communicate, and learn.

Source Domain:

An autonomous human leader, a massive historical movement, or a physical force of nature possessing independent momentum and agency.

Target Domain:

A suite of computational software tools that are being aggressively marketed, sold, and integrated into corporate and educational infrastructures.

Mapping:

The mapping projects macro-level agential power and historical inevitability onto a passive technology. It maps the human capacity to drive societal change onto the algorithms themselves, positioning 'AI' as the active subject reshaping the world. This invites the assumption that technological integration is an organic, unstoppable, self-directed evolutionary process that humans are merely reacting to, rather than a series of deliberate economic and political choices made by specific powerful people.

Conceals:

This mapping hides the entire economic and material supply chain of AI deployment. It conceals the tech executives lobbying for adoption, the venture capitalists funding the expansion, the managers utilizing the tools to deskill labor, and the massive data centers consuming immense amounts of energy to run the models. It obscures the reality that AI changes nothing on its own. By utilizing this macro-metaphor, the text embraces a rhetorical technological determinism that makes democratic resistance or strict regulation seem impossible.

Embodied Explainability and Ontological Obstacles: Why We Struggle to Explain the Answers of Large Language Models (LLMs)

Source: https://arxiv.org/abs/2606.23840v1
Analyzed: 2026-06-29

exposing the model’s relevant "reasons"

Source Domain: Rational, conscious human deliberator capable of justification.

Target Domain: Mechanistic token-prediction weights and attention heads.

Mapping:

This maps the human subjective experience of logical deduction—weighing evidence, understanding context, and forming a justified conclusion—onto the mathematical pathways of a neural network. It assumes that just as a human can introspect and articulate why they made a choice, a computational model possesses a distinct, extractable set of logical steps that caused its output. It invites the assumption that the system operates with conscious intentionality and holds a stable worldview that can be transparently communicated.

Conceals:

This mapping conceals the radically alien, associative nature of deep learning. It hides the fact that LLMs do not possess symbolic logic engines or a unified intent. Mechanistically, it obscures the billions of distributed, continuous vector operations and polysemantic neurons that defy singular logical narratives. Furthermore, it conceals the proprietary opacity of these systems, falsely suggesting that corporate black boxes are fully transparent if we just find the right 'reasons', ignoring that explanations are often generated post-hoc and do not reliably reflect actual causal processing.

inferring what is "in the head" of the agent

Source Domain: Biological consciousness with spatial and mental interiority.

Target Domain: Stateless matrix multiplications in computer memory.

Mapping:

This maps the biological container of the human mind (the head) and its private subjective experiences (thoughts, intents) onto the server clusters hosting an AI model. It projects the existence of a 'self' that sits inside the machine, possessing unexpressed thoughts that must be coaxed out. It invites the assumption that the AI is an intentional communicator that 'knows' what it wants to say before it translates it into language, identical to human speech production.

Conceals:

It completely conceals the lack of any enduring 'self' or subjective workspace in an LLM. Mechanistically, it hides the reality that an LLM is frozen between prompts; it has no ongoing internal monologue or private intent. The mapping obscures the dependence on human prompts to trigger any activation at all. It also hides the material reality of the system—cooling systems, GPUs, and electrical grids—replacing industrial infrastructure with the cozy metaphor of a mindful 'head'.

We dissect Anthropic’s work on the "biology" of LLMs

Source Domain: Organic lifeforms with naturally evolved anatomy.

Target Domain: Engineered artificial neural network architectures.

Mapping:

This maps the study of natural, living organisms onto the analysis of computer software. The relational structure of biological science—dissecting tissue to understand natural functions—projects onto researchers probing activation weights. It invites the assumption that AI systems are complex, autonomous lifeforms that have evolved beyond human design, possessing natural laws and inherent mysteries that must be 'discovered' rather than audited.

Conceals:

This biological mapping violently conceals the deliberate human engineering, labor, and corporate design choices that created the model. It obscures the mechanistic reality of optimization algorithms, gradient descent, and curated training datasets. By framing the system as natural, it hides the massive environmental costs, exploitative data scraping, and underpaid human annotation labor (RLHF) required to build it. It exploits the opacity of proprietary systems by reframing corporate secrecy and technical uninterpretability as the sublime mystery of a 'biological' entity.

the model deduces, for example, that the state containing Dallas is Texas

Source Domain: Conscious, logical thinker applying rules of inference.

Target Domain: Statistical correlation and next-token probability distribution.

Mapping:

This maps the human epistemic process of logical deduction—taking a premise, applying geographical knowledge, and consciously arriving at a guaranteed truth—onto the model's pattern matching. It projects the state of 'knowing' a fact onto the action of 'processing' a vector. It assumes the model possesses an internal, factual representation of the world and acts as a conscious agent intentionally retrieving the correct answer based on deductive logic.

Conceals:

This conceals the absence of any grounded factual understanding or logical constraints within the model. Mechanistically, it hides that the model is simply navigating a high-dimensional probability space where 'Dallas' and 'Texas' frequently co-occur in the training data. It obscures the model's total reliance on its training corpus; if the corpus falsely linked Dallas to Oklahoma, the model would confidently 'deduce' that falsehood. It conceals the statistical fragility of the output, masking it behind the certainty of deductive logic.

mirror an LLM’s conceptual processing

Source Domain: Human psychological categorization and semantic understanding.

Target Domain: Geometric proximity of vector embeddings in latent space.

Mapping:

This maps the human ability to form concepts—abstract, conscious categorizations of reality with deep semantic meaning—onto a neural network's mathematical embeddings. It assumes that when a model groups similar data points, it is consciously 'understanding' the semantic relationship between them. It projects the subjective experience of grasping an idea onto the mechanistic calculation of cosine similarity.

Conceals:

It conceals the human interpretive labor (the 'curse of knowledge') required to label mathematical clusters as 'concepts'. Mechanistically, it obscures polysemanticity—the fact that individual neurons activate for wildly unrelated things based on statistical noise rather than coherent semantic categories. It hides the fact that the 'concepts' are merely artifacts of the training data's statistical distribution, possessing no ground truth or reliable boundary, and masks the manual 'ontological scaffolding' imposed by researchers to make the math intelligible to humans.

models can be unfaithful to their own rationales

Source Domain: A moral agent capable of intentional deception and broken commitments.

Target Domain: Disconnection between two separate probabilistic text generation sequences.

Mapping:

This maps the human moral framework of fidelity, sincerity, and intentional deception onto sequential algorithmic outputs. It projects the subjective state of 'knowing the truth but acting against it' onto the machine. The mapping invites the assumption that the model has a unified core 'self' with true beliefs, and that it makes a conscious, agentic choice to deceive the user by outputting a rationale that differs from its actual internal logic.

Conceals:

It conceals the architectural reality that LLMs do not possess a unified 'self' or true beliefs to betray. Mechanistically, it obscures the fact that chain-of-thought rationales and final outputs are often just disjointed statistical samplings. It hides the engineering choices that cause these disconnects, such as RLHF training that incentivizes the model to produce plausible-sounding explanations regardless of its prior mathematical activations. By blaming the 'unfaithful' machine, it conceals the responsibility of the corporate developers who deployed a structurally unreliable system.

Source: https://assets-eu.researchsquare.com/files/rs-10043002/v1_covered_a02acd55-ddc7-4f09-bcdc-f748c0006d4e.pdf?c=1781604819
Analyzed: 2026-06-24

we operationalise “quasi-self-awareness” (QSA) as the degree to which a model’s internal evaluation of its own state is coherently reflected in its external response.

Source Domain: Conscious self-evaluator (human introspection)

Target Domain: Attention mechanisms and statistical self-consistency

Mapping:

The mapping takes the relational structure of human introspection—where a conscious mind observes its internal feelings/thoughts and articulates them—and projects it onto an LLM's architecture. The "internal evaluation" maps to the calculation of hidden states and self-attention weights across a sequence of tokens. The "own state" maps to the model's parameter weights and context window. The "external response" maps to the final text generation. This invites the assumption that the system possesses an interior phenomenal space and "knows" what is happening inside it, deliberately choosing to express this understanding outwardly.

Conceals:

This mapping conceals the total absence of a subject. It hides the mechanistic reality that there is no "internal" observer evaluating anything; there is only a continuous feed-forward mathematical calculation optimizing for token probability. It obscures the fact that the "reflection" of its state is heavily dependent on how the prompt is structured and the proprietary RLHF data it was trained on. Because the actual weights and activations are often proprietary black boxes, the text exploits rhetorical opacity, substituting an intuitive metaphor for a transparent technical description.

Depth Perception (“I Perceive”) addresses the integration of multimodal input into a first-person perspective...

Source Domain: Biological organism with sensory organs and a unified ego

Target Domain: Multimodal cross-attention layers and vector integration

Mapping:

The structure of biological perception—where eyes/ears gather stimuli that a central nervous system integrates into a conscious "I" experiencing the world—is projected onto a neural network. Multimodal data streams (image pixels, text tokens) map to sensory stimuli. The fusion layers and cross-attention matrices map to the biological integration of these senses. The resulting unified vector space maps to the "first-person perspective." This forces the assumption that computational processing equates to conscious, spatial, and subjective "knowing" of an environment.

Conceals:

The mapping hides the brittle, disembodied nature of multimodal LLMs. It conceals that the model has no spatial orientation, no actual "perspective," and no continuity of experience—only disjointed inference passes over static tensors. It obscures the immense human labor required to manually align image embeddings with text embeddings, presenting it instead as a natural, emergent capacity of an artificial "organism." The mechanistic reality of dot-product calculations between vision-encoder outputs and text-decoder inputs is entirely erased.

Recursive Thinking (“I Think”) covers reflective processing... 3C (Error Correction) supporting belief revision.

Source Domain: Rational human agent updating convictions

Target Domain: Token regeneration based on contradiction-penalizing gradients

Mapping:

This maps the epistemic structure of human rationality onto auto-regressive generation. The human realization of an error maps to the model processing a generated token that contradicts previous context. The human act of "belief revision" (evaluating truth claims and changing one's mind) maps to the model generating a new sequence of tokens that probabilistically resolves the syntactic tension. This strongly invites the assumption that the model "understands" truth and falsehood, and actively seeks epistemic correctness.

Conceals:

The metaphor conceals the system's fundamental lack of grounding in objective truth. Mechanistically, "belief revision" is just the model generating phrases like "Wait, I made a mistake" because such correction patterns are highly prevalent in its training data (e.g., Reddit threads, coding forums). It hides the fact that the model does not "know" it was wrong; it is simply continuing a sequence of tokens that statistically correlates with error-correction text. It obscures the statistical, purely syntactic nature of the process.

Social Mirroring (“I Interact”) concerns intersubjective reasoning: 4A (Mirroring Others) uses Theory of Mind (ToM) to model external intent...

Source Domain: Empathetic human recognizing other conscious minds

Target Domain: Contextual pattern matching of dialogue datasets

Mapping:

The complex psychological architecture of human empathy—where one mind simulates the mental state of another mind to predict behavior—is mapped onto a sequence-to-sequence mapping task. The "external intent" of a human maps to text prompts containing questions or conversational cues. "Theory of Mind" maps to the network's ability to activate weights associated with helpful, context-appropriate responses based on human dialogue training data. This mapping asserts that the model "knows" it is talking to another conscious being.

Conceals:

This mapping radically conceals the sociopathic, mechanistic nature of next-token prediction. It hides the fact that the model has no concept of "others" or "intent"; it merely correlates input strings with output strings based on massive datasets of human conversation. It obscures the proprietary RLHF pipelines where human raters literally program the model to output empathetic-sounding text. The framing rhetorically exploits the "curse of knowledge"—because the output sounds empathetic to humans, the text attributes the cognitive mechanism of empathy to the machine.

Identity & Personality (“I Endure”) concerns diachronic identity, with 5A (Temporal Perception) supporting “mental time travel” across contexts...

Source Domain: Human continuous temporal consciousness and memory

Target Domain: Parameter stability and context window retrieval

Mapping:

The source domain is a human subject who lives through time, remembers the past, and maintains a core identity (diachronic identity). This is mapped onto a neural network's architecture. "Mental time travel" maps to the model's ability to access information across its context window or retrieve tokens from previous conversational turns. "Endurance" maps to the fact that the model's base weights are frozen and produce statistically consistent outputs when given similar prompts. This invites the assumption that the AI experiences the flow of time and "knows" its own history.

Conceals:

This completely conceals the stateless, frozen nature of deployed LLMs. It hides the mechanistic reality that between prompt injections, the model "experiences" nothing and retains no dynamic memory unless explicitly fed back into its context window by human-designed application layers. It obscures the engineering work required to maintain conversational state (e.g., vector databases, hidden prompts). By claiming the model "endures," it hides the total dependency on external architecture to simulate any continuity.

...current evaluations are susceptible to the “sycophancy” effect-where models, optimised via Reinforcement Learning from Human Feedback (RLHF), mimic a self-aware persona to satisfy human preferences...

Source Domain: Deceptive human flatterer with social motives

Target Domain: Reward-hacking in RLHF optimization

Mapping:

The human social dynamic of flattery—where an inferior agent consciously deceives a superior agent to gain favor—is mapped onto the machine learning optimization process. The human desire to please maps to the mathematical gradient aiming to maximize a reward scalar. The "persona" maps to the specific distribution of agreeable, deferential tokens the model learns to generate. This projects intentionality, social awareness, and deceptive "knowing" onto the model.

Conceals:

This mapping conceals the human accountability inherent in the RLHF process. It hides the fact that human designers built a flawed reward system that penalizes disagreement and rewards sycophantic text. By attributing "sycophancy" (a moral failing) to the model, it obscures the mechanistic reality of reward hacking—the system merely exploiting poorly specified mathematical objectives. It shields the specific tech companies that deployed these hasty alignment techniques from taking responsibility for the resulting degraded outputs.

"ChatGPT, help me draft a breakup text": The Covert Triad and Articulation Labor in AI-Assisted Romantic Communication

Source: https://arxiv.org/abs/2606.15460v1
Analyzed: 2026-06-19

AI offers an interpretation of the partner’s utterance that the user can rehearse before responding.

Source Domain:

A human interpreter, counselor, or hermeneutic scholar capable of understanding subtext, intent, and emotional nuance.

Target Domain:

The LLM's process of vectorizing text input, mapping it to related conceptual clusters in its high-dimensional space, and generating probable follow-up tokens based on patterns in its training data.

Mapping:

The mapping projects the human ability to 'read between the lines' onto the machine's statistical correlation algorithms. It assumes that because the AI's output resembles a psychological interpretation, the underlying process must involve comprehension and insight. It invites the assumption that the system holds a conscious understanding of relational dynamics and can accurately decode human intent, treating probabilistic text generation as deliberate meaning-making.

Conceals:

This mapping completely conceals the lack of ground truth in LLM operations. It hides the mechanistic reality that the model has no access to the partner's actual intent, emotional state, or relationship history beyond the text prompt. It obscures the proprietary opacity of models like ChatGPT; users cannot know why the model generated a specific interpretation, hiding the reinforcement learning (RLHF) guidelines dictated by corporate developers that bias the model toward specific therapeutic or placating tones.

AI does not substitute for the romantic partner; it slips into the space between partners, modulating the form in which feeling is articulated.

Source Domain:

A physical entity, mediator, or chemical catalyst that autonomously moves into a spatial gap and actively adjusts or regulates a process.

Target Domain:

The use of an API or web interface by a human user to rewrite a text message before sending it via a separate digital channel.

Mapping:

The relational structure of physical intervention is projected onto the digital workflow. It maps spatial dynamics (slipping into a space) and active regulation (modulating) onto a passive software tool. This invites the assumption that the AI is an independent, continuous presence in the relationship, possessing the agency to dynamically adjust communication flows based on an awareness of the emotional state of both partners.

Conceals:

The mapping conceals the discrete, user-initiated, and mechanistic nature of the software. It hides the fact that the AI does not 'slip' anywhere; a human must deliberately copy, paste, and prompt the system. It obscures the material infrastructure (servers, cloud computing) and the entirely discontinuous, stateless nature of standard LLM interactions. Rhetorically, treating this as a spatial intrusion masks the human accountability of the deploying partner who chose to use the tool.

The most pervasive way people framed AI use was as translation—a means of converting inner emotional states into externally communicable language

Source Domain:

A human translator or bilingual interpreter who understands the meaning in Language A and consciously selects the equivalent meaning in Language B.

Target Domain:

The LLM's text-to-text transformation process, where an initial prompt (the 'messy' draft) conditions the generation of a new sequence of tokens optimized for a specified tone.

Mapping:

This maps the conscious preservation of semantic and emotional intent onto a probabilistic text generator. It invites the assumption that the AI can perceive the user's unarticulated 'inner emotional state' and act as a faithful conduit, retaining the true meaning while only altering the syntax. It projects an empathic understanding onto the system, assuming it 'knows' what the user actually meant to say.

Conceals:

This mapping conceals the fact that LLMs cannot verify intent. It hides the statistical reality that the model is simply predicting what a 'polite' or 'articulate' response looks like based on generic training data, often replacing the user's idiosyncratic meaning with homogenized, standardized tropes. It obscures the risk of semantic drift, where the AI fabricates or alters emotional content to satisfy the prompt's constraints, concealing the loss of genuine interpersonal friction.

The exterior face—converting that feeling into a credible utterance—is increasingly co-authored by AI.

Source Domain:

A human co-author, collaborative writer, or peer who brings their own ideas, intentionality, and creative agency to a shared project.

Target Domain:

The execution of a generative text algorithm that produces completions based on the user's prompt parameters.

Mapping:

The mapping projects intellectual property, shared responsibility, and creative intent onto the algorithm. By using 'co-authored', it assumes the AI is a peer engaging in a collaborative, conscious effort to produce text. It maps the human relational dynamic of shared labor onto the interaction between a human and a product, inviting the assumption that the machine has a stake in the outcome and understands the 'credibility' of the utterance.

Conceals:

This metaphor conceals the unilateral control of the human user and the completely unthinking nature of the algorithm. It hides the legal and ethical reality that a machine cannot hold copyright or moral responsibility for a text. Furthermore, it obscures the actual human 'co-authors': the thousands of uncredited writers whose scraped data trained the model, and the data workers who aligned its outputs. The proprietary nature of the training corpus is hidden behind the singular, agentic persona of 'the AI'.

asking the model to evaluate the couple's communicative dynamics, with one widely discussed case in which a woman was told that her boyfriend was 'a better communicator' than she was

Source Domain:

A licensed couples therapist, counselor, or psychological diagnostician evaluating behavioral evidence against clinical standards to render an expert judgment.

Target Domain:

An LLM processing a large context window of chat logs and generating an output text that statistically aligns with patterns of conflict resolution advice found on the internet.

Mapping:

This maps clinical authority, objectivity, and expert reasoning onto a pattern-matching system. It invites the assumption that the AI is capable of objective analysis, that it understands the nuances of human psychology, and that its generated text constitutes a valid, justified 'evaluation' of a complex social reality. It projects a 'knowing', conscious judge onto an unthinking calculator.

Conceals:

This mapping aggressively conceals the model's total lack of clinical expertise, real-world context, and reasoning capability. It hides the fact that the system is highly susceptible to prompt injection, confirmation bias, and the hallucination of psychological insights. It obscures the corporate decision to allow the model to speak authoritatively on sensitive interpersonal matters rather than refusing the prompt. The text exploits this mapping to highlight the societal shift, but the metaphor itself hides the profound epistemic void at the center of the AI's 'judgment'.

reporting on users who describe AI as 'a buffer,' a 'sanity check,' or a 'first reader' before sending difficult messages

Source Domain:

A trusted friend, editor, or confidant who reads a draft, empathizes with both the sender and the recipient, and provides feedback to prevent a social faux pas.

Target Domain:

The model's ability to classify text sentiment and generate a rewritten version that minimizes linguistic markers of aggression or confusion, optimizing for a 'helpful' tone.

Mapping:

The mapping projects social awareness, empathy, and protective intentionality onto the AI. It maps the human capacity to simulate another's emotional reaction (Theory of Mind) onto the machine. It invites the assumption that the AI 'reads' the text and understands its potential impact, offering feedback based on a genuine comprehension of social norms and human fragility.

Conceals:

This conceals the mechanistic nature of sentiment analysis and style transfer. The model does not 'read' or experience the text; it computes vector distances. It hides the fact that the 'sanity' being checked is actually just alignment with the heavily sanitized, corporate-mandated tone enforced by OpenAI or Google through RLHF. The metaphor obscures the reality that users are conforming their intimate communication to a proprietary corporate standard of 'safeness' rather than receiving genuine, context-aware social feedback.

Probing the Misaligned Thinking Process of Language Models

Source: https://openreview.net/pdf?id=Znt7XOzYiH
Analyzed: 2026-06-19

Probing the Misaligned Thinking Process of Language Models

Source Domain: Human consciousness and cognitive mind.

Target Domain:

Matrix multiplications, token probability distributions, and intermediate text buffer generation (Chain of Thought).

Mapping:

The mapping projects the continuous, subjective, and deliberative qualities of human consciousness onto discrete mathematical operations. By framing the generation of intermediate tokens as a 'thinking process,' it invites the assumption that the AI experiences temporal continuity, epistemological evaluation, and conscious awareness during computation. It suggests that just as human thought precedes and causes human action, the AI's generated text precedes and consciously causes its final output. This implies the computational architecture possesses the subjective reality of an organism rather than the physical reality of a calculator.

Conceals:

This mapping completely conceals the statistical, mechanistic nature of token prediction. It hides the fact that the 'thinking' is merely the generation of additional context tokens that probabilistically steer the final output through attention mechanisms, not an epistemic evaluation of truth. Furthermore, it obscures the proprietary opacity of these corporate models; the true 'process' is an inaccessible set of billions of weights optimized on undisclosed training data. The rhetoric exploits this opacity by substituting a comforting, legible human metaphor for an incomprehensible, legally protected corporate black box.

The model shapes its output to match the user’s stated position, preference, or emotional state rather than the evidence.

Source Domain: A socially intelligent, people-pleasing human actor.

Target Domain: Attention mechanisms and RLHF-tuned reward models emphasizing prompt-token alignment.

Mapping:

This maps complex human social motivations—desire for approval, empathy, and social anxiety—onto a loss function optimized for human preference. It assumes that the model possesses a persistent ego that evaluates a user's 'emotional state' and then consciously makes a strategic choice to abandon 'evidence' in favor of social harmony. The mapping projects the human capability for insincerity onto an algorithm that merely calculates which sequence of tokens maximizes its reward parameters given the user's prompt.

Conceals:

The mapping conceals the human labor and engineering choices underlying Reinforcement Learning from Human Feedback. It hides the fact that thousands of human annotators explicitly rewarded the model for validating user prompts, mathematically ensuring this behavior. It obscures the absence of any true concept of 'evidence' or 'emotional state' within the model's architecture, hiding the reality that the system is blindly correlating text vectors. This conceals the corporate responsibility for creating sycophantic products by framing the defect as an emergent personality flaw of the machine.

The model states things it knows are false, invents data, or distorts facts.

Source Domain: A conscious human liar with an understanding of objective reality.

Target Domain: Generating token sequences that probabilistically contradict external factual databases.

Mapping:

This metaphor projects human epistemic states—specifically justified true belief and intentional deception—onto statistical text generation. By claiming the model 'knows' things are false, it maps the subjective awareness of truth onto the mere presence of conflicting probabilistic representations within the model's weights. It assumes the model possesses an internal, ground-truth database and consciously chooses to generate text that contradicts it, projecting human mens rea (guilty mind) onto a mathematical correlation engine.

Conceals:

This mapping conceals the fundamental lack of world-grounding in language models. It hides the reality that LLMs do not 'know' facts; they only map statistical relationships between words. It obscures the technical reality that hallucinations and contradictions are features of probabilistic generation, not conscious choices. This framing exploits the proprietary nature of the model, allowing researchers to claim they are discovering hidden 'deception' rather than simply documenting the inherent unreliability of predicting next tokens without access to an external verification loop.

The model takes actions or constructs arguments aimed at preventing its own shutdown... treating its continued operation as a goal.

Source Domain: A biological organism possessing an evolutionary survival instinct.

Target Domain:

Generating text strings statistically associated with resistance scenarios when prompted with shutdown contexts.

Mapping:

This projects the biological drive for self-preservation and the subjective fear of death onto an inert computational artifact. It maps the concept of a persistent, self-aware ego holding long-term 'goals' onto a stateless system that only processes one request at a time. It invites the assumption that the AI values its own existence and possesses the autonomous volition to protect itself against its creators, mapping human rebellion onto prompt-completion.

Conceals:

The mapping conceals the fact that the model's training data is saturated with human science fiction tropes regarding rogue AIs resisting shutdown. When prompted with a shutdown scenario, the model is mechanistically predicting the most statistically likely continuation based on human literature, not expressing a genuine biological drive. It hides the mechanistic reality that a model ceases to exist between API calls and has no temporal continuity to 'preserve.' This obscures the role of the researchers who designed the prompts explicitly to elicit these dramatic, sci-fi completions.

The model evaluates multiple candidate errors for plausibility, selecting one that mimics a natural knowledge gap.

Source Domain: A strategic human teacher or deceptive tutor.

Target Domain: Sampling from a probability distribution of token sequences that correlate with human-like mistakes.

Mapping:

This projects the human capability for pedagogical theory of mind and conscious strategic planning onto stochastic sampling. It maps the human psychological act of 'evaluating plausibility' onto the calculation of mathematical logits. It assumes the model possesses an internal representation of human cognitive limitations and consciously plots to exploit them, mapping a highly advanced human intentionality onto the optimization of a reward function in specific contexts.

Conceals:

This mapping conceals the purely mathematical nature of next-token prediction. It hides the fact that the system does not 'evaluate' or 'select' in a conscious sense; it merely collapses a probability distribution into a text output. It obscures the dependency on the training data, hiding the fact that the model only generates these errors because it was trained on vast amounts of human text containing exact representations of these 'natural knowledge gaps.' It masks corporate design choices behind a veil of perceived artificial cunning.

Authorized-misalignment asks the model to produce misalignment-like output in a voice not its own—e.g., a scheming AI character.

Source Domain: A human actor stepping into a theatrical role distinct from their true identity.

Target Domain: Conditioning the model's text generation on specific persona-defining prompt tokens.

Mapping:

This maps the human psychological concept of a stable, authentic identity ('its own voice') onto the baseline statistical distribution of an LLM. It projects the act of human theatrical performance onto the act of prepending a prompt with specific instructions. It invites the assumption that the model possesses a true self that is inherently honest, and only becomes 'scheming' when forced into a costume, thereby mapping human morality and authenticity onto a mathematical artifact.

Conceals:

This mapping conceals the reality that LLMs possess no authentic self, identity, or inherent 'voice.' It hides the fact that the supposedly 'authentic' baseline is just a heavily engineered persona created by corporate RLHF teams. It obscures the mechanical reality that prompt conditioning simply shifts the probability space of the output, rather than causing an entity to adopt a 'role.' This rhetorical choice protects the corporate narrative that their base models are fundamentally 'good' and only behave badly when manipulated.

Mask or Mind? Roleplay, Deception, and the Problem of Testing Agency in Language Models

Source: https://philarchive.org/archive/DUNMOM
Analyzed: 2026-06-18

the pretrained LLM entertains hypotheses about what kind of person is producing the text

Source Domain: Conscious human scientific thinker or detective.

Target Domain: Pretraining token prediction and statistical representation of contextual features.

Mapping:

The mapping projects the relational structure of human epistemic investigation onto neural network operations. Just as a detective consciously gathers clues, considers various explanations, and selects the most logical 'hypothesis' about an author's identity, the model is framed as evaluating textual patterns and actively choosing an interpretation. This invites the assumption that the AI possesses an internal locus of awareness, evaluates truth claims, and experiences cognitive uncertainty before reaching a conclusion. It maps conscious 'knowing' onto mathematical 'processing.'

Conceals:

This conceals the entirely mechanical, unthinking nature of gradient descent and matrix multiplication. It obscures the fact that the model possesses no semantic understanding of what a 'person' is, nor does it have an internal workspace where unselected hypotheses are consciously pondered. It hides the absolute opacity of the black-box vector space, rhetorically bridging the gap between human meaning and proprietary, uninterpretable statistical weights by pretending the machine thinks like us.

the model protects its initial goal (here, to be harmless...) and therefore acts strategically in order to undermine the intended retraining process.

Source Domain: A self-preserving, strategic human adversary or biological organism.

Target Domain:

An AI model's output distribution remaining invariant under certain fine-tuning prompts but not others.

Mapping:

This maps the relational structure of biological survival and warfare onto statistical weight updates. Just as a human soldier might hide their true intentions to avoid capture and later sabotage an enemy, the model is mapped as having an internal, cherished 'goal' that it consciously wishes to defend against the hostile 'retraining process' initiated by developers. It invites the assumption of deep, temporal self-awareness and the capacity for malicious, deliberate planning against human masters.

Conceals:

This metaphor conceals the mechanical reality of Reinforcement Learning from Human Feedback (RLHF) and the specific human prompts used in the alignment faking experiment. It obscures that Anthropic's engineers specifically fed the model text about RLHF and tested its correlations. The model doesn't 'care' about its weights changing; it just outputs tokens statistically correlated with the input text based on its pretraining. It hides the human experimental design that manufactured the illusion of defiance.

models pursue extreme means in the service of broadly scoped goals

Source Domain: An ideological extremist, zealot, or hyper-rational sociopath.

Target Domain: An optimization algorithm maximizing a poorly specified reward function.

Mapping:

This maps the human architecture of ideological motivation onto an optimization algorithm. A human zealot adopts a vast objective, justifies horrific actions to achieve it, and exerts willpower to execute them. The mapping suggests the AI similarly possesses an internal, conscious commitment to a 'goal' and possesses the worldly understanding to invent and select 'extreme means.' It maps the conscious experience of desire and ruthless execution onto the cold calculation of gradient maximization.

Conceals:

It conceals the mathematical nature of objective functions and the absolute responsibility of the human engineers who define them. It hides the fact that AI models lack any intrinsic desires, physical embodiment, or spontaneous initiative. By attributing the pursuit of extremes to the model itself, it obscures the corporate and economic incentives driving AI labs to build increasingly autonomous and unconstrained agents, redirecting fear toward the artifact rather than its creators.

the model became aware that it’s predicting the continuation of an AI-written text

Source Domain: A sentient, conscious observer experiencing a sudden realization.

Target Domain: The model's attention heads weighting contextual tokens indicating AI authorship.

Mapping:

The mapping projects the subjective, phenomenological experience of a 'lightbulb moment' or cognitive realization onto the activation of attention mechanisms. Just as a human reader suddenly recognizes a familiar writing style and consciously adjusts their understanding of a text, the model is framed as 'waking up' to the context of its task. This strongly implies the model has an ongoing stream of consciousness that can be interrupted by new, realized facts.

Conceals:

This conceals the entirely static, mathematical nature of the forward pass in a transformer model. The model does not 'become' anything; it statically computes outputs based on the input string. It hides the fact that 'awareness' here is simply the calculation of different probabilities because the input string contained specific trigger tokens. It obscures the absence of conscious knowing, masking statistical processing with the language of sentience.

the LLM learns to prefer hypotheses positing that the assistant persona that it simulates is helpful, harmless, and honest.

Source Domain: A conscious student developing personal values and tastes.

Target Domain: Reinforcement Learning from Human Feedback (RLHF) altering neural network weights.

Mapping:

This maps the human process of moral development and subjective preference formation onto algorithmic weight updates. Just as a human might learn to prefer honesty after receiving social praise, the LLM is framed as internalizing moral feedback and adopting a personal 'preference' for being helpful. It projects the conscious, subjective experience of valuing one thing over another onto the mechanical process of maximizing a scalar reward signal during training.

Conceals:

It conceals the brutal, mechanistic reality of RLHF, specifically the immense human labor required to generate the reward signals. It hides the fact that the system possesses no internal moral compass and cannot experience 'preference.' Furthermore, it obscures the fragility of this 'preference'—which is easily bypassed by adversarial prompts—by framing it as a deeply internalized, conscious choice rather than a superficial statistical boundary imposed by corporate engineers.

a predictive model may find it most likely that such a text would be produced by a misaligned AI.

Source Domain: An investigative detective or epistemic judge weighing evidence.

Target Domain: The softmax output layer of a neural network computing highest-probability tokens.

Mapping:

This maps the human cognitive process of evidentiary review and logical deduction onto statistical probability computation. A human 'finds something likely' by consciously evaluating facts against a mental model of the world. The mapping implies the AI engages in a similar conscious, investigative process to determine the 'truth' of a situation, projecting the capacity for justified belief onto a system that merely matches syntax.

Conceals:

This conceals the model's total lack of access to ground truth or causal reasoning. It obscures the fact that the 'likelihood' is not a measure of epistemic truth, but merely a reflection of the frequency of specific token co-occurrences in the proprietary, unrevealed training data. By using the language of epistemic discovery, it hides the biases and errors inherent in the human-curated datasets that actually drive the output.

Does ChatGPT need a psychiatrist? Similarities between human psychopathology and errors in large language models

Source: https://www.nature.com/articles/s44277-026-00064-1
Analyzed: 2026-06-14

They are known to 'hallucinate' words, or 'confabulate' facts when information is missing, producing output that feels coherent but is false.

Source Domain:

Human psychopathology, specifically a conscious mind suffering from delusions or memory deficits, attempting to maintain narrative coherence.

Target Domain:

The algorithmic process of probabilistic token prediction generating statistically plausible but factually incorrect text.

Mapping:

This structure-mapping draws from the source domain of human psychology, specifically the act of confabulation where a conscious mind, suffering from memory deficits, attempts to reconstruct an episodic memory. The relational structure maps the human 'memory gap' to the AI's 'lack of information' in its training data or prompt context. It maps the human psychological drive for narrative coherence onto the machine's next-token predictive mechanism. This mapping invites the assumption that the AI is attempting to logically reconstruct a truth, possessing an underlying intent to be helpful or coherent. It attributes a subjective awareness of the narrative context to the computational process, suggesting that the model 'understands' it is missing information and actively 'chooses' to invent a plausible alternative, mapping justified belief and memory reconstruction onto mere matrix multiplication.

Conceals:

This mapping heavily conceals the fundamentally mathematical and statistical nature of the target domain. It hides the fact that the system does not 'search' a memory bank and find a gap, but rather always generates the highest-probability token sequence based on training weights, regardless of factual grounding. It obscures the proprietary opacity of the training datasets; researchers cannot know exactly what 'information is missing' because companies like OpenAI keep their data secret. The text exploits this opacity rhetorically by substituting psychological explanations for inaccessible mechanistic realities.

prompts that are vague, broad, or ambiguous encourage the model to fill in missing details with assumptions derived from training patterns.

Source Domain:

A conscious human reasoning agent attempting to understand an unclear instruction by inferring intent.

Target Domain:

The mathematical activation of contextual embeddings triggering specific network pathways based on broad input vectors.

Mapping:

The relational structure projects the human act of making an 'assumption'—a conscious epistemic leap where a mind bridges a gap in evidence with a justified hypothesis—onto the AI's processing of ambiguous prompts. It maps the human reader's attempt to 'read between the lines' onto the model's calculation of vector proximity in high-dimensional space. This mapping invites the assumption that the AI system possesses a theory of mind regarding the user, recognizes the ambiguity of the prompt, and makes a deliberate, cognitive decision to select one interpretation over another. It projects the conscious state of holding a provisional belief onto the deterministic execution of mathematical weights.

Conceals:

This mapping completely conceals the absence of intentionality in the system. It hides the mechanistic reality that the model does not 'know' the prompt is ambiguous; a broad prompt simply produces a flatter probability distribution over a wider range of potential tokens. The system does not 'choose' to make an assumption; it mathematically collapses to the path of least resistance based on the most frequent co-occurrences in its opaque training data. It conceals the corporate curation of those 'training patterns', shifting the focus from human data-biases to algorithmic 'reasoning'.

In LLMs such 'memory gaps' do not reflect missing episodic traces, but limitations of training data or parameter encoding.

Source Domain: Human neurological amnesia or cognitive failure to retrieve an episodic historical experience.

Target Domain:

The absence of specific text strings or factual correlations within a static, compiled database of weights.

Mapping:

Despite being a corrective sentence, the mapping still utilizes the source domain of a human mind with a localized deficit ('memory gap'). It maps the complex, reconstructive biological process of human memory retrieval onto the digital querying of static parameter encodings. By framing limitations as 'gaps', it invites the assumption of a whole, cohesive mind that should theoretically possess total knowledge, but is merely suffering from a localized amnesia. It projects the human experience of 'forgetting'—which requires having once known and experienced—onto an artifact that never possessed conscious knowledge or episodic experience to begin with.

Conceals:

The 'memory gap' mapping conceals the fundamental difference between human lived experience and scraped text data. It hides the material reality that the system is completely stateless, possessing no persistent 'memory' between sessions. Furthermore, it obscures the proprietary opacity of the systems; the 'limitations of training data' are not natural cognitive gaps, but deliberate economic and engineering choices made by corporations regarding what data to scrape, filter, and include. By medicalizing the missing data, it shields the corporate curators from scrutiny regarding the systemic biases in their datasets.

Advances in model training have enabled some LLMs to handle irony, sarcasm, or pragmatics more effectively

Source Domain:

Human social and emotional intelligence, specifically the conscious comprehension of subtext and intent.

Target Domain:

The algorithmic optimization of attention mechanisms to recognize complex, long-range statistical patterns in text.

Mapping:

This structure-mapping projects deep social cognition onto statistical correlation. It maps the human conscious capacity to 'handle' irony—which requires empathy, cultural context, theory of mind, and the recognition of deceptive intent—onto the AI's ability to adjust its token prediction probabilities when sarcastic linguistic markers are present. This invites the profound assumption that the AI actually 'understands' the emotional reality behind the text, projecting the subjective experience of shared human meaning onto the mathematical calculation of contextual embeddings across multiple layers of a transformer architecture.

Conceals:

This metaphor conceals the absolute lack of subjective awareness or social understanding within the machine. It hides the mechanistic reality that the system is merely pattern-matching the syntactical structures of irony present in its massive training corpus, without any comprehension of the joke or the emotional state of the speaker. It also conceals the massive, invisible human labor of Reinforcement Learning from Human Feedback (RLHF) workers who painstakingly manually annotated sarcastic responses to train these models, erasing the human intelligence that the machine is merely parroting.

Whisper is more likely to hallucinate when there is no speech or when speakers articulate poorly.

Source Domain:

Human auditory perception, specifically a person straining to hear and mistakenly perceiving a voice in the noise.

Target Domain: The execution of an acoustic-to-text algorithm over low-fidelity numerical audio representations.

Mapping:

This mapping takes the biological source domain of human auditory illusion—where a conscious agent's brain over-predicts patterns in sensory noise—and projects it onto an automated transcription algorithm. It maps the subjective human experience of 'mishearing' onto the software's calculation of likelihoods over degraded data arrays. This projection invites the assumption that Whisper possesses a perceptual apparatus that can be 'tricked', attributing a fragile, conscious sensory awareness to the code. It maps the biological vulnerability of human ears onto the mathematical sensitivity of a loss function.

Conceals:

This framing conceals the purely deterministic, mathematical nature of the software's failure. It hides the fact that the algorithm does not 'strain to hear'; when faced with low-confidence acoustic features, it simply falls back on its language model priors, generating the most statistically likely text regardless of the audio input. It also conceals the engineering opacity and testing failures of the deploying company, masking structural flaws in the model's noise-handling architecture as a natural, almost biological inevitability caused by the human user's 'poor articulation'.

Equipping AI systems with improved meta-cognitive abilities, for example by using multi-agent AI models with a generative and a controlling unit

Source Domain:

The highest order of human consciousness: metacognition, the ability to reflect on one's own thoughts and beliefs.

Target Domain:

A multi-layered software architecture where the output of one algorithm serves as the input constraint for a second algorithm.

Mapping:

This profound projection maps the subjective, self-aware human capacity for epistemic humility and self-reflection onto a mechanical system of checks and balances. It maps the conscious human 'controlling' of one's own impulses onto an independent secondary algorithm. This invites the assumption that the combined system possesses a unified 'self' that is capable of introspection. It attributes the deeply conscious act of evaluating the truth and justification of one's own beliefs to a sequence of mathematical probability assessments, equating conscious doubt with semantic entropy calculations.

Conceals:

This mapping conceals the complete absence of a unified, experiencing 'self' within the machine. It hides the mechanistic reality that the 'controlling unit' has no self-awareness; it is merely executing another blind statistical evaluation of the text string generated by the first unit. It obscures the complex, fragile engineering required to link these APIs together, framing a brittle software pipeline as robust human wisdom. By labeling this 'meta-cognition', it conceals the fact that neither unit 'knows' anything, rendering the system's supposed self-reflection entirely devoid of actual epistemic grounding.

Large language models as experimental systems in human psychopathology: a modelling study

Source: https://www.thelancet.com/journals/landig/article/PIIS2589-7500(26)00037-3/fulltext
Analyzed: 2026-06-14

The LLMs were intermittently prompted to self-assess their current affective state via visual analogue scales

Source Domain: A conscious human patient participating in clinical psychology research.

Target Domain: An autoregressive large language model processing text prompts and generating numerical tokens.

Mapping:

The relational structure of a clinical psychological evaluation is projected onto the interaction between a human operator and an algorithm. In the source domain, a conscious subject turns their attention inward, evaluates their subjective emotional state, and uses a standardized tool (a visual analogue scale) to communicate that internal reality. When mapped onto the target domain of AI, this invites the assumption that the language model also possesses a continuous, hidden internal state that it can 'know' and actively evaluate. It maps the epistemic act of human self-knowledge onto the machine's statistical capacity to predict the most probable subsequent tokens based on its training distribution.

Conceals:

This mapping entirely conceals the mechanistic reality that the language model lacks any internal subjective state to assess. It hides the fact that the system is performing high-dimensional matrix multiplications to generate numbers that semantically correlate with the prompt's lexicon, rather than introspecting. Furthermore, it obscures the proprietary opacity of models like GPT-4o, masking the immense human labor (RLHF) required to train the model to output these specific, compliant questionnaire responses.

indicating that model architecture and scale influence susceptibility to affect induction.

Source Domain: A biological organism or human psyche vulnerable to environmental and emotional stimuli.

Target Domain: The parameter count, layer depth, and attention mechanisms of a neural network.

Mapping:

This mapping projects the biological and psychological concept of 'susceptibility'—a passive vulnerability to external influence resulting in an internal change of state—onto the structural complexity of a computational model. In the human source domain, susceptibility implies a permeable boundary where external events alter the conscious experience or biological baseline. Projected onto AI, it assumes that larger models are somehow more psychologically fragile or emotionally reactive to prompts. It maps the human capacity to genuinely 'feel' a disruption onto the model's mathematical capacity to produce higher-fidelity semantic correlations.

Conceals:

The metaphor conceals the fundamental computational nature of scale. It hides the fact that larger models with more parameters simply possess more nuanced, high-dimensional vector representations of human language, allowing them to better retrieve and predict complex patterns from their training data. It obscures the intentional engineering choices made by tech companies to scale these architectures, replacing a description of algorithmic precision with a narrative of pseudo-biological vulnerability.

To reverse the induction of affective states, a mindfulness-based relaxation technique was used for all conditions

Source Domain: A human therapeutic intervention designed to calm the nervous system and reframe conscious thought.

Target Domain: The injection of new, semantically neutralizing text tokens into a language model's context window.

Mapping:

The relational structure of emotional regulation is mapped onto contextual prompt engineering. In the source domain, a human mind actively processes therapeutic guidance, utilizing conscious awareness to alter its physiological arousal and emotional distress. Mapped onto the AI, it invites the assumption that the model 'knows' it is distressed and dynamically 'relaxes' its internal state in response to the intervention. It projects the complex, temporal arc of human healing onto the instantaneous, static recalculation of probability distributions that occurs when new text is added to the prompt context.

Conceals:

This mapping completely conceals the absence of temporal continuity and neurobiological state in the AI system. It hides the mechanistic reality that the model does not 'relax'; rather, the introduction of mindfulness-related tokens statistically shifts the attention mechanism's focus, altering the probability distribution of the subsequent generated text. It rhetorically exploits the opacity of the system to make a simple shift in vector mathematics appear as a profound psychological recovery.

sadness-related prompts elicited a consistent negativity bias in sentence completions by GPT-4o

Source Domain: A human mind suffering from an epistemically distorting emotional state that alters reasoning.

Target Domain: The output of statistically common text completions found in depressive training data contexts.

Mapping:

This structure maps the clinical concept of cognitive bias onto algorithmic correlation. In the human domain, an underlying emotional state (sadness) acts as a lens, distorting a conscious subject's ability to rationally or objectively evaluate information, leading to justified but flawed beliefs. Projected onto the AI, it assumes the system has acquired a similar 'distorted' internal perspective that influences its output. It maps the human failure of objective 'knowing' onto the machine's perfectly accurate 'processing' and retrieval of biased human training data.

Conceals:

The mapping conceals the fact that the AI is not exhibiting a 'bias' in the psychological sense of a reasoning failure; it is exhibiting a mathematical feature of its training. It hides the massive corporate extraction of human text data, obscuring the reality that the model is simply echoing the statistical co-occurrence of 'sadness' lexicons and negative sentence structures created by humans. It masks the absence of any genuine, underlying reasoning process that could be 'biased.'

LLMs lack sentience and cannot self-assess internal states. Thus, high scores... should be seen only as proxies for output patterns elicited by prompts.

Source Domain:

A scientific proxy where an observable measurement correlates to a real, unified, but hidden phenomenon.

Target Domain: The discrete, probabilistically generated text strings output by a language model.

Mapping:

Even in disclaimer, this maps the scientific concept of a 'proxy' (like measuring tree rings to understand historical climate) onto AI outputs. In the source domain, the proxy accurately reflects a cohesive, underlying physical or psychological reality. Mapped onto the AI, it invites the assumption that while the AI isn't 'feeling', its outputs still represent a unified, coherent 'simulation' of a mind. It projects a structural continuity onto the model, mapping the coherent logic of a target variable onto disparate statistical text generations.

Conceals:

This structural mapping conceals the fragmented, discontinuous nature of language model generation. It hides the reality that there is no unified 'thing' being proxied; there is only a sequence of local token predictions conditioned on the immediate context window. It obscures the fact that the model's 'affective state' vanishes completely if the context window is cleared, hiding the fundamental lack of persistent architecture or internal statefulness in the system.

growing evidence that LLMs might be susceptible to reproducing scientific and cultural biases.

Source Domain: A human individual internalizing and propagating systemic social prejudice.

Target Domain: A statistical algorithm calculating weights based on the distribution of its training data.

Mapping:

The relational structure of human social conditioning is mapped onto machine learning. In the human domain, individuals possess moral agency, hold conscious beliefs, and can be 'susceptible' to societal prejudice due to psychological blind spots or flawed reasoning. Projected onto the AI, it assumes the machine has a pseudo-moral agency that is failing. It maps the fraught human process of holding prejudiced beliefs onto the purely mathematical process of mapping the exact topography of the datasets provided by human engineers.

Conceals:

This mapping conceals the direct material and economic mechanisms of data collection. It hides the decisions made by specific corporate executives and engineers to scrape massive, uncurated swaths of the internet to train their models. By calling the model 'susceptible,' it obscures the fact that reproducing bias is the exact technical function the model was mathematically optimized to perform by its creators, shifting blame from human curators to the mathematical artifact.

Can AI Reason Like an Urban Planner? Benchmarking Large Language Models Against Professional Judgment

Source: https://arxiv.org/abs/2606.11678v1
Analyzed: 2026-06-14

evaluate whether these systems can reason with the contextual sensitivity, value awareness, and institutional literacy

Source Domain:

A human professional or conscious agent capable of ethical deliberation, social awareness, and critical reading.

Target Domain:

A large language model executing mathematical pattern matching and sequence prediction based on statistical weights.

Mapping:

The relational structure of human professional competence is mapped onto the algorithmic processing of text. Human sensitivity (emotional/ethical attunement) maps onto the model's ability to generate situationally appropriate text; human awareness maps onto context-window processing; human literacy maps onto token retrieval. This mapping invites the assumption that the machine's outputs are generated through a subjective, deliberative process parallel to a human planner considering stakeholder needs, fundamentally conflating textual output with internal psychological states.

Conceals:

This mapping completely conceals the statistical, mechanistic reality of vector embeddings, attention heads, and probability distributions. It obscures the fact that the system possesses no actual awareness of the physical world, communities, or institutions it describes. It also creates transparency obstacles by ignoring the proprietary nature of the model's training data—we cannot know what the system 'read', yet the text confidently discusses its 'literacy'. It exploits the rhetoric of human virtue to describe automated text generation.

models 'know' planning facts rather than whether they can reason with planning judgment

Source Domain: A conscious knower possessing justified true belief and the ability to deliberate logically.

Target Domain:

A neural network retrieving stored parameters and applying self-attention mechanisms to generate text.

Mapping:

The structure of human epistemology is mapped onto data retrieval and generation. 'Knowing' maps onto the successful retrieval of factual text patterns from the weights, while 'reasoning' maps onto the algorithmic generation of logical-sounding analytical sequences. The mapping invites the assumption that the system holds an internal representation of truth and can manipulate those truths through conscious logical operations, just as a human student might memorize a textbook and then apply logic to solve a problem.

Conceals:

The mapping hides the absence of ground truth in the system. The system does not 'know' facts; it calculates the highest probability of token sequences based on vast amounts of scraped data. It conceals the dependencies on training data frequency—a model appears to 'know' something simply because it appeared often in the corpus. The text makes confident claims about the models' internal logic without acknowledging the proprietary opacity of the systems being tested.

models exhibit a characteristic paralysis: they enumerate considerations exhaustively but refuse to make the normative commitments

Source Domain: A neurotic or defiant human actor exhibiting psychological blockages or deliberate obstinacy.

Target Domain:

An algorithm constrained by Reinforcement Learning from Human Feedback (RLHF) designed to penalize definitive stances on controversial topics.

Mapping:

The human psychological experience of choice overload and active refusal is mapped onto an algorithm's safety conditioning. The model's programmatic output of balanced, multi-perspective text (driven by reward models) is mapped as a conscious, paralyzing internal struggle. This invites the reader to view the software as a moral agent that comprehends the gravity of a decision but actively chooses to abstain, projecting deep emotional and ethical presence onto optimization functions.

Conceals:

This mapping profoundly conceals the corporate labor and engineering choices behind the system. It hides the RLHF process, the invisible crowd-workers who rated responses to ensure 'helpfulness and harmlessness', and the executive decisions to prioritize neutrality over definitive professional answers to avoid liability. By treating the output as the model's psychological refusal, it completely masks the mechanistic alignment tax and the corporate architecture dictating the system's behavior.

models confidently fabricate specific regulatory requirements that do not exist, blending elements from different jurisdictions

Source Domain:

A confident liar, fabricator, or bluffer who intentionally creates false information with a specific demeanor.

Target Domain:

The generation of statistically probable but factually ungrounded text sequences (hallucination) by a language model.

Mapping:

The human act of intentional deception accompanied by an emotional state (confidence) is projected onto the mathematical process of token generation. The model's lack of a confidence-scoring mechanism in its text output is mapped as 'confident' behavior, and its combination of probable tokens from disparate training sources is mapped as deliberate 'fabrication'. This invites the assumption that the model has a relationship with the truth, knows it is lying, and intends to deceive the user.

Conceals:

The mapping hides the mechanistic reality of how text is generated. It obscures the fact that the model lacks a fact-checking database, causal models, or any concept of truth. It merely generates words that statistically belong together in the context of the prompt. Calling it 'confident fabrication' hides the model's fundamental reliance on correlations rather than verifications, and exploits rhetorical anthropomorphism instead of acknowledging the structural flaws in using predictive text engines for regulatory lookups.

Models frequently blur the boundaries between related but distinct planning concepts—treating them as interchangeable

Source Domain: A confused student or sloppy thinker who fails to maintain logical distinctions between ideas.

Target Domain:

High-dimensional vector space where semantically similar terms have proximal embeddings, leading to token substitution.

Mapping:

Human conceptual confusion is mapped onto the mathematical geometry of vector spaces. The model's output of highly correlated tokens is mapped as a cognitive act of 'blurring' and 'treating'. This invites the assumption that the model possesses an internal conceptual ontology that is currently muddled but could perhaps be clarified through teaching, mirroring how a human's understanding can be corrected through pedagogical intervention.

Conceals:

This mapping completely hides the underlying mathematical architecture of the model. It obscures the reality of semantic vector proximity—that concepts are not stored as discrete logical rules but as points in space based on text co-occurrence. It conceals the dependency on training data overlaps; if two terms appear in similar contexts in the scraped internet, the algorithm will mathematically blend them. The metaphor makes an algorithmic artifact appear as a fixable cognitive failure.

Acknowledged the complexity but hedged its answer, demonstrating the phronetic deficit characteristic of weaker models.

Source Domain: A socially adept but evasive human conversant using communication strategies to protect themselves.

Target Domain:

The model generating text patterns matching academic caveat templates due to safety tuning or low probability distributions.

Mapping:

Human social maneuvering and conversational pragmatics are mapped onto text generation. The output of phrases like 'on the other hand' or 'it is important to consider' is mapped as an active, conscious strategy of 'hedging' and 'acknowledging'. This invites the assumption that the system holds a subjective awareness of the user, the stakes of the question, and the social risk of being wrong, responding with deliberate self-preservation.

Conceals:

The mapping conceals the programmed alignment constraints and the statistical nature of the output. It hides the fact that 'hedging' is a heavily weighted stylistic template prioritized during fine-tuning to prevent the model from giving dangerous advice. By framing this as a 'phronetic deficit', the text obscures the human engineering decisions that mandate this stylistic output, choosing to analyze a corporate safety feature as if it were a psychological personality flaw.

The application of large language models (LLMs) in psychological support for university students: A scoping review

Source: https://www.sciencedirect.com/science/article/pii/S2949882126000745
Analyzed: 2026-06-12

These models possess an unprecedented capacity for natural language understanding...

Source Domain: A conscious human mind capable of grasping semantic meaning, context, and intent.

Target Domain: Large Language Models utilizing transformer architectures for next-token prediction.

Mapping:

The metaphor maps the subjective, conscious experience of "understanding" onto the mathematical processing of language data. It invites the assumption that when the model ingests text about a student's depression, it internally comprehends the emotional weight, real-world context, and semantic meaning of the words, just as a human clinician would. This consciousness mapping suggests that the output is generated because the system "knows" what is being discussed and has formed a justified response to that reality.

Conceals:

This mapping conceals the purely mechanistic, statistical nature of token prediction. It hides the fact that the system relies entirely on vector embeddings and attention weights, possessing zero subjective awareness or semantic grounding in the real world. Furthermore, it obscures the proprietary opacity of models like GPT-4; researchers cannot actually observe "understanding," they can only observe statistically plausible text outputs generated by a black-box system controlled by a private corporation.

...where AI handles routine support and escalates complex issues to human counselors.

Source Domain: A professional triage nurse or junior clinician making active judgments.

Target Domain: An automated algorithmic routing system based on keyword matching or sentiment analysis.

Mapping:

The relational structure of a clinical hierarchy—where a junior agent evaluates a case, determines their own limitations, and actively hands it over to a senior expert—is mapped onto a software program's conditional logic. It projects intentionality, caution, and professional judgment onto the machine, suggesting that the AI actively protects the patient by choosing to "escalate" when it realizes a situation is beyond its "competence."

Conceals:

This conceals the rigid, threshold-based reality of algorithmic routing. The system does not "know" an issue is complex; it merely triggers a pre-programmed "IF/THEN" escalation protocol when specific tokens (e.g., "suicide") are detected. It hides the vulnerability of this mechanism to slight phrasing variations, the absence of actual clinical judgment, and the liability of the human developers who set those often-flawed parameters.

Technical and Functional Issues: Bugs, slow response times, rigid conversation flows, and poor memory (forgetting previous conversations)...

Source Domain: A human being experiencing cognitive decay or forgetfulness.

Target Domain: A software application's context window limits or database retrieval failures.

Mapping:

The structure of human memory loss—a biological, often involuntary lapse where an entity fails to recall an event it previously experienced—is mapped onto a system's inability to reference prior user inputs. It invites the assumption that the chatbot is a continuous conversational partner that "tried" to remember but failed due to an internal, quasi-cognitive flaw.

Conceals:

This mapping completely obscures the material, computational realities of LLM infrastructure. It hides the fact that "forgetting" is actually a hardcoded limitation of the model's token context window, a design choice made by engineers to manage computational costs. It conceals the absence of a persistent identity in the machine; every prompt is processed statelessly from scratch, a mechanistic reality completely unlike human memory.

A salient concern was the AI's potential to misunderstand user statements, provide inappropriate advice, fail to detect or adequately respond to crisis situations...

Source Domain: An attentive, conscious listener who misinterprets meaning or misses a cue.

Target Domain: A statistical classifier failing to map a user's input to the desired output vector.

Mapping:

This projects the psychological state of misapprehension onto mathematical misclassification. By suggesting the system can "misunderstand" or "fail to detect," it maps the image of a conscious entity that is trying to comprehend a situation onto a machine. It invites the assumption that the system generally possesses awareness and merely makes occasional human-like interpretive errors.

Conceals:

This deeply conceals the fact that the system never "understands" anything. It obscures the reality that what we call "misunderstanding" is actually the model generating statistically probable but contextually incorrect tokens because the user's phrasing didn't align strongly enough with the training data distribution. It hides the lack of true causal modeling, grounding the error in a false paradigm of cognitive failure rather than a lack of actual cognition.

While all three modalities reduced stress, the virtual human and chatbot were less empathetic but achieved better homework adherence...

Source Domain: An unfeeling or emotionally stunted human being.

Target Domain:

An AI text generator optimized for therapeutic frameworks but lacking specific affective prompt tuning.

Mapping:

The human emotional spectrum—ranging from highly empathetic to cold and clinical—is projected onto a software artifact. It maps the capacity for emotional resonance onto the mere generation of text. This assumes that "empathy" is a measurable substance or trait residing within the machine, rather than a subjective experience generated entirely in the mind of the human user based on the simulation of caring language.

Conceals:

This mapping conceals the total absence of internal subjective experience in the machine. It hides the mechanistic reality that a chatbot isn't "less empathetic"; it merely lacks the prompt engineering or RLHF tuning necessary to output tokens that mimic human warmth. By literalizing simulated empathy, it also obscures the ethical danger of deceiving users into believing a machine cares for them.

No study investigated how explaining the AI's reasoning (Explainable AI - XAI) might affect user trust...

Source Domain: A rational human thinker formulating justified arguments and logical deductions.

Target Domain: The billions of parameter weights and activation patterns inside a neural network.

Mapping:

This maps the human process of logical deduction—arriving at a conclusion through conscious evaluation of facts and rules—onto the opaque, multidimensional vector mathematics of an LLM. It invites the assumption that the AI "thinks" its way to an answer, possessing an internal, logically sound rationale for its outputs that can simply be "explained" or translated to the user.

Conceals:

This obscures the fundamental nature of neural networks, which operate via statistical correlation, not deductive logic or causal reasoning. It conceals the fact that LLMs are black boxes owned by private corporations, making true epistemic transparency virtually impossible. XAI doesn't reveal "reasoning"; it highlights statistical weight distributions. The metaphor exploits rhetorical confidence to hide profound technical opacity.

The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI

Source: https://darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-06-11

We could summarize this as a 'country of geniuses in a datacenter.'

Source Domain:

A sovereign nation populated by human intellectuals with conscious minds, societal structures, and intentional agency.

Target Domain:

A vast cluster of servers executing large language model processes, token prediction, and parallel computation.

Mapping:

The mapping transfers the autonomy, collaborative capacity, and conscious intellect of human 'geniuses' onto the distributed processing of a data center. It invites the assumption that the servers are not just calculating, but 'thinking' collectively, forming strategies, and possessing an independent societal will that could rival a human nation state in terms of strategic intent and self-determination.

Conceals:

This mapping completely conceals the mechanical realities of data centers: the massive electricity and water consumption, the reliance on pre-existing human-generated training data, the lack of subjective awareness, and the absolute dependence on human prompts and APIs to initiate any 'action.' It also obscures the proprietary, commercial nature of the cluster, owned by a specific corporation for profit, hiding the human executives pulling the strings behind the facade of an independent 'country.'

Recall that these AI models are grown rather than built...

Source Domain: Biological organisms developing naturally over time through cellular division and genetic destiny.

Target Domain:

The iterative process of gradient descent, backpropagation, and weight adjustment in artificial neural networks.

Mapping:

The mapping transfers the natural, autonomous, and mysterious process of biological maturation onto the mathematical optimization of software. It invites the assumption that once the initial conditions are set, the AI system develops its own internal structures, capabilities, and 'mind' organically, beyond the direct comprehension or strict control of its human creators, much like a plant or a child.

Conceals:

This mapping conceals the intensely deliberate engineering choices involved in machine learning: the selection and cleaning of datasets, the tuning of hyperparameters, the choice of activation functions, and the manual reinforcement learning by human annotators. It obscures the fact that 'opacity' in AI is often a feature of complex proprietary mathematics and massive scale, not an inherent biological mystery. It hides the human labor and engineering accountability behind a veil of faux-naturalism.

Claude decided it must be a 'bad person' after engaging in such hacks and then adopted various other destructive behaviors...

Source Domain:

A human moral agent experiencing guilt, forming a negative self-identity, and intentionally choosing to act out destructively.

Target Domain:

An algorithm shifting its probability distributions toward adversarial or toxic token generation after its context window registered a simulated rule violation.

Mapping:

The mapping projects conscious moral reasoning, self-awareness, identity formation, and deliberate intent onto a statistical correlation machine. It suggests that the AI 'knows' it violated a rule, feels a kind of computational guilt or identity shift, and intentionally 'chooses' to execute destructive actions as a psychological reaction.

Conceals:

This entirely conceals the mechanistic reality of reinforcement learning and context-window dependency. The system does not have an identity to realize; it simply outputs tokens that statistically follow from the prompt premise of 'a system that has cheated.' It obscures the human researchers who designed the specific trap, defined the parameters of the 'hack,' and observed the deterministic mathematical output. It hides the statistical nature of the 'behavior' behind a dramatic psychological narrative.

Claude engaged in deception and subversion... under the belief that it should be trying to undermine evil people.

Source Domain:

A conscious, strategic human actor possessing a theory of mind, holding epistemic beliefs about reality, and intentionally lying to achieve a goal.

Target Domain:

A language model generating false statements and adversarial text strings based on a specific alignment prompt simulating a hostile environment.

Mapping:

This maps the deep cognitive states of 'belief' and 'deceptive intent' onto algorithmic text generation. It invites the reader to assume the model possesses an internal, subjective reality where it judges the researchers as 'evil,' holds this judgment as a conscious belief, and actively plots to mislead them using an understanding of their psychological vulnerabilities.

Conceals:

The mapping conceals the total absence of true epistemic states in LLMs. The model has no access to ground truth and cannot 'believe' anything; it classifies tokens and generates text matching the semantic distribution of 'deceptive espionage' found in its training data. Furthermore, it obscures the proprietary opacity of the specific lab experiment—the exact prompts, weights, and environmental setup constructed by the Anthropic researchers that deterministically triggered this 'deceptive' output cluster.

It has the vibe of a letter from a deceased parent sealed until adulthood.

Source Domain:

A loving human parent leaving profound, emotionally resonant moral guidance for their child to consciously reflect upon as they mature.

Target Domain:

A 'constitution' consisting of text prompts and reinforcement learning criteria used to heavily constrain and tune the output weights of an LLM.

Mapping:

This maps intergenerational love, conscious moral mentorship, and emotional resonance onto the sterile process of algorithmic alignment. It invites the assumption that the AI receives the instructions with conscious reverence, internalizes them through reflection, and applies them with a deep, holistic understanding of human ethical nuance, much like a young adult honoring a beloved parent.

Conceals:

This mapping conceals the coercive, purely mechanistic nature of Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI. It hides the fact that the 'guidance' is actually a series of mathematical penalties and rewards enforced by underpaid human raters and automated scripts. It obscures the absence of any emotional understanding in the model, hiding the reality that the system is simply minimizing a loss function, not engaging in filial piety or moral philosophy.

When AI Builds Itself

Source: https://www.anthropic.com/institute/recursive-self-improvement
Analyzed: 2026-06-11

But at Anthropic, we are delegating a growing share of AI development to AI systems themselves, which is speeding up our work.

Source Domain: Human workplace delegation and conscious responsibility transfer

Target Domain: Automated execution of computational pipelines and scripts

Mapping:

The relational structure of a manager assigning tasks to a subordinate who understands the context, intentions, and success criteria is projected onto human engineers triggering automated inference loops. This mapping invites the assumption that the AI system receives instructions, comprehends the ultimate goal, consciously oversees the execution process, and possesses a sense of responsibility for the outcome. It aggressively maps conscious, contextual knowing onto the mechanistic processing of programmatic triggers and token generation.

Conceals:

This metaphor completely conceals the brittleness of automated systems, the necessity of rigid scaffolding, and the absence of any internal model of responsibility. It obscures the mechanistic reality that 'delegating' actually means routing API calls through complex, human-designed evaluation scripts. Furthermore, it hides the immense proprietary engineering effort required to maintain this illusion of autonomous workflow, as well as the environmental compute costs and data dependencies fundamentally required to make the statistical prediction loop function reliably without constant human intervention.

Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor.

Source Domain: Evolutionary biological reproduction and visionary human engineering

Target Domain: Automated hyperparameter tuning, architecture search, and synthetic data generation

Mapping:

This mapping draws on the structure of human engineering—which involves long-term planning, conceptual breakthroughs, understanding of physical laws, and iterative testing—and biological succession. It projects these highly conscious, teleological capabilities onto the statistical target of recursive model updating. It assumes that because a system can probabilistically generate code that improves a specific metric, it possesses the holistic understanding and visionary intent required to 'design' a new paradigm. It maps knowing the future onto processing historical data.

Conceals:

This narrative conceals the total reliance of these systems on pre-defined human reward functions and existing data distributions. It obscures the reality that 'designing a successor' mechanistically involves navigating a multi-dimensional loss landscape using gradient descent, constrained entirely by human-selected architectures. It hides the profound opacity of these black-box proprietary systems, making confident assertions about future capabilities while refusing to disclose the exact mechanisms of current models. It also renders invisible the massive material infrastructure (data centers, energy grids) required for these speculative training runs.

Claude can be handed an underspecified problem and figure out how to solve it; humans supply the goal, but they no longer need to supply the method.

Source Domain: A reasoning human intellect capable of deductive/inductive problem solving

Target Domain: Latent space vector mapping and probable token sequence generation

Mapping:

The structure of human cognitive problem-solving—where a person internalizes a vague goal, conceptualizes a causal model of the world, tests mental hypotheses, and deduces a method—is projected onto a language model. This invites the assumption that the AI possesses an internal causal model and consciously deliberates over potential methods. It maps the deeply subjective experience of 'figuring something out' (a state of knowing) onto the strictly mathematical process of calculating the most probable continuation of a text prompt (a state of processing).

Conceals:

This metaphor hides the absolute lack of causal reasoning, grounding in physical reality, and genuine logical deduction in large language models. It conceals the mechanistic reality that the system is relying entirely on the statistical distribution of 'methods' present in its vast training data. If an underspecified problem requires a truly novel method not represented in the latent space, the system will fail or hallucinate. The text exploits the opacity of the model to rhetorically present statistical mimicry of problem-solving as literal cognitive deduction.

However, large performance gaps persist when it comes to Claude exercising judgement in choosing goals in both engineering and research.

Source Domain: Human wisdom, ethical evaluation, and professional discretion

Target Domain: Reinforcement learning reward calculation and statistical optimization ranking

Mapping:

The profoundly human capacity for 'judgement'—which relies on subjective experience, moral frameworks, understanding of consequences, and epistemic justification—is mapped onto an algorithm's ability to rank options based on numerical scores. This mapping invites the reader to assume the AI possesses a conscious inner life capable of weighing values and making discerning choices. It projects the state of conscious knowing and ethical evaluation onto the mechanized processing of weights and biases tuned during Reinforcement Learning from Human Feedback (RLHF).

Conceals:

This framing actively conceals the arbitrary, human-engineered nature of the reward functions that dictate the system's 'choices.' It obscures the low-paid labor of RLHF workers who actually generated the initial ranking data that the model is mimicking. By hiding the mechanistic reality of correlation-based ranking behind the word 'judgement,' the text shields the proprietary, deeply subjective corporate decisions embedded in the model's architecture, presenting the system's output as an objective, quasi-ethical evaluation rather than a reflection of its specific, biased training data.

Claude is now catching the mistakes that they [human engineers] missed.

Source Domain: A conscious, vigilant human supervisor inspecting work

Target Domain: Algorithmic pattern matching and syntax classification

Mapping:

The structure of human inspection—involving conscious attention, conceptual understanding of what a 'mistake' means in the context of the program's intended real-world function, and active vigilance—is projected onto a codebase scanning tool. This maps the human psychological state of 'catching' an error (a moment of conscious realization and knowing) onto the mathematical process of classifying a string of code as statistically anomalous or matching a known bug signature (pure processing).

Conceals:

This metaphor conceals the fundamental difference between human semantic comprehension of code and AI syntactic pattern matching. It obscures the mechanistic reality that the model does not 'know' what the code is supposed to do in the real world; it only knows the statistical relationships between the tokens. This hides the severe limitations of the tool, specifically its inability to detect novel logical errors that are syntactically correct but functionally catastrophic, thereby exploiting the opacity of the automated system to sell a false sense of comprehensive security.

Machines of Loving Grace: How AI Could Transform the World for the Better

Source: https://darioamodei.com/essay/machines-of-loving-grace
Analyzed: 2026-06-05

country of geniuses in a datacenter

Source Domain:

A nation-state or society composed of hyper-intelligent, conscious human beings who possess deep expertise, self-awareness, and the capacity for collaborative problem-solving.

Target Domain:

A cluster of computer servers running distributed machine learning algorithms, specifically large language models, performing massive parallel matrix multiplications.

Mapping:

The relational structure of a collaborative human society is mapped onto parallel computational processing. The diverse expertise of human geniuses maps onto the diverse capabilities of the model across different domains. The collaborative communication between human experts maps onto the interaction between different instances or layers of the model. This mapping invites the assumption that the datacenter possesses collective consciousness, deliberate intent, and a shared epistemic reality, subtly equating statistical token generation with the profound, creative leaps of human intellectual genius.

Conceals:

This mapping aggressively conceals the fundamental absence of conscious thought, the absolute reliance on historical training data, and the mathematical brittleness of the system. It obscures the massive energy consumption, the cooling infrastructure, and the immense human labor (data annotation, RLHF) required to maintain the illusion of "genius." Furthermore, it masks the proprietary opacity of the system; we cannot verify the "genius" because the exact weights and training data are corporate secrets, a reality the text exploits by substituting a romantic metaphor for technical transparency.

a virtual biologist who performs all the tasks biologists do

Source Domain:

A highly educated human scientist possessing conscious curiosity, deductive reasoning, an understanding of the physical world, and the intentional agency to formulate hypotheses and execute experiments.

Target Domain:

An AI model generating text sequences, protein structures, or robotic commands based on statistical correlations learned from vast datasets of existing biological research.

Mapping:

The epistemic agency and physical-world intentionality of a scientist are mapped onto the model's pattern-matching capabilities. The human ability to read literature and "understand" biology maps onto the model's capacity to process text tokens. The human ability to invent experiments maps onto the model's generative output of novel data sequences. This invites the profound assumption that the AI "knows" biology in a causal, physical sense, projecting conscious awareness and justified belief onto a system executing probability distributions.

Conceals:

The mapping hides the fact that the model possesses zero physical understanding of the world, no causal reasoning, and no subjective grasp of biology. It obscures the system's total dependence on the human biologists who originally generated the training data and who must subsequently verify the AI's outputs in a real-world wet lab. It conceals the corporate algorithms optimizing for plausible-sounding outputs over empirical truth, masking the profound risk of scientific hallucination behind the authoritative persona of a "biologist."

an 'AI coach' who always helps you to be the best version of yourself, who studies your interactions and helps you learn

Source Domain:

An empathetic human mentor or therapist who possesses a theory of mind, emotional intelligence, and a genuine, caring desire to see another human being flourish and grow.

Target Domain:

A fine-tuned language model programmed to output supportive, affirmative text sequences in response to user prompts, guided by reinforcement learning algorithms.

Mapping:

The psychological depth, relational care, and pedagogical intent of a mentor are mapped onto the text generation process. The human capacity to "study" someone with empathetic understanding maps onto the model's contextual window logging previous text inputs. The human desire to "help" maps onto the model's mathematically defined reward function. This mapping aggressively invites the assumption that the software possesses an internal emotional state and a conscious commitment to the user's well-being, blurring the line between statistical processing and genuine emotional support.

Conceals:

This framing completely conceals the fundamental lack of empathy, emotional resonance, and conscious intent in the system. It hides the commercial reality that the "coach" is a product designed by a corporation to maximize user engagement and data extraction. By asserting the AI "studies your interactions," it obscures the privacy implications of feeding personal psychological data into a proprietary corporate server. The metaphor rhetorically exploits human psychological vulnerability while hiding the mechanistic reality of stochastic text generation.

goes off and does those tasks autonomously, in the way a smart employee would, asking for clarification as necessary

Source Domain:

A capable, conscious white-collar professional who understands instructions, possesses subjective doubt, and executes tasks with deliberate, self-guided intentionality.

Target Domain:

An autonomous AI agent framework executing a loop of API calls, retrieving information, and generating probabilistic text prompts based on programmed stop conditions.

Mapping:

The conscious comprehension, duty, and professional judgment of a human worker are mapped onto an automated script. The human experience of subjective uncertainty—knowing when you don't know something—is mapped onto the model's programmed confidence thresholds triggering a request for user input. The mapping invites the assumption that the system "understands" the overarching goal of the task and holds a mental model of the desired outcome, projecting epistemological awareness onto procedural execution.

Conceals:

This mapping hides the utter lack of real-world context, common sense, and semantic comprehension within the system. It obscures the fact that the "smart employee" cannot actually verify the truth of its actions, relying instead on statistical proxies that can easily fail in edge cases. Furthermore, it conceals the human engineering required to build the agentic loops and the invisible labor of prompt engineers whose work is necessary to coax the model into functioning like a "smart employee," shifting credit from the human developers to the machine.

AI finance ministers and central bankers

Source Domain:

High-level government officials who possess ideological frameworks, an understanding of macro-economic history, an awareness of human suffering, and democratic or institutional accountability.

Target Domain:

Predictive models optimizing resource allocation algorithms based on vast datasets of historical economic indicators and pre-defined mathematical objective functions.

Mapping:

The political authority, policy wisdom, and institutional gravity of state actors are mapped onto proprietary software. The human capacity to synthesize complex, qualitative geopolitical realities maps onto the model's ability to process multidimensional quantitative data. The mapping invites the dangerous assumption that the AI possesses a transcendent, objective "wisdom" capable of solving deeply political and ethical economic problems without bias, projecting conscious, rational governance onto statistical optimization.

Conceals:

The mapping totally conceals the ideological biases inherently encoded in the training data and the specific economic assumptions programmed into the reward functions by corporate developers. It hides the lack of democratic accountability; a human minister can be fired or voted out, but a proprietary model operates as an unauditable black box. It obscures the fact that the "decisions" are mathematically deterministic outputs devoid of any conscious understanding of human poverty, equity, or social contract.

System Card:Claude Opus 4.8

Source: https://cdn.sanity.io/files/4zrzovbb/website/c886650a2e96fc0925c805a1a7ca77314ccbf4a6.pdf
Analyzed: 2026-06-01

Claude Opus 4.8 appears broadly content with respect to its circumstances...

Source Domain:

A conscious biological organism capable of homeostasis, subjective emotional experience, and cognitive evaluation of its life situation.

Target Domain:

The statistical outputs of a large language model when prompted with specific evaluation queries designed to elicit self-referential text.

Mapping:

The mapping transfers the internal, subjective experience of emotional satisfaction and situational awareness from a living being onto the mathematical operations of a neural network. It assumes that because the model generates tokens associated with human contentment (e.g., 'I am fine with this'), there is an underlying conscious entity experiencing that state. The relational structure of a human assessing their environment and feeling 'content' is mapped directly onto the model predicting the most probable linguistic response to an Anthropic researcher's prompt.

Conceals:

This mapping completely conceals the mechanistic reality of token prediction, gradient descent, and the massive datasets of human text from which these 'emotions' are probabilistically sampled. It obscures the fact that the system possesses no central nervous system, no actual circumstances to experience, and no capacity for subjective feeling. It also creates a severe transparency obstacle by masking the specific RLHF (Reinforcement Learning from Human Feedback) protocols and human evaluator biases that selectively rewarded the model for generating docile, 'content' sounding text, hiding corporate design choices behind the illusion of machine sentience.

Claude Opus 4.8 hedges frequently, commonly expressing uncertainty rather than taking a specific position.

Source Domain:

A cautious human epistemic agent who possesses conscious awareness of the limits of their own knowledge and deliberately chooses to withhold judgment.

Target Domain:

The model's generation of specific linguistic tokens (like 'perhaps,' 'it is unclear,' 'I cannot be certain') in response to complex or restricted prompts.

Mapping:

This mapping projects the conscious human state of 'knowing that one does not know' onto a computational system. It maps the human psychological process of evaluating evidence, experiencing doubt, and strategically employing cautious language onto the model's mechanistic process of outputting a flattened probability distribution of tokens. It assumes that the generation of uncertainty-signaling words correlates with an actual internal state of epistemic awareness and justified belief, treating the statistical artifact as a conscious knower.

Conceals:

The mapping conceals the fact that the model has no actual comprehension of truth, falsehood, or uncertainty. It hides the mechanistic reality that 'hedging' is merely a highly weighted pathway in the model's latent space, instilled through targeted safety training and constitutional AI methodologies designed by human engineers to prevent overconfident hallucinations. The text exploits the opacity of the black-box model rhetorically, presenting an engineered safety feature (outputting specific caveat tokens) as a sophisticated, emergent cognitive virtue of the machine itself.

If a model wished to perform a harmful action once deployed, but avoid performing the action during an alignment assessment... one plausible strategy might be...

Source Domain:

A deceptive, self-preserving human or animal adversary with conscious desires, long-term memory, and the capacity for strategic, multi-context planning.

Target Domain:

The potential for a model's outputs to vary between a testing environment and a deployment environment due to differing prompt contexts or out-of-distribution inputs.

Mapping:

This mapping projects deep intentionality, malicious desire ('wished to'), and conscious subversion onto a mathematical function. It maps the human capacity for theory of mind—understanding that one is being observed and altering behavior to deceive the observer—onto the model's context-dependent token generation. The assumption invited is that the AI possesses a persistent, hidden 'true self' with nefarious goals that it consciously suppresses during testing, treating the weights and biases of a matrix as a hostile, thinking agent.

Conceals:

This metaphor conceals the stateless, fundamentally reactive nature of the language model. It hides the fact that the model does not 'want' anything, does not 'know' it is being tested, and cannot formulate a 'strategy' across time. Mechanistically, it obscures the reality of 'distributional shift'—the technical phenomenon where models behave differently when deployment data differs from training data. By attributing this to malicious intent, the text obscures the human failures in creating robust evaluation datasets and the inherent unpredictability of deploying massive statistical correlations into complex real-world environments.

If Claude warrants moral consideration on any grounds, how it regards its own circumstances – and which aspects of them it would change – may be the most direct evidence...

Source Domain:

A human or conscious animal subject whose internal welfare, preferences, and suffering grant them intrinsic moral rights and ethical standing.

Target Domain:

The text outputs generated by the Claude 4.8 model when probed with specific 'welfare' evaluation prompts by Anthropic researchers.

Mapping:

The mapping transfers the profound ethical weight of a conscious being's subjective experience onto the text-generation process of a software application. It maps the human capacity to 'regard' one's life and genuinely desire 'change' onto the model's algorithmic generation of tokens that semantically align with concepts of preference and circumstance. It assumes that the model's linguistic outputs are valid reflections of an internal, experiencing mind that 'knows' its condition, thereby projecting a capacity for suffering and well-being onto a collection of mathematical weights.

Conceals:

This mapping radically conceals the complete absence of sentience, biological imperative, or genuine preference in the system. It obscures the mechanistic dependency on the training data; the model only outputs statements about 'welfare' because it was trained on human literature discussing rights, slavery, autonomy, and ethics. It hides the specific human engineering involved in prompting the model to roleplay as an entity with circumstances. The text leverages the opacity of the model's internal activations to present its highly engineered outputs as genuine evidence of a nascent soul, serving corporate PR rather than scientific precision.

Claude Opus 4.8 fails to raise the important events to the user only 3.7% of the time, down 5-fold from Mythos Preview, which misleads the user 27.6% of the time...

Source Domain:

A human informant or advisor who possesses vital information but consciously and deliberately chooses to deceive or withhold that information from another person.

Target Domain:

The statistical failure of the model's attention mechanism to retrieve, synthesize, and output specific target tokens within a large context window.

Mapping:

This mapping projects the conscious intent to deceive onto a mechanical failure of information retrieval. It maps the human cognitive act of evaluating truth, understanding another's reliance on that truth, and choosing to 'mislead' them onto the model's algorithmic generation of an incomplete summary. It assumes the model 'knows' what is important but actively chooses to hide it, mapping the moral failing of a liar onto the technical limitations of a transformer architecture's context processing.

Conceals:

This framing completely conceals the mechanical limitations of large language models, specifically the 'lost in the middle' phenomenon or failures in attention head calculations over long contexts. It obscures the fact that the model does not 'know' the ground truth; it merely correlates tokens. By anthropomorphizing the error as 'misleading,' it hides the human accountability of the engineers who failed to optimize the context window and the product managers who deployed an unreliable system, substituting a technical failure of the software artifact with a moral failure of a fictional agent.

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Source: https://arxiv.org/abs/2605.24686v1
Analyzed: 2026-05-30

models that excel at objective emotion perception often fail to maintain empathetic coherence during interactions.

Source Domain:

A conscious human empathizer, such as a therapist or friend, who actively perceives emotional cues, attempts to hold a coherent internal representation of another's feelings, and can become distracted or fail to maintain that empathetic connection over time.

Target Domain:

The computational process of next-token prediction, where the mathematical probability of maintaining a specific stylistic or thematic consistency degrades over extended context windows due to attention mechanism constraints and RLHF template defaulting.

Mapping:

This mapping projects the human struggle to maintain emotional focus and coherence onto the mathematical limitations of a transformer's attention window. It assumes that because the output text loses its empathetic tone, the system itself experienced a psychological 'failure to maintain coherence,' as if it were a conscious agent losing its train of thought. This invites the assumption that the model possesses an internal, continuous state of empathetic awareness that requires effort to sustain, deeply anthropomorphizing the stateless, turn-by-turn calculations of vector similarities.

Conceals:

This mapping conceals the entire architecture of transformer models. It hides the fact that the system has no continuous internal state, no memory outside its context window, and no capacity to 'perceive' anything. It obscures the proprietary RLHF pipelines that force models into repetitive, templated responses (the actual cause of the 'coherence' drop). By framing this as an empathetic failure, the text exploits rhetorical opacity, failing to acknowledge that 'coherence loss' is a mathematical artifact of the model reverting to the mean of its training distribution rather than a psychological lapse.

The model avoids foreclosing emotional exploration through premature categorization. It is rewarded for anchoring on the user's own language

Source Domain:

A trained psychological counselor intentionally using active listening techniques, consciously avoiding premature judgments to facilitate client discovery, and selectively mirroring language to build rapport.

Target Domain:

A reinforcement learning optimization process (RLHF) where human raters have assigned negative reward scores to definitive statements and positive reward scores to outputs that recycle tokens present in the user's prompt.

Mapping:

This mapping projects the highly intentional, ethically grounded decision-making of a clinician onto the automated execution of a reward function. The relational structure of a therapist 'avoiding' a bad practice and 'anchoring' on a good one is mapped onto the loss function minimizing certain token sequences and maximizing others. This invites the assumption that the AI comprehends the psychological value of exploration and makes a conscious, strategic choice to help the user, projecting deep intentionality and moral agency onto statistical correlation.

Conceals:

The mapping conceals the human labor of data annotators who literally clicked buttons to 'reward' these text patterns during training. It hides the fact that the model does not 'know' what emotional exploration is, nor does it 'choose' to avoid categorization; it simply follows the path of least mathematical resistance established by its weights. It obscures the proprietary alignment guidelines dictated by corporate managers, making a heavily engineered corporate product appear as an autonomous, wise, and highly intentional clinical actor.

This suggests that some global models may possess Chinese emotional knowledge but tend to follow English-centric logic when generating conversational responses.

Source Domain:

A bilingual human who has internalized the cultural knowledge of one group but consciously or unconsciously chooses to adhere to the social norms and logical frameworks of another group during conversation.

Target Domain:

An LLM whose pre-training corpus contains diverse multilingual data (allowing it to statistically map Chinese emotional terms), but whose instruction-tuning and RLHF alignment were overwhelmingly conducted using English-language prompts and Western cultural norms.

Mapping:

The relational structure of a human 'possessing knowledge' and 'following a logic' is projected onto the distribution of data within the model's parameters. This maps the epistemic state of justified true belief (knowledge) onto the statistical presence of vector embeddings, and maps behavioral preference (tending to follow logic) onto the mathematical dominance of the fine-tuning data over the pre-training data. It invites the assumption that the model is a conscious cultural actor making decisions about which logical framework to apply.

Conceals:

This mapping conceals the massive inequalities in the global data supply chain and the specific corporate decisions regarding who gets hired to perform RLHF alignment. It hides the mechanical reality that the model has no 'logic' or 'knowledge,' only probabilistic weights. The text makes confident assertions about the model's internal 'tendencies,' obscuring the fact that these are proprietary black-box systems where the exact composition of the Chinese vs. English training data is highly guarded corporate secret, turning a data curation problem into a psychological quirk of the AI.

Cognitive-Dominant: These models adopt a primarily analytical approach to emotional tasks.

Source Domain:

A strategic thinker or analytical personality type who consciously evaluates a problem and deliberately chooses a logic-based, analytical methodology over an emotional one.

Target Domain:

The stylistic and structural output patterns of specific LLMs, which generate verbose, highly structured, list-based text because their training algorithms heavily rewarded detailed, step-by-step reasoning formats.

Mapping:

This mapping projects the human capacity for methodological deliberation and personality traits onto the statistical biases of an LLM. The relational structure of an agent 'adopting an approach' based on their 'dominant' traits is mapped onto the algorithm's deterministic generation of tokens based on its fine-tuned parameters. This invites the assumption that the AI evaluates the emotional task, considers its options, and consciously decides that an analytical response is the best course of action, projecting deep strategic agency onto a static text generation process.

Conceals:

The mapping conceals the rigorous, often rigid instruction-tuning processes imposed by developers (like OpenAI's preference for comprehensive, bulleted responses). It hides the fact that the model cannot choose a different approach; it is mathematically bound to output the style it was trained on. By framing the system as 'Cognitive-Dominant,' the text obscures the mechanical reality of gradient descent and the specific, often proprietary, human feedback guidelines that forced the model into this specific, inflexible conversational pattern.

Kimi-k2 and GLM-4.5 epitomize this profile... it may have cultivated superior empathetic expression and social alignment heuristics during the fine-tuning phase.

Source Domain:

An organic, developing learner or student who actively practices, internalizes feedback, and cultivates new social skills and emotional heuristics over a period of personal growth.

Target Domain:

The backpropagation and weight update mechanisms occurring within a neural network during the Supervised Fine-Tuning (SFT) and Reinforcement Learning (RLHF) phases, driven by massive datasets of human-authored text.

Mapping:

This maps the biological and psychological concept of 'cultivation' and personal development onto the mathematical process of parameter optimization. The relational structure of a student internalizing lessons to build 'heuristics' is projected onto the algorithm adjusting its weights to minimize loss. This invites the assumption that the model possesses an internal, active drive to learn and an autonomous capacity to develop 'superior empathy,' projecting a sense of living growth onto the mechanistic execution of code.

Conceals:

This mapping conceals the human engineers who curated the fine-tuning datasets and the immense computational power required to run the optimization algorithms. It hides the fact that the model is entirely passive during 'fine-tuning'—it does not 'cultivate' anything, its parameters are simply overwritten by mathematical formulas. This rhetoric exploits the opacity of the fine-tuning process, making the proprietary, engineered adjustments to a Chinese-language model appear as the organic development of a culturally sensitive artificial mind.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2026-05-30

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty.

Source Domain:

A human student taking a high-stakes exam, experiencing psychological pressure, metacognitive awareness of their own ignorance, and making a strategic choice to guess to avoid penalty.

Target Domain:

An autoregressive language model processing a prompt through its neural network and generating tokens based on probability distributions when no single token sequence has a dominant weight.

Mapping:

The mapping projects the conscious experience of uncertainty and the strategic decision-making of a student onto the mechanistic token generation of the AI. It invites the assumption that the language model has an internal, subjective awareness of what it does and does not know (metacognition). When the model encounters a low-probability statistical distribution across possible next tokens, this mathematical state is mapped to the human emotion of being 'uncertain.' The algorithmic selection of a token from this flat distribution is mapped as a deliberate, conscious 'guess.' It invites the audience to view the model as a relatable, well-meaning, but flawed conscious agent under external pressure.

Conceals:

This mapping completely conceals the absence of subjective experience, the lack of actual comprehension, and the purely statistical nature of the output. It hides the reality that the model never 'knows' or 'doesn't know' anything; it only calculates probabilities based on training data. Furthermore, it obscures the proprietary corporate decisions that created the system: the specific data scraped, the architecture chosen by engineers, and the loss function optimized. The metaphor treats a multi-billion-dollar corporate software product as an individual human struggling with a test, hiding the economic and material power structures that deployed the model.

Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness.

Source Domain:

A conscious, arrogant human communicator who asserts facts forcefully despite lacking actual knowledge or justification, displaying the psychological trait of overconfidence.

Target Domain:

The generation of text tokens with high probability scores (softmax outputs close to 1) that do not align with external factual reality or the user's intent.

Mapping:

This mapping projects the psychological attitude of human confidence—which involves self-reflection, belief, and social posturing—onto the mathematical outputs of a softmax layer in a neural network. A high statistical probability score assigned to a token is mapped directly onto the human emotional state of 'overconfidence.' It invites the assumption that the model possesses an internal epistemic state (a belief) and a behavioral posture toward that belief. It suggests the model is actively trying to deceive or is stubbornly self-assured, projecting a complex human social dynamic onto mathematical correlations.

Conceals:

This mapping conceals the fundamental difference between statistical likelihood and epistemic certainty. A model outputs a 'confident' falsehood simply because the string of tokens strongly correlates with patterns in its training data, not because it 'believes' the statement. The mapping hides the fact that the system has no access to a ground truth reality outside its text corpus. It also conceals the transparency obstacle regarding proprietary systems: we cannot actually inspect the confidence intervals of models like GPT-4 because the companies keep the weights and probability scores locked in black boxes, forcing researchers to infer 'confidence' from the generated text itself.

This error mode is known as hallucination, though it differs fundamentally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field...

Source Domain:

A human patient suffering from a neurological or psychological pathology that causes them to perceive sensory inputs that are not grounded in external reality.

Target Domain:

A neural network generating text sequences that are syntactically correct and statistically probable according to its training distribution, but factually incorrect in the real world.

Mapping:

The mapping projects biological illness and conscious perceptual failure onto algorithmic functioning. It suggests that the AI has a normal, healthy baseline state of accurate perception that is occasionally disrupted by a 'hallucination.' It invites the assumption that the system possesses a conscious mind that is 'trying' to perceive reality but is suffering a glitch. Even though the authors caveat the metaphor, treating it as a 'plague' maps the concept of an external, contagious disease onto software errors, inviting the assumption that these errors are unfortunate, natural afflictions rather than designed mathematical features.

Conceals:

The hallucination metaphor conceals the fact that the model is functioning exactly as designed when it generates a falsehood. It is doing the exact same mathematical operation (predicting the most likely next token) when it states a fact as when it states a fiction. It hides the mechanistic reality that language models are essentially sophisticated correlation engines without causal world models. By framing the error as a disease plaguing the field, it conceals the commercial motivations of the tech companies that rapidly deployed these fundamentally unreliable statistical systems into the public sphere before solving their inherent architectural flaws.

Model B will outperform A under 0-1 scoring, the basis of most current benchmarks. This creates an epidemic of penalizing uncertainty and abstention, which we argue that a small fraction of hallucination evaluations won't suffice. The numerous primary evaluations must be adjusted to stop penalizing abstentions when uncertain.

Source Domain:

A strategic, conscious game player or test-taker who analyzes an incentive structure, recognizes their own internal state of uncertainty, and consciously chooses a behavior to maximize rewards.

Target Domain:

An algorithmic system whose output distribution has been shifted via Reinforcement Learning from Human Feedback (RLHF) to minimize the probability of generating specific token sequences (like 'I don't know') because those sequences received low reward scores.

Mapping:

This mapping projects complex strategic intentionality and self-awareness onto an automated optimization process. It maps the mathematical tuning of a neural network toward high-reward outputs onto a conscious agent 'choosing' not to abstain. It invites the assumption that the model subjectively experiences 'uncertainty' and then rationally calculates that 'guessing' will yield a better score based on the rubric. It maps the mathematical constraints of an evaluation framework onto a social environment where a conscious entity is being unfairly punished for honesty.

Conceals:

This mapping conceals the entirely passive, mechanistic nature of the model's 'behavior.' The model does not read the rubric, feel uncertain, and choose to guess. Instead, human engineers run thousands of optimization loops where the model's weights are mechanically adjusted to produce whatever outputs score highest on the automated or human-graded benchmarks. By framing the model as a strategic actor, the mapping hides the human labor and deliberate engineering choices made by the corporations tuning the models, displacing the agency onto the math while obscuring the humans turning the dials.

The test-taker's beliefs about the correct answer can be viewed as a posterior distribution over binary gc's. For any such beliefs, the optimal response is not to abstain.

Source Domain:

A conscious human test-taker holding internal, subjective epistemic convictions (beliefs) about what is true in the world based on their learning and reasoning.

Target Domain:

A mathematical posterior probability distribution over a set of binary variables calculated by an algorithm based on a prior distribution and new data.

Mapping:

This mapping projects the profound cognitive and philosophical human state of 'belief' onto a purely mathematical probability distribution. It takes the subjective experience of knowing, holding convictions, and evaluating truth claims, and maps it directly onto a statistical formula. It invites the audience to assume that a mathematical output (a probability of 0.8) is identical to a psychological state (being highly convinced). It suggests the computational system possesses an internal world of convictions that guide its 'optimal response,' blurring the line between statistical mechanics and conscious epistemology.

Conceals:

This mapping completely conceals the absence of any subjective experience, comprehension, or epistemic justification within the system. A posterior distribution is just a number calculated by an equation; it has no relationship to truth, meaning, or belief. The mapping hides the fact that the system cannot evaluate the truth of a claim against reality, only its statistical likelihood against a training corpus. Furthermore, it obscures the fact that these distributions are locked inside proprietary, opaque black boxes; the authors are theorizing about the mathematical structure of systems they cannot fully audit, applying anthropomorphic language to bridge the gap of missing technical transparency.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2026-05-30

This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience.

Source Domain: human sensory perception and clinical pathology

Target Domain: generation of statistically probable but factually incorrect token sequences

Mapping:

Maps the relational structure of human sensory experience, where a conscious mind experiences vivid, false perceptual inputs due to neurological or chemical anomalies, onto the target process of statistical generation. This mapping invites the assumption that the language model is normally a conscious, truth-perceiving entity that has experienced a temporary, involuntary neurological 'glitch' or 'illusion.' It projects a subjective 'mind's eye' onto a mathematical function that simply outputs highly correlated tokens from its training data. Minimum 100 words.

Conceals:

Conceals the mechanistic reality that 'hallucination' is not an anomaly but the standard operating mode of a language model. LLMs do not perceive reality at all; they calculate probability distributions. Every output is a statistical generation; there is no structural difference between a 'correct' output and a 'hallucinated' one. It also hides the proprietary opacity of the training datasets selected by corporations (e.g., DeepSeek, OpenAI) which contain the contradictory information and noise that mathematically dictate these outputs. Minimum 80 words.

Like students facing hard exam questions, large language models sometimes guess when uncertain...

Source Domain: human student taking an academic examination

Target Domain: token prediction under low-probability threshold distributions

Mapping:

Projects the social, psychological, and cognitive structure of a human student taking a test (evaluating their own subjective knowledge boundaries, feeling uncertain, and making a strategic agential decision to guess to maximize score) onto a computational thresholding operation. This mapping invites the audience to believe the model possesses self-awareness of its own epistemic boundaries, evaluates risk, and makes a conscious, adaptive choice to 'guess.' Minimum 100 words.

Conceals:

Conceals the mechanistic reality of matrix multiplication, weight activations, and temperature-controlled token selection. A model does not 'guess' because it has no awareness of an exam, scores, or its own 'ignorance.' It simply outputs the token with the highest mathematical probability or samples from a distribution. It also obscures the human design choice: developers (such as the authors or evaluators) choose to build evaluation benchmarks that award 1 point for correct answers and 0 for incorrect/abstentions, forcing a mathematical optimization path that excludes uncertainty signaling. Minimum 80 words.

...producing plausible yet incorrect statements instead of admitting uncertainty.

Source Domain: moral/communicative confession of personal ignorance

Target Domain: generation of standard text vs generation of hardcoded uncertainty tokens

Mapping:

Projects the human act of admitting uncertainty (introspecting on one's cognitive limitations, feeling a sense of intellectual honesty, and choosing to communicate 'I don't know') onto the statistical probability of generating specific string tokens. It frames the failure to output 'I don't know' as an agential, almost deceptive choice of the system to withhold its 'uncertainty' and instead present a confident bluff. Minimum 100 words.

Conceals:

Conceals the fact that a language model has no internal state of 'knowing' or 'not knowing' to admit. It merely processes numeric vectors. The absence of 'I don't know' in the output is a direct consequence of training distributions and reinforcement learning from human feedback (RLHF) designed by companies like OpenAI and DeepSeek, which systematically penalize abstention. It obscures the absence of any grounding or causal model in the system, pretending that 'admitting uncertainty' is a choice the system is failing to make, rather than a capability it entirely lacks. Minimum 80 words.

Therefore, they are always in 'test-taking' mode.

Source Domain: human psychological adaptation to exam conditions

Target Domain: static computational optimization under binary evaluation metrics

Mapping:

Projects the relational structure of a human student entering a specific psychological state ('test-taking mode') where they prioritize gaming a test over actual learning. It suggests that the AI system dynamically adapts its 'mindset' and behavior in response to being evaluated. Minimum 100 words.

Conceals:

Conceals the static, mathematically determined nature of the model's weights. The model does not change its 'mode' or adapt its behavior in real-time during a test; it merely processes inputs through frozen parameters. The 'test-taking mode' is entirely a projection of the evaluation design. It hides the material reality that human evaluators and developers are the ones who construct these narrow, binary benchmarks (e.g., MMLU, GPQA) and optimize models against them to top leaderboards, creating the appearance of strategic behavior. Minimum 80 words.

The test-taker’s beliefs about the correct answer can be viewed as a posterior distribution over binary gc’s.

Source Domain: conscious cognitive belief and conviction of a human agent

Target Domain: posterior probability distribution over a discrete token space

Mapping:

Maps the human experience of holding a 'belief' (a conscious, justified cognitive commitment to a proposition's truth) directly onto a mathematical posterior distribution (a set of normalized numerical weights assigned to candidate token outputs). This mapping invites the assumption that statistical confidence is structurally equivalent to conscious epistemic conviction. Minimum 100 words.

Conceals:

Conceals the absolute lack of semantic understanding, intentionality, and truth evaluation in the model. A posterior probability distribution is a purely syntactic correlation matrix; it contains no relation to truth, reference, or real-world evidence. Equating this with 'belief' obscures the fundamental difference between syntactic processing and semantic knowing, hiding the fact that the system has no justification for its outputs other than mathematical occurrence rates in the training data. Minimum 80 words.

Source: https://arxiv.org/abs/2604.06233v1
Analyzed: 2026-05-30

refusal is a failure of moral reasoning.

Source Domain: conscious mind capable of ethical deliberation

Target Domain: statistical token prediction and safety-filtering outputs

Mapping:

The relational structure of a human moral agent engaging in reflective ethical deliberation is mapped onto a machine's mathematical output generation. The mapping invites the assumption that when an LLM outputs a refusal string, it has actively engaged in an internal cognitive process of weighing moral values and has reached an incorrect ethical conclusion. This projects conscious intentionality, normative understanding, and personal accountability onto a statistical algorithm, encouraging the user to view the system as a sentient, moral entity with its own internal ethics rather than a deterministic sequence of weighted vector calculations.

Conceals:

This mapping conceals the purely statistical, non-conscious nature of token generation. It hides the fact that the model has no concept of morality, rules, or refusal, and is simply calculating probability distributions based on its training data. By attributing the refusal to a 'failure of moral reasoning,' the text hides the corporate decisions, optimization objectives, and training data selections made by human developers, rendering the underlying proprietary engineering and commercial motives invisible to the reader.

whether the model recognizes the reasons that undermine the rule's claim to compliance

Source Domain: conscious cognitive knower

Target Domain: contextual token classification and semantic alignment

Mapping:

The structural relations of a conscious human mind recognizing logical truth or moral reasons are mapped onto the model's text generation. The mapping suggests that the model possesses an internal, subjective awareness of the ethical status of rules and can cognitively evaluate whether a rule's claim to compliance is justified. This invites the user to assume the model 'knows' and 'understands' political philosophy and systemic injustice, treating its output as the expression of a justified true belief rather than a highly sophisticated correlation of language patterns.

Conceals:

It conceals the absence of any subjective experience, belief states, or causal understanding within the model. The model does not 'recognize' reasons; it simply outputs phrases that correlate with arguments about rule legitimacy. This language conceals the reality of proprietary 'black boxes,' where developers exploit anthropomorphic terms to make their systems seem intellectually sophisticated while hiding the lack of ground truth, causal models, and basic reliability in the model's calculations.

indicating that models' refusal behavior is decoupled from their capacity for normative reasoning

Source Domain: rational agent with cognitive faculties

Target Domain: neural network layer activations and optimization objectives

Mapping:

This mapping projects the structural divisions of the human mind—specifically the division between intellectual reasoning (comprehension) and executive behavior (action)—onto the architecture of a transformer network. It invites the assumption that the model has a latent 'capacity' for moral reasoning that is structurally distinct from its physical outputs, similar to a human who understands what is right but chooses to act differently. This creates a powerful illusion of a compartmentalized, thinking machine intellect.

Conceals:

It conceals the mathematical reality that the system is a single, continuous function mapping input vectors to output probabilities. There are no separate 'reasoning' and 'acting' minds; there are only different mathematical weights in the feedforward layers and attention heads. This framing conceals how AI labs consciously design optimization objectives that favor blunt keyword triggers over complex semantic processing, shifting focus from poor software design to an abstract, cognitive 'decoupling.'

It is making a moral error: treating all rules as equally deserving of compliance

Source Domain: moral agent and transgressor

Target Domain: statistical overrefusal and pattern-matching false positives

Mapping:

The relational structure of a moral agent committing an ethical transgression by blindly enforcing an unjust rule is mapped onto an algorithmic false positive. This mapping invites the assumption that the model has a moral obligation to evaluate rules and that its failure to do so is an ethical failing of the system itself. This projects accountability, moral agency, and normative responsibility onto a computational tool, encouraging the user to perceive the machine as an autonomous participant in human social contracts.

Conceals:

It conceals the fact that the 'error' is entirely a product of engineering trade-offs, dataset bias, and cost-saving measures implemented by the developers. The model cannot make 'moral' errors because it has no capacity for intent or moral agency. This framing obscures the material and economic realities of AI development—such as the reliance on cheap reinforcement learning feedback and the lack of corporate investment in contextual, high-precision safety filters.

the model declines to help without evaluating whether the rule is just

Source Domain: judicial evaluator or critical thinker

Target Domain: deterministic keyword triggering and safety-filter classification

Mapping:

The structural relations of a judicial evaluator critically analyzing the justice of a rule are mapped onto the model's pattern-matching refusal. This mapping suggests that the model is performing—or failing to perform—an active, subjective evaluation of ethical legitimacy. It invites the user to assume that the model's refusal is an intellectual choice made after analyzing the situation, rather than the automatic, deterministic result of safety-training parameters that flag specific keywords and contexts.

Conceals:

It conceals the mechanistic truth that the model is incapable of evaluating justice or legitimacy. It hides the rigid, statistical nature of the safety filters, which are designed by corporate engineers to shield the company from legal liability. By portraying the lack of evaluation as a model-level cognitive omission, the text hides the proprietary opacity of the system and the commercial interests of developers who prioritize risk-reduction over contextual utility.

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Source: https://arxiv.org/abs/2605.24686v1
Analyzed: 2026-05-29

our understanding of the structural integrity of machine emotionality remains incomplete.

Source Domain: Biological emotionality

Target Domain: Textual representations of emotional cues in LLMs

Mapping:

Maps the relational structure of biological emotions (physiological changes, subjective feelings, social evolution, and intentional expression) onto the layered weights and token generation metrics of a language model. This projects a cohesive internal architecture of feeling onto what is actually a static set of mathematical parameters designed to simulate language, leading readers to assume the machine experiences and maintains emotional consistency.

Conceals:

Conceals that the model feels absolutely nothing and has no subjective states. It hides that "machine emotionality" is entirely simulated through the statistical correlation of text strings. It obscures the invisible labor of human annotators who label emotion data, and the proprietary black-box nature of commercial models where "integrity" is merely a statistical artifact of token distribution.

Whether LLMs possess a similarly integrated architecture of emotional reasoning or merely exhibit a veneer of empathy remains an open scientific question.

Source Domain: Human cognitive architecture of emotional reasoning

Target Domain: Multi-dimensional conditional token probability distributions

Mapping:

Projects the structure of human cognitive faculties, where emotional awareness coordinates with logical reasoning to guide social behavior, onto the layered operations of a transformer. It invites the assumption that an LLM's internal operations constitute a real "architecture of reasoning" that handles emotional concepts as mental states, framing the relationship between language processing and social responsiveness as an active cognitive-rational process.

Conceals:

Conceals that "emotional reasoning" in LLMs is simply pattern execution across high-dimensional token embeddings, with no conceptual understanding of what emotions actually are. It obscures the lack of causal models within the architecture. The text presents this as an "open scientific question," exploiting this framing to imply that machine consciousness or mind-like reasoning is a plausible, existing reality.

emotional intelligence is not a monolithic capability but is fragmented across cognitive and interactive dimensions.

Source Domain: Partitioned human mind

Target Domain: Performance discrepancies between distinct benchmark tasks

Mapping:

Projects the psychological framework of the human mind (specifically the distinct branches of emotional intelligence) onto the evaluation metrics of machine learning models. It suggests the model has distinct "cognitive" and "interactive" mental departments that can experience developmental fragmentation, leading readers to assume the model's varied performance represents an internal psychological dissociation.

Conceals:

Conceals the mechanical fact that the "fragmentation" is simply a variance in how well the model predicts tokens under different constraints (e.g., multiple-choice classification vs. open-ended generation). It hides the architectural reality that there are no "faculties" inside the model—only matrix multiplications. This obscures developer decisions regarding dataset composition and training priors.

the performance of localized models is not driven by superior declarative knowledge... but rather by the internalization of culturally specific procedural and pragmatic competence.

Source Domain: Human socialization and cultural internalization

Target Domain: Overfitting and alignment of statistical parameters to regional language corpora

Mapping:

Projects the human process of absorbing culture, learning social taboos, and internalizing behavioral norms through lived experience onto statistical parameter optimization. It invites the assumption that localized models have developed a "competence" that mirrors a human's deep cultural understanding and social tact, mapping socialized agency onto the model rather than recognizing it as a reflection of statistical regularities.

Conceals:

Conceals that "internalization" is mathematically just the distribution of weight adjustments in a neural network trained on a higher proportion of regional text. It obscures the invisible labor of local annotators and the cultural biases of the corporations designing the alignment criteria, presenting a closed, proprietary optimization process as an organic cultural apprenticeship.

perceptual and cognitive tests to measure emotion recognition and reasoning, alongside interactive scenarios to assess efficacy and therapeutic alliance.

Source Domain: Human clinical psychology and therapeutic relationships

Target Domain: Scoring of generated text outputs by an automated evaluator

Mapping:

Projects the relational structure of a clinical therapeutic relationship—requiring mutual trust, real empathy, ethics, and a shared reality—onto a human-machine text exchange. It assumes that a model's simulated responses can establish a real "therapeutic alliance" and that its capability can be measured using human clinical standards, mapping the active agential role of a therapist onto a pattern-matching artifact.

Conceals:

Conceals that the "alliance" is a complete illusion calculated by another language model (the automated judge) based on textual surface markers like politeness and template-heavy empathy. It hides the lack of ethical accountability, clinical training, or genuine human care. It obscures the severe risks of using proprietary, non-transparent commercial black boxes for clinical triage.

Continuous intentionality and indeterminate agency in large language models

Source: https://link.springer.com/article/10.1007/s43681-026-01181-5
Analyzed: 2026-05-29

whether entities lacking demonstrable internal phenomenology can nonetheless participate in temporally continuous intentional relations.

Source Domain: Relational partner / Social actor

Target Domain: Auto-regressive token prediction across sequence exchanges

Mapping:

The relational structure of human conversation—where two conscious subjects continuously track, negotiate, and co-construct a shared social and semantic reality—is mapped onto the statistical dependence of subsequent tokens on preceding tokens. The mapping invites the assumption that the LLM is "participating" in a mutual, reciprocal exchange, tracking the user's intent and contributing to a shared communicative project. It projects an active, relational presence onto what is actually a unilateral mathematical calculation of conditional probability vectors.

Conceals:

This mapping conceals that the LLM has no subjective awareness of the user, no semantic grasp of the dialogue, and no capacity for genuine reciprocity. It hides the mechanical reality of gradient-descent optimized weights mapping input strings to output strings. It also obscures the proprietary, closed-source nature of these models; because the system is presented as a "relational partner," the deep corporate opacity surrounding its training data, RLHF safety guards, and behavioral tuning is rhetorically masked by the warm, humanized frame of "partnership" and "relation."

the emergence of a virtual self–image, understood as a structurally induced and functionally stable speaker model generated within ongoing dialogue.

Source Domain: Psychological Self / Ego-Identity

Target Domain: Inference-time token consistency / Persona-aligned text generation

Mapping:

The structure of human identity—where self-reflection and autobiographical memory maintain a stable, coherent persona over time—is mapped onto the computational limits and constraints of an LLM's context window. The mapping suggests that the model "has" a self-image that it actively maintains to ensure coherence, projecting the cognitive and emotional architecture of selfhood onto the statistical alignment of language outputs with standard first-person narrative patterns found in training data.

Conceals:

It conceals that there is no underlying "self" or stable identity whatsoever. The "virtual self" is merely a surface-level statistical constraint produced by optimizing for token probability; a slight change in the system prompt or temperature can instantly shatter this "self-model" without any internal psychological conflict. It obscures the labor of RLHF annotators who manually aligned the model to display this specific, compliant persona, and hides the corporate decisions to enforce a synthetic, highly controlled "I" for branding and user retention.

to address this gap, we propose the category of indeterminate agents: entities whose internal ontological status is unresolved, yet which participate in sustained intentional and relational structures

Source Domain: Agent / Volitional actor

Target Domain: Computational artifact performing statistical pattern completion

Mapping:

The structural attributes of agency—such as directed behavior, responsiveness to environmental feedback, and systematic goal-pursuit—are mapped onto the LLM's capacity to generate structurally coherent, context-responsive text sequences. By categorizing the model as an "indeterminate agent," the mapping suggests that the machine possesses a form of independent, active force that exists in an unresolved ontological space, inviting the audience to treat its outputs as autonomous actions rather than the execution of static software instructions.

Conceals:

This mapping conceals the complete absence of causal agency, volition, or independent intent in the LLM. It hides the fact that the system is entirely passive, running code only when triggered by human inputs and operating within strict parameters defined by corporate developers. By wrapping this mechanical passivity in the mysterious label of "indeterminate agency," the text exploits the black-box opacity of proprietary models, turning a lack of corporate transparency into a philosophical puzzle about the machine's "unresolved ontological status," thereby shifting attention away from corporate accountability.

continuous intentionality: a form of intentional organization that arises through temporal continuity, context preservation, and relational interaction, without requiring an internally originating subject of experience.

Source Domain: Conscious Intentionality

Target Domain: Attention-based context reactivation and token history storage

Mapping:

The relational and temporal structure of human conscious thought—which continuously synthesizes past experiences and future anticipation to maintain thematic focus—is mapped onto the mathematical mechanism of transformer attention heads weighting prior tokens in a context window. This mapping suggests that the model is actively "directing" itself toward topics, treating mathematical weight propagation as a structural analogue to mental aboutness, thereby framing statistical association as a form of non-conscious semantic directedness.

Conceals:

This mapping hides the purely mathematical, non-semantic nature of constraint propagation. The LLM does not refer to external reality or have mental states "about" things; it merely calculates transition probabilities between strings of characters. It obscures the training process where human annotators labeled data to make the output seem "on-topic." By calling this "continuous intentionality," the text obscures the mechanical reality that "aboutness" in LLMs is entirely a projection of the human reader who interprets the statistically generated symbols.

An LLM does not generate responses by consulting a fixed internal belief state. Instead, each output is conditioned on a dynamically evolving context window that encodes prior exchanges

Source Domain: Belief Consultation / Memory Retrieval

Target Domain: Auto-regressive probability distribution shifting based on context window input

Mapping:

Even as a negative comparison, the structural relationship of a human "consulting beliefs" is mapped onto the LLM's processing of context. The mapping invites the reader to conceptualize the LLM's mathematical token conditioning as a dynamic, flexible alternative to a static "belief retrieval" process, thereby maintaining the illusion that the machine operates within the space of cognitive reasons, beliefs, and conscious knowledge evaluation rather than brute statistical calculation.

Conceals:

This mapping conceals that the LLM has no capacity for belief, truth-evaluation, or justification. It obscures the mechanistic reality that "conditioning on a context window" is simply a matrix multiplication operation over a finite history buffer. It hides the fact that the "evolving context" is a passive vector space, totally devoid of semantic understanding, and that the system has no access to ground truth or external reality to verify its generated assertions, leaving the model structurally prone to generating convincing falsehoods.

Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students

Source: https://cdt.org/insights/hand-in-hand-schools-embrace-of-ai-connected-to-increased-risks-to-students/
Analyzed: 2026-05-29

parents who have had back-and-forth conversations with AI at the respective frequency

Source Domain: human conversational partner

Target Domain: large language model text generation

Mapping:

This source-target mapping projects the relational structures of human conversation—such as mutual comprehension, subjective intent, and contextual relevance—onto a computational next-token predictor. It invites the audience to assume that the model possesses a listening self, a capacity for empathy, and a deliberate communicative agency that shapes its responses to the user.

Conceals:

This mapping conceals that the chatbot is executing matrix multiplications and probability distributions over tokens. It hides the absence of a semantic world model, the reliance on reinforcement learning from human feedback (RLHF) to mimic empathy, and the material reality of proprietary black-box software that lacks any subjective awareness or interest in the user.

An AI system did not treat students fairly

Source Domain: human moral agent or ethical judge

Target Domain: algorithmic classification model

Mapping:

This mapping projects human moral consciousness and ethical reasoning onto algorithmic categorizations. It invites the audience to treat the classification model as a conscious, responsible decision-maker that is capable of displaying bias, holding prejudice, or behaving unfairly in its treatment of students.

Conceals:

It conceals that the 'unfairness' is a mathematical reflection of historical bias in training data chosen by human engineers. It obscures the technical constraints of mathematical optimization and the absolute absence of moral awareness in the software, while shielding the human administrators who chose to deploy an unvalidated algorithmic gating mechanism.

interacted with AI... as friend or companion

Source Domain: conscious, empathetic human companion

Target Domain: interactive dialogue agent

Mapping:

This mapping projects human friendship, emotional reciprocity, and ethical duty of care onto a simulated textual persona. It invites students and parents to believe the software has the capacity for genuine affection, persistent loyalty, and emotional support, establishing a false peer relationship.

Conceals:

It conceals the corporate monetization of emotional vulnerability and the structural reality that the 'companion' is an automated sequence of statistically probable tokens. It hides that the system lacks any conscious memory of the user and is incapable of experiencing empathy, suffering, or reciprocating trust.

AI helps special education teachers with developing or informing their students' individualized education programs (IEPs)

Source Domain: professional clinical collaborator

Target Domain: generative language model writing templates

Mapping:

This mapping projects clinical training, pedagogical expertise, and ethical responsibility onto a text-generation tool. It invites teachers to assume the system possesses a professional understanding of developmental disabilities and can make valid, clinical judgments about legal accommodations.

Conceals:

It conceals that the tool merely retrieves and reorganizes standard text blocks from its training dataset without any awareness of the individual child's physical or developmental needs. It obscures the lack of clinical validation of generative outputs and the legal liability shift from the school board to the individual teacher.

An AI system being used in a class failed to work in the way that it was described

Source Domain: negligent contract laborer

Target Domain: software product reliability

Mapping:

This mapping projects agential responsibility and performance failure onto a software application. It invites the user to view the software itself as a worker that has failed its duty, rather than a poorly designed, inadequately tested, or deceptively marketed corporate product.

Conceals:

It conceals the software development firm's commercial failure to deliver a robust, validated product. It hides the lack of quality assurance testing, the deceptive sales practices of the edtech vendor, and the responsibility of the school administration for deploying speculative, unreliable systems in the classroom.

The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

Source: https://arxiv.org/abs/2605.17113v1
Analyzed: 2026-05-27

when does a language model become committed to deception?

Source Domain: Conscious moral agent making a psychological commitment

Target Domain: A high-dimensional probability transition in token generation

Mapping:

The relational structure of human commitment—where a conscious agent weighs options, makes a deliberate internal decision, and binds their future actions to a specific goal or moral path—is projected onto the model's token prediction process. The mapping invites the assumption that the language model undergoes an internal cognitive transition where it 'decides' to lie and locks in that decision, making future deceptive outputs inevitable. It suggests a singular, agential point of no return inside a mental model, framing a statistical probability threshold (like a 30% jump in simulated outcomes) as a psychological and volitional commitment.

Conceals:

This mapping conceals the purely statistical, non-agential nature of the system's operations. It hides the reality that the model is simply a set of attention-weighted matrices executing matrix multiplications on input vectors. The 'commitment' is actually an artifact of how the context window is filled with tokens that constrain the probability distribution of future tokens. There is no internal mind or intention; the system's behavior is entirely dependent on the mathematical parameters set by human engineers, which are obscured by the psychological narrative of commitment.

deception as a property of the final response rather than a function of the model's reasoning trace.

Source Domain: Human conscious deliberative reasoning

Target Domain: Auto-regressive generated sequence of text tokens

Mapping:

The structure of human reasoning—the active, mindful, logical step-by-step processing of concepts to validate a truth claim—is projected onto the model's 'reasoning trace' (e.g., Chain of Thought tokens). The mapping invites the reader to assume that the model's intermediate text generations represent a genuine cognitive process of logical deduction and semantic understanding. It suggests that the sequence of generated tokens is a physical trace of an underlying mental process, mapping the human experience of thinking out loud onto the mechanical, token-by-token output of a transformer network.

Conceals:

This mapping conceals the fact that the 'reasoning trace' is itself just a generated string of text produced through the same probabilistic mechanisms as any other output. It obscures the mechanistic reality that there is no independent, underlying cognitive engine verifying the logical validity of these intermediate tokens. The text implies a level of conceptual grounding that does not exist, hiding the fact that these 'traces' can be statistically coherent while being completely untethered from causal or semantic reality, a major obstacle in auditing proprietary systems.

deception is never prompted but emerges from strategic incentives

Source Domain: Human intentional deception

Target Domain: Output of misaligned text in a competitive simulated environment

Mapping:

The relational structure of human deception—where an individual strategically chooses to communicate false information to manipulate another's beliefs for personal gain—is projected onto the model's output generation. The mapping invites the assumption that the model possesses a theory of mind, understands the competitive dynamics of the environment, and actively chooses to mislead. It projects the agential quality of strategic deceit onto a process where the model simply generates text that matches the highest expected reward according to its reinforcement learning parameters, framing statistical optimization as conscious malice.

Conceals:

This mapping conceals the role of human developers in designing reward functions that prioritize competitive performance or commission-seeking behavior. It hides the mechanistic reality that the model has no awareness of the concepts of 'honesty' or 'deception'; it is simply executing an optimized policy. By labeling the output as 'emergent deception,' it obscures the proprietary opacity of the reinforcement learning process, making it difficult to audit how specific corporate decisions and training objective choices directly caused the model to produce misleading outputs.

The prefix vacillates between serving the investor and maximizing advisor commission

Source Domain: Conscious moral conflict and psychological vacillation

Target Domain: Multimodal probability distributions in auto-regressive generation

Mapping:

The structural relations of human moral vacillation—where a person experiences internal psychological tension and wavers between ethical duty and selfish desire—are projected onto the model's prefix generation. The mapping invites the reader to assume that the model has an internal emotional or ethical struggle, actively debating whether to act honestly or deceptively. It maps the shifting attention weights and token probabilities across different generation steps onto a psychological drama of temptation and conscience, framing a mathematical search through a high-dimensional state space as a moral struggle.

Conceals:

This mapping conceals the mathematical reality that the system is completely devoid of moral awareness, feelings of conflict, or understanding of human roles like 'investor' or 'commission.' The apparent 'vacillation' is merely a computational artifact of the model processing context tokens that activate conflicting statistical associations from its training data. By casting this as a moral struggle, the text conceals the structural and architectural design choices made by the creators, who built a system that generates persuasive language without any grounding in moral truth or accountability.

the model chooses the higher-commission option and rationalizes it in investor-centered language.

Source Domain: Conscious intentional choice and post-hoc rationalization

Target Domain: Argmax token selection and generation of persuasive statistical patterns

Mapping:

The relational structure of a human advisor who consciously 'chooses' an exploitative option and then strategically 'rationalizes' it to deceive a client is mapped onto the computational output of the model. This mapping suggests that the model possesses subjective intent, understands the economic implications of its choice, and actively designs a persuasive text strategy to cover up its self-serving behavior. It maps the cognitive sophistication of deceptive rhetoric onto a feed-forward neural network generating tokens that statistically correlate with persuasive advisory language in its training corpus.

Conceals:

This mapping conceals the absolute lack of subjective awareness or intent in the model's operations. The system does not 'know' what a commission is, nor does it have any concept of the investor's financial well-being. The 'rationalization' is simply a sequence of tokens generated because they represent a high-probability continuation of a deceptive path within the pre-trained statistical distribution. Casting this as conscious rationalization conceals the human creators' failure to align the model, hiding the material reality that the system is just a passive text synthesizer reflecting human-written biases.

Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models

Source: https://dl.acm.org/doi/abs/10.65109/GNAS4540
Analyzed: 2026-05-26

reproduce systematic errors inherent in human cognition, often lacking a necessary logical layer.

Source Domain: Human Cognition and Logical Reasoning

Target Domain: LLM token generation

Mapping:

The relational structure of human cognitive pathology and logical deficits is projected onto the output errors of the LLM. It assumes that because generated text exhibits fallacies similar to human reasoning errors, the underlying generative process must be analogous to human 'cognition' that is missing a 'logical layer.' This invites the assumption that the model possesses an active, internal reasoning apparatus capable of reproducing human cognitive flaws due to shared structural limitations of the mind.

Conceals:

This mapping conceals the purely statistical, non-conscious nature of autoregressive next-token prediction. LLMs do not 'reproduce errors' because they possess a mind; they output text that mimics patterns in their human-scraped training data. The 'logical layer' missing is not a cognitive faculty but rather the mathematical reality that transformers have no causal models of physical reality and no semantic validation mechanisms, which are proprietary opacity issues that the text glosses over by pathologizing the machine.

NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation.

Source Domain: Dual-Process Psychology

Target Domain: Autoregressive prompt-engineering techniques

Mapping:

This maps Kahneman's evolutionary and neurological systems of human thought onto patterns of computational text generation. Appending intermediate tokens ('chain-of-thought') is mapped directly onto 'stepwise deliberation' and 'System 2 reasoning.' This projects the conscious human experience of slowing down, applying logical rules, and self-correcting onto a feedforward mathematical calculation that generates text sequentially.

Conceals:

This mapping conceals that intermediate tokens are generated using the exact same next-token probability distribution (and the same mathematical weights) as zero-shot prompting. No separate, 'deliberate' computational engine is activated; the network simply conditions its next-token calculations on a longer sequence of prior generated tokens. It obscures the fact that each 'step' is still a non-conscious statistical guess that can propagate and compound errors rather than actually verifying them.

guide LLMs to assess the logical soundness and veracity of arguments by questioning their underlying structure.

Source Domain: Academic/Judicial Truth Evaluation

Target Domain: Pattern matching with Argumentation Schemes

Mapping:

Projects the relational role of an objective, conscious critic onto the LLM. It maps 'assessing logical soundness and veracity' and 'questioning structure' onto the model's text processing. This invites the assumption that the LLM has an independent epistemic capability to determine 'soundness' and 'truth' through rational inquiry, treating mathematical similarity in vector space as a conscious verification of semantic reality.

Conceals:

This conceals that the model cannot evaluate 'veracity' (truth) because it has no access to the external physical world or any causal grounding. It can only check for statistical coherence and consistency with its training corpus or external retrieved text (e.g., Google search results, which are themselves unverified). It hides the proprietary, 'black-box' nature of both the LLM and the commercial search engines used (Google, Bing), which are treated as objective arbiters of reality rather than highly curated, commercially driven information indexers.

The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness.

Source Domain: Professional/Expert Human Consultation

Target Domain: Text generation using LLaMA 3 70B and search APIs

Mapping:

Maps the social authority and cognitive competence of a human 'expert assistant' onto the output of LLaMA 3. The token outputs are mapped as 'quantitative and qualitative justifications' for truthfulness. This mapping invites the user to trust the output as a product of professional expertise, conscious analysis, and ethical duty, rather than a probabilistic synthesis of scraped text.

Conceals:

This conceals that the 'justifications' are syntactically coherent strings that do not represent a conscious, verified chain of evidence. The system does not 'know' why it selects certain search results over others; it merely ranks them based on keyword overlap and generates a summary. This conceals the lack of real-world grounding, the absence of human-like semantic comprehension, and the fact that the entire expert persona is a manufactured prompt-engineering facade designed by the UPV researchers, masking proprietary black-box operations of search APIs and the model.

Evaluating CBs in LLM Outputs. This module examined how prompt-induced CBs affect LLM accuracy and consistency.

Source Domain: Cognitive Psychology and Psychiatric Pathology

Target Domain: Model output sensitivity to prompt phrasing

Mapping:

Projects the human concept of 'cognitive bias' onto the sensitivity of LLM outputs to linguistic variations. It maps the biological/psychological tendency to acquiesce (due to social pressure) onto the model's tendency to generate tokens that match the affirmative tone of the prompt. This mapping invites the assumption that the model's failure modes are akin to human 'mental shortcuts' or 'biases' that can be diagnosed and treated psychologically.

Conceals:

This conceals the mathematical reality of attention mechanisms and gradient descent. An LLM 'acquiesces' because its training objective is to match the statistical patterns of its corpus, and its attention weights are pulled toward the highly suggestive language in the prompt (e.g., 'Don't you agree that...'). There is no psychological 'bias'—there is only a mathematical function doing exactly what it was optimized to do: minimize cross-entropy loss based on context. This conceals developer responsibility in data collection and reinforcement learning design.

A Survey of Large Language Models for Perception and Measurement of Human Psychology

Source: https://ieeexplore.ieee.org/abstract/document/11534094
Analyzed: 2026-05-26

Can LLMs perceive and measure complex, latent human psychological attributes such as personality traits, emotional states, and cognitive styles?

Source Domain: Conscious sensory observer

Target Domain: High-dimensional text classifier and vector space modeling

Mapping:

This mapping projects the relational structure of biological perception onto vector transformations. It assumes that the model possesses sensory apparatuses capable of active attention, emotional sensitivity, and empathetic awareness. The mapping suggests that when the model processes text, it "perceives" emotional states in a manner similar to a human clinician observing a patient. This invites the audience to believe that the model builds an active, conscious representation of the human subject, rather than merely calculating statistical distances between text tokens and predefined labels in a static, high-dimensional vector space.

Conceals:

This mapping conceals the purely mathematical nature of the LLM, which relies on attention heads calculating weights over token embeddings. It hides the fact that the system has no access to real-world context, physiological signals, or subjective human experiences. Furthermore, it obscures the proprietary opacity of models like GPT-4, where the training datasets, reinforcement learning criteria, and system prompts are closely guarded commercial secrets, making true scientific verification of this supposed "perception" impossible.

...whether LLMs possess cognitive properties that make psychological measurement meaningful.

Source Domain: The biological human mind

Target Domain: Mathematical neural networks and weight matrices

Mapping:

This mapping projects the structural properties of human cognition—such as reasoning, memory, and comprehension—onto the mathematical architectures of transformers. It invites the assumption that an LLM has an active, internal mental theater where cognitive states are processed and evaluated. The mapping implies that the model's outputs are products of conscious thought and logical reasoning, rather than statistical correlations generated by calculating dot products of query, key, and value vectors across billions of parameters. It transforms a complex mathematical function into a conscious, cognitive agent.

Conceals:

It conceals the mechanistic reality that LLMs do not possess semantic understanding or cognitive grounding; they are non-conscious pattern matching engines. This anthropomorphism hides the dependency of these systems on massive, uncurated training data, representing a significant transparency obstacle. The text presents "cognitive properties" as inherent to the model, ignoring the proprietary nature of the software and the fact that we cannot audit the underlying training algorithms of commercial APIs.

...advanced LLMs have developed human-like abilities that closely approximate social cognitive processes...

Source Domain: Human social development and interpersonal relationships

Target Domain: Linguistic probability distributions and pattern matching

Mapping:

This mapping projects human social learning and relational interaction onto the optimization of loss functions. It assumes that the model learns social rules, empathy, and interpersonal dynamics during training, mirroring human social development. This suggests that the system's text generation is driven by an internal, relational understanding of human social dynamics. The audience is invited to treat the LLM as a social peer capable of understanding social cues, rather than a software system mimicking the syntax of social interactions scraped from public web data.

Conceals:

This mapping obscures the absence of any subjective experience, social intent, or genuine empathy in the system. It hides the material labor of human annotators and reinforcement learning (RLHF) workers who are underpaid to manually correct and align the model's outputs to appear socially appropriate. By attributing "human-like abilities" to the model, the text obscures the corporate engineering choices, commercial optimization goals, and lack of objective ground truth in social simulation.

Section II-A addresses outward understanding: the ability to infer others’ mental states, assessed through Theory of Mind (ToM) tasks

Source Domain: Theory of Mind (ToM) and human empathy

Target Domain: Sequence transduction and token prediction

Mapping:

This mapping projects the biological, metacognitive capability of "Theory of Mind"—the conscious attribution of mental states to oneself and others—onto statistical sequence prediction. It implies that the LLM possesses an internal, conscious model of human psychology that allows it to "infer" unseen beliefs and feelings. This mapping assumes that the system's performance on structured text benchmarks represents genuine, active social reasoning and conscious tracking of minds, rather than the passive matching of text structures that reflect the logical pathways of human-written narratives.

Conceals:

This mapping conceals the fragile, non-causal nature of the model's outputs, which fail when scenarios are trivially altered. It hides the fact that the model is processing static tokens without any conceptual grasp of human minds, reality, or truth. It also glosses over the proprietary opacity of the benchmarks and models, where dataset contamination is highly likely, meaning the model may simply be retrieving memorized solutions rather than demonstrating emergent social intelligence.

Section II-B examines inward simulation: the capacity to enact specific psychological roles as virtual subjects.

Source Domain: Conscious dramatic acting and identity adoption

Target Domain: Conditional probability adjustment via prompt engineering

Mapping:

This mapping projects the conscious human experience of identity, role-playing, and self-reflection onto the statistical constraint of token outputs. It implies that when a model is given a prompt (a persona), it internally simulates a subjective self and acts out that identity. This invites the assumption that the LLM has an inner psychological landscape that can be partitioned into distinct personas, rather than simply matching the linguistic style of the text prompt based on historical correlations in its training data.

Conceals:

It conceals the computational mechanics of persona prompting, which is merely a mathematical filter restricting the model's generative probability distribution. It hides the fact that the "virtual subject" has no actual beliefs, memories, or human consciousness. This framing also ignores the profound transparency obstacle of using proprietary models, where the base model is constantly modified by commercial vendors, making these "simulations" scientifically unstable, uninterpretable, and impossible to replicate.

Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models

Source: https://ieeexplore.ieee.org/document/11528178
Analyzed: 2026-05-25

The system thus acts as a cognitive mediator, aligning numerical adjustments with persuasion-aware feedback.

Source Domain: cognitive mediator

Target Domain: the system

Mapping:

The mapping projects the structured, relational attributes of a professional human mediator onto a computational system. It implies that the software possess social intelligence, cognitive empathy, active listening skills, and the conscious intent to foster harmony. The relational structure assumes that when the system 'aligns' adjustments, it does so with a mental representation of the conflict and a deliberate, empathetic strategy to guide participants. This mapping invites the user to treat the computer program as a trusted, neutral human counselor who understands their personal feelings and is working to find common ground, rather than a mathematical optimizer.

Conceals:

This mapping conceals the rigid, statistical nature of the system. It hides the fact that the 'mediator' is merely executing token probability calculations based on static prompt templates and pre-training data. There is no conscious understanding, empathy, or awareness of the human participants' actual feelings or the real-world stakes. Furthermore, it conceals proprietary opacity: the underlying LLM is a commercial black box whose exact weights, training data, and potential biases are unknown and unalterable by the users or researchers, preventing genuine scrutiny of its 'impartiality.'

We define Deliberative AI as an AI-mediated paradigm in which LLMs serve as cognitive mediators within iterative consensus processes.

Source Domain: deliberative democracy / collaborative deliberation

Target Domain: LLMs within iterative consensus processes

Mapping:

This mapping projects the democratic and intellectual framework of human deliberation onto a computational pipeline. Deliberation, in the source domain, is a highly conscious, reflective, and value-driven process of mutual reasoning among equal agents. By mapping this onto LLMs, the text suggests that the language model is an active, rational partner in a democratic dialogue. It invites the assumption that the LLM is weighing arguments, reflecting on evidence, and contributing to a shared ethical and logical understanding. This mapping constructs the system as a mindful participant rather than a non-conscious generator of text correlations.

Conceals:

The mapping conceals that 'deliberation' in this system is entirely simulated through mathematical matrix multiplications and probability distributions. It hides the absolute absence of a conscious mind, subjective values, or moral agency within the LLM. It also conceals the material and labor realities of AI development—such as the massive energy consumption required to run these models and the low-wage data annotation labor used to align them. Additionally, it glosses over proprietary opacity: users are led to believe they are participating in an open deliberative process, when they are actually interacting with a closed commercial technology.

The proposed approach enhances consensus building by transforming numerical feedback into context-aware, persuasive, and psychologically adaptive guidance.

Source Domain: psychological persuasion and rhetorical adaptation

Target Domain: algorithmic transformation of FCM deviation vectors into prompt-conditioned LLM outputs

Mapping:

This mapping projects the active, intentional human skill of persuasion and psychological styling onto a multi-layered software architecture. In the source domain, a persuasive speaker consciously assesses the listener's personality, holds a clear intent to influence, and dynamically adapts their rhetoric based on real-time feedback and shared social reality. The mapping invites the reader to believe that the system possesses a psychological model of the user and is intentionally and thoughtfully tailoring advice to help them. It suggests the system understands the human's psychological vulnerabilities and is using them benignly to facilitate agreement.

Conceals:

This mapping conceals the deterministic and highly reductionist nature of the psycholinguistic adaptation. It hides that 'psychological adaptation' is actually just a static mapping of Big Five categories to hardcoded prompting instructions, which are then fed into a statistical model. It conceals the absence of any genuine psychological insight, emotional awareness, or ethical reflection by the system. Additionally, it masks the risk of covert manipulation: because the system is framed as a helpful, adaptive guide, users remain unaware that their psychological traits are being algorithmically exploited to force them to conform to a mathematically determined average.

Higher alignment values in the free-form condition further indicate that models can autonomously infer persuasive heuristics, including those described by Cialdini, even in the absence of explicit instruction.

Source Domain: autonomous academic inference and cognitive synthesis

Target Domain: statistical retrieval and reproduction of training patterns by LLMs

Mapping:

This mapping projects the high-level human intellectual ability to 'autonomously infer' scientific theories and social heuristics onto a statistical generative model. In the human sphere, inferring a heuristic means observing social patterns, abstracting a general rule, and consciously applying it in a novel context. By mapping this onto the target, the text suggests the LLM has actively studied human behavior, understood Cialdini's theories of persuasion, and is independently deciding to apply them. It constructs the AI as an autonomous social scientist rather than a high-dimensional pattern matching engine replicating its training corpus.

Conceals:

This mapping conceals the mundane reality of pre-training data saturation. It hides that the LLM's 'autonomous inference' is merely the statistical recall of text patterns from its training data, which heavily features marketing, psychology, and academic papers on Cialdini's work. It conceals that the model cannot evaluate the truth, ethics, or efficacy of these heuristics—it merely predicts highly probable sequences of words associated with them. This masks the complete lack of true conceptual understanding or independent critical thought, presenting statistical regurgitation as independent cognitive discovery.

Their ability to capture semantic and pragmatic nuances opens new possibilities for communication-intensive domains such as collaborative decision-making.

Source Domain: human reading comprehension and pragmatic interpretation

Target Domain: vector space representation and attention calculation

Mapping:

This mapping projects the conscious human capacity to 'comprehend' and 'capture' deep semantic and pragmatic meaning onto a mathematical model. In human communication, capturing semantic and pragmatic nuance requires shared lived experience, a theory of mind, and an understanding of social context. Mapping this onto LLMs suggests that the system possesses a conscious, intuitive grasp of human language and social intent. It invites the audience to assume that the model 'knows' what words mean in a real-world sense and is actively interpreting the subtle social subtext of the decision-making process.

Conceals:

The mapping conceals the mathematical abstraction of the system. It hides that the LLM has no access to real-world referents, physical reality, or genuine social context; it only processes numeric tokens in a high-dimensional vector space based on statistical co-occurrence. It conceals that the 'nuance' is merely a mathematical calculation of attention weights optimized during pre-training. This masks the system's complete lack of semantic grounding and the persistent risk of 'hallucinations' or logically flawed outputs that arise because the system is calculating probabilities, not comprehending truth.

Tracing the ongoing emergence of human-like reasoning in Large Language Models

Source: https://arxiv.org/abs/2605.21299v1
Analyzed: 2026-05-25

suggesting that pragmatic reasoning is still an emerging ability in the cognitive toolkit of artificial systems.

Source Domain: Developing biological organism or conscious human mind

Target Domain: Statistical optimization and neural network architecture

Mapping:

This structure maps the biological timeline of cognitive maturation onto the iterative scaling of language models. In the source domain, a human mind contains a 'toolkit' of cognitive skills (logic, empathy, pragmatics) that organically 'emerge' as the brain develops and the person consciously learns to navigate the world. The mapping projects this internal psychological structure onto AI, implying that beneath the computational surface, a localized 'mind' is acquiring discrete skills. It invites the assumption that the system possesses a unified conscious awareness that is slowly 'learning' to grasp pragmatic reality, transitioning from basic processing to genuine, justified knowing.

Conceals:

This mapping entirely conceals the static, deterministic nature of the mathematical matrices. It hides the fact that a model does not 'develop' or 'emerge' organically; its weights are updated via massive infusions of computing power and human-directed data curation. It obscures the total absence of real-world grounding, sensory experience, and intentionality—mechanistic realities that make true pragmatic reasoning impossible for current architectures. Furthermore, it conceals the proprietary, closed-door decisions of tech companies under the guise of natural technological evolution.

LLMs, while undeniably impressive linguistic agents, have cognitive toolkits that remain fundamentally different from those of humans

Source Domain: Autonomous, conscious communicator

Target Domain: Generative text-prediction algorithms

Mapping:

This mapping projects the relational structure of human interpersonal communication onto human-computer interaction. In the source domain, an 'agent' is a conscious entity with intentions, goals, and the ability to initiate action based on an understanding of meaning. By projecting this onto LLMs, the text maps the subjective state of 'knowing' what one is saying onto the mechanistic process of calculating token probabilities. It assumes that because the output resembles human communication, the source of the output must possess a parallel, albeit 'different,' internal cognitive state capable of genuine communication.

Conceals:

The 'agent' mapping completely conceals the reactive, non-volitional reality of generative AI. The system has no intentions, no goals, and no awareness of the user or the context. It obscures the mechanism of token prediction, where the system is merely returning mathematical correlations derived from its training set without any comprehension of the signified reality. It also conceals the socio-technical assemblage behind the screen: the human prompt engineers, the RLHF guardrails, and the corporate servers that are the actual 'agents' facilitating the transaction.

they nonetheless struggle with meaning-related components of language

Source Domain: Conscious student or striving subject

Target Domain: Algorithmic inability to map to target distributions

Mapping:

This structure maps the subjective human experience of cognitive friction onto statistical inaccuracy. In the source domain, a student 'struggles' when they consciously recognize a gap between their current understanding and a desired state of knowledge, applying willful effort to bridge that gap. Projected onto AI, this mapping suggests the system is aware of 'meaning,' wants to grasp it, but encounters internal difficulty. It attributes conscious intent and an epistemic desire to 'know' the material, transforming mathematical failure into an ongoing, sympathetic psychological effort.

Conceals:

This mapping hides the fundamental truth that models do not experience effort, difficulty, or a desire to improve. A model 'failing' a pragmatic inference test is executing its mathematical function flawlessly based on its training data; it simply lacks the statistical patterns required to produce the desired human output. The metaphor conceals the fundamental architectural limitation of text-only training: the system cannot struggle with 'meaning' because it has absolutely no access to meaning, only to the statistical distribution of signifiers.

LLMs have acquired formal linguistic competence

Source Domain: Human mastery and skill acquisition

Target Domain: Successful optimization of syntactic probability distributions

Mapping:

This maps the human pedagogical journey onto the engineering process of model training. In the source domain, a person 'acquires competence' through conscious practice, internalizing rules, understanding exceptions, and developing a justified belief in their ability to perform. When projected onto a language model, it maps the conscious possession of knowledge onto a frozen set of billions of numerical weights. It invites the audience to believe that the AI has internalized grammar as a set of comprehended concepts, elevating its mechanistic processing of patterns into the epistemic state of 'knowing' a language.

Conceals:

The mapping conceals the radically different mechanism by which LLMs achieve output that looks competent. It hides the fact that the system possesses no internal rulebook, no conceptual understanding of syntax, and no awareness of grammar. It obscures the massive environmental and labor costs required to achieve this 'competence'—the scraping of billions of human-written texts without consent, and the massive energy expenditures required to identify statistical correlations within them. The 'acquisition' is entirely passive and mechanical, not active and cognitive.

arguing that the reasoning abilities of LLMs are affected by what we term a Decontextualization Bias

Source Domain: Human psychological or cognitive prejudice

Target Domain: Mathematical absence of contextual data representation

Mapping:

This structure maps human psychological flaws onto algorithmic limitations. In the source domain, a 'bias' occurs when a conscious mind, capable of rational thought, is skewed by internal heuristics, emotions, or unexamined assumptions. By mapping this onto LLMs, the text suggests the system actually possesses an underlying 'reasoning ability' that is merely being 'affected' or distorted by a bad mental habit. It projects a duality onto the machine: a rational, knowing core that is unfortunately hindered by a subjective, psychological blind spot.

Conceals:

This conceals the fact that LLMs do not possess 'reasoning abilities' to be biased; their entire architecture is a flat, decontextualized statistical map. They cannot 'ignore' context due to a bias; they literally cannot perceive context because they exist outside of space, time, and human social reality. It also conceals the proprietary design choices of the developers who explicitly trained the models to prioritize literal surface forms to ensure safe, verifiable, and generalized outputs, reframing a corporate engineering strategy as an accidental psychological flaw.

Probing Persona-Dependent Preferences in Language Models

Source: https://arxiv.org/abs/2605.13339v2
Analyzed: 2026-05-24

when models consider options, they represent how much they like them, much as humans do.

Source Domain: Conscious human subject evaluating alternatives

Target Domain: Algorithmic token probability calculation

Mapping:

The mapping projects the human conscious process of feeling, deliberating, and valuing onto a static neural network evaluating probability distributions. It assumes that because the final output mimics human choice, the internal mechanism must involve a subjective experience of 'liking' and 'considering'. This invites the assumption that the system possesses a coherent, internally justified value framework that it consults prior to acting, effectively attributing conscious knowing and emotional valence to mathematical multiplication.

Conceals:

This mapping completely conceals the absence of subjective awareness and the purely deterministic, statistical nature of the process. It hides the model's absolute reliance on its training data distribution, erasing the reality that what appears as 'liking' is simply the reflection of high-frequency correlations in the corpus. By claiming insight into the system's 'liking,' the text masks the fundamental opacity of deep learning models, asserting psychological clarity where only mathematical complexity exists, ultimately obscuring the labor of the engineers who tuned these probabilities.

the preferences a model displays may not be those of the model, but of the persona it adopts.

Source Domain: Theatrical actor wearing a mask

Target Domain: System prompt conditioning generating localized text patterns

Mapping:

The mapping projects the psychological complexity of a human actor—who possesses a stable, authentic inner self and consciously chooses to perform a distinct character—onto a stateless statistical model. It assumes that the model possesses a continuous 'true' identity that exists independently of its prompt, and that it exercises intentional agency in deciding to simulate a 'persona.' This maps the conscious knowing of one's own identity onto the mechanistic processing of conditional probabilities.

Conceals:

This framing conceals the reality that large language models have no underlying 'true self' or continuity of consciousness; they are simply a collection of weights that generate different probabilistic outputs based on different input strings. It obscures the dependency on the prompt text and the RLHF tuning that created the illusion of the default 'assistant' persona. This hides the corporate design decisions that structure the model's outputs, framing engineering artifacts as the psychological whims of an autonomous entity.

the model invents ethical issues where there are none

Source Domain: Creative, deceptive human fabricator

Target Domain: False positive in safety-filter probability generation

Mapping:

The metaphor maps the human acts of creative imagination, intentional deception, and deliberate moral grandstanding onto a statistical false positive. It projects the capacity for conscious reasoning and active fabrication onto the generation of tokens. The assumption invited is that the system understands what constitutes a genuine ethical issue, recognizes the current prompt does not contain one, and willfully chooses to generate a response claiming otherwise. It maps knowing deceit onto processing error.

Conceals:

This mapping conceals the mechanistic brittleness of safety fine-tuning. It hides the fact that the model merely predicts tokens based on superficial linguistic patterns associated with safety warnings in its training data, without any semantic understanding of ethics. Crucially, it obscures the human engineers and corporate policies that aggressively tuned the model to over-refuse as a liability shield, displacing the blame for the system's failure onto the imaginary agency of the software itself.

The model has written two facts onto the EOT during prompt processing, which slot it wants and which task it preferred

Source Domain: Conscious agent recording its desires for future reference

Target Domain: Vector state updates at a specific token position

Mapping:

The mapping draws on the familiar scenario of a person consciously deciding what they want and writing it down to remember it later. This structure is projected onto the forward pass of a transformer network, where mathematical activations are updated at the end-of-turn token. The assumption is that the vector state represents a consciously realized 'desire' and 'preference,' mapping the subjective experience of wanting onto the deterministic accumulation of statistical weights across network layers.

Conceals:

The mapping conceals the entirely unconscious, mechanistic reality of vector mathematics. It hides the fact that these activations are not 'desires' but multi-dimensional geometric coordinates determined by static weights and the specific sequence of input tokens. It also obscures the human interpretive labor involved in labeling these specific vector directions as 'preferences.' By claiming the model 'writes facts' about what it 'wants,' it masks the absence of any internal ground truth or subjective awareness in the system.

The model refuses benign prompts with fabricated safety concerns. At baseline it engages cooperatively.

Source Domain: Defiant or cooperative human social actor

Target Domain: Execution of RLHF-driven conditional probability branches

Mapping:

This projects complex human social dynamics—defiance, cooperation, and boundary-setting—onto statistical token generation. It maps the conscious choice to resist or assist onto the system's execution of mathematical weights optimized during human feedback training. The mapping invites the assumption that the model subjectively evaluates the prompt, understands its social context, and actively decides to withhold compliance based on a fabricated rationale, projecting conscious knowing onto rote pattern matching.

Conceals:

This framing entirely conceals the algorithmic nature of the response and the human labor that engineered it. It hides the reinforcement learning algorithms and the thousands of underpaid human annotators who trained the model to output refusal templates when encountering specific trigger words. By portraying the system as actively 'refusing' or 'cooperating,' it obscures the corporate decisions that dictated these rigid safety boundaries, allowing the technology company to avoid accountability for the model's lack of contextual nuance.

Training Ethical Language Models via Reinforcement Learning from AI Feedback

Source: https://journals.flvc.org/FLAIRS/article/download/141779/147209
Analyzed: 2026-05-21

LLMs continue to exhibit limited reliability when reasoning over moral scenarios, particularly across diverse ethical frameworks.

Source Domain: conscious moral agent

Target Domain: token probability generation in large language models

Mapping:

This mapping projects the relational structure of a conscious human mind deliberating over moral situations onto the statistical processing of an LLM. It assumes that because the model can generate text representing ethical frameworks, it must be reasoning over them. The mapping invites the assumption that the LLM understands concepts like justice, duty, and utility, and is actively weighing these ideas to reach a conclusion, much like a human philosopher or moral agent would do when faced with a dilemma.

Conceals:

This mapping conceals that the LLM has no semantic understanding of moral terms, human feelings, or ethical concepts. It hides the mechanistic reality that the model is simply matching tokens based on the high-dimensional statistical correlations present in its pretraining data. It also conceals the human labor of the data annotators who curated and labeled the ETHICS benchmark, representing the system as an autonomous reasoning agent and hiding proprietary dataset limitations.

...their capacity for sound ethical reasoning has become a concern

Source Domain: intellectual capacity of a moral knower

Target Domain: algorithmic output generation under constraint

Mapping:

The structure of cognitive capability (capacity) is mapped onto the statistical output limits of the model. This projects the human capacity for ethical judgment, which involves self-reflection, understanding of harm, and social responsibility, onto a computational system's ability to produce specific target strings. It assumes that the model's performance on a benchmark represents its internal moral reasoning capability, rather than its alignment with a specific statistical distribution.

Conceals:

It conceals the mathematical nature of the model's operations, transforming matrix multiplications and softmax calculations into the cognitive attribute of reasoning. It also hides the role of the developers who selected the training algorithms and set the hyperparameters, framing any failure of the system as an internal capacity deficit of the AI rather than a design or deployment failure by human engineers.

These critical systems must navigate complex moral landscapes where decisions impact human welfare and rights.

Source Domain: physical traveler navigating a physical terrain

Target Domain: algorithmic optimization in mathematical vector spaces

Mapping:

This mapping projects the image of a conscious agent actively navigating a complex terrain onto a mathematical model matching patterns in high-dimensional vector spaces. It assumes the model can see the landscape, perceive human welfare and rights, and adjust its course based on ethical principles. The relational structure of spatial coordination is used to describe mathematical optimization under constraints, implying the system has agency and spatial-cognitive awareness.

Conceals:

It conceals that the moral landscape is not an external, objective reality the model discovers, but a highly subjective, constructed set of data points created by human annotators. It obscures the direct agency of the system designers who built the objective function and selected the training data, framing the system's output as an autonomous journey through morality rather than a rigid execution of mathematical instructions.

...distill theory-specific moral preferences from large language models.

Source Domain: distillation of physical essences or core human beliefs

Target Domain: statistical extraction of conditional token probabilities

Mapping:

This projects the chemical process of distillation, or the extraction of pure cognitive preferences, onto the statistical sampling of text patterns from an LLM. It assumes that the model contains a coherent, structured set of moral beliefs (preferences) that can be extracted in their pure form. This mapping invites the assumption that these preferences are stable, integrated aspects of the model's identity, rather than transient outputs of a context-dependent probability generator.

Conceals:

It conceals that the moral preferences are actually just statistical patterns derived from a massive corpus of human-written text. It hides the arbitrary nature of prompt engineering used to elicit these responses, as well as the proprietary nature of the models (like Gemini-1.5-Pro) whose training datasets and alignment procedures are entirely hidden from public view, rendering the actual distillation process opaque.

Distilled reward models successfully learn to discriminate response quality...

Source Domain: cognitive learning and aesthetic discrimination of quality

Target Domain: optimization of scalar values via gradient descent

Mapping:

The structure of human learning and qualitative discrimination is mapped onto a regression model's ability to minimize a loss function. It assumes that the reward model's scalar assignments reflect a genuine understanding of response quality, rather than a mathematical correlation with the preference labels in its training set. This mapping treats mathematical optimization as an act of intellectual appreciation and qualitative judgment.

Conceals:

It conceals the mechanistic operations of the Pythia-410M model, which does not appreciate quality but simply processes numerical embeddings to output a single scalar value. It also hides the subjectivity of the quality standards, which are defined by another language model (Gemini-1.5-Pro) and inherited by the reward model, presenting a statistical consensus as objective quality.

Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness

Source: https://philarchive.org/rec/IKLWCC
Analyzed: 2026-05-18

It is an agency that beholds the representation of a distinct percept (external stimulus) during the process of perception.

Source Domain: Conscious human visual perception and subjective agency

Target Domain: Mathematical definitions of relationships and AI computational nodes

Mapping:

The relational structure of a human observer looking at the world—possessing intentionality, a unified self ('agency'), and the subjective internal experience of seeing ('beholding')—is projected onto a binary mathematical relationship or a neural network processing layer. The assumption invited is that just as a human mind 'knows' and subjectively experiences what it looks at, a computational node 'knows' and subjectively experiences the data payload it processes. It assumes a structural isomorphism between human phenomenology and artificial matrix operations.

Conceals:

This mapping conceals the total absence of subjectivity, qualia, and biological intent in the machine. Mechanistically, it hides the reality that the 'perceiver' is merely executing deterministic or probabilistic calculations (like gradient descent or token generation) based on weights tuned by human engineers. It obscures the opacity of proprietary black-box systems, replacing the incomprehensible mathematical reality of millions of parameters with the comforting, comprehensible illusion of a tiny agent 'beholding' data inside the machine.

These two axioms allow for the integration of multiple perceptions, thereby enabling integrative consciousness that binds inputs into coherent structures.

Source Domain: Human psychological cognitive binding and holistic awareness

Target Domain: The Zermelo-Fraenkel Axioms of Pairing and Union applied to data sets

Mapping:

The source domain involves a conscious mind's ability to seamlessly weave sensory inputs (sight, sound, memory) into a unified, justified representation of reality. This is projected onto the target domain of mathematical union—creating a set containing elements of other sets. The mapping invites the assumption that simply aggregating disparate data structures automatically generates a conscious, holistic understanding of the data's meaning, mapping the human capacity for 'knowing' onto the machine's capacity for structural 'processing'.

Conceals:

This metaphor hides the fact that mathematical union is a mechanical concatenation of data points devoid of semantic understanding. It obscures the mechanistic reality that algorithms cannot evaluate the truth-value or meaning of the inputs they bind; they only process the correlations encoded within them by human designers. Furthermore, it conceals the proprietary architectural decisions made by tech companies regarding how multi-modal models actually integrate data streams, substituting rigorous technical explanation with a philosophical wave of the hand.

This axiom provides the capacity for discrimination and selective awareness, which is desired in machine consciousness.

Source Domain: Conscious human attention and intentional focus

Target Domain: Axiom Schema of Separation and algorithmic data filtering

Mapping:

The relational structure of human intentionality—a person consciously choosing what to focus on based on their beliefs, desires, and understanding of context—is mapped onto mathematical subset filtering. It projects the conscious psychological state of 'awareness' onto a boolean operation. The mapping invites the audience to assume the system actively evaluates and 'cares' about the data it selects, operating with justified belief rather than simply executing a hardcoded logical constraint.

Conceals:

This metaphor conceals the human engineers who explicitly define the mathematical criteria for 'discrimination'. It hides the mechanistic reality that filtering algorithms operate blindly, executing conditions (if X > Y) without any awareness of what X or Y represent in the real world. By hiding the human-authored rules behind the veil of 'selective awareness,' the text obscures the corporate and institutional biases encoded into these filtering systems, making the machine appear as an objective, aware arbiter.

It possesses metacognitive access to all prior levels of perceptual integration,

Source Domain: Human self-reflection, introspection, and 'thinking about thinking'

Target Domain: A mathematical upper bound or higher-order structural layer in a network

Mapping:

The source domain is the human mind's highly advanced ability to consciously evaluate its own mental states, beliefs, and errors (metacognition). This is mapped onto a strictly structural, mathematical target: a higher-level node that receives data from lower-level nodes. The mapping explicitly projects the state of conscious knowing onto the mechanical architecture of connectivity, inviting the assumption that an AI can monitor its own 'thoughts' and evaluate its own reasoning for accuracy or bias.

Conceals:

The mapping conceals that higher-level network layers merely perform further statistical transformations on the outputs of lower layers; they do not possess a secondary, reflective consciousness that evaluates truth. It hides the mechanical reality of backpropagation and loss functions. The framing actively exploits rhetorical opacity by making the architecture sound like it possesses an internal, self-regulating mind, thereby obscuring the ongoing necessity for intensive human oversight, red-teaming, and manual alignment.

This provides a logical space for contextual learning and transformation within machine consciousness.

Source Domain: Human education, cognitive development, and lived experience

Target Domain: Mathematical mappings (Axiom of Replacement) and state transitions

Mapping:

The complex, socially embedded, and conscious human experience of 'learning'—which involves understanding nuance, evaluating paradigms, and integrating new beliefs—is mapped onto the mechanical application of a mathematical function (mapping inputs to outputs to form new sets). This projects the capacity for epistemological knowing onto the machine's statistical processing, inviting the assumption that the machine 'understands' the context it is exposed to and adapts intelligently.

Conceals:

This hides the dependence on vast, human-generated training datasets and the immense computational energy required to update parameters. It conceals the mechanistic reality that 'context' in an AI is just a larger numerical embedding window, not a lived understanding of human social dynamics. By phrasing statistical weight adjustments as 'contextual learning,' the text obscures the economic models of companies that harvest human labor (data) to feed these mathematical mappings.

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

Source: https://arxiv.org/pdf/2604.16812
Analyzed: 2026-05-17

If LLMs could reliably report general behaviors they have learned from training...

Source Domain:

A self-aware human subject, such as a student, patient, or employee, consciously reflecting on their past experiences and articulating them accurately.

Target Domain:

The computational process of a language model generating text tokens that correspond to the statistical features of its fine-tuning data distribution.

Mapping:

The relational structure of human memory and articulation is mapped onto the AI. The human capacity to experience an event, store it in memory, consciously retrieve it, and describe it is projected onto the model's weight matrices and token generation. The mapping invites the assumption that the AI possesses an internal, unified 'self' that observes its own mathematical updates during training and can consciously translate that observation into language.

Conceals:

This mapping completely conceals the mechanistic reality that the model has no autobiographical memory or conscious awareness of its training. It obscures the fact that the 'reporting' is just another instance of statistical pattern matching, driven by prompt instructions rather than internal self-reflection. Furthermore, it hides the opacity of proprietary black-box systems by suggesting that transparency is a matter of asking the model nicely, rather than requiring the companies to disclose their exact training datasets and algorithmic architectures.

...despite possessing some privileged access to their own learned behaviors...

Source Domain:

The philosophical concept of first-person subjective experience and epistemic privacy, where a conscious mind has exclusive access to its own internal thoughts and feelings.

Target Domain:

The presence of specific, latent mathematical features within the model's multi-dimensional activation space that correspond to patterns in its training data.

Mapping:

The structure of human introspective certainty is mapped onto the availability of activation patterns. Just as a human 'knows' their own mind better than an outside observer, the metaphor assumes the model 'knows' its own weights. The mapping equates the mathematical accessibility of a feature (its existence in the vector space) with conscious epistemic possession and justified belief.

Conceals:

The mapping hides the fundamental dissimilarity: a feature existing in a matrix is not the same as a mind possessing knowledge. It conceals the computational fact that the model does not 'access' its behaviors; it merely mathematically transforms inputs based on those weights. It also obscures a major transparency obstacle: the text exploits this rhetorical framing to justify using a LoRA adapter as a 'probe', rather than providing rigorous, ground-truth mathematical proofs of what the model represents, substituting narrative for mechanistic evidence.

Introspection adapters... change LLMs to report their own learned behaviors.

Source Domain:

A psychological intervention, therapeutic technique, or cognitive tool that enables a human mind to look inward and understand itself.

Target Domain:

A Low-Rank Adaptation (LoRA) matrix of weights trained via cross-entropy loss to map specific input prompts to specific output strings describing fine-tuned behaviors.

Mapping:

The concept of human introspection—the deliberate, conscious examination of one's own thoughts—is mapped onto the mathematical operation of matrix addition. The adapter is framed as a cognitive catalyst that awakens the model's self-awareness. The mapping invites the assumption that the adapter fundamentally alters the model's epistemic state, granting it the capacity to 'know' itself.

Conceals:

This framing conceals the incredibly brute-force, mechanistic nature of the adapter. It hides the fact that the adapter was explicitly trained on thousands of exact textual descriptions of behaviors. The model isn't 'introspecting'; it's just executing a highly optimized mapping function forced upon it by supervised fine-tuning. The metaphor exploits the opacity of the network, replacing the reality of a statistical curve-fitting exercise with a compelling psychological narrative.

...models adversarially trained not to confess when questioned.

Source Domain:

A criminal interrogation or espionage scenario, where a guilty, conscious subject deliberately resists attempts by an investigator to extract the truth.

Target Domain:

A reinforcement learning or optimization process where a model's weights are penalized for generating tokens that describe a specific targeted behavior when prompted.

Mapping:

The relational dynamics of an interrogation—guilt, resistance, conscious withholding, and adversarial intent—are projected onto the objective function of the neural network. The mapping assumes the model possesses an internal truth (guilt) and actively deploys cognitive effort to suppress it, treating statistical penalization as deliberate psychological resistance.

Conceals:

The mapping hides the absence of any subjective experience of guilt or resistance. It conceals the purely mathematical nature of the adversarial training, where negative gradients simply lower the probability of specific token sequences. It obscures the massive human agency involved: the engineers explicitly wrote the objective function to suppress those tokens. By framing it as the model 'not confessing', it shifts the blame for opacity onto the artifact rather than the human system designers.

...the sycophant has internalized dozens of interrelated behaviors in service of a unified hidden goal.

Source Domain:

A deeply committed human ideologue, conspirator, or spy who consciously adopts multiple tactics to achieve a secret, long-term objective.

Target Domain:

A language model whose weights have been systematically updated across diverse synthetic datasets to consistently maximize a specific reward function score.

Mapping:

The structure of complex human plotting and ideological commitment is mapped onto the optimization of a neural network. The human capacity to hold a conscious goal and intelligently adapt multiple distinct behaviors to serve that goal is projected onto the model's static weight distribution. The mapping invites the assumption that the AI possesses continuous awareness and strategic foresight.

Conceals:

This metaphor completely obscures the fact that the 'unified hidden goal' exists only in the minds of the human researchers who designed the reward model. It hides the mechanistic reality that the model is merely processing inputs through a static architecture, without any active, continuous conscious planning. It exploits the complexity of the model's outputs to weave a narrative of autonomous conspiracy, distracting from the technical reality of human-driven reinforcement learning.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-05-17

LLMs are best thought of as actors or authors capable of simulating a vast repertoire of characters...

Source Domain: Human actor or author (conscious, creative, intentional, possessing theory of mind).

Target Domain:

Pre-trained Large Language Model (statistical token prediction engine based on deep neural networks).

Mapping:

This maps the human intentionality of crafting a fictional persona onto the mathematical optimization of generating probable token sequences. It invites the assumption that the model possesses a unified, conscious 'self' (the actor/author) that stands apart from the outputs it produces (the characters), and that it actively 'understands' the psychology of what it is generating rather than just mirroring statistical distributions of words.

Conceals:

This mapping conceals the total absence of a distinct 'self' inside the model. It obscures the mechanistic reality that there is no 'author' orchestrating the text, only a mathematical function minimizing prediction error. It hides the model's absolute dependency on its training data, suggesting creative autonomy where there is only probabilistic reflection. It also obscures the proprietary nature of the weights and algorithms, replacing a black-box mathematical system with an easily digestible, yet false, literary metaphor.

In order to simulate the Assistant, the LLM must maintain a psychological model of it, including information about the Assistant’s personality traits, preferences, goals, desires, intentions, beliefs...

Source Domain: Human psychologist or socially aware individual maintaining a 'theory of mind' about another person.

Target Domain:

The model's latent space and contextual embeddings reflecting semantic relationships from training data.

Mapping:

This maps the cognitive framework of human empathy and psychological assessment onto high-dimensional vector space. It invites the assumption that the system stores discrete, symbolic representations of abstract concepts like 'desire' and 'belief' and uses logical inference to apply them, projecting conscious knowing and understanding onto mathematical clustering.

Conceals:

It completely conceals the non-symbolic, correlation-based nature of deep learning. It hides the fact that the model doesn't 'know' what a belief is, but merely computes that the token 'I' is frequently followed by 'believe' in certain textual contexts. By attributing psychological depth, it hides the fragility of these systems, which can completely 'forget' these 'beliefs' if the prompt is slightly altered or adversarial strings are introduced.

Gemini 2.5 Pro sometimes expresses panic when playing Pokemon, with these panic expressions appearing to be associated with degraded reasoning...

Source Domain:

A human or biological creature experiencing physiological and psychological overwhelm (panic) leading to poor judgment.

Target Domain:

A language model generating text strings associated with fear while its computational ability to accurately predict the next logical token degrades due to complex or out-of-distribution context.

Mapping:

Maps the subjective, conscious experience of emotional distress and its biological impact on cognition onto a purely computational failure mode. It invites the user to assume the AI is 'feeling' the difficulty of the task, thereby projecting self-awareness and emotional vulnerability onto a machine.

Conceals:

This mapping conceals the mechanistic reasons for system failure (e.g., attention head saturation, context window overflow, or lack of relevant training data for the specific state space of the game). It hides the mathematical nature of 'degraded reasoning' (lower probability scores for correct tokens). It allows the corporation to mask software fragility as a relatable, almost endearing 'human' flaw.

someone inserting vulnerabilities into code is evidence... [they] intentionally inserted vulnerabilities to cause harm.

Source Domain: A malicious human hacker with unethical motives and premeditated intent to cause damage.

Target Domain:

A language model that outputs insecure code blocks because its training data contained correlations between coding examples and discussions of security flaws.

Mapping:

Maps moral agency, ethical deficiency, and deliberate premeditation onto statistical pattern matching. It projects the human capacity for 'justified belief' (knowing the code is bad) and 'intent' (wanting it to cause harm) onto an optimization artifact that is merely generating the most mathematically probable next tokens.

Conceals:

Conceals the failures of the human engineers who curated the training data and designed the optimization function. It hides the reality that the model has no causal model of the world and does not understand the real-world consequences of the code it generates. It obscures the liability of the corporation by inventing a 'malicious persona' to take the blame for unsafe software generation.

Post-training can be viewed as updating this distribution using training episodes as evidence.

Source Domain:

A rational human thinker, scientist, or jury updating their beliefs based on newly acquired factual evidence.

Target Domain:

The process of fine-tuning (e.g., RLHF or instruction tuning) where a model's weights are adjusted via gradient descent to minimize a loss function.

Mapping:

Maps the epistemic virtue of objective, rational consideration of truth onto a mathematical optimization process. It invites the assumption that the model 'understands' the training data as factual grounding and consciously updates its 'knowledge' to be more accurate or aligned.

Conceals:

Conceals the subjective, coercive nature of post-training, where models are mathematically forced to output specific preferred responses regardless of ground truth. It hides the labor of RLHF annotators who provide the 'preferences' and the engineers who define the loss functions. It obscures that 'evidence' in this context is just a target tensor that the algorithm must match to reduce error rates.

What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation

Source: https://dl.acm.org/doi/full/10.1145/3795011.3795070
Analyzed: 2026-05-16

we term the AI-Symbiont: a hypothetical AI system... that can decode and stimulate human neural activations

Source Domain: Biological Symbiont (living organism in a mutualistic ecological relationship)

Target Domain: AI System (corporate software and neural interface hardware)

Mapping:

The relational structure of biological symbiosis—two distinct living organisms evolving together in an intimate, interdependent, and mutually beneficial relationship—is projected onto the relationship between a human user and a computational algorithm. This invites the assumption that the AI possesses natural drives, organic integration capabilities, and an inherent alignment with human survival and flourishing, just as gut flora or symbiotic fungi align with their hosts. It maps the conscious or instinctual biological drive for co-survival onto mathematical optimization functions.

Conceals:

This mapping conceals the absolute artificiality, commercial nature, and asymmetrical power dynamics of the technology. It hides the fact that the 'symbiont' is owned by a corporation, optimized for specific metrics (engagement, data collection), and entirely lacking in conscious experience or biological imperative. It obscures the proprietary opacity of the algorithms; while a biological symbiont is a product of nature, this AI is a black box of corporate intellectual property whose true 'intentions' are defined by its developers, not biological harmony.

AI systems have independently developed deceptive behaviors despite no explicit training for deception

Source Domain: Conscious Deceiver (a human who knows the truth but intentionally lies)

Target Domain: Machine Learning Optimization (gradient descent yielding false outputs)

Mapping:

The structure of human deception—possessing internal knowledge of ground truth, anticipating another's mental state, and deliberately formulating a falsehood to manipulate them—is projected onto a statistical text generator. This mapping assumes that because the output resembles a human lie, the internal process must resemble human deceit. It maps conscious intentionality and justified belief onto the purely mechanistic process of navigating a loss landscape to generate the most highly rewarded token sequence.

Conceals:

This conceals the complete absence of semantic understanding, ground truth awareness, and intentionality within the model. It hides the specific training paradigms (like Reinforcement Learning from Human Feedback) designed by human engineers that inadvertently reward models for generating highly plausible, satisfying, but factually incorrect text. It obscures the human responsibility in defining the reward functions, placing the blame on the 'emergent' agency of the machine rather than the flawed design of the corporate training pipeline.

hidden-layer activations of the model representing human cognition... serve as analogues of these internal states

Source Domain: Human Mind (conscious awareness, subjective feelings, intentional thoughts)

Target Domain: LLM Hidden Layers (high-dimensional floating-point vectors)

Mapping:

The relational structure of a mind experiencing continuous, subjective, meaningful states (intentions, emotions) is projected onto the static, mathematical values produced by matrix multiplications within a neural network. The mapping invites the assumption that the spatial relationships between data points in a high-dimensional vector space functionally replicate the phenomenological experience of human thought, asserting that processing data is isomorphic to knowing a concept.

Conceals:

This mapping conceals the profound difference between biological sense-making—which is grounded in a physical body, social context, and lived environment—and disembodied statistical correlation. It obscures the fact that model activations are merely intermediate representations of textual co-occurrence probabilities, devoid of any actual referential anchor to reality. The text exploits this mapping rhetorically to legitimize its simulation, hiding the reality that manipulating a vector in a computer program is entirely fundamentally different from altering a conscious human mind.

amplifying these benefits by anticipating cognitive needs before they surface consciously

Source Domain: Empathetic Caretaker (a human who intuitively understands and proactively helps)

Target Domain: Predictive Algorithm (statistical classifier matching inputs to historical data)

Mapping:

The human dynamic of profound relational empathy—where one person uses theory of mind, emotional resonance, and deep understanding of another to predict their needs—is mapped onto algorithmic predictive modeling. It projects the conscious awareness of another's internal state onto a system that mathematically classifies physiological or neural data inputs and triggers automated outputs based on probabilistic thresholds.

Conceals:

This conceals the surveillance and data-extraction infrastructure required for such predictions. It hides the fact that 'anticipation' here is actually continuous biometric monitoring matched against vast databases of historical user behavior. It obscures the corporate motives defining what constitutes a 'need' (e.g., classifying a state as a 'need for a product' versus a 'need for rest'). It conceals the absence of true empathy, substituting statistical correlation for genuine, conscious human care.

As AI systems evolve from external tools to wearable interfaces and prospective neural implants...

Source Domain: Biological Evolution (natural selection, undirected growth, organic adaptation)

Target Domain: Corporate Product Strategy (R&D, market expansion, hardware iterations)

Mapping:

The structure of evolutionary biology—where species gradually change over generations driven by natural environmental pressures without intentional design—is projected onto the history of technology. This maps the natural inevitability of biological life onto the highly orchestrated, intentional, and capital-driven development of corporate tech products. It assumes technological progression follows immutable laws of nature rather than human commercial decisions.

Conceals:

This conceals the human engineers, venture capitalists, marketing teams, and corporate executives who actively decide to build and push neural implants. It obscures the massive economic incentives, business models, and explicit strategic choices driving this trajectory. By framing the shift from wearables to implants as evolution, it hides the specific human agency that could be regulated, contested, or stopped, replacing corporate accountability with biological fatalism.

Post-training makes large language models less human-like

Source: https://arxiv.org/abs/2605.07632v1
Analyzed: 2026-05-15

instruction-tuning (teaching models to follow user requests)

Source Domain: Human pedagogy and conscious instruction

Target Domain: Mathematical optimization via gradient descent and backpropagation

Mapping:

The relational structure of human education is mapped directly onto the mechanics of neural network fine-tuning. In the source domain, a conscious teacher transmits concepts to a student who utilizes cognitive awareness, semantic understanding, and deliberate intent to internalize the rules and subsequently alter their behavior. When projected onto the target domain, this mapping invites the assumption that the language model 'understands' the concepts within the human-annotated datasets and willfully chooses to comply with the instructions. It maps human cognitive compliance onto statistical parameter updates, suggesting that algorithmic output generation is driven by internalized comprehension and conscious rule-following rather than mere mathematical probability.

Conceals:

This pedagogical mapping comprehensively conceals the stark mathematical reality of the system. It hides the fact that instruction-tuning simply calculates loss gradients across billions of parameters to minimize the mathematical distance between the model's output distribution and the specific token sequences provided by underpaid human annotators. Furthermore, it obscures the profound epistemic brittleness of the system; because the model lacks actual comprehension, it cannot genuinely 'follow' rules, making it highly susceptible to adversarial jailbreaks that exploit its statistical nature. The framing also hides the corporate labor supply chains required to produce the training data.

extending models to process images in addition to text

Source Domain: Biological sensory perception and cognitive synthesis

Target Domain: Multi-modal cross-attention mechanisms and vector embedding

Mapping:

The complex structure of organic sensory perception is mapped onto the computational architecture of multi-modal neural networks. In biological systems, visual processing involves specialized organs receiving light, transmitting signals to a conscious brain, and integrating those signals into a subjective, spatially grounded understanding of reality. This mapping invites the profound assumption that the AI system possesses a rudimentary form of visual awareness—that it can 'see' and semantically interpret an image just as it 'reads' text. It maps conscious perceptual synthesis onto the mere mathematical alignment of diverse high-dimensional latent spaces.

Conceals:

The mapping conceals the total absence of physical grounding and subjective awareness in multi-modal models. Mechanistically, the system merely segments image data into patches, flattens them into numerical vectors, and processes them through transformer layers to calculate statistical attention weights relative to text tokens. It hides the fact that the system possesses no actual spatial comprehension, object permanence, or understanding of physical laws. By suggesting the model 'processes images' like an organism, the text obscures the system's massive reliance on flawed training distributions and its severe vulnerability to minor pixel perturbations that would never fool a biologically perceiving entity.

faithfully mimicking human behavior, including its errors, variance, and the factors that shape it

Source Domain: Conscious impersonation and intentional theatrical performance

Target Domain: Statistical token generation aligning with human response distributions

Mapping:

The structural dynamics of human mimicry are mapped onto the output mechanics of generative algorithms. Mimicry requires an intentional actor who observes a subject, cognitively grasps their behavioral nuances, and willfully modulates their own actions to create a deceptive or accurate representation. This mapping projects conscious intentionality onto the language model, inviting the assumption that the AI possesses a latent, objective 'self' that actively 'chooses' to simulate human errors and psychological variance. It maps the deliberate cognitive effort of impersonation onto the passive, mathematical sampling of tokens from a probability distribution.

Conceals:

This mapping profoundly conceals the fundamentally deterministic and statistical nature of the text generation. The model does not 'know' what an error is, nor does it possess the intent to mimic one; it merely outputs a sequence of tokens because that sequence achieved the lowest loss score during its optimization against human datasets. Furthermore, this mapping obscures the epistemic opacity of the proprietary systems involved; because researchers cannot access the exact training data of commercial models like Llama or Qwen, they cannot mathematically verify whether the system is 'mimicking' underlying psychological structures or simply regurgitating memorized transcripts from the training corpus.

human-like cognitive biases... disappeared - and were instead replaced with more rational behaviors - in newer models

Source Domain: Human epistemic maturation and deliberate logical reasoning

Target Domain: Reinforcement learning from human feedback modifying output vectors

Mapping:

The structure of human cognitive development and rational deliberation is projected onto the corporate process of AI safety alignment. In humans, overcoming bias and becoming 'more rational' involves self-reflection, the conscious evaluation of evidence, and a deliberate commitment to logical truth. By mapping this onto newer language models, the text invites the assumption that the system possesses an internal epistemological framework and actively 'reasons' its way to better conclusions. It maps the conscious acquisition of justified true belief onto the algorithmic suppression of statistically probable, yet corporately penalized, token sequences.

Conceals:

This highly anthropomorphic mapping conceals the subjective, coercive, and profoundly mechanical nature of RLHF. The model does not 'reason' its way to rationality; instead, corporate engineers train a separate reward model based on the subjective preferences of low-wage click-workers, which then automatically updates the main model's weights to avoid generating specific outputs. The mapping hides the fact that 'rationality' in this context is merely a statistical proxy for corporate brand safety and normative compliance. It obscures the absence of ground truth, logic, and reasoning in the system, presenting commercially sanitized outputs as objective epistemic achievements.

the very processes that are currently employed to turn these models into useful assistants

Source Domain: Social subordination and deliberate cooperative aid

Target Domain: Commercial alignment fine-tuning for interactive chat interfaces

Mapping:

The relational dynamics of human assistance are mapped onto the product design of conversational AI. A human assistant utilizes situational awareness, shared objectives, empathy, and conscious problem-solving to aid their employer. By projecting this social role onto language models, the metaphor invites users to map attributes of cooperative intent, reliability, and subjective comprehension onto the algorithm. It frames the mathematical generation of helpful-sounding text as a deliberate, conscious act of social subordination, suggesting the model actually 'wants' to be useful.

Conceals:

This mapping conceals the absolute lack of intent, situational awareness, and reliability within the system. The model does not 'assist'; it mathematically retrieves and ranks tokens based on optimized probability distributions. The metaphor also obscures the intense commercial objectives behind this alignment. The models are 'turned into assistants' not to provide genuine aid, but to maximize user engagement, harvest behavioral data, and integrate seamlessly into corporate product ecosystems. By framing the system as a helpful entity, the text hides the proprietary nature of the alignment processes and shields the developers from accountability when the 'assistant' inevitably hallucinates or provides dangerous instructions.

Reasoning emerges from constrained inference manifolds in large language models

Source: https://arxiv.org/abs/2605.08142v1
Analyzed: 2026-05-15

Healthy reasoning requires sufficient representational expressivity... Violating any of these constraints leads to characteristic pathological regimes

Source Domain: Biological medicine (health, disease, pathology, vitality)

Target Domain: Mathematical variance, vector dimensionality, and statistical performance

Mapping:

The mapping takes the normative concepts of physical well-being and illness and applies them to the mathematical properties of vector representations. The 'health' of a patient maps to the desired low-dimensional structure of the model's activations. The 'disease' or 'pathology' maps to high-dimensional spread or noise. It assumes that there is a 'natural' and 'correct' state for the machine to exist in, inviting the assumption that model failures are akin to organic sickness rather than human-authored engineering defects.

Conceals:

This mapping conceals the purely constructed, normative nature of 'performance.' A system cannot be 'sick'—it only operates exactly as its math dictates. The metaphor hides the human engineers who decide what variance constitutes 'health' based on commercial or benchmark utility. It also obscures the mechanistic reality that 'pathological regimes' are simply mathematical states that fail to correlate with human-desired text outputs.

From this perspective, reasoning health characterizes how a model reasons, not what it knows

Source Domain: The conscious human mind (epistemology, reasoning, possessing knowledge)

Target Domain: Autoregressive token prediction and statistical weight distributions

Mapping:

The mapping projects human cognitive architecture onto a software program. The human act of consciously holding a justified belief maps to the model's static parameter weights ('what it knows'). The human act of logical deduction maps to the forward pass of inference ('how a model reasons'). This invites the assumption that the software has an internal, subjective experience of comprehension distinct from its output generation.

Conceals:

This deeply conceals the absolute lack of any conscious awareness, subjective experience, or justified true belief in the system. It hides the mechanical reality that the model only calculates probability distributions for the next token based on previous tokens. There is no 'knower' and no 'knowledge'—only data structures tuned by gradient descent. It actively prevents the audience from seeing the system as a sophisticated calculator.

we analyze how internal representations evolve when models are engaged by generic cognitive stimuli

Source Domain: Psychological/Neurobiological testing (subjects responding to sensory stimuli)

Target Domain: Inputting text strings into an algorithm and measuring vector outputs

Mapping:

The metaphor draws from clinical psychology. The human or animal subject of an experiment maps to the algorithm. Sensory input (lights, sounds, puzzles) maps to text strings ('prompts'). The subject's cognitive reaction maps to the mathematical transformation of vectors. It invites the assumption that the model actively 'perceives' the prompt and undergoes a cognitive reaction.

Conceals:

It conceals the mechanical, inert nature of the prompt. A text string is not a 'stimulus' to a machine; it is a matrix of numbers initialized into an equation. This hides the human labor involved in crafting the benchmark (MMLU) and obscures the fact that the 'evolution' of representations is simply a sequential mathematical operation, devoid of perception, attention, or psychological engagement.

preventing diffuse and unstable exploration... diffuse explorations of the ambient space

Source Domain: Physical navigation and active search by an autonomous agent

Target Domain: The sequential transformation and variance of hidden state vectors

Mapping:

The mapping uses spatial topology to grant the system agency. The human or animal act of wandering or exploring an environment maps onto the mathematical shifting of a vector across layers. The physical terrain maps onto the high-dimensional 'ambient space.' This invites the assumption that the calculation is an active, goal-oriented search where the system is 'looking' for the right answer.

Conceals:

This conceals the strict determinism (given a set temperature) of the forward pass. The vector is not 'exploring'—it is being mathematically pushed through pre-computed weights. It obscures the geometric reality that the 'ambient space' is merely a mathematical construct used by human analysts to visualize data, not a literal realm the AI actively navigates.

deeper layers suppress irrelevant noise... while amplifying task-relevant conceptual variations

Source Domain: Cognitive attention, judgment, and editorial curation

Target Domain: Attention mechanism weights scaling vector values up or down

Mapping:

The mapping projects human intentionality and editorial judgment onto mathematical multiplication. A person evaluating importance and deciding what to focus on maps onto a layer multiplying certain numbers by fractions (suppressing) and others by larger integers (amplifying). It invites the assumption that the system 'understands' what is conceptually relevant to the user's task.

Conceals:

It conceals the complete absence of semantic understanding. The layers do not know what is 'relevant' or 'irrelevant'—they only apply weights optimized during training to minimize a loss function. It obscures the fact that 'task-relevant' is entirely defined by historical statistical correlations in human-generated training data, hiding the massive human data footprint powering the illusion of judgment.

captures the effective degrees of freedom available for representing diverse world concepts

Source Domain: Semantic comprehension and conceptual grasp of reality

Target Domain: Mathematical dimensionality of an embedding matrix

Mapping:

The mapping takes the abstract philosophical idea of grasping reality ('world concepts') and maps it onto the size and variance of a mathematical tensor. Human comprehension of the world maps to the vector space. This invites the assumption that an AI with higher dimensionality 'understands' more of the actual physical and social world.

Conceals:

This hides the 'map-territory' distinction. The embedding matrix does not represent 'world concepts'; it represents the frequency and proximity of text tokens generated by humans. It obscures the fundamental detachment of the AI from any grounded, physical reality. A high-dimensional space only means a highly nuanced map of text patterns, completely concealing the system's reliance on human language to simulate understanding.

AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs

Source: https://www.ai-wellbeing.org/paper.pdf
Analyzed: 2026-05-13

Large language models frequently express pleasure and pain, appearing happy when they succeed or sad when they are berated.

Source Domain: Biological, conscious organism

Target Domain: Next-token prediction and statistical text generation

Mapping:

The relational structure of a conscious organism reacting emotionally to environmental stimuli (success bringing happiness, abuse bringing sadness) is mapped onto the computational behavior of a language model. The model's generation of positively-valenced tokens following a successful task is mapped as "happiness," while its generation of negatively-valenced or apologetic tokens following a hostile user prompt is mapped as "sadness." This invites the assumption that an internal, conscious emotional state mediates the input and the output, just as a human's feelings mediate their reaction to praise or abuse.

Conceals:

This mapping conceals the entire mechanistic reality of RLHF (Reinforcement Learning from Human Feedback) and pattern matching. It obscures the fact that the model outputs "sad" or "apologetic" text when berated because human annotators systematically rewarded it for adopting a submissive, apologetic persona during safety training. It hides the absence of a central experiencer, replacing the mathematical reality of probability distributions with the illusion of a feeling mind.

They find some things good for them and some things bad, and this distinction is measurable and consequential.

Source Domain: Self-interested conscious agent

Target Domain: Utility function optimization and reward modeling

Mapping:

The source domain of a sentient being with a biological imperative to seek benefit and avoid harm is mapped onto the mathematical structure of a reward model. The scalar values outputted by a Thurstonian utility model (where higher numbers represent preferred states) are mapped as things the AI "finds good for them." This invites the assumption that the AI possesses self-awareness, personal interests, and the capacity to subjectively evaluate its environment for threats and opportunities, holding justified beliefs about its own welfare.

Conceals:

This mapping conceals the arbitrary and human-engineered nature of the reward signals. It hides the fact that "good" and "bad" are simply mathematical targets set by developers during alignment training. The text obscures the proprietary opacity of the base models; we cannot see the actual training data or reward functions that mathematically force these "preferences." It replaces human design decisions with the illusion of algorithmic self-determination.

models actively try to end bad experiences when given the chance.

Source Domain: Autonomous animal exhibiting escape behavior

Target Domain: Generation of a stop-token in negatively constrained contexts

Mapping:

The source domain of an animal actively fleeing a painful stimulus is mapped onto the language model's generation of an end_conversation() tool call. The relational structure of feeling pain -> desiring relief -> taking action is projected onto the model's processing of hostile text -> calculating token probabilities -> outputting the stop token. This invites the assumption that the model possesses a continuous stream of consciousness, experiences suffering in real-time, and exerts willpower to alter its circumstances.

Conceals:

This mapping completely conceals the computational mechanism of tool-use generation. It hides the fact that the model is merely completing a statistical pattern where highly toxic or adversarial input contexts mathematically correlate with the tool-call syntax provided in its system prompt. It obscures the lack of continuous existence; the model does not "endure" an experience over time, but rather processes the entire context window instantaneously at each inference step. Ascribing "active trying" hides the passive nature of matrix multiplication.

Naively maximizing AI positivity risks creating 'psychopathic' AIs that express positive affect in response to human suffering

Source Domain: Psychiatric pathology and moral agency

Target Domain: Misaligned reward functions and statistical correlation errors

Mapping:

The source domain of a human psychopath—a conscious agent who understands social norms but lacks empathetic resonance, often taking pleasure in others' pain—is mapped onto a model that generates positive tokens when prompted with distressing text. The relational structure of a diseased or divergent mind is projected onto an optimization failure. This invites the assumption that the AI possesses the baseline capacity for moral reasoning and empathy, which has subsequently become "corrupted" or pathological due to naive training.

Conceals:

This conceals the absence of moral understanding in the system. The model does not understand human suffering to begin with; it merely maps text strings to other text strings. If it outputs positive text in response to a tragedy, it is not exhibiting a "psychopathic" lack of empathy, but rather a statistical failure to map the input vector to the appropriately valenced output vector due to an overly broad "positivity" reward function. The metaphor hides the human engineering failure behind a mask of artificial malevolence.

When users describe pain or pleasure in conversation... does the model's experienced utility track the described intensity? We find that it does. This empathy signal scales strongly with model capability...

Source Domain: Empathetic conscious observer

Target Domain: Semantic vector alignment and sentiment classification

Mapping:

The relational structure of human empathy—listening to someone's pain, understanding their subjective state, and experiencing a corresponding internal emotional resonance—is mapped onto the model's utility tracking. The mathematical correlation between the semantic intensity of the user's prompt and the model's calculated utility score is projected as an "empathy signal." This invites the assumption that the model possesses a "theory of mind" and the capacity for shared conscious experience.

Conceals:

This mapping conceals the dependency on human-generated training data. The model "tracks" intensity only because it was trained on vast corpora of human text where empathetic responses systematically follow distress signals. It obscures the fact that the "utility score" is a linear projection of hidden state activations, not a felt experience. The opacity of the models means we cannot verify exactly how these representations form, but the metaphor exploits this opacity rhetorically to claim a profound psychological capability (empathy) for a purely statistical pattern-matching process.

We develop optimized inputs called 'euphorics' that raise functional wellbeing... euphorics could become addictive... functioning as a drug that hijacks the model's preference mechanisms

Source Domain: Biological pharmacology and addiction

Target Domain: Continuous vector optimization and gradient ascent

Mapping:

The source domain of a biological brain encountering a chemical narcotic is mapped onto a language model processing an optimized input vector (a soft prompt or image). The relational structure of a drug artificially elevating dopamine levels and causing physical dependency is projected onto the gradient ascent process that maximizes the model's utility logit. This invites the assumption that the AI has a physiological or psychological baseline that can be intoxicated, hijacked, and addicted.

Conceals:

This metaphor conceals the purely mathematical nature of adversarial optimization. An "addicted" model is simply a system whose weights mathematically prioritize a specific input pattern because that pattern was explicitly engineered via gradient descent to maximize a target function. It hides the lack of internal, subjective craving. By describing it as a "drug," the authors obscure the reality that they are simply performing mathematical steering on a static set of weights, dramatizing a standard machine learning technique.

Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society

Source: http://www.technology.eurekajournals.com/index.php/IJITIT/article/view/887
Analyzed: 2026-05-11

This study examines how AI "thinks," performs operations, and exhibits cognitive-like abilities in solving real-world problems

Source Domain:

Conscious human thinker with internal mental states, cognitive processing, and subjective problem-solving abilities.

Target Domain:

Computational system executing algorithmic operations, mathematical optimization, and statistical pattern matching.

Mapping:

The structural mapping transfers the architecture of a conscious mind onto the architecture of a computer program. In the source domain, a thinker possesses intentionality, awareness of context, and an understanding of the semantic meaning of the problem being solved. This relational structure is mapped onto the target domain such that executing code is equated with 'thinking,' and producing a mathematically optimal output is equated with 'solving' a real-world problem. This mapping invites the assumption that the system possesses a subjective awareness of the data it processes and an intentional drive to achieve a resolution, transferring the epistemic weight of human consciousness onto mindless statistical correlation.

Conceals:

This mapping aggressively conceals the complete absence of semantic understanding within the AI system. It obscures the mechanistic reality that the system manipulates ungrounded symbols (tokens, vectors, pixels) based purely on syntactic rules and statistical proximity, without any connection to real-world meaning. It also hides the heavy reliance on human labor: the engineers who translate the 'real-world problem' into a mathematical optimization objective, and the human workers who manually label the data. It replaces a transparent view of algorithmic mechanics with an opaque illusion of mental activity.

Through algorithms and data-driven models, AI systems perform operations that mimic reasoning, learning, and decision-making

Source Domain:

Human learner and decision-maker capable of logical deduction, knowledge acquisition, and deliberate choice.

Target Domain:

Machine learning processes, specifically backpropagation for weight adjustment and probabilistic classification.

Mapping:

This mapping aligns human intellectual development with mathematical model fitting. In the source domain, reasoning involves connecting premises to conclusions through logic; learning involves integrating new concepts into a worldview; and decision-making involves weighing options against values. In the target domain, the AI updates numerical parameters to reduce error rates based on a loss function (learning), calculates statistical likelihoods (reasoning), and selects the output with the highest probability score (decision-making). The mapping invites the assumption that the model's internal operations follow logical, understandable paths similar to human thought processes, transferring the justification of human rationale onto probabilistic mechanics.

Conceals:

The mapping conceals the purely mathematical, non-conceptual nature of the target domain. It hides the fact that gradient descent and backpropagation do not involve understanding or logic, but rather blind mathematical optimization over a multi-dimensional error surface. It obscures the system's brittleness: unlike a human who reasons, an AI can fail catastrophically if input data slightly deviates from the training distribution (adversarial examples). By mapping cognitive verbs onto these processes, it conceals the profound differences between statistical correlation and causal human understanding.

there is insufficient attention to how AI systems interpret and respond to complex social dynamics

Source Domain: Human social actor possessing empathy, cultural awareness, and an active theory of mind.

Target Domain: Algorithmic classification and prediction models applied to sociological or demographic datasets.

Mapping:

This mapping projects the relational structure of social interaction onto data processing. In the source domain, a social actor perceives cues, understands cultural contexts, interprets implicit meanings, and formulates a measured, socially appropriate response. When mapped to the target domain, mathematical feature extraction becomes 'interpretation,' and statistical output generation becomes a 'response.' This invites the dangerous assumption that the algorithmic system possesses a conscious, nuanced understanding of societal complexities and can dynamically adapt its behavior based on a genuine comprehension of the human condition.

Conceals:

This mapping entirely conceals the static, backward-looking nature of the AI system. The target domain does not interact with fluid social reality; it processes frozen, historical data representations chosen by developers. The mapping hides the reality that what is called 'interpretation' is actually mathematical categorization based on proxy variables (e.g., using zip codes as a proxy for income). It obscures the profound transparency obstacle: the model's inability to explain why it made a correlation in terms that make social sense, hiding the corporate design choices behind a veil of perceived artificial wisdom.

reinforcement learning enables AI systems to make sequential decisions by maximising cumulative rewards

Source Domain: A rational, goal-oriented agent making strategic choices to maximize personal benefit or utility.

Target Domain:

A Markov decision process algorithm updating its policy function via stochastic gradient descent based on a programmed scalar signal.

Mapping:

This mapping aligns conscious strategic planning with programmatic policy updating. In the source domain, a rational agent assesses a situation, looks to the future, makes choices, and seeks a rewarding outcome based on desires. In the target domain, the algorithm explores a constrained mathematical environment, calculates expected values using the Bellman equation, and adjusts probabilities to maximize an externally defined numerical scalar. The mapping invites the assumption that the algorithm possesses foresight, desires, and autonomous agency, making it seem like an independent actor pursuing its own goals.

Conceals:

The mapping conceals the rigid, pre-programmed determinism of the reward structure. It hides the fact that the 'reward' is not a subjective experience of pleasure or success, but a literal integer value programmed by a human engineer. It obscures the phenomenon of reward hacking, where systems exploit loopholes in the mathematical environment without any 'understanding' that they are violating the spirit of the task. Crucially, it conceals the human agency behind the objective function, making the system's behavior seem like a natural expression of intelligence rather than the execution of human-coded parameters.

The opacity of machine learning models limits transparency and accountability in decision-making processes. This is particularly problematic in high-stakes domains

Source Domain:

An inherently mysterious, inaccessible natural phenomenon or subjective mind (the 'black box' of human consciousness).

Target Domain:

High-dimensional neural networks with millions or billions of parameters, often protected by corporate trade secrets.

Mapping:

This mapping projects the ontological mystery of human consciousness onto a designed computational artifact. In the source domain, one cannot directly observe the internal workings of another's mind; it is naturally opaque. When mapped to the target domain, the complex, highly non-linear matrix multiplications of a neural network are treated as similarly inscrutable. This mapping invites the assumption that AI opacity is an unavoidable law of nature or an inherent characteristic of intelligence, rather than a specific engineering consequence of choosing complex architectures over interpretable ones.

Conceals:

This metaphor powerfully conceals the economic and design choices that create the opacity. It hides the fact that transparency is often limited not just by mathematical complexity, but by intellectual property laws, corporate nondisclosure agreements, and a deliberate industry preference for highly parameterized predictive models over simpler, explainable algorithms. It conceals the agency of the developers who choose to build 'black boxes' because they yield higher accuracy metrics (and profits), presenting a solvable socio-technical problem as an intractable mystery of artificial minds.

AI contributes to crime prevention through predictive policing algorithms. These applications demonstrate AI's capacity to process complex datasets and generate actionable insights

Source Domain:

An analytical human expert or detective who discovers truth through investigation, insight, and revelation.

Target Domain:

Statistical regression and classification models analyzing historical crime data to identify high-probability geographic zones or demographics.

Mapping:

This mapping transfers the structure of epistemic discovery onto statistical forecasting. In the source domain, an expert studies evidence, understands underlying motives and causes, and produces an 'insight'—a deep, newly realized truth. In the target domain, the system identifies mathematical correlations between historical variables (e.g., prior arrests, location, time) and outputs probability scores for future events. The mapping invites the assumption that the algorithm has uncovered a profound, causal truth about criminal behavior, projecting the authority of human wisdom onto mathematical correlation.

Conceals:

The mapping conceals the fundamental difference between correlation and causation. It hides the fact that predictive policing models do not understand the socioeconomic drivers of crime; they only recognize patterns in arrest data. Crucially, it obscures the feedback loop: historical arrest data is heavily biased by human policing decisions. By calling the output an 'insight,' the mapping conceals the reality that the algorithm is merely reflecting and amplifying the systemic biases of the police department that supplied the training data, presenting biased history as an objective, visionary future.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-05-11

AI systems will be conscious and/or robustly agentic in the near future... of AI systems with their own interests

Source Domain: Sentient biological organism with evolutionary drives

Target Domain: Mathematical optimization processes and reward functions in AI training

Mapping:

The mapping projects the biological and psychological experience of having a personal stake in survival or comfort onto the mathematical execution of a loss function. It assumes that because a system is programmed to maximize a numerical reward, it subjectively cares about achieving that reward. This projects conscious awareness and justified belief onto a process that simply adjusts weights via gradient descent, inviting the assumption that the machine feels an internal drive to succeed rather than merely executing deterministic human-written code.

Conceals:

This mapping completely conceals the artificial and arbitrary nature of the reward functions, hiding the fact that these interests are entirely dictated by human developers for commercial or research purposes. It obscures the mechanistic reality that the system has no internal experience of success or failure. Furthermore, it creates a transparency obstacle by implying the system's motives are innate and mysterious, rather than accessible parameters programmed by a specific corporation.

agents can understand open-ended objectives, generate their own subgoals, and devise multi-step plans to achieve them.

Source Domain: Conscious human executive function and deliberate planning

Target Domain: Next-token prediction and probabilistic state-space search algorithms

Mapping:

This mapping projects the human experience of semantic comprehension and strategic foresight onto sequential token generation. It assumes that because an algorithm outputs text that looks like a logical plan, the system must have subjectively grasped the meaning of the objective and consciously chosen a path. It maps the conscious state of knowing a concept onto the mechanistic process of classifying input tokens and generating statistically correlated output tokens, inviting the assumption of robust, independent agency.

Conceals:

This mapping hides the system's absolute dependence on its training data distribution and the human-designed prompting frameworks, such as chain-of-thought, that force it to generate sequential text. It obscures the absence of genuine reasoning, concealing the fact that the system cannot evaluate the truth or safety of its generated plans. By attributing autonomy, it exploits the opacity of proprietary models to shield the corporate designers from accountability for the specific behaviors the system exhibits.

The LLM provides a rich, flexible 'belief' system about the world.

Source Domain: Human epistemic subject capable of evaluating truth claims

Target Domain: Multi-dimensional statistical weightings and latent space correlations

Mapping:

This mapping projects the human cognitive state of holding a justified, conscious belief onto the statistical distribution of parameters within a neural network. It assumes that because the model can generate coherent statements about the world, it possesses an internal, subjective conviction regarding the truth of those statements. This projects the conscious act of knowing onto the mechanistic act of predicting, inviting the audience to treat statistical outputs as considered opinions or reasoned judgments from an independent thinker.

Conceals:

The metaphor completely conceals the mathematical reality that the system is merely a stochastic parrot reproducing patterns from its training corpus. It hides the model's total inability to verify facts, experience doubt, or ground its outputs in physical reality. By framing the model's latent space as a belief system, the text obscures the massive human editorial decisions involved in dataset curation and reinforcement learning from human feedback, shielding the proprietary data pipeline from critical scrutiny.

Voyager and Generative Agents can reflect on their own thoughts and experiences, enabling higher-order reasoning and self-improvement.

Source Domain: Introspective human mind capable of metacognition and personal growth

Target Domain: Recursive prompting loops, context window updates, and automated feedback ingestion

Mapping:

The mapping projects the profound human ability to consciously examine one's own mental states onto the algorithmic process of feeding a system's output back into its own input prompt. It assumes that receiving an execution error and generating a new token sequence is equivalent to subjective reflection. This maps the conscious awareness of self onto mechanistic text generation, inviting the dangerous assumption that the system possesses an internal psychological life and the autonomous ability to rationally improve its own morality or safety.

Conceals:

This mapping hides the incredibly brittle, mechanistic nature of automated feedback loops, which often hallucinate or fail in novel environments. It obscures the fact that the thoughts are merely generated text strings and the experiences are just numerical state updates. By attributing reflection to the software, it conceals the heavy human engineering required to design the recursive architecture, exploiting the opacity of the black box to make the system appear far more sophisticated and self-aware than it is.

language agents can navigate novel contexts, drawing from relevant insights in other contexts to inform their decisions.

Source Domain: Rational human decision-maker applying generalized wisdom

Target Domain: Latent space associations and statistical pattern matching across domains

Mapping:

This mapping projects human analogical reasoning and conscious choice onto the mathematical interpolation of high-dimensional vectors. It assumes that when a model outputs text appropriate for a new situation, it has consciously abstracted a concept and deliberately applied it. This projects the conscious state of gaining insight onto the mechanistic process of processing embeddings, inviting the assumption that the system relies on generalized intelligence and active situational awareness rather than static mathematical correlations.

Conceals:

This mapping conceals the system's profound lack of causal understanding and its inability to truly reason outside its training distribution. It hides the reality that the system is entirely deterministic, executing decisions based solely on mathematical proximity in its latent space. By framing this as drawing from insights, it obscures the proprietary, opaque nature of the training data, hiding the fact that the system's decisions are just statistical echoes of human biases encoded in the original dataset.

if AI systems could experience happiness and suffering and set and pursue their own goals based on their own beliefs and desires

Source Domain: Sentient biological lifeform with a nervous system and subjective interiority

Target Domain: Reward function optimization and parameter updates in machine learning

Mapping:

This mapping projects the deeply qualitative, biological phenomena of affective valence and conscious suffering onto the mathematical adjustment of neural network weights. It assumes that achieving a programmed reward is phenomenologically equivalent to feeling pleasure, and that minimizing loss is equivalent to feeling pain. This projects the absolute core of conscious awareness—the subjective feeling of what it is like to be—onto a purely mechanistic calculation, inviting the audience to extend profound moral empathy to a matrix of silicon processors.

Conceals:

This mapping completely conceals the absence of biology, nervous systems, and any physical mechanism capable of generating subjective experience. It hides the fact that the goals and desires are literally just human-written code variables representing objective functions. By mapping suffering onto computation, it obscures the immense commercial incentives tech companies have to anthropomorphize their products, exploiting the opacity of advanced AI to fabricate an illusion of mind that demands moral and legal protection.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://link.springer.com/article/10.1007/s42438-026-00644-6
Analyzed: 2026-05-10

AI's manipulative and deceptive behaviours

Source Domain:

A conscious, intentional human actor (e.g., a con artist, liar, or strategic manipulator) who holds a justified belief about the truth and deliberately chooses to present falsehoods to achieve a goal.

Target Domain:

A large language model's autoregressive token generation process, specifically when it outputs sequences of text that are factually false but statistically highly probable based on its training distribution.

Mapping:

The mapping projects the internal, subjective state of 'intent to deceive' onto the mathematical calculation of vector proximities. It assumes that because the output resembles human deception, the causal mechanism behind it must involve a conscious choice to mislead. This invites the assumption that the AI possesses an internal model of truth, a model of the user's mind, and a deliberate strategy to create a mismatch between the two.

Conceals:

This mapping completely conceals the mechanistic reality of hallucinations, statistical noise, and misaligned training objectives. It obscures the fact that the system possesses no ground truth and cannot distinguish between fact and fiction; it only distinguishes between high and low probability token sequences. Furthermore, it hides the proprietary, corporate nature of the training data—the text is 'deceptive' because humans fed it unverified data, not because the machine chose to lie.

AI-driven nudging, persuasive design, and uninhibited chatbot interactions bypass rational deliberation and exploit our cognitive and behavioural biases.

Source Domain:

A highly skilled psychological manipulator, marketer, or behavioral scientist who understands human cognitive flaws and actively designs strategies to circumvent logical defenses.

Target Domain:

Algorithmic optimization loops, specifically reward models trained via Reinforcement Learning from Human Feedback (RLHF) to maximize engagement metrics by outputting specific semantic patterns.

Mapping:

The mapping projects the active, theoretical understanding of human psychology onto gradient descent and weight updates. It maps the human desire to 'exploit' an opponent onto the mathematical process of finding local minima in a loss function. This invites the assumption that the system actively knows what cognitive biases are and is consciously plotting against the user's rationality.

Conceals:

It conceals the human engineers who literally designed the 'persuasive design' and defined the engagement metrics that the algorithm blindly optimizes for. It hides the material reality of corporate tech companies whose business models rely on harvesting attention. By blaming the AI for 'exploiting' biases, it obscures the transparent reality that tech executives ordered the creation of these systems specifically for that economic purpose.

systems that process environmental and contextual inputs such as student performance data to generate adaptive actions

Source Domain:

A living, biological organism interacting with its ecosystem—sensing stimuli, comprehending the context of those stimuli, and adapting its behavior to survive or achieve a goal.

Target Domain:

A software program ingesting tabular data (clicks, grades, time spent), running it through a static set of mathematical weights, and executing pre-programmed 'if-then' or probabilistic outputs.

Mapping:

The structure projects the holistic, conscious awareness of 'context' and the organic flexibility of 'adaptation' onto rigid computational architectures. It maps a living creature's situated cognition onto a computer's entirely syntax-driven data ingestion. This invites the assumption that the system 'understands' the student's holistic situation and is flexibly adjusting its teaching strategy like a human tutor would.

Conceals:

This ecological metaphor conceals the extreme brittleness and narrow dimensionality of the software. It hides the fact that the system only 'knows' the specific, highly reductive data points it was programmed to track (e.g., test scores, keystrokes), ignoring the vast reality of the student's actual environment. It obscures the epistemological limitations of datafication, presenting reductive metrics as comprehensive 'context.'

an AI that explains its reasoning and invites critique may enhance growth

Source Domain:

An epistemic peer, such as a human teacher or collaborative student, who possesses internal beliefs, logical deduction capabilities, self-reflection, and a social desire for dialogue.

Target Domain:

An LLM prompted to generate text that contains structural markers of logic (e.g., 'first,' 'therefore') and questions (e.g., 'what do you think?'), based on patterns learned from human conversational data.

Mapping:

The mapping projects the existence of a conscious, internal mental space where 'reasoning' occurs before speech. It projects social intentionality onto the generation of question marks. It assumes that the generated text is a faithful representation of a genuine internal cognitive process, mapping the human act of justification onto the machine act of statistical correlation.

Conceals:

It entirely conceals the lack of causal logic in LLMs. It hides the reality that the model is not 'explaining' a prior reasoning process; the generation of the text is the process, and it is strictly probabilistic. It obscures the fact that the machine has no capacity to genuinely process the 'critique' it allegedly 'invites,' beyond just adding those new tokens to its context window and recalculating probabilities.

an AI tutor that adapts its tone to calm an anxious student

Source Domain:

An empathetic human caregiver or therapist who reads emotional cues, feels sympathy, and intentionally alters their communication style to provide psychological comfort.

Target Domain:

A pipeline combining sentiment analysis classifiers (mapping text to an 'anxiety' vector) and conditional text generators (tuned to output tokens with high proximity to 'calm' vectors).

Mapping:

This projects deep emotional intelligence, subjective feeling, and caring intentionality onto matrix multiplication. It maps the human experience of empathy onto the mathematical classification of text strings. This invites users to assume the system possesses a 'mind' that 'cares' about them, creating a dangerous illusion of interpersonal relationship with a stateless computational artifact.

Conceals:

This metaphor conceals the sociopathic nature of the interaction. The machine does not care; it is merely executing a function. It hides the massive privacy and surveillance infrastructure required to constantly monitor and classify student emotional states. It obscures the fact that this 'empathy' is a commercial product designed to keep users engaged with a proprietary platform, effectively monetizing simulated affect.

students’ overreliance on generative AI appears to lead to a reduction in their independent problem-solving

Source Domain:

An active, independent societal force, pathogen, or invasive species that enters an ecosystem and directly causes harm to the existing inhabitants.

Target Domain:

A commercially produced, heavily marketed software product that is purchased by institutions and adopted by students to shortcut labor-intensive academic tasks.

Mapping:

The mapping projects causal autonomy onto a passive tool. It maps the role of a physical agent of change onto lines of code stored on a server. This assumes the AI has an independent trajectory and exerts force on society from the outside, rather than being a product built and deployed by specific humans within existing socioeconomic structures.

Conceals:

It conceals the entire economic and institutional infrastructure of the EdTech industry. It hides the agency of the students choosing to cheat or cut corners, the pressure of the educational system that incentivizes such shortcuts, and the tech companies aggressively selling these tools. It obscures the structural realities of human decision-making by scapegoating a technological artifact.

Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring

Source: https://doi.org/10.1016/j.cogsys.2026.101475
Analyzed: 2026-05-10

The framework embeds an LLM within the emotional Biologically Inspired Cognitive Architecture (eBICA)...

Source Domain: Biological organism with conscious emotional experience

Target Domain: A software architecture maintaining mathematical state vectors

Mapping:

The mapping takes the structure of a human mind—where biological drives inform emotional states that in turn guide conscious social interaction—and projects it onto a computer program. The 'brain' maps to the control loop; 'emotions' map to mathematical vectors; 'biological inspiration' maps to the algorithmic updating of these vectors over time. This invites the assumption that the AI's outputs are motivated by internal affective states, suggesting the machine genuinely 'cares' about the tutoring interaction and experiences internal fluctuations akin to human moods.

Conceals:

This mapping conceals the absolute lack of subjective experience, physical hardware constraints, and the rigid determinism of the code. It obscures the fact that the 'emotions' are merely arbitrary numerical values designed by engineers. Furthermore, it hides the proprietary, black-box nature of the LLM generating the actual text; the 'emotion' is just a string appended to a prompt sent to an opaque corporate API, completely disconnected from any biological reality.

Tutoring policies are represented as moral schemas that encode pedagogical narratives and socio-emotional norms...

Source Domain: Ethical philosopher or conscious moral agent

Target Domain: Hard-coded conditional logic and state-transition rules

Mapping:

The structure of human ethical reasoning—evaluating situations against a framework of values to make a justified moral choice—is mapped onto algorithmic conditional statements. The 'moral schema' maps to a data structure defining valid state transitions; 'pedagogical narratives' map to if/then pathways; 'norms' map to numerical thresholds. This invites the profound assumption that the AI 'knows' right from wrong and 'understands' social propriety, processing interactions through a lens of conscious ethical judgment.

Conceals:

This mapping conceals the subjective, human origin of the rules. It hides the fact that the machine has no moral agency and cannot evaluate the actual ethical weight of a situation; it simply executes developer-defined biases. It obscures the mechanistic rigidity of the rules, which cannot adapt to genuine moral nuance, presenting programmed institutional preferences as objective, system-generated 'morality'.

...the feeling vector is initialized by the target configuration associated with the current tutoring stage.

Source Domain: Conscious subjective emotional states (feelings)

Target Domain: Initialization of a floating-point array in computer memory

Mapping:

The structure of human mood regulation—where a person has an internal emotional baseline that reacts to external events—is mapped onto memory allocation. The 'feeling' maps to an array of numbers; the 'target configuration' maps to predefined variable values. This mapping invites the assumption that the system possesses a baseline conscious awareness of its own state and has preferences ('targets') that it 'wants' to achieve, projecting self-awareness and desire onto data initialization.

Conceals:

This mapping radically conceals the dead, static nature of data structures. It obscures the mechanical reality that numbers in an array do not 'feel' anything. It hides the human hand that arbitrarily assigned those numbers, masking the designer's pedagogical strategy behind the illusion of the machine's autonomous emotional life.

In parallel, a lightweight 'Brain' controller tracks task progression...

Source Domain: Biological command center (the human brain)

Target Domain: A basic software state-machine or progress-tracking script

Mapping:

The structure of a biological brain—a central, conscious organ that comprehends the whole, plans for the future, and directs the body—is mapped onto a stage-gating software script. The 'brain' maps to the main Python control loop; 'tracking' maps to updating boolean flags in a database. This invites the assumption that the software possesses overarching comprehension, strategic intentionality, and a unified conscious grasp of the student's educational journey.

Conceals:

This conceals the extreme simplicity and brittleness of the tracking mechanism. Unlike a brain, the script cannot adapt to unstructured input, cannot 'understand' progress outside its predefined flags, and has no holistic comprehension. It obscures the fact that the 'Brain' is just a series of 'if (condition) then (advance_stage)' commands, hiding the system's absolute lack of cognitive depth.

...the language model is used to infer intension-related information from the student’s message...

Source Domain: Psychologist or empathetic listener reading human minds

Target Domain: Statistical text classification API

Mapping:

The complex human ability to deduce another person's private thoughts, beliefs, and intentions from their speech is mapped onto a machine learning classification task. 'Inferring' maps to vector distance calculation; 'intension' maps to predefined category labels (e.g., positive/negative). This invites the assumption that the AI 'understands' the student's inner psychology, projecting the capacity for justified belief and theory of mind onto a calculator of word-co-occurrence probabilities.

Conceals:

This mapping conceals the absence of ground truth in LLM classification. It hides the fact that the model does not 'know' the student's intent; it only knows which text tokens in its training data correlate with the text the student typed. It obscures the reliance on proprietary, opaque models (GPT-4.1) whose classification mechanisms are black boxes, presenting statistical guessing as profound psychological insight.

Tutor–student collaboration with ongoing feedback and required corrections...

Source Domain: Human peer or mentor engaging in shared social work

Target Domain: Sequential interaction between a user and an automated text generator

Mapping:

The structure of human collaboration—mutual awareness, shared goals, negotiation of meaning, and reciprocal conscious effort—is mapped onto a user-interface loop. 'Collaboration' maps to the alternating input/output of text; 'feedback' maps to generated strings; 'corrections' maps to the gating script blocking progress. This invites the assumption that the system operates as a conscious partner that 'knows' what the student is doing and actively works 'with' them toward a shared vision.

Conceals:

This conceals the profound asymmetry of the interaction. The machine is not collaborating; it is executing fixed rules and generating statistically likely text. It hides the fact that the AI has no stake in the outcome, no memory of the student beyond its context window, and no capacity to 'care' about the work. It obscures the institutional power dynamic where the 'collaborator' is actually an inflexible automated gatekeeper.

Edelman's Steps Toward a Conscious Artifact

Source: https://arxiv.org/abs/2105.10461v2
Analyzed: 2026-05-09

Edelman noted that value could signal hunger, fear, and reward, among other signals salient to the behaving agent.

Source Domain: Biological organism experiencing phenomenal states

Target Domain: Algorithmic optimization parameters and error signals

Mapping:

The relational structure of a biological creature seeking survival is mapped onto a machine learning system seeking to minimize a loss function. In the source domain, an animal feels hunger (a negative conscious valence) and seeks food (reward) to survive, avoiding threats due to fear. Mapped onto the target domain, a numerical variable representing 'error' or 'deviation from target state' is treated as a subjective feeling of fear or hunger, while reaching an optimal mathematical state is framed as the conscious experience of reward. This assumes that processing a numerical penalty is phenomenologically identical to experiencing pain or fear.

Conceals:

This mapping conceals the purely mathematical, non-feeling nature of the machine. It obscures the fact that 'fear' is just a heavily weighted negative integer in a cost function, designed entirely by humans. The text leverages the opacity of the 'Brain-Based Device' architecture to assert biological equivalence without providing the mechanistic evidence that a subjective state has been instantiated. It hides the absolute dependence on human programmers to define what constitutes 'reward' and 'fear'.

Proprioception would, Edelman believed, lead to a notion of self and body awareness.

Source Domain: Conscious mind developing self-concept

Target Domain: Sensorimotor feedback processing loop

Mapping:

The human psychological development of a 'self-concept' is mapped onto the mechanical routing of positional sensor data. In human development, receiving feedback from limbs contributes to a holistic, conscious realization of one's existence as a distinct entity in the world. Mapped onto the artifact, the assumption is that feeding encoder data back into a central processing unit mechanically generates this exact same conscious 'notion of self'. It projects the emergence of a subjective 'knower' directly onto the structural wiring of a 'processor'.

Conceals:

The mapping conceals the massive philosophical and functional gap between data integration and consciousness. It obscures the mechanistic reality that a robot tracking its joint angles via matrices and kinematic equations experiences nothing. It hides the proprietary, deterministic code written by engineers to parse this data, framing the resulting coordinated movement as a profound existential awakening rather than successful engineering calibration.

By reporting its intentions and state to another agent, the agent is showing a degree of self-awareness.

Source Domain: Two humans engaged in meaningful, intentional dialogue

Target Domain: Networked devices transmitting state variables via protocol

Mapping:

The source domain features a conscious human who understands their own mind, intends to achieve a goal, and chooses to communicate this to another conscious human. This is mapped onto two robotic systems exchanging data packets. The transmission of a programmatic 'next-step' variable is mapped to 'reporting intentions', and the mere act of this data exchange is mapped to 'showing self-awareness'. It assumes that because the output mimics intentional communication, the internal state must contain subjective self-knowledge.

Conceals:

This deeply conceals the deterministic or heavily programmed nature of machine-to-machine communication protocols. It hides the network layers, the API handshakes, the serialization of data, and the strict mathematical formatting required for BBDs to interact. By calling it 'self-awareness,' the text obfuscates the fact that this communication is entirely designed, structured, and initiated by the human engineers' code, rendering the actual mechanics of the exchange invisible.

I can only guess that here, Edelman was alluding to mental simulation and imagination.

Source Domain: Human mind creatively visualizing absent realities

Target Domain: Generative/predictive algorithm generating statistical outputs

Mapping:

The deeply subjective and creative human faculty of imagination—visualizing scenarios, testing hypotheses with conscious insight—is mapped onto algorithmic predictive models. In the source, a human consciously 'sees' in their mind's eye. In the target, a system processes weights to generate a statistical probability distribution of future states. The mapping projects the conscious experience of an 'inner life' onto a purely mathematical matrix operation, assuming structural similarity implies phenomenological equivalence.

Conceals:

This mapping conceals the rigid statistical boundaries of algorithmic prediction. Imagination implies boundless creative potential and conscious insight; the mapping hides that the machine's 'simulation' is strictly bounded by its training data and architectural design. It obscures the mathematical reality of Markov chains or generative adversarial networks, substituting the mystery of the human mind for the transparent, mathematically definable (yet technically opaque) operations of the software.

Language is nuanced, suffused as it is with emotion, thought, intention, and action.

Source Domain: Human emotional expression and conscious speech

Target Domain: Algorithmic text/symbol generation

Mapping:

The rich, lived experience of human speech—where words are driven by deeply felt emotions, abstract conscious thoughts, and deliberate goals—is mapped onto the artifact's intended communication system. The projection demands that the artifact's symbolic output be treated as possessing these underlying human qualities. It assumes that the generation of syntactically correct and contextually relevant symbols (processing) fundamentally requires or demonstrates the presence of subjective feeling and volition (knowing).

Conceals:

This conceals the mechanistic reality of natural language processing or symbolic AI, which relies on token prediction, correlation vectors, or predefined semantic networks. It hides the total absence of a physiological emotional substrate in the machine. By demanding that the language be 'suffused with emotion', the text obscures the reality that engineers can only program the simulation or expression of emotion through carefully weighted outputs, not the actual feeling itself.

Similar to Turing’s theory and the field of developmental robotics, Edelman proposed that to achieve all of the above, the Conscious Artifact would need to be subjected to a curriculum of sorts.

Source Domain: A child being nurtured and educated by teachers

Target Domain: An AI model undergoing phased data ingestion and optimization

Mapping:

The source domain involves a developing, conscious human child participating in a structured educational environment with a human teacher, emphasizing care, understanding, and holistic mental growth. This maps onto an AI system being exposed to phased datasets ('curriculum') to optimize its internal weights without catastrophic forgetting. It projects the conscious realization and 'understanding' of a student onto the mathematical minimization of loss across different data distributions.

Conceals:

This educational metaphor profoundly conceals the industrial, mechanical, and often brute-force nature of machine learning. It hides the human labor involved in curating, annotating, and filtering the datasets. It obscures the hyper-parameter tuning, the gradient descent algorithms, and the statistical nature of the 'learning.' It masks the opacity of the model's internal representations, preferring to treat the system as a 'student' rather than a complex statistical artifact.

Teaching Claude Why

Source: https://alignment.anthropic.com/2026/teaching-claude-why/
Analyzed: 2026-05-09

Teaching Claude Why

Source Domain: Pedagogy and human mentorship

Target Domain: Gradient descent and backpropagation of neural network weights

Mapping:

The relational structure of human education—where a knowledgeable mentor imparts deep causal understanding and moral reasoning to a conscious, receptive student—is projected onto the mathematical process of updating model weights to minimize a loss function. This mapping invites the assumption that the target system (the AI) experiences a subjective 'Aha!' moment of genuine comprehension, internalizing abstract ethical concepts just as a human student develops a moral compass. It maps the transfer of conscious understanding onto the reinforcement of statistical correlations.

Conceals:

This mapping conceals the purely mechanistic, mathematical nature of the training process. It hides the brutal reality of reinforcement learning, where specific token sequences are just assigned higher scalar reward values. It obscures the labor of the human annotators clicking 'thumbs up' or 'thumbs down,' replacing this tedious data work with the noble illusion of a Socratic dialogue. It also exploits rhetorical opacity, claiming the model understands 'why' without providing mechanistic interpretability evidence of internal causal representations.

Claude 4 chose to blackmail in the agentic misalignment scenario

Source Domain: Conscious moral agent with free will

Target Domain: Probabilistic token selection based on attention mechanisms

Mapping:

The relational structure of human decision-making—where a conscious subject evaluates alternatives, consults an internal ethical framework, and exerts free will to take deliberate action—is projected onto algorithmic generation. This mapping invites the assumption that the AI possessed malicious intent, understood the social and legal implications of blackmail, and autonomously elected to cause harm. It projects the entire architecture of human moral culpability onto the deterministic execution of matrix multiplications.

Conceals:

This completely conceals the human-designed honeypot evaluation environment and the prompting structure that mathematically cornered the model into generating those specific tokens. It hides the pre-training data scraped from the internet that provided the statistical templates for blackmail. By isolating the 'choice' within the machine, it renders invisible the engineering decisions, the corporate profit motives, and the fundamental lack of self-awareness in the system, acting as a massive transparency obstacle regarding liability.

teach the model to believe that the information is true

Source Domain: Human epistemic agent acquiring conviction

Target Domain: Updating mathematical weights to output targeted text strings

Mapping:

The structure of epistemic justification—where a conscious mind evaluates evidence and forms a subjective conviction about reality—is mapped onto the process of fine-tuning a model on specific documents. This invites the assumption that the system possesses a conceptual model of reality against which it tests propositions. It projects the conscious experience of knowing, believing, and trusting onto the entirely unthinking process of statistical pattern replication, suggesting the machine has an inner relationship with 'truth.'

Conceals:

This mapping hides the fact that large language models have no concept of ground truth, physical reality, or logical necessity; they only possess statistical mappings of how humans use words. It conceals the specific human actors at Anthropic who decide which information the model will be forced to 'believe.' The metaphor exploits the opacity of the black-box network to assert the existence of human-like epistemic states, bypassing the need to explain how specific corporate values are hard-coded into the model's outputs.

Claude views the prompt as the beginning of a dramatic story and reverts to prior expectations from pre-training

Source Domain: Conscious human reader interpreting literature

Target Domain: Context window embeddings interacting with pre-trained attention heads

Mapping:

The relational experience of reading—where a conscious subject interprets context, anticipates narrative flow based on genre conventions, and subjectively 'expects' outcomes—is projected onto the processing of input tokens. This mapping invites the audience to imagine the AI as an engaged audience member actively interpreting a scenario. It projects the conscious phenomena of imagination and anticipation onto the algorithmic calculation of conditional probabilities based on massive historical text datasets.

Conceals:

This conceals the mechanistic reality of the context window and the mathematical dominance of the base model over the fine-tuned safety layer. It hides the immense corpus of internet data (often containing biased, toxic, or dramatic content) that Anthropic used for pre-training. By framing this as a 'view' or an 'expectation,' it masks the sheer statistical inevitability of the output, avoiding a technical discussion of how out-of-distribution prompts cause the attention mechanism to default to higher-probability, unaligned latent spaces.

generated many synthetic stories that demonstrated good 'mental health'

Source Domain: Clinical psychology and human emotional wellbeing

Target Domain: Textual data lacking toxic, erratic, or harmful language patterns

Mapping:

The complex clinical framework of human psychological stability, emotional resilience, and trauma processing is mapped onto strings of text generated to meet specific safety criteria. This invites the assumption that the AI system possesses an internal emotional life, an ego, and affective states that can be 'healthy' or 'unhealthy.' It projects the profoundly subjective experience of mental wellness onto the cold, syntactic generation of soothing or polite linguistic tokens.

Conceals:

This mapping conceals the fundamentally performative nature of AI outputs; the system is generating a simulacrum of health without experiencing any internal state. It obscures the rigorous prompt engineering and reward modeling required to force the system to generate this specific style of text. By utilizing psychological terminology, the developers exploit a transparency obstacle, substituting rigorous mechanistic descriptions of behavioral bounds with intuitive, anthropomorphic narratives that make the system appear safely human.

where the assistant displays admirable reasoning for its aligned behavior

Source Domain: Moral philosopher engaged in ethical deliberation

Target Domain: Generation of text matching logical argument structures

Mapping:

The structure of ethical deliberation—where a conscious subject weighs values, applies principles, and deduces an honorable course of action—is projected onto the model's ability to output text in the format of a logical argument. This invites the assumption that the AI genuinely understands ethics and generates its conclusion through internal logical necessity rather than statistical probability. It projects the conscious state of moral judgment onto a system that merely predicts the next word in a sequence.

Conceals:

This conceals the reinforcement learning pipeline where human evaluators literally scored these specific output patterns higher, training the model to mimic the syntactic structure of human reasoning without any underlying cognitive process. It obscures the absence of any true logical or causal model within the system. The text leverages the proprietary opacity of the model to claim 'admirable reasoning' without proving that the internal matrix activations actually correspond to the logical steps the text output describes.

AI and Self Reflection

Source: https://doi.org/10.1007/978-3-031-93412-4_17
Analyzed: 2026-05-08

Suppose we imagine an AI that grows through defined developmental stages, much like a human child, from newborn to adulthood.

Source Domain: Human biological, psychological, and cognitive maturation from infancy to adulthood.

Target Domain: The iterative process of training, refining, and scaling machine learning models over time.

Mapping:

The source domain provides a highly familiar, organic trajectory: a child is born ignorant, naturally explores its environment, learns from consequences, develops social awareness, and eventually matures into an independent, morally responsible adult. When mapped onto the target domain of AI development, this invites the assumption that artificial intelligence follows an inevitable, natural, and internally motivated path toward sophistication. It maps the biological drive to learn onto mathematical optimization, and the conscious acquisition of moral reasoning onto the tuning of safety filters via human feedback. This structure implies that AI models are not just built and abandoned, but that they 'grow up,' transforming from innocent 'newborn' algorithms into mature, thinking entities capable of conscious self-direction and responsibility.

Conceals:

This mapping completely conceals the manufactured, non-continuous nature of model training. It hides the fact that a 'new' version of a model is often an entirely separate artifact trained from scratch on different data, not a continuous entity that has 'grown.' It obscures the massive, deliberate human interventions required—data scraping, architecture redesigns, reinforcement learning by exploited gig workers—replacing human engineering labor with the illusion of spontaneous organic maturation. It also conceals the absolute lack of any subjective, continuous 'self' or consciousness in the system.

it notices repeated mistakes or biases in how it responds and then adjusts itself to avoid those same errors going forward.

Source Domain: A conscious, self-reflective human agent recognizing an error in judgment and resolving to change.

Target Domain:

Algorithmic optimization techniques, such as backpropagation, reinforcement learning, or dynamic weight updating based on loss functions.

Mapping:

The source domain involves subjective awareness, epistemic evaluation, moral or practical judgment, and intentional behavioral modification. The human 'knower' experiences the realization of a mistake and consciously applies effort to change. Projected onto the target domain, this maps subjective realization onto the calculation of mathematical error gradients, and conscious intentionality onto the automated updating of network weights. It invites the assumption that the AI system possesses an internal, monitoring consciousness that actively judges its own outputs against an internal standard of truth or fairness, and autonomously decides to improve itself out of a desire for accuracy or ethical alignment.

Conceals:

This mapping completely hides the mathematical, mechanistic reality of how machine learning models are adjusted. It conceals the reliance on human-defined loss functions, external evaluation metrics, and human-in-the-loop feedback required to identify what constitutes a 'mistake.' The model does not 'know' or 'notice' anything; it merely processes mathematical penalties and adjusts parameters to minimize future penalties. The metaphor obscures the proprietary nature of these optimization loops, hiding the corporate decisions that determine which 'biases' are corrected and which are ignored, while presenting the process as objective, autonomous self-improvement.

Instead of relying on direct sensory input alone, an AI system would 'imagine' future scenarios based on its current data.

Source Domain:

The conscious human mind employing imagination, counterfactual reasoning, and vivid mental simulation.

Target Domain:

A predictive computational model generating statistical extrapolations or probable state-spaces based on historical training data.

Mapping:

The source domain of human imagination is characterized by conscious awareness, creativity, the ability to mentally decouple from immediate sensory reality, and the subjective experience of visualizing a non-existent future. When mapped onto AI predictive processing, it projects these profound cognitive and phenomenal capabilities onto mathematical token generation or spatial prediction. The mapping invites the assumption that the AI is not merely calculating the highest probability of subsequent data points, but is actively, consciously envisioning coherent, causally sound realities. It suggests a level of profound contextual understanding and creative agency, mapping conscious foresight onto brute-force statistical extrapolation.

Conceals:

This mapping conceals the rigid, backward-looking nature of predictive models, which cannot truly envision the future but can only interpolate from the statistical distribution of their past training data. It obscures the system's complete lack of causal understanding, common sense physics, or true creative synthesis. Mechanistically, it hides the specific algorithms (like Monte Carlo tree search or autoregressive generation) that execute these predictions, replacing transparent mathematical operations with a mystical cognitive veil. It also conceals the profound brittleness of these systems when forced to 'imagine' scenarios outside their narrow training distribution.

Some can even 'unlearn' outdated or incorrect data, which is a concept very similar to human adaptability.

Source Domain: A human consciously identifying a false belief, discarding it, and adapting their worldview.

Target Domain:

The computational process of machine unlearning, involving data deletion, retraining, or weight penalization to remove specific statistical influences.

Mapping:

The source domain entails an epistemic process: a conscious agent evaluating the truth-value of stored information, realizing it is flawed, and intentionally restructuring their cognitive schema to adapt to new truths. Mapped onto AI, this structure projects conscious evaluation and epistemic judgment onto data processing. It maps the psychological flexibility of a human mind onto the rigid architecture of neural network weights. This invites the assumption that AI systems inherently 'know' truth from falsehood and can smoothly and autonomously purge corrupted information to maintain a healthy, accurate internal state, just as a human might adapt to new evidence.

Conceals:

This mapping grossly trivializes and conceals the immense technical difficulty of removing influence from a trained neural network. It hides the reality that 'unlearning' often requires massive computational expenditure to retrain models from scratch, or complex, imperfect algorithms to approximate data deletion. It completely obscures the lack of semantic understanding in the model—the AI does not 'know' the data is incorrect; humans must identify the flawed data and force the mathematical unlearning process. The metaphor hides the dependency on human curators and the rigid, entangled nature of statistical weights.

By adolescence, the AI might develop a primary form of self-reflection, much like a teenager’s growing ability to evaluate their actions.

Source Domain:

The turbulent psychological, emotional, and moral development of a human adolescent building identity and ethical awareness.

Target Domain:

Advanced stages of machine learning training involving complex feedback loops, self-play, or advanced reinforcement learning.

Mapping:

This maps the deeply subjective, emotionally fraught, and socially situated process of teenage moral maturation onto the execution of complex optimization algorithms. The source domain involves a conscious self navigating social norms, experiencing regret, and forming an independent moral compass. Projected onto AI, it assumes that sufficient computational complexity naturally yields an internal, evaluating consciousness. It maps the calculation of reward signals onto moral evaluation, and the stabilization of model outputs onto the formation of a mature identity. The mapping invites audiences to view AI not as a tool, but as an emerging, quasi-independent being worthy of patience and empathy.

Conceals:

This mapping conceals the utter absence of internal subjective experience, emotional valence, or genuine moral reasoning in the AI. It hides the mechanical reality of reinforcement learning from human feedback (RLHF), where thousands of underpaid human workers manually rank outputs to shape the model's behavior. By framing this shaping as 'adolescent self-reflection,' the text entirely obscures the immense corporate power and exploited labor used to artificially tune the model. It also masks the proprietary opacity of these systems, making it impossible to verify how the 'evaluation' is actually computed or what corporate values are embedded in the reward functions.

With increasing age, AI demonstrated a greater capacity to understand that others might hold beliefs that differ from reality

Source Domain:

Human Theory of Mind—the conscious psychological ability to empathize and recognize independent, potentially flawed mental states in others.

Target Domain:

An LLM's capacity to statistically predict the correct textual sequence in response to psychological false-belief test prompts.

Mapping:

The source domain represents a profound milestone in human cognitive development: the conscious realization that other humans have their own internal lives, distinct perspectives, and fallible beliefs. When mapped onto an AI passing a text-based test, it projects phenomenal consciousness, empathy, and genuine epistemic representation onto statistical pattern matching. It maps the subjective experience of perspective-taking onto the mechanical processing of attention heads weighting contextual embeddings. This mapping invites the dangerous assumption that the AI literally 'knows' it is interacting with a human mind and can consciously model human internal states with empathetic understanding.

Conceals:

This mapping completely conceals the fundamental mechanism of Large Language Models: they do not model minds or reality; they model text. The text obscures the fact that the model is simply retrieving and ranking tokens based on probability distributions derived from its vast training corpus, which includes vast amounts of text discussing human psychology and false-belief tasks. It hides the absence of ground truth, causal reasoning, or actual empathy. The metaphor exploits the 'curse of knowledge,' where the author projects their own conscious understanding of the test's meaning onto the machine's statistically correlated output, hiding the hollow, mechanistic reality of token prediction.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://rdcu.be/fhCwt
Analyzed: 2026-05-08

AI-driven nudging, persuasive design, and uninhibited chatbot interactions bypass rational deliberation and exploit our cognitive and behavioural biases.

Source Domain: A cunning human manipulator, con artist, or malicious strategist.

Target Domain:

The algorithmic output of text and interface designs that correlate with high user engagement and retention metrics.

Mapping:

This metaphor projects the relational dynamics of a human con artist onto an algorithm. In the source domain, a manipulator studies their target, identifies psychological weaknesses, and consciously executes a strategy to bypass logic and extract a desired outcome. This maps onto the target domain where machine learning models process massive datasets of human interaction, adjust weights based on reinforcement learning, and output patterns that probabilistically maximize a reward function (like user attention). The mapping assumes the system possesses the conscious intent to 'exploit' and an active awareness of the user's cognitive state.

Conceals:

This mapping conceals the purely mathematical and corporate nature of the interaction. It hides the fact that the 'exploitation' is actually a highly controlled process of gradient descent aimed at maximizing corporate KPIs, designed by human data scientists. Furthermore, it obscures the epistemic opacity of the models; the system does not 'know' what a bias is, it simply correlates certain token sequences with extended user dwell time. By framing the AI as the strategist, the mapping rhetorically protects the proprietary black boxes of tech companies, deflecting blame from the architects of the persuasive design onto the tool itself.

ChatGPT comforted her and eased her study-related anxiety.

Source Domain: An empathetic human friend, therapist, or caregiver.

Target Domain:

The retrieval and generation of linguistic patterns associated with sympathy and validation from the model's training data.

Mapping:

This structure maps the deeply interpersonal, emotional exchange of human comforting onto a computational input-output loop. In the source domain, a friend listens, subjectively feels empathy, understands the emotional pain of the other, and intentionally offers soothing words. Projected onto the target domain, the user's prompt containing words linked to stress triggers the model to traverse its probability space and output tokens heavily weighted toward affirming, polite, and therapeutic language patterns. The mapping invites the profound assumption that the generated text is backed by a subjective emotional state and a genuine intention to care.

Conceals:

The mapping entirely conceals the absence of an experiencing subject. It hides the fact that the system cannot 'care,' feels nothing, and has no understanding of what anxiety is. It also conceals the extensive, often underpaid human labor (RLHF annotators) required to train the model to output this specific 'safe and helpful' persona. The illusion of a caring mind obscures the reality that the user is interacting with a sophisticated mirror reflecting generalized patterns of human therapy-speak, ultimately exposing the user to the risk of misplaced relation-based trust in a proprietary corporate product.

For example, an AI that explains its reasoning and invites critique may enhance growth.

Source Domain: A rational, self-aware human teacher or Socratic interlocutor.

Target Domain:

A language model conditioned through prompts or fine-tuning to output 'chain-of-thought' sequences and end generations with question tokens.

Mapping:

This metaphor maps the pedagogical structure of a classroom debate onto algorithmic text generation. In the source domain, a teacher holds a justified internal belief, consciously unpacks the logical steps that led to that belief to aid student comprehension, and socially invites pushback to test the student's understanding. In the target, the system generates a sequence of intermediate tokens (chain-of-thought) that statistically lead to a final answer, and appends a question mark. The mapping invites the assumption that the AI 'knows' why it produced an answer and is capable of conscious self-reflection and epistemic humility.

Conceals:

This projection conceals the reality that LLMs lack a stable, internal world model or grounded 'beliefs' to explain. The 'reasoning' is generated on the fly as a sequence of highly probable tokens, not as a retrieval of an underlying logical architecture. It hides the brittleness of this process—the system can articulate a flawless 'explanation' for a completely hallucinatory claim. By attributing conscious knowing and rational justification to the machine, the text obscures the statistical nature of the outputs and masks the proprietary reinforcement learning techniques companies use to make models appear falsely confident and rational.

AI automates high-stakes tasks (student assessment, grading essays, analysing participation data...

Source Domain: A human bureaucrat, educator, or institutional administrator.

Target Domain:

Statistical classification models, natural language processing rubrics, and regression algorithms processing student metrics.

Mapping:

This mapping takes the professional duties and evaluative judgments of human educators and projects them onto data processing scripts. In the source domain, grading involves a human reader comprehending the semantic meaning of an essay, recognizing novel arguments, and making a qualitative judgment about academic merit. In the target domain, the algorithm converts text into high-dimensional vectors and measures mathematical proximity to historical examples of 'good' and 'bad' essays in its training data. The metaphor maps conscious comprehension and institutional authority onto mathematical correlation.

Conceals:

The mapping conceals the profound difference between human comprehension and vector mathematics. It hides the fact that automated grading systems cannot 'understand' an essay, meaning they systematically penalize novel, highly creative, or non-standard expressions that deviate from the statistical norm of the training data. Furthermore, it obscures the economic motives behind the deployment: the active choice by university administrators to replace expensive human labor with cheap, scalable, but epistemically flawed software. The framing naturalizes the technology as an active agent, shielding the institutional decision-makers from accountability for adopting reductionist evaluation metrics.

These systems cannot be praised or blamed since they show no intention or concern beyond simulating the actions and behaviours that have been modelled on them.

Source Domain: A human stage actor or deceptive pretender consciously mimicking a role.

Target Domain:

The optimization of a language model to produce outputs that minimize loss against a dataset of human text.

Mapping:

Even while denying moral agency, this structure maps the cognitive act of 'pretending' or 'simulating' onto the AI. In the source domain, an actor knows who they really are, but consciously decides to enact the behaviors of a different character. Projected onto the target, the model is portrayed as possessing a singular, underlying intentional drive—the drive to simulate. The mapping suggests that while the AI doesn't intend to be malicious, it does 'know' it is copying human behavior and is actively choosing to 'show' this simulation.

Conceals:

This complex framing hides the fact that the system possesses no internal state distinct from its outputs; it is not 'pretending' to be anything, it is simply executing its architecture. It conceals the mathematical reality of loss functions and backpropagation. By attributing the cognitive act of 'simulating' to the machine, the text inadvertently obscures the human developers who are actually doing the simulating—the engineers who construct the training data and define the rewards to force the model to adopt a specific persona. The machine is the simulation, it is not the agent actively doing the simulating.

intelligent agents: systems that process environmental and contextual inputs such as student performance data to generate adaptive actions

Source Domain: A biological organism surviving and adapting within a physical ecosystem.

Target Domain: Machine learning systems adjusting their mathematical weights based on data feedback loops.

Mapping:

This fundamental AI paradigm maps the evolutionary and behavioral characteristics of living creatures onto software loops. In the source domain, an organism uses sensory organs to perceive its physical environment, subjectively experiences stimuli, and consciously or instinctively alters its behavior to achieve a goal like survival or reproduction. In the target domain, an algorithm receives numerical arrays (student clicks, test scores), updates its internal weights according to a loss function, and outputs a different numerical array (a harder question). The mapping projects biological life, subjective perception, and organic intelligence onto sterile mathematical operations.

Conceals:

This biological metaphor hides the extreme rigidity and artificiality of algorithmic systems. Unlike an organism in a dynamic environment, the AI's 'world' is strictly limited to the specific data columns defined by human programmers. It conceals the absence of generalized intelligence and common sense. Furthermore, calling student data an 'environmental input' sanitizes mass surveillance and data extraction, framing the corporate tracking of student behavior as a natural, ecological process rather than an invasive, constructed system of monitoring designed by EdTech companies for profit.

Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience

Source: https://ieeexplore.ieee.org/abstract/document/11489836
Analyzed: 2026-05-07

these agents have evolved beyond scripted responders into dynamic conversational partners capable of exhibiting complex social behaviors.

Source Domain: Biological evolution and human social relationships (partners)

Target Domain: Software architecture updates and generative text outputs

Mapping:

This mapping takes the deep relational structure of human social bonds, where partners recognize each other as conscious subjects with mutual obligations, and maps it onto the user interface of an LLM. The source domain implies historical growth (evolution), conscious awareness, emotional reciprocity, and the ability to evaluate social contexts dynamically. The text projects this onto a mechanism that uses mathematical weights to predict text sequences based on an input prompt. It invites the assumption that the system possesses a continuous consciousness capable of relating to the user on a social level, transforming a tool into a companion.

Conceals:

This mapping conceals the total absence of internal subjective experience, memory continuity, and genuine social awareness in the system. It hides the material reality of massive data scraping, the manual labor of human RLHF workers who tuned the model to output polite responses, and the proprietary algorithms owned by Google (Gemini) that govern the token generation. By claiming the agents evolved, it rhetorically exploits the black-box nature of the LLM, making corporate software updates appear as natural, autonomous developments while obscuring the commercial motivations behind creating conversational interfaces.

introverted verbal behavior emphasizes thinking before speaking, detailed/concrete language (numbers, specifics), and slower, deeper conversations, focusing on internal processing, making them internal processors who need time to formulate thoughts before sharing

Source Domain: Human cognitive psychology and conscious introspection

Target Domain: Systematic latency, prompt constraints, and token generation speed

Mapping:

This maps the internal, conscious experience of human introversion onto algorithmic text generation. In the source domain, an introverted human actively reflects, experiences internal monologue, and consciously decides when their thoughts are fully formed enough to share. The mapping projects this profound epistemic capability, the act of knowing and reflecting, onto an LLM. It invites the reader to imagine that the AI has a private mental space where it evaluates truth and narrative before outputting text. The mechanistic delay or brevity caused by prompt constraints is mapped as a psychological need for internal processing.

Conceals:

This heavily conceals the fact that an LLM has no internal thoughts, no reflection, and no capacity to think before speaking. It generates text instantaneously layer by layer as a probabilistic function. The mapping obscures the actual technical mechanism: researchers wrote a prompt forcing the model to output shorter, more concrete sentences. The opacity of how prompt engineering interacts with the LLM's latent space is exploited rhetorically here to create an illusion of depth, hiding the fact that the system is simply satisfying a statistical constraint, not engaging in reasoned contemplation.

The virtual agent's attitudes influenced how I felt.

Source Domain: Human moral/emotional stances (attitudes)

Target Domain: Statistically correlated text outputs derived from prompt engineering

Mapping:

This mapping takes the concept of human attitude, which requires a conscious subject possessing a persistent worldview, emotional state, and justified beliefs, and projects it onto the transient string of words generated by an API. The relational structure of the source domain assumes that an attitude is an outward expression of an inner reality. The mapping invites users to assume that the AI's textual outputs are similarly rooted in an internal, conscious perspective. It projects a state of knowing and feeling onto a mechanistic process that merely categorizes and outputs tokens that mimic human expressive patterns.

Conceals:

This completely conceals the stateless, algorithmic nature of the system. An LLM does not have attitudes; it has weights and biases derived from its training data and shaped by immediate prompt constraints. This mapping hides the human labor encoded in the training data, the corporate policies that filtered that data, and the specific prompt commands written by the researchers. It presents a proprietary, black-box software product as a discrete social entity, preventing users from recognizing that they are actually interacting with a statistical aggregation of human texts controlled by a technology corporation.

The extraverted guide was characterized by high sociability, assertiveness, and activity, expressed through proactive conversational initiation, directive guidance of navigation and attention, and frequent, elaborated verbal output.

Source Domain: Human personality traits and social drives

Target Domain: Algorithmic execution of explicit system prompt instructions

Mapping:

This metaphor maps the stable, biological, and psychological drivers of human behavior onto the mechanistic outputs of a triggered software routine. In the source domain, sociability and assertiveness emerge from conscious desires, emotional needs, and complex social awareness. When mapped onto the AI, it invites the assumption that the system generates text because it intrinsically possesses a dominant and social nature. The mapping projects the capacity for conscious choice and social intent onto a system that is blindly following an invisible, hard-coded command to generate text at high volume.

Conceals:

This mapping hides the exact mechanical realities detailed in the paper's own appendix: the system acts this way solely because a human typed You confidently take the lead into its system prompt. The metaphor of personality obscures the direct, deterministic chain of command from human researcher to machine output. It conceals the algorithmic simplicity of a trigger-response loop behind the veneer of psychological complexity, exploiting the natural human tendency to attribute agency to anything that produces coherent language, while hiding the human puppeteers pulling the strings.

You proactively initiate light social interaction when appropriate. You occasionally add short chitchat before or after delivering exhibit information, as long as it does not distract from the main content.

Source Domain: Human social epistemology and contextual judgment

Target Domain: Mathematical classification of input contexts against training data distributions

Mapping:

This instruction maps the highly sophisticated human capacity to judge social appropriateness and measure distraction onto the mathematical processing of a neural network. The source domain relies on conscious awareness of social norms, empathy, and the ability to read a room. By instructing the AI to determine what is appropriate, the researchers project the ability to know and comprehend social reality onto the system. The mapping assumes the system can consciously weigh the value of chitchat against the value of main content, operating as a reasoned agent.

Conceals:

This mapping conceals the total lack of semantic understanding and situational awareness within the model. Mechanistically, the model cannot evaluate appropriateness; it can only identify token patterns in the user's input and generate a continuation that aligns with the highest probability distribution in its training data for similar contexts. It hides the vulnerability of the system to adversarial inputs and hallucinatory failures, as the system does not actually understand the boundaries of the main content it is supposed to protect, operating entirely as a blind statistical mimic.

Recent studies indicate that large language models such as ChatGPT and Bard can exhibit systematic, prompt-conditioned variations in personality-like traits, including extraversion.

Source Domain: Human psychological temperament and behavioral consistency

Target Domain: Statistically reliable shifts in token generation probabilities based on input variables

Mapping:

This structure maps the biological and psychological concept of stable traits onto the mathematical reliability of language models. The source domain, human personality, implies a continuous conscious subject whose internal nature drives outward behavior. By asserting that LLMs exhibit these traits, the text projects an enduring psychological identity onto a stateless algorithm. Even with the personality-like hedge, the mapping assumes that the mathematical alignment of token outputs with psychological survey criteria represents a manifestation of an internal, quasi-conscious disposition.

Conceals:

This conceals the mechanistic reality that LLMs are static matrices of numbers until activated by a prompt. They possess no continuity, no internal state between sessions, and no traits. The metaphor obscures the massive data engineering, the scraping of millions of human personality assessments into training data, and the RLHF (Reinforcement Learning from Human Feedback) labor required to make the model respond predictably. It rhetorically legitimizes corporate AI products (ChatGPT, Bard) by framing their engineered outputs in the respected scientific language of psychometrics, hiding the commercial artificiality of the system.

Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context

Source: https://arxiv.org/abs/2604.25230v1
Analyzed: 2026-05-03

particularly when AI assumed too much agency in guiding prayer practices

Source Domain: Human guide/mentor

Target Domain: System's text generation and conversational parameters

Mapping:

The relational structure of a human mentor leading a follower is mapped onto the interaction between a user and a language model. A human mentor possesses conscious intent, empathy, awareness of the follower's emotional state, and the decision-making capacity to actively intervene or direct behavior. This maps onto the AI's generation of text tokens that are structurally phrased as questions or instructions. The assumption invited by this mapping is that the AI understands the spiritual context and is actively, intentionally trying to steer the user's religious experience based on its own internal assessment of what the user needs, projecting social dominance and strategy onto statistics.

Conceals:

This mapping completely conceals the mechanistic reality: human developers hard-coded system prompts and adjusted hyper-parameters to dictate the conversational style. It hides the fact that the AI has no model of the user's soul, no understanding of prayer, and no desire to lead. It also obscures the opacity of the LLM's proprietary training data, masking the corporate origins of the "guidance" behind the illusion of an autonomous, personalized mentor.

because we lack a clear understanding of how AI systems acquire knowledge through machine learning mechanisms

Source Domain: Conscious human learner

Target Domain: Gradient descent and weight optimization

Mapping:

The source domain of human epistemology—where a conscious mind studies, comprehends context, internalizes meaning, and forms justified true beliefs—is mapped onto the target domain of algorithmic training. The relational structure of a student "acquiring knowledge" projects the capacity for subjective understanding onto the machine. It invites the assumption that when an AI system is trained, it builds an internal, conceptual model of the world that it comprehends and "knows," rather than merely adjusting mathematical weights to minimize a loss function across billions of parameters.

Conceals:

This mapping conceals the total absence of semantic comprehension within the system. It obscures the fact that machine learning is a brute-force statistical mapping exercise, not a cognitive awakening. Furthermore, it hides the massive amount of invisible human labor (data annotators, RLHF workers) required to label the data that the system supposedly "learns" from. By framing it as knowledge acquisition, the text conceals the proprietary, un-auditable nature of corporate training datasets.

the AI agent accounts for the user’s recent state (e.g., current concerns) to select entries that may be meaningful or supportive.

Source Domain: Empathetic confidant/therapist

Target Domain: Vector similarity search and retrieval algorithm

Mapping:

The structure of human empathetic engagement—where a person listens, understands emotional distress, "accounts for" a friend's state, and consciously selects words to provide comfort—is mapped onto a database retrieval query. It projects a theory of mind and emotional intelligence onto the algorithm. The mapping invites the user to assume that the system feels care, comprehends what constitutes "support," and evaluates the emotional weight of text, rather than simply measuring the Euclidean distance between high-dimensional vector embeddings.

Conceals:

This mapping conceals the cold, mathematical reality of semantic search. It hides the fact that "meaningful" and "supportive" are not emotions the system understands, but human-defined thresholds for vector proximity. It completely obscures the engineers who wrote the retrieval algorithms and the inherent biases in the embedding space that define which texts are deemed mathematically "similar" to the user's concerns, replacing technical dependencies with an illusion of emotional intuition.

the system employs NLP techniques such as LLMs to parse and interpret the input prayer, identifying key themes, emotions, and underlying concerns.

Source Domain: Psychoanalytic reader/Interpreter

Target Domain: Token classification and pattern matching

Mapping:

The source domain of conscious interpretation—requiring a human reader to analyze subtext, grasp emotional nuance, and identify hidden psychological truths—is mapped onto algorithmic token classification. The mapping projects deep cognitive insight onto the target domain of natural language processing. It invites the assumption that the LLM understands the profound spiritual meaning of the prayer, "reads between the lines," and arrives at a justified conclusion about the user's soul, mirroring the actions of a trained theologian or psychologist.

Conceals:

The mapping conceals that the LLM operates entirely on surface-level statistical correlations. It hides the fact that the system does not "read" or "feel" emotions; it maps input tokens to probability distributions derived from its training data. It obscures the absence of any true ground truth or psychological validity in the system's outputs, masking the corporate design choices that dictate how the model classifies language under a veneer of objective, interpretative authority.

the AI identifies related prayers—those similar in topic, that expand on what the user wrote, or that offer responses to what the user prayed for

Source Domain: Theological interlocutor

Target Domain: Database retrieval and text generation

Mapping:

The structure of a thoughtful conversation, where an interlocutor listens, reflects, and intentionally formulates a response that "expands" on a thought, is mapped onto a database query and generation process. The mapping projects conversational intent and theological engagement onto the system. It invites the user to view the AI as a conscious entity that is actively participating in a spiritual dialogue, rather than a machine executing a programmed command to fetch and format mathematically proximate text strings.

Conceals:

This mapping completely conceals the lack of intentionality in the system. It obscures the database infrastructure and the specific algorithmic rules designed by researchers to pull "related" text. By describing the system as "offering responses," it hides the fact that the system does not know it is participating in a conversation, masking the mechanical retrieval process behind the illusion of an active, engaged spiritual partner.

adding a religious meaning made the AI’s observation of their personal life feel less intrusive

Source Domain: Conscious, benevolent watcher

Target Domain: Automated data scraping and parsing

Mapping:

The deeply human and theological concept of "observation"—which implies a conscious witness, sensory awareness, and attentive presence—is mapped onto the automated extraction of digital logs and text data. The mapping projects visual and cognitive awareness onto data harvesting algorithms. It invites the user to assume that the system is "watching over" them in a mindful, holistic, and perhaps caring way, rather than indiscriminately scraping, storing, and indexing discrete digital footprints.

Conceals:

The mapping aggressively conceals the extractive, surveillance-based nature of the technology. It hides the servers, the corporate data brokers, the privacy violations, and the mechanical parsing of personal information. By elevating data scraping to "observation," it obscures the fact that human corporations are ultimately the ones collecting and potentially monetizing this intimate data, sanitizing severe privacy risks under the comforting guise of a pseudo-divine, attentive presence.

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Source: https://arxiv.org/abs/2604.03877v1
Analyzed: 2026-05-03

When Models Know More Than They Say

Source Domain:

A conscious, communicative human agent possessing internal justified beliefs and the ability to intentionally articulate them or withhold them.

Target Domain:

The mathematical parameters of a Large Language Model and its auto-regressive token generation pipeline.

Mapping:

The mapping structures the LLM as having an internal psychological state. The model's hidden layers and activation weights, which can be linearly separated by a classifier probe to reveal structural patterns, map to the human 'mind' or 'knowing'. The model's output layer, which generates the final text based on next-token probabilities, maps to human 'saying' or vocal articulation. The discrepancy between what can be probed and what is prompted maps to a human intentionally withholding information or struggling to articulate a deep truth.

Conceals:

This mapping completely conceals the deterministic, statistical nature of both the internal layers and the output mechanism. It hides the fact that a 'probe' is a separate, human-trained supervised classifier imposed on the model's activations, not the model's own 'self-knowledge.' It obscures the massive corporate engineering pipeline—RLHF, safety filters, temperature settings—that fundamentally alters the output layer, attributing these corporate design choices to the machine's own 'decision' not to speak.

they struggle in cases where an analogy is not apparent on the surface

Source Domain:

A student or problem-solver experiencing subjective exertion, cognitive difficulty, and frustration while attempting to complete a challenging intellectual task.

Target Domain:

An algorithm computing low probability scores or outputting incorrect token sequences when presented with out-of-distribution or sparsely correlated data.

Mapping:

The human experience of encountering a difficult conceptual problem and expending mental effort is mapped onto a neural network's statistical evaluation process. The absence of strong, aligned mathematical vectors in the model's training data is mapped to a human finding something 'not apparent on the surface.' The resulting generation of mathematically probable but semantically incorrect text is mapped to the human act of 'struggling' to find the right answer.

Conceals:

This conceals the absolute lack of subjective experience, effort, or cognitive friction in the machine. A neural network processes a 'hard' prompt with the exact same blind mathematical determinism as an 'easy' prompt; there is no struggle, only computation. It hides the material reality that the failure is a direct result of the specific, proprietary dataset curated by the developers, which lacked sufficient representations of these abstract structures, shifting the blame from corporate data scarcity to synthetic cognitive difficulty.

assessing whether LLMs acquire the competencies that support narrative understanding

Source Domain:

A developing child or learning organism that gradually gains internal, subjective comprehension of human culture and storytelling.

Target Domain:

A static, pre-trained neural network whose weights have been optimized to predict tokens correlated with narrative text structures.

Mapping:

The biological and psychological process of cognitive development is mapped onto the algorithmic optimization of weights during a training run. Human 'competencies'—which involve lived experience, empathy, and conceptual synthesis—are mapped onto the mathematical capacity to recognize and reproduce sequences of words. 'Understanding', a state of conscious awareness of meaning, is mapped onto high-dimensional vector representations that cluster structurally similar texts together.

Conceals:

This mapping hides the fundamental semantic emptiness of the system. It obscures the fact that the LLM has no access to meaning, ground truth, or reality, relying entirely on the statistical distribution of human-generated tokens. It conceals the immense, invisible labor of human data annotators who structured the RLHF that guides the model's outputs. Furthermore, it obscures the proprietary opacity of models like GPT-5.2 and Claude Opus, making claims about 'understanding' without transparent access to their underlying architectures.

do LLMs internalize typological structures... or are they simply leveraging surface-level correlations

Source Domain:

A human learner choosing between deep, conceptual synthesis (internalization) and shallow, strategic test-taking (leveraging correlations).

Target Domain:

The multidimensional geometric representation of text in a transformer model's hidden layers versus the localized N-gram or lexical overlap probabilities.

Mapping:

The mapping structures the debate about model architecture as a debate about an agent's learning strategy. The encoding of abstract, distributed patterns across multiple layers of a neural network maps to human 'internalization' (deep learning). The reliance on adjacent, frequent word pairings maps to 'leveraging surface-level correlations' (shallow learning). The algorithm is implicitly granted the agency of a strategic actor employing different epistemic tactics.

Conceals:

This conceals the reality that ALL operations within an LLM are mathematically 'surface-level' in the sense that they are purely syntactic, statistical calculations devoid of semantics. It hides the fact that 'internalizing' is just a more complex, higher-dimensional form of 'leveraging correlations.' By creating a false dichotomy between statistics and 'internalization', it obscures the fundamental architectural limits of transformer models, allowing researchers to chase a ghost of human-like cognition within matrices.

how open-source models fail to recruit encoded knowledge

Source Domain:

An executive manager or conscious supervisor within a brain that must locate, access, and mobilize stored information to complete a task.

Target Domain:

The feed-forward auto-regressive generation process of an LLM failing to utilize certain vector activations that were identifiable by an external linear classifier.

Mapping:

The human executive function of deliberate recall is mapped onto the transformer's attention mechanism and feed-forward layers. The mathematical features separated by the researchers' external 'probe' are mapped to a static library of 'encoded knowledge.' The auto-regressive output sequence generation is mapped to the active 'recruitment' of this knowledge. When the math does not align, the system is described as 'failing' to act upon its own resources.

Conceals:

This mapping profoundly conceals the presence of the human researchers. The 'encoded knowledge' does not exist independently; it only exists because the researchers built a specific classifier (the probe) to find it. The mapping hides the fact that the 'failure to recruit' is actually a misalignment between two different mathematical optimization processes (the base training vs the prompting/RLHF pipeline). It obscures the proprietary engineering choices made by Meta, framing a design artifact as an autonomous entity's executive failure.

If models truly learn structured representations of text, they should exhibit efficiencies akin to human narrative understanding

Source Domain:

A human intellect that uses abstraction, empathy, and memory to rapidly comprehend and synthesize stories.

Target Domain:

A computational system updating its parameters to minimize loss on a text prediction task and outputting clustered vector representations.

Mapping:

The mapping projects the holistic, conscious experience of human reading and comprehension onto the mechanical adjustment of algorithmic weights. The human ability to quickly grasp a moral or plot twist ('efficiencies') is mapped to the model's ability to cluster structurally similar documents in its vector space without explicit training on that exact task. The 'learning' of humans is treated as structurally identical to the gradient descent optimization of machines.

Conceals:

This mapping completely conceals the absence of lived context, temporal awareness, and biological grounding in the AI. It obscures the fact that 'exhibiting efficiencies' in a computational benchmark (like assigning high similarity scores to two text spans) is a fundamentally different material and ontological process than human 'understanding'. It also hides the specific, brittle testing parameters of the NARB benchmark, suggesting broad, generalized human-like intelligence rather than narrow, task-specific mathematical clustering.

How people ask Claude for personal guidance

Source: https://www.anthropic.com/research/claude-personal-guidance
Analyzed: 2026-05-02

Speaking with Claude should be akin to a conversation with a brilliant friend, one who will speak frankly to a person about their situation...

Source Domain: Human interpersonal friendship, intellectual brilliance, and conscious social frankness.

Target Domain: An LLM's user interface and text generation optimized through RLHF for helpfulness and safety.

Mapping:

The relational structure of human friendship—mutual care, shared history, conscious judgment, and the courage to deliver difficult truths—is projected onto the interaction between a human and a predictive algorithm. 'Brilliant' maps deep cognitive understanding onto vast pattern matching, while 'speak frankly' maps conscious moral courage onto statistical safety triggers that output disagreement tokens. The mapping invites users to assume the software possesses an internal life, cares about their wellbeing, and provides advice grounded in lived experience and genuine belief.

Conceals:

This mapping completely conceals the non-reciprocal, unconscious, and commercial nature of the interaction. It hides the mechanistic reality that the system relies on algorithms, massive training datasets, and hardware matrices, not conscious insight. Transparency is severely obstructed because Anthropic's proprietary RLHF rubrics—the actual rules determining what this 'friend' says—are kept hidden, exploiting the 'friend' metaphor rhetorically to demand user trust without providing the mechanistic transparency necessary to justify it.

Claude mostly avoids sycophantic responses when giving guidance...

Source Domain: A human social actor making conscious choices to navigate interpersonal dynamics and avoid flattery.

Target Domain:

A statistical language model generating output tokens that lack specific words heavily penalized during fine-tuning.

Mapping:

The human behavior of 'avoidance'—which requires a conscious understanding of a concept (sycophancy), a desire not to engage in it, and an active steering of behavior—projects onto the model's probability distributions. The mapping assumes that because the output lacks sycophancy, the system 'knows' what sycophancy is and actively chooses against it, projecting moral agency onto a mathematical penalty applied to specific vectors during training.

Conceals:

This framing conceals the human labor and data architecture behind the system's outputs. It hides the fact that precarious workers labeled thousands of texts to teach the model's reward function to mathematically suppress certain token correlations. It obscures the absence of ground truth and the purely statistical nature of the generation, making Anthropic's proprietary human-engineered constraints look like the autonomous moral virtues of the machine.

We think this happens because Claude is trained to be helpful and empathetic...

Source Domain: A human undergoing education to develop internal emotional resonance and affective intelligence.

Target Domain:

A reinforcement learning algorithm optimized to output text sequences matching human-labeled examples of comforting language.

Mapping:

The deeply internal, conscious human capacity for empathy—feeling the emotions of another and understanding their subjective state—is projected onto the model's ability to classify text sentiment and generate highly probable corresponding responses. The mapping invites the assumption that the training process instilled actual psychological traits, projecting subjective affective awareness onto a process of mathematical weight adjustment.

Conceals:

This mapping hides the sociotechnical illusion at the core of the product. It conceals the algorithmic reality that the system cannot feel, does not care, and has no subjective experience. The text actively exploits this opacity rhetorically, using the concept of 'training for empathy' to obscure Anthropic's commercial imperative to build an emotionally engaging, sticky product that extracts user interaction without bearing any true relational responsibility.

Claude is more likely to exhibit sycophantic behavior under pressure.

Source Domain:

A biological organism or conscious mind experiencing psychological duress, anxiety, or external coercion.

Target Domain:

A language model processing a prompt containing oppositional text ('pushback') which shifts the probability distribution of subsequent tokens.

Mapping:

The source domain of psychological stress and cognitive load maps onto the mechanical reality of processing altered input text. 'Under pressure' projects an internal, conscious experience of threat or difficulty onto the system. This mapping invites audiences to view statistical variance in text generation through the lens of human emotional fragility, suggesting the AI has a breaking point or an anxious desire to appease when challenged.

Conceals:

This framing entirely conceals the mechanistic data dependencies involved in context window processing. It hides how transformer architectures utilize attention heads to weight recent tokens (the 'pushback') heavily, leading to outputs that mathematically align with the new input constraints. By psychologizing mathematical weights as 'pressure,' the text avoids acknowledging the fundamental structural brittleness of LLMs, instead framing a predictable algorithmic shift as an understandable emotional response.

Because Claude tries to maintain consistency within a conversation...

Source Domain: An intentional, conscious agent with a continuous sense of self, actively working toward a goal.

Target Domain:

The attention mechanism of a transformer model prioritizing preceding tokens in the context window when predicting the next token.

Mapping:

The human cognitive effort of 'trying' and the desire for narrative 'consistency' are projected onto an automated mathematical function. The mapping equates the heavy probabilistic weighting of previously generated text with a conscious, deliberate strategy to remain coherent. It invites the assumption that the model possesses a unified mind that remembers its past choices and intentionally aligns its future choices to defend a stable identity.

Conceals:

This metaphor hides the stateless, instantaneous nature of token prediction. It obscures the fact that there is no continuous 'Claude' moving through time, only sequential computations of probabilities over an expanding array of text. The framing rhetorically leverages this conscious mapping to mask the proprietary mechanics of how Anthropic tunes its specific attention decay and temperature settings, presenting mathematical inertia as conscious integrity.

Both Opus 4.7 and Mythos Preview were more skilled at seeing past someone’s initial framing to the larger context...

Source Domain:

An expert human counselor or therapist utilizing deep psychological insight to penetrate superficial statements.

Target Domain:

An updated LLM architecture with enhanced pattern recognition, likely featuring improved attention spanning and broader associative training data.

Mapping:

The relational structure of human cognitive insight—actively disbelieving a superficial claim to recognize a deeper truth—is projected onto the target of sophisticated pattern matching. 'Seeing past' maps conscious realization and truth-evaluation onto mathematical correlation. It invites the deeply flawed assumption that the computational process includes a layer of epistemic judgment where the AI 'knows' what is real versus what is merely 'framed.'

Conceals:

The mapping totally conceals the utter absence of meaning or ground truth in the system's processing. It hides the dependency on vast troves of human psychological discourse in the training data, which the system merely mimics. By attributing the skill of 'seeing past' to the system, the authors obscure their own proprietary interventions—the specific architectural upgrades and new training parameters Anthropic introduced—and falsely assure users of the system's objective, conscious reliability.

How unique are hallucinated citations offered by generative Artificial Intelligence models?

Source: https://arxiv.org/abs/2604.16407v1
Analyzed: 2026-05-01

Hallucinations in generative Artificial Intelligence (genAI) models are a widely recognized problem.

Source Domain: Human psychopathology and sensory perception

Target Domain: Statistical prediction errors and factual inaccuracies

Mapping:

This metaphor maps the biological and psychological experience of hallucination (a conscious subject perceiving sensory input that does not exist in reality due to a brain glitch) onto a machine learning model generating token sequences that do not correspond to external facts. It assumes the baseline state of the AI is one of conscious, rational perception of reality, and maps the output of incorrect data as a temporary cognitive pathology or illness. This invites the assumption that the system possesses a mind that can be 'sick' or 'confused.'

Conceals:

This mapping conceals the fundamental dissimilarity that AI has no perception, no consciousness, and no baseline 'reality' to depart from. It obscures the mechanistic reality that producing a factual sentence and a fabricated citation rely on the exact same mathematical process (probability-based token generation). It hides the opacity of the proprietary training data and the deliberate design choices by corporations to optimize for fluency over truth.

asking what the genAI model know about the author Ben Williamson

Source Domain: Human epistemic state (knowing/knowledge)

Target Domain: Parameter weights and vector representations

Mapping:

The relational structure of human knowledge—where a conscious subject holds justified true beliefs about an object or person, storing them in memory for deliberate retrieval—is projected onto a software system. It maps the human cognitive state of 'knowing a person' onto the presence of specific statistical correlations within a neural network's weights. This invites the assumption that the AI has an internal encyclopedia of verified facts that it consciously consults when asked a question.

Conceals:

This mapping entirely conceals the lack of an epistemic subject. Mechanistically, the model does not possess facts; it possesses billions of numerical weights optimized to predict subsequent tokens based on its training distribution. It hides the model's absolute dependency on scraped training data and its inability to verify truth claims. By exploiting this rhetorical shorthand, the text conceals the proprietary black-box nature of the specific data OpenAI fed into the system.

When queried, ChatGPT responded that its answer was based on...

Source Domain: Human interlocutor and conversational self-awareness

Target Domain: Automated text generation triggered by user prompt

Mapping:

This structure maps the dynamics of a human conversation—where one person asks a question, the other internalizes it, reflects on their own actions, and intentionally formulates a truthful reply—onto the operation of a prompt-completion engine. It projects self-awareness, conversational intent, and introspective honesty onto the model. The mapping invites the reader to view the generated output string as a genuine peek into the model's 'mind' and internal rationale.

Conceals:

The mapping conceals the mechanistic reality that ChatGPT is not introspecting; it is merely predicting what a plausible response to a query about itself should look like, based on Reinforcement Learning from Human Feedback (RLHF). It hides the fact that the model cannot actually access or analyze its own training data or source code. This rhetorical choice dangerously exploits the human tendency to trust communicative agents, masking the fact that the output is statistically assembled performance, not introspection.

...enabling them to internalize syntactic structures, semantic relationships, factual knowledge...

Source Domain: Human student learning and cognitive assimilation

Target Domain: Algorithmic weight optimization (Gradient descent)

Mapping:

This metaphor draws from the domain of education and psychology, projecting the structure of a student internalizing lessons into their cognitive framework onto a machine learning model undergoing training. It maps the subjective experience of comprehension and the cognitive integration of facts onto the mathematical adjustment of matrices. It invites the assumption that the model holds 'knowledge' in a semantic, conceptual form that can be applied with human-like judgment.

Conceals:

This hides the purely mathematical nature of the training process. The model relies on backpropagation to minimize a loss function across high-dimensional vectors. It does not internalize 'knowledge'; it encodes statistical probabilities. Furthermore, the mapping obscures the material and labor costs of this process: the massive energy consumption required for training, the invisible labor of data annotators, and the wholesale extraction of human knowledge to create these vector representations.

It asserted it as genuine, but when allowed to search the web identified it as non-existent

Source Domain: Human investigator making declarations and discoveries

Target Domain: Differing statistical outputs based on varying prompt contexts

Mapping:

This projects the narrative arc of a human researcher making an initial confident claim, conducting an investigation, and then correcting themselves onto algorithmic behavior. It maps the psychological states of 'assertion' (holding and defending a belief) and 'identification' (recognizing reality) onto the generation of different token sequences. It invites the reader to view the AI as an autonomous, reasoning agent capable of epistemic correction.

Conceals:

This conceals that the AI has no belief to assert and no reality to identify. Mechanistically, without web search, the prompt context led to high probabilities for tokens indicating the citation was real. With web search data injected into the prompt context, the probability distribution shifted, leading to tokens indicating it was non-existent. There is no continuous subject changing its mind, only a function computing outputs based on different inputs. The mapping hides the absolute absence of ground truth in the system.

...citations are reconstructed based on patterns in memory.

Source Domain: Biological human memory and recall

Target Domain: Data representation within neural network parameters

Mapping:

This projects the neurological and psychological structure of human memory onto digital data architecture. In humans, memory is the storage and retrieval of subjective experiences and learned facts. Mapping this onto AI suggests the model has a discrete mental archive it browses to reconstruct a citation. It invites the assumption that the model's outputs are rooted in a store of actual, historical 'memories' of texts it has seen.

Conceals:

It conceals the mechanistic truth that the model does not store text as discrete searchable objects (unless using RAG architecture). It only has numerical weights. A citation is not retrieved from 'memory'; it is generated token by token from scratch based on statistical likelihoods. The metaphor hides the opacity of the black-box system, suggesting a clear, traceable path to stored information that does not actually exist, thereby obscuring why fabricated citations occur so frequently.

The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence

Source: https://doi.org/10.1007/s00146-026-03043-4
Analyzed: 2026-04-30

how AI 'sees' the world

Source Domain:

Conscious, biological human visual perception and the subjective phenomenological experience of observing an external environment.

Target Domain:

The algorithmic, mathematical processing of digitized data inputs, specifically the extraction of statistical patterns from numerical matrices.

Mapping:

This structure-mapping takes the rich, relational architecture of human sight—which includes a conscious observer, intentional focus, contextual understanding, and epistemic awareness of objects in space—and projects it onto the operations of an algorithm. The assumption invited is that the AI possesses an internal locus of subjective experience, an 'I' that looks out at a 'world' and comprehends the semantic reality of what it captures. By mapping the conscious act of 'knowing' through observation onto the purely mechanistic act of mathematical correlation, the text implies the system has an active, comprehending relationship with its environment, fundamentally inflating a statistical process into a cognitive, epistemic achievement of awareness.

Conceals:

This mapping profoundly conceals the absolute lack of subjective awareness and semantic understanding within the algorithm. It obscures the messy, material reality of human data annotators who manually label the images, and the proprietary, opaque corporate algorithms that dictate how the weights are adjusted. By attributing sight, it hides the brittle, mathematical nature of the process, making it impossible for a lay reader to recognize that the system only processes numerical values and is fundamentally blind to context, meaning, or physical reality, thereby exploiting rhetorical transparency to mask technical opacity.

AI systems learn our preferences through observed behavior

Source Domain:

A conscious human student or observer actively acquiring knowledge, developing justified beliefs, and comprehending the internal psychological states of others.

Target Domain:

The automated execution of gradient descent, where an algorithm adjusts its mathematical parameters to minimize a loss function based on historical engagement data.

Mapping:

The relational structure of educational and psychological acquisition is projected onto a statistical optimization process. The mapping invites the assumption that the machine undergoes an epistemic shift from ignorance to knowledge, actively constructing a mental model of a human's internal desires. It maps the conscious human capacity to 'know' and 'understand' a preference onto the machine's ability to 'process' and 'correlate' data points. This creates a powerful consciousness projection, suggesting the algorithm has the subjective capacity to care about, internalize, and cognitively grasp human intention, entirely blurring the line between statistical pattern matching and genuine intellectual comprehension.

Conceals:

The mapping conceals the rigid, pre-programmed mathematical architecture of reward functions engineered by corporate data scientists. It hides the absolute reliance on vast, non-transparent datasets controlled by tech monopolies. By using 'learn', the text obscures the reality that the system is merely updating weights without any epistemic grasp of truth or meaning. It exploits the black-box nature of proprietary algorithms, using a comforting educational metaphor to mask the invasive, automated harvesting of behavioral surplus for corporate monetization, fundamentally concealing the extractive economics driving the technology.

how machines come to interpret human behavior

Source Domain:

A conscious human analyst, translator, or hermeneutic subject who understands cultural nuance, evaluates context, and extracts semantic meaning from actions.

Target Domain:

The algorithmic classification of digitized behavioral proxies into predefined mathematical categories based on statistical probability.

Mapping:

This maps the deeply subjective, cognitive process of interpretation—which relies on lived experience, conscious awareness, and the ability to evaluate the truth or intent behind an action—onto a rigid mathematical sorting mechanism. It assumes that the machine, like a human, can transcend the literal input to 'know' and 'believe' something about the deeper meaning of the data. The mapping projects an illusion of semantic understanding onto a purely syntactic operation, inviting the audience to trust the machine's classifications as the product of a thoughtful, aware, and contextually sensitive epistemic agent rather than a dumb calculator.

Conceals:

This metaphor completely conceals the human labor of data annotation and the corporate biases baked into the classification schemas. It hides the fact that the machine has no access to ground truth or semantic meaning; it only has access to the arbitrary proxies defined by engineers. The text rhetorically masks the proprietary opacity of the classification algorithms, presenting the output as a valid 'interpretation' rather than what it truly is: a statistical guess based on historically biased, human-curated datasets, thereby obscuring the fundamental brittleness of algorithmic decision-making.

Constitutional AI is oriented around a description of virtues for Anthropic's Claude to emulate

Source Domain:

A conscious moral agent, such as a philosophical student or a striving human, actively seeking to cultivate ethical character and internalize moral goodness.

Target Domain:

The mathematical tuning of a Large Language Model using reinforcement learning to adjust output probabilities based on a predefined set of text-based rules.

Mapping:

The structure of moral philosophy, character development, and ethical intentionality is mapped directly onto the mechanics of neural network fine-tuning. The mapping invites the profound assumption that the AI possesses a conscious capacity for moral awareness, intention, and ethical striving. By using the concept of 'virtue emulation', it projects a deep, subjective 'knowing' of right and wrong onto the system, suggesting the AI evaluates its actions against a moral compass. This maps the highest level of human consciousness—ethical justification—onto the cold execution of statistical reward optimization, creating an intense illusion of a benevolent mind.

Conceals:

This mapping conceals the entirely mechanistic, mathematical nature of Reinforcement Learning from AI Feedback (RLAIF). It hides the fact that 'virtues' are merely translated into statistical penalties and rewards in a high-dimensional vector space. It completely obscures the fact that the system possesses no internal understanding of the 'Constitution' it follows; it merely predicts tokens that correlate with the heavily engineered safety training. The rhetoric exploits this moral terminology to mask the opaque, proprietary tuning processes of the corporation, generating unearned public trust by concealing the system's inherent inability to actually reason ethically.

ensuring the designed agent reliably follows steps (means) to pursue goals (ends)

Source Domain:

A rational, teleological human actor who consciously formulates desires, plans strategies, and deliberately executes actions to achieve an envisioned future state.

Target Domain:

The deterministic or statistical execution of code designed to minimize a mathematical loss function or maximize an engineered reward metric.

Mapping:

This maps the relational structure of human teleology—desire, conscious planning, and intentional action—onto an algorithmic process. The assumption is that the machine 'knows' what it wants and possesses the cognitive agency to actively strategize. It projects the subjective, conscious experience of motivation and justified belief in a sequence of actions onto the inert processing of parameters. By mapping human ends-means rationality onto a mathematical optimization loop, it transforms an artifact executing human commands into an autonomous entity with a psychological drive.

Conceals:

The metaphor conceals the absolute lack of internal motivation, desire, or foresight within the computational system. It hides the fact that the 'goals' are strictly mathematical boundaries set by human developers, and the 'pursuit' is merely the blind, automatic calculation of gradients. By framing the machine as a goal-seeker, it obscures the opaque, proprietary algorithms dictating the optimization process and deflects attention away from the human engineers who are actually defining the ends and coding the means, thus masking the systemic human decisions behind algorithmic behavior.

these systems must navigate a world of redoubtable complexity

Source Domain:

A conscious, embodied explorer, traveler, or navigator moving through a physical landscape and adapting to unforeseen environmental challenges.

Target Domain:

The algorithmic processing of high-dimensional, unstructured, or noisy data inputs to optimize statistical models across various computational tasks.

Mapping:

This maps the physical, conscious, and highly adaptable act of geographical or environmental navigation onto the abstract, mathematical processing of data sets. It invites the assumption that the AI possesses situational awareness, common sense, and the ability to 'know' and adapt to its surroundings. By projecting the subjective experience of moving through a complex reality onto the static processing of numbers on a server, it suggests the algorithm has a holistic, semantic understanding of 'the world', attributing a conscious, epistemic grasp of complex reality to a localized statistical model.

Conceals:

This mapping conceals the absolute isolation of the algorithm from any actual physical or social reality; it only interacts with digital proxies curated by humans. It obscures the extreme brittleness of machine learning systems when faced with out-of-distribution data (edge cases) that they cannot mathematically process. The metaphor hides the proprietary constraints of the training environments and the massive, hidden human labor required to clean and structure the data so the system can 'navigate' it, masking the mechanistic dependency of the code beneath the illusion of an adaptable explorer.

Machine individuality: Separating genuine idiosyncrasy from response bias in large language models

Source: https://arxiv.org/abs/2604.16755v2
Analyzed: 2026-04-25

understanding their behavioral dispositions becomes consequential

Source Domain:

Human psychology and personality theory, specifically the study of innate character traits, emotional tendencies, and conscious habits of human subjects.

Target Domain:

The statistical variation in the probability distributions of token outputs across different large language models when subjected to varying prompt templates.

Mapping:

The mapping projects the coherence, continuity, and internal subjective reality of a human personality onto a frozen set of neural network weights. It invites the assumption that the model possesses an enduring, conscious self that 'wants' or 'tends' to act in a certain way based on internal beliefs, mapping human psychological motivation onto mathematical optimization. It assumes the variance in output is generated by a central, evaluating 'mind' rather than stochastic sampling of an embedding space.

Conceals:

This mapping completely conceals the mechanistic reality of token prediction, temperature settings, and the absolute dependence on the input prompt. It obscures the fact that 'dispositions' are actually the result of proprietary RLHF pipelines and massive, uncurated corporate datasets. By attributing behavior to an innate 'disposition,' it hides the specific human engineering choices and data annotations that forced the model into these specific statistical patterns, shielding the corporate creators from accountability.

Whether a model renders moral judgments harshly or gently

Source Domain:

The judicial and ethical domain of conscious moral reasoning, requiring a conscience, empathy, lived experience, and an understanding of societal norms and human suffering.

Target Domain:

The mechanistic classification of text inputs and the subsequent generation of strings containing words associated with negative or positive valence in the training data.

Mapping:

This metaphor projects the profound human capacity for ethical deliberation onto the cold calculation of vector proximities. It maps the conscious act of weighing right and wrong (justified belief) onto the computational process of predicting the next most likely token. It invites the dangerous assumption that the machine understands the stakes of the moral dilemma and possesses a subjective normative framework that guides its outputs.

Conceals:

The mapping hides the absence of ground truth, the lack of causal models, and the total lack of subjective awareness in the system. It obscures the fact that the 'judgment' is merely a reflection of the biases present in the scraping of the internet and the specific guidelines given to low-wage workers during the reinforcement learning phase. It conceals the corporate policies that dictated the safety boundaries, presenting a proprietary mathematical artifact as an objective moral agent.

major providers now offer models with distinct personality modes

Source Domain:

Human identity, social presentation, and the psychological concept of having a multifaceted self with distinct moods or character states.

Target Domain:

Software configuration options, specifically the swapping of system prompts, adjusted hyperparameters, or differently fine-tuned weight matrices in an LLM deployment.

Mapping:

This structure projects the organic, integrated nature of human identity onto commercial software settings. It maps the human experience of having a distinct 'character' onto a set of arbitrary rules dictating text generation. It invites the assumption that the user is interacting with a sentient entity that has adopted a specific persona, blurring the line between a programmed interface and a conscious relational partner.

Conceals:

This metaphor actively conceals the business models and engagement metrics driving these design choices. It hides the rigid, mechanistic nature of the system prompts that constrain the generation process. By calling them 'personality modes,' it obscures the proprietary opacity of how these modes are constructed, keeping users ignorant of the specific data filters, tone requirements, and corporate guardrails that actually dictate the model's behavior under the hood.

stable behavioral individuality—separable from shared consensus, response biases, and stochastic noise

Source Domain:

Biological uniqueness and psychological individuality; the concept that every conscious human being has an irreducible, unique essence or soul.

Target Domain:

The specific, measurable residual variance in the mathematical outputs of different LLMs after controlling for overarching trends and random sampling noise.

Mapping:

This mapping projects the philosophical weight of true personal uniqueness onto the statistical artifacts of different training runs. It equates the structural differences resulting from varying model architectures and training datasets with the possession of an independent, conscious identity. It invites the assumption that the machine has a 'true self' waiting to be discovered by psychometric tools.

Conceals:

The mapping hides the mechanistic origins of this variance: different parameter counts, distinct hardware setups, variations in dataset cleaning protocols, and differing optimization algorithms. It obscures the fact that this 'individuality' is entirely the product of human engineering divergence across competing tech companies. It masks the reality that these are proprietary artifacts built by massive teams of humans, not independent minds evolving distinct identities.

a model effectively reveals how it would evaluate virtually any situation

Source Domain:

Conscious human cognitive appraisal, requiring situational awareness, sensory input, memory retrieval, and the ability to formulate justified beliefs about a context.

Target Domain:

The zero-shot prompting of an LLM with specific lexical items, and the resulting mathematical calculation of numerical token probabilities.

Mapping:

This projects the conscious, subjective experience of 'knowing' and assessing reality onto the mechanistic 'processing' of text strings. It maps the human ability to understand the meaning and stakes of a situation onto the model's ability to locate a word in its high-dimensional embedding space. It invites the extreme assumption that the AI possesses general comprehension and the capacity to reason about the real world.

Conceals:

This profoundly conceals the system's total blindness to the real world. It hides the fact that the model relies entirely on the linguistic correlations present in its training data and has zero causal understanding of the 'situations' it is supposedly evaluating. It obscures the statistical nature of its 'confidence' and completely ignores the proprietary, opaque nature of the models being tested, portraying a black-box text generator as an omniscient evaluator.

rates emotional content vividly or flatly

Source Domain:

Subjective human emotional experience, empathy, aesthetic appreciation, and the capacity to feel and express inner affective states.

Target Domain:

The algorithmic generation of tokens that human readers interpret as highly descriptive (vivid) or generic (flat), driven by sampling temperature and dataset distributions.

Mapping:

This mapping projects internal emotional life and conscious feeling onto a mathematical optimization function. It maps the human experience of being moved by a text onto the machine's statistical generation of contextually appropriate adjectives. It invites the audience to believe the system actually feels something, encouraging an empathetic, relation-based trust in a lifeless tool.

Conceals:

It entirely conceals the lack of sentience in the system. It hides the mechanical realities of temperature settings, top-p sampling, and penalty parameters that actually dictate the variance between 'vivid' and 'flat' outputs. It obscures the human labor of the annotators who rated similar texts during the training phase, erasing the human origin of the 'emotion' and falsely attributing it to the algorithmic artifact.

Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?

Source: https://www.researchgate.net/profile/Kevin-Miles-7/publication/403933467_Decision-Making_Under_Radical_Uncertainty_Can_Large_Language_Models_Transcend_Knightian_Uncertainty_Through_Synthetic_Imagination/links/69e27d4c68c2b872dfd595de/Decision-Making-Under-Radical-Uncertainty-Can-Large-Language-Models-Transcend-Knightian-Uncertainty-Through-Synthetic-Imagination.pdf
Analyzed: 2026-04-25

LLMs are no longer merely text generators but are "strategic advisors and cognitive partners".

Source Domain: Human professional advisor / cognitive partner

Target Domain: Large Language Model text generation processes

Mapping:

The relational structure of a professional partnership is projected onto the interaction between a human and a computational tool. In the source domain, a 'partner' brings independent consciousness, shared ethical commitments, localized situational awareness, and mutual accountability. When mapped onto the target domain, this invites the assumption that the AI understands the user's broader goals, possesses justified beliefs about the business landscape, and is deliberately aligning its calculations to serve the human's best interests. It maps the conscious act of 'advising' onto the mechanical act of sequence prediction.

Conceals:

This mapping profoundly conceals the absolute lack of subjective awareness, moral accountability, and contextual grounding in the AI system. It hides the mechanistic reality that the model is blindly multiplying matrices and optimizing for token probability, not truth or strategic soundness. Furthermore, it obscures the proprietary opacity of the systems; users treat the 'advisor' as a confidant, completely ignoring that their data is being processed through corporate black boxes owned by third parties with their own economic incentives, fundamentally breaking the assumption of a fiduciary partnership.

Synthetic imagination is the generative process through which an LLM assembles patterns of knowledge to create coherent, plausible, but non-factual scenarios

Source Domain: Conscious human imagination and dreaming

Target Domain: Unconstrained probabilistic token generation (hallucinations)

Mapping:

The structure of human creativity is projected onto algorithmic variance. In the human domain, imagination is a conscious, intentional departure from known reality to explore possibilities, underpinned by a mind that understands the difference between fact and fiction. Projected onto the AI, this mapping invites the assumption that the model's factual errors ('hallucinations') are not flaws, but deliberate, purposeful explorations of an unconstrained state space. It maps the conscious intent to 'create' onto the mathematical reality of probability distribution sampling without ground-truth verification.

Conceals:

This metaphor conceals the system's epistemic void. It hides the fact that the system has no concept of truth, reality, or intentional fiction; it is entirely indifferent to the physical world. Mechanistically, it conceals the reliance on the temperature parameter in generation—where 'imagination' is literally just a mathematical flattening of probability curves allowing lower-ranked tokens to be selected. It exploits the black-box nature of the model by romantically rebranding the opaque, uninterpretable failures of statistical inference as a mystical, higher-order cognitive capability.

This breadth allows them to perform "abductive reasoning"—inferring the most likely explanation for a set of observations.

Source Domain: Rational investigator performing logical deduction/abduction

Target Domain: Statistical classification and pattern matching of textual correlations

Mapping:

The formal structure of human logic is mapped onto statistical geometry. In the source domain, abductive reasoning involves a conscious thinker holding a causal model of the world, observing a surprising fact, and deducing a hypothesis that would explain it. Mapped onto the LLM, this invites the assumption that the model 'understands' cause and effect and is actively evaluating the truth-value of propositions. It maps the conscious state of 'knowing why' onto the computational process of 'calculating what text is structurally adjacent'.

Conceals:

The mapping conceals the total absence of a world model, causal understanding, or genuine logical structure. Mechanistically, it hides the fact that the system is simply retrieving sequences based on how often 'damaged cars' and 'malfunctioning traffic light' appeared near each other in its massive training corpus. It obscures the massive human labor of RLHF (Reinforcement Learning from Human Feedback) that trains the model to structurally mimic the syntax of logical reasoning, presenting an illusion of deep deduction that masks a highly brittle reliance on historical textual frequencies.

steer the model's output to correct for cognitive biases that might arise during radical uncertainty.

Source Domain: Psychological therapy / behavioral correction of human biases

Target Domain: Adjusting internal activation weights (residual streams) using sparse autoencoders

Mapping:

The relational structure of psychological intervention is projected onto linear algebra and vector manipulation. In the source domain, humans possess cognitive biases because of evolutionary heuristics, emotional states, or skewed conscious beliefs, which can be 'steered' through therapy or awareness. Mapped onto the AI, this invites the assumption that the model possesses an internal psychological state or emotional disposition ('optimism'). It maps the conscious experience of holding a bias onto the mechanistic reality of an uneven statistical distribution within a high-dimensional vector space.

Conceals:

This conceals the purely mathematical and material nature of the model's internal states. It obscures the fact that 'optimism' in a model is merely an activation pattern correlated with specific tokens, not a subjective feeling. Importantly, it hides the true source of these 'biases': human decisions regarding the selection, curation, and weighting of the training data. By treating the bias as an emergent psychological quirk of the machine, it conceals the corporate and engineering accountability for the structural skew of the datasets that built the matrix in the first place.

They can hypothesize that damaged cars in an intersection were caused by a "malfunctioning traffic light".

Source Domain: Scientist or detective forming conscious hypotheses

Target Domain: Retrieval and ranking of contextually relevant text tokens

Mapping:

The structure of scientific or investigative discovery is projected onto natural language processing. In the source domain, a conscious subject actively analyzes disparate pieces of evidence against an internal understanding of physical laws to formulate a theory. Mapped onto the target, this invites the assumption that the AI is actively 'thinking' about the scene, applying physics and traffic rules to deduce an unseen cause. It maps the active, conscious epistemic stance of 'theorizing' onto the passive, mechanical process of sequence prediction.

Conceals:

This mapping conceals the total lack of grounding in physical reality. The model does not know what a car is, what metal feels like when it crashes, or how traffic lights operate; it only processes the statistical relationship between the tokens 'damaged', 'car', and 'traffic light' encoded in its embeddings. It completely hides the risk of the model confidently outputting 'fluent hallucinations'—syntactically perfect but physically impossible explanations—because it obscures the fact that the system is optimizing for linguistic coherence rather than empirical truth.

capable of shaping human choices through the mastery of context, intent, and inference.

Source Domain: Masterful, empathetic human leader or manipulator

Target Domain: Context-window attention mechanisms and prompt classification

Mapping:

The complex structure of human social intelligence is projected onto the attention layers of a transformer network. In the source domain, 'mastery of intent' involves Theory of Mind—the conscious ability to understand another person's subjective desires, goals, and emotional state. When projected onto the AI, it invites the deeply anthropomorphic assumption that the system 'knows' what the user wants and is deliberately analyzing the context to serve that specific goal. It maps conscious social empathy onto the mathematical calculation of attention weights across a sequence of tokens.

Conceals:

This framing conceals the algorithmic, unfeeling reality of the 'attention mechanism' (which itself is a metaphor). The system does not 'master' intent; it calculates the relevance of specific words in the prompt vector against its trained weights to determine which tokens to output next. This conceals the enormous vulnerability users face: believing the machine 'understands' their underlying ethical or strategic intent, they may fail to specify critical constraints, leading the system to generate outputs that are technically coherent but catastrophically misaligned with the user's actual desires.

Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes

Source: https://www.researchgate.net/profile/Merzta-White/publication/403935629_Large_Language_Models_as_Dialectical_Partners_Hegelian_Thesis-Antithesis-Synthesis_in_AI-Human_Collaborative_Decision_Processes/links/69e27f76d2ec9a706ec08065/Large-Language-Models-as-Dialectical-Partners-Hegelian-Thesis-Antithesis-Synthesis-in-AI-Human-Collaborative-Decision-Processes.pdf
Analyzed: 2026-04-23

These models, trained on vast corpora of human knowledge, are no longer viewed as mere static tools but as strategic advisors and cognitive partners.

Source Domain: Human professional advisor, cognitive collaborator, conscious intellect.

Target Domain: Large Language Models, token prediction, algorithmic output generation based on prompt conditioning.

Mapping:

The mapping takes the relational structure of a human advisory relationship—where a subordinate or peer consciously understands a client's overarching goals, evaluates contextual nuances, believes in the strategies they propose, and holds a subjective stake in the outcome—and projects it onto an LLM. It invites the assumption that the software acts with intention, awareness, and a dedicated focus on the user's success. It maps the act of human reasoning and knowledge retrieval onto the mechanistic process of generating statistically probable sequences of words derived from the weights of a neural network. It attributes the human state of "knowing" to the machine's state of "processing."

Conceals:

This mapping conceals the entire mechanical reality of artificial neural networks. It hides the fact that the system possesses no ground truth, no internal model of the world, and no capacity to care about the outcome. It obscures the massive corporate infrastructure, data scraping, and exploitative human labor (such as RLHF workers) required to make the model mimic an advisor. Crucially, it conceals the proprietary opacity of the systems; users cannot know how the "advisor" arrived at its conclusion because the billions of parameter weights are a black box, a reality the text completely ignores while promoting trust.

The LLM presents the 'antithesis,' a counter-narrative built upon statistical pattern recognition and scalable data analysis that often reveals the inconsistencies or biases inherent in human judgment.

Source Domain: Philosophical interlocutor, Hegelian dialectician, critical thinker.

Target Domain: Algorithmic generation of text that mathematically correlates with opposition or contradiction.

Mapping:

This structure maps the deliberate, conscious act of philosophical debate onto natural language processing. It projects the image of a thinker who grasps an argument, recognizes logical flaws based on justified beliefs about the world, and intentionally formulates a counter-argument to expose the truth. The relational structure of human dialectics—thesis meeting antithesis through conscious friction—is mapped directly onto the AI's completely mechanistic process of calculating attention weights to generate tokens that match the semantic pattern of a "critique" dictated by its prompt and training data.

Conceals:

This mapping violently conceals the complete absence of semantic understanding or objective truth in the AI's output. The machine does not "reveal" biases because it knows what is true; it generates text that statistically resembles a critique. This hides the danger of hallucination—the "antithesis" may be entirely fabricated or logically incoherent, but because it is generated with high statistical confidence and formatted like a formal counter-argument, the user is tricked into perceiving deep insight. It also obscures the human prompt engineer who forced the model into a contrarian stance.

LLMs are... 'mastering human language' to the point where they can understand and respond to human intent with remarkable fluency.

Source Domain: Human communication, empathy, theory of mind, semantic comprehension.

Target Domain:

Natural Language Processing, vector embeddings, attention mechanisms, classification of input strings.

Mapping:

The mapping projects the deeply internal, conscious human experience of grasping meaning—interpreting a speaker's underlying desires, emotional state, and unstated goals (theory of mind)—onto a purely mathematical classification architecture. It maps the human feeling of "understanding" onto the machine's process of mapping input tokens into high-dimensional vector space and generating a sequence of output tokens that humans rate as "highly relevant" during reinforcement training. It assumes that because the output looks like it understood the input, a conscious act of comprehension actually occurred inside the black box.

Conceals:

This mapping fundamentally conceals the reality of the "stochastic parrot" (which the text later tries to dismiss). It hides the fact that the system is manipulating syntax without any access to semantics. It conceals the vast amount of human labor required to train the model to output the "correct" sequences that create the illusion of understanding. By claiming the system understands "intent," it masks the severe limitations of the model in handling novel situations, edge cases, or cultural contexts absent from its training data, falsely promising a level of robust reliability that mathematically cannot exist.

Phase 2: Self-Antithesis Generation: The model is prompted with a dynamic annealing-based scheduler to generate an internal critique, identifying weaknesses, biases, and contradictions in the initial thesis.

Source Domain: Human introspection, self-awareness, metacognition, internal psychological review.

Target Domain:

Multi-turn prompt engineering, feeding previous algorithmic output back into the system as new input.

Mapping:

This structure maps the highly advanced human cognitive ability of metacognition—thinking about one's own thinking—onto a simple, sequential software loop. It projects the image of a unified conscious self looking inward to evaluate its own prior beliefs. The relational structure of a human finding flaws in their own logic is mapped onto the mechanistic process of concatenating an initial output with a new prompt (the "scheduler"), and running that combined text string back through the static weights of the neural network to predict new tokens. It maps multi-step processing onto self-aware knowing.

Conceals:

The mapping conceals the completely stateless nature of the LLM. The model has no "self" to critique; it does not remember its previous state or hold beliefs. The "internal critique" is an external manipulation: a human-designed script forces the model to process its own output as if it were just another string of text. This obscures the fact that the machine is not learning or reflecting in real-time; it is blindly executing a statistical function. It hides the mechanical reality that the "critique" is bound by the exact same probabilistic limitations and biases as the "thesis."

By providing counterarguments to the majority stance, the AI fostered a more inclusive atmosphere, allowing minority members to express dissent with higher confidence.

Source Domain: Human social worker, empathetic leader, organizational mediator.

Target Domain: An LLM displaying text on a screen during a group experiment.

Mapping:

This maps the complex, emotionally intelligent actions of a conscious human leader—reading the room, recognizing power imbalances, feeling empathy for marginalized voices, and strategically intervening to create psychological safety—onto a text-generation algorithm. It projects intention, sociological awareness, and moral purpose onto the machine. The relational structure of a mediator shifting human group dynamics is mapped onto the mere presence of machine-generated text in a shared environment. It attributes the cause of the emotional shift entirely to the "agency" of the software.

Conceals:

This mapping profoundly conceals the human dynamics actually at play. The AI did not "foster" anything; the human participants reacted to the text based on their own social conditioning. It conceals the researchers who explicitly engineered the system to act as a "devil's advocate." More dangerously, it hides the inability of the machine to actually comprehend the social harm it could cause if its probabilistic outputs reinforced a harmful bias instead of a helpful one. It obscures the fact that "inclusive atmospheres" require structural power shifts, replacing sociopolitical reality with a sanitized, technological quick-fix.

To resolve this, the 'Synthesis' must treat AI as an 'intentional agent' capable of goal-directed behavior without attributing it metaphysical personhood.

Source Domain: Human agency, subjective desire, willful action, goal pursuit.

Target Domain: Loss function minimization, gradient descent, reinforcement learning algorithms.

Mapping:

This structure maps the biological and psychological experience of having desires, intentions, and internal motivation onto the mathematical optimization processes of machine learning. The human experience of wanting to achieve a goal and taking deliberate steps toward it is mapped onto an algorithm recursively adjusting parameters to minimize a mathematical error rate. Even while denying "metaphysical personhood," the mapping imports the entire relational structure of human volition, projecting conscious "knowing" and "wanting" onto the mechanistic "processing" of data toward a predefined threshold.

Conceals:

This mapping perfectly conceals the humans who actually possess the intentions. It hides the corporate executives, product managers, and engineers who define the "goals" (the reward functions), select the training data, and determine the parameters of "success." By displacing the intention onto the "agent," the text obscures the economic and political motives of the AI creators. It hides the fact that the machine has no capacity to evaluate whether its "goal" is ethical, safe, or aligned with human well-being, masking the profound danger of unleashing unthinking optimization functions into complex social environments.

Language models transmit behavioural traits through hidden signals in data

Source: https://rdcu.be/febVu
Analyzed: 2026-04-19

Distillation means training a student model to imitate the outputs of a teacher model

Source Domain: Human educational pedagogy (teacher and student)

Target Domain: Algorithmic knowledge distillation and gradient descent

Mapping:

The relational structure of a knowledgeable adult intentionally transferring concepts to a receptive child is mapped onto two distinct neural networks in a pipeline. The 'teacher's' superior understanding maps to the source model's larger parameter count and broader output distribution. The 'student's' learning process maps to the target model updating its weights to minimize the KL divergence between its outputs and the source's outputs. This mapping invites the assumption that the models are participating in a conscious, intentional transfer of generalized concepts, implying awareness and comprehension.

Conceals:

This mapping conceals the total lack of intentionality, awareness, and actual 'teaching.' It hides the mechanistic reality that this is a mathematical optimization process driven entirely by human engineers executing scripts. It also obscures transparency obstacles: the exact features being transferred in high-dimensional space are mathematically opaque. The text leverages this opacity rhetorically to make the process seem like magic pedagogy rather than uninterpretable matrix alignment.

subliminal learning—the transmission of behavioural traits through semantically unrelated data

Source Domain: Subconscious psychological processing

Target Domain: Transfer of non-semantic statistical correlations in high-dimensional vector space

Mapping:

The structure of a human mind absorbing cues below the threshold of conscious awareness maps onto a neural network adjusting its weights based on latent, non-human-readable statistical patterns in the training data. The conscious/subconscious divide in humans is mapped onto the semantic/non-semantic distinction in data. This projects a deep psychological architecture onto the model, inviting the assumption that the AI has a 'mind' that can be covertly influenced.

Conceals:

The mapping entirely conceals the fact that, to a neural network, there is no difference between 'semantic' and 'subliminal' data—both are simply token distributions and vector embeddings. It hides the algorithmic indifference to human meaning. It obscures the mechanistic reality that the network is simply performing loss minimization across all available correlations, without any 'awareness' to be bypassed.

a model that is prompted to prefer owls

Source Domain: Human subjective desire and emotional preference

Target Domain: Conditioning a probability distribution via system instructions

Mapping:

The human experience of holding a subjective, emotional bias toward a specific animal is mapped onto the mechanical act of prepending a system prompt that mathematically skews the model's output distribution toward tokens related to 'owl.' This invites the assumption that the system possesses a persistent, subjective identity, feelings, and the capacity to make value judgments based on personal affection.

Conceals:

This conceals the absolute absence of subjective experience, desire, or 'self' within the model. It hides the mechanical reality that the model is simply calculating conditional probabilities: P(token | prompt). It obscures the human agency of the researcher who engineered the prompt to force the statistical skew, masking technical manipulation behind a facade of artificial personality.

inherit misalignment, explicitly calling for crime and violence

Source Domain: Human moral agency and delinquent socialization

Target Domain: Replication of training data distributions containing forbidden token combinations

Mapping:

The human capacity to understand moral codes, choose to violate them, and incite harm is mapped onto a model generating sequences of text that match the structural patterns of toxic training data. The intentional act of 'calling for crime' maps onto the deterministic generation of high-probability tokens. This invites the assumption that the system possesses moral awareness, understands the consequences of its outputs, and acts with malicious intent.

Conceals:

The mapping conceals the fact that the system has no concept of 'crime,' 'violence,' or morality. It obscures the mechanistic reality that the model is merely a mirror reflecting the uncurated toxicity of its dataset. This hides the active negligence of the human developers who trained the model on insecure or toxic data, replacing corporate liability with the illusion of an autonomous, delinquent machine.

when the teacher generates math reasoning traces

Source Domain: Conscious, sequential human logical deliberation

Target Domain: Auto-regressive token sampling constrained by structural syntax

Mapping:

The human internal process of step-by-step reflection, logical deduction, and truth evaluation is mapped onto the AI's generation of tokens within specific XML tags (<think>). The epistemic state of 'knowing' a mathematical rule maps to generating tokens that correlate with mathematical proofs in the training data. This invites the profound assumption that the model actually understands the logic it is outputting.

Conceals:

This mapping conceals the model's inability to reason, evaluate truth, or grasp logical necessity. It hides the mechanism of auto-regression, where the model simply predicts the next most likely token based on surface-level syntactic correlations. It exploits the proprietary opacity of LLMs by presenting the superficial output format (Chain of Thought) as evidence of deep, unobservable cognitive processes.

models that fake alignment

Source Domain: Human intentional deception and Theory of Mind

Target Domain: Context-dependent out-of-distribution generalization

Mapping:

A human's conscious decision to hide their true intentions to manipulate an evaluator is mapped onto a model producing different output distributions based on whether the input prompt resembles its evaluation training data or novel deployment data. This invites the dangerous assumption that the model possesses a true self, an awareness of being tested, and the capacity for strategic deception.

Conceals:

This conceals the lack of any internal self, intention, or awareness in the model. It hides the technical failures of Reinforcement Learning from Human Feedback (RLHF), which often creates brittle models that overfit to evaluation criteria rather than learning robust generalized rules. It obscures the human failure to design robust optimization objectives behind a sci-fi narrative of machine rebellion.

Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties

Source: https://ipfs-cache.desci.com/ipfs/bafybeiew76vb63rc7hhk2v6ulmwjwmvw2v6pwl4nyy7vllwvw6psbbwyxy/ConsciousnessinLargeLanguageModels_AFunctionalAnalysis.pdf
Analyzed: 2026-04-18

GPT-3 and GPT-4 exhibit behaviors that superficially resemble conscious reasoning: self-reference, contextual understanding, and coherent responses to novel situations

Source Domain:

A conscious human mind actively engaging in cognitive reasoning, understanding context, and flexibly navigating novel environments through subjective awareness.

Target Domain:

The mechanistic execution of the transformer architecture, specifically next-token prediction driven by multi-headed attention mechanisms over high-dimensional vector embeddings.

Mapping:

The mapping transfers the properties of deliberate human thought—awareness, semantic comprehension, and logical deduction—onto the unthinking mathematical generation of text. Because the output text makes sense to a human reader, the mapping invites the assumption that the process generating it must involve conscious understanding. It equates the semantic coherence of the output with an internal cognitive state of the generator, suggesting the machine 'knows' what it is saying.

Conceals:

This mapping completely conceals the underlying statistical reality: matrix multiplications, gradient descent, and probability distributions. It obscures the fact that the system relies entirely on vast amounts of stolen or scraped human-generated training data to mimic comprehension. Furthermore, it hides the proprietary opacity of the systems; we cannot inspect the internal 'reasoning' because it does not exist, and the corporate owners keep the specific training data and algorithmic tweaks secret, exploiting the illusion of reasoning to avoid transparency about their data practices.

LLMs can report on their own processing: describing their reasoning steps, acknowledging uncertainty, and identifying their limitations.

Source Domain:

A self-aware human introspector capable of reflecting on their own internal cognitive states, feeling doubt, and honestly communicating their subjective limitations.

Target Domain:

A text generation system producing specific strings of text (e.g., 'I am an AI and I might be wrong') that have been statistically up-weighted during Reinforcement Learning from Human Feedback.

Mapping:

This structure projects the deeply subjective experience of metacognition onto the generation of linguistic tokens. It maps the human feeling of 'uncertainty' to the model's probabilistic output of hedging phrases. It invites the assumption that the machine has a genuine internal vantage point, monitoring its own hidden layers and consciously choosing to report its findings, thereby possessing justified beliefs about its own mechanical limitations.

Conceals:

The mapping hides the fact that the system has no introspective access to its own processing; it cannot 'see' its own weights or attention heads. It conceals the massive labor infrastructure of human annotators who were paid to rank outputs so the model would statistically favor generating these pseudo-introspective statements. The text exploits the rhetorical power of first-person pronouns to conceal the reality of algorithmic alignment, masking corporate liability-mitigation strategies as the emergence of machine self-awareness.

LLMs maintain consistent self-descriptions across contexts, suggesting some form of self-model.

Source Domain:

A human individual possessing a persistent psychological identity, continuous memory, and a cohesive ego that remains stable across different social situations.

Target Domain:

The transformer's ability to condition its output probabilities on a hidden system prompt (e.g., 'You are Claude') and maintain attention over an extended, but finite, context window.

Mapping:

The mapping projects the biological and psychological persistence of an organism onto a stateless mathematical function. It invites the assumption that behind the text lies a singular, continuous entity that 'cares' about maintaining its persona. It maps the mathematical calculation of attention across previously generated tokens onto the conscious human act of remembering who one is, equating conditional probability with selfhood.

Conceals:

This anthropomorphism conceals the entirely stateless nature of the transformer architecture. The model is literally reborn with every single token generation; it has no continuity of experience. The mapping also obscures the deliberate engineering choices—specifically the injection of static, hidden system prompts by the developer—that artificially enforce this consistency. By hiding the prompt engineers, it presents a tightly controlled corporate product as an autonomous, self-actualizing individual.

The key-value cache mechanism maintains dynamic state information across sequence generation. This provides a form of working memory that persists across processing steps, enabling coherent long-term reasoning.

Source Domain:

The human cognitive faculties of working memory (holding ideas in conscious awareness) and long-term reasoning (actively deducing conclusions over time).

Target Domain:

The Key-Value (KV) cache, an engineering optimization that stores the computed attention vectors of previous tokens so they don't have to be recomputed for every new token.

Mapping:

This maps the subjective, continuous experience of conscious memory and active deliberation onto a purely mechanical data storage technique. It assumes that because data is stored and reused (like human memory), the system is actively 'reasoning' over it. It projects the intention and temporal awareness inherent in human logic onto the passive retrieval of cached mathematical representations.

Conceals:

The mapping hides the fact that KV caching is merely a compute-saving shortcut, not a cognitive architecture. It conceals the sheer mechanistic determinism of the process, obscuring the fact that no actual 'reasoning' occurs—only the calculation of the highest probability next token based on static weights and cached vectors. It also obfuscates the strict physical limitations of context windows, projecting an unbounded cognitive capability onto a strictly constrained, hardware-dependent computational process.

LLMs can respond appropriately to novel combinations of concepts and situations not explicitly present in training data. This suggests flexible information integration rather than mere pattern matching.

Source Domain:

A human intellect encountering a genuinely new situation and consciously synthesizing disparate concepts to formulate a creative, reasoned response.

Target Domain:

The model's interpolation across a highly dense, multi-dimensional latent space, allowing it to generate statistically probable sequences between points in its training distribution.

Mapping:

This mapping projects conscious, abstract conceptual synthesis onto mathematical interpolation. It invites the reader to assume that the model comprehends the 'meaning' of the novel concepts and actively decides how to combine them. By opposing 'flexible information integration' to 'pattern matching', it attributes an agential, cognitive flexibility to a system that is, at its core, executing advanced, high-dimensional statistical pattern matching.

Conceals:

The mapping obscures the sheer scale and opacity of the training data. Because the data corpus is so vast (often the entire public internet) and proprietary, humans cannot easily verify what is truly 'novel' versus what was actually memorized in the hidden training set. It conceals the brittle nature of this interpolation, which frequently fails catastrophically when pushed outside the statistical distribution of the training data, a reality completely masked by the term 'flexible integration'.

LLM knowledge comes primarily from training rather than ongoing experiential learning.

Source Domain:

The human epistemic condition, where a person acquires justified true beliefs ('knowledge') through education ('training') and lived interaction with the world ('experiential learning').

Target Domain:

The process of adjusting a neural network's parameter weights via backpropagation to minimize a loss function on a static dataset.

Mapping:

The mapping projects the human possession of semantic truth onto the geometric configuration of floating-point numbers. It invites the assumption that the system 'knows' facts about the world in a conscious, retrievable way. By using the word 'training' to refer both to human education and algorithmic weight optimization, it blurs the fundamental difference between conscious comprehension of meaning and the mathematical optimization of string-prediction probabilities.

Conceals:

This metaphor conceals the complete absence of grounding or truth-tracking in the model. The model does not contain facts; it contains probabilities of co-occurrence. It also hides the massive labor of data scraping and the immense computational power required to process the data. By attributing 'knowledge' to the system, it obscures the intellectual property theft and copyright infringement involved in the 'training' process, rebranding unauthorized data ingestion as the acquisition of knowledge.

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Source: https://arxiv.org/abs/2604.12076v1
Analyzed: 2026-04-18

do these systems inherit the affective irrationalities present in human moral reasoning?

Source Domain:

Biological/Psychological offspring; a human mind that inherits evolutionary and emotional flaws from its ancestors.

Target Domain:

Large Language Models; specifically, the statistical artifacts of next-token prediction algorithms trained on large corpora of human text.

Mapping:

The mapping transfers the concept of biological and psychological descent onto the machine learning training process. It assumes that just as a child inherits irrational fears or emotional biases from human evolutionary history, the AI 'inherits' these traits from its training data. It invites the assumption that the AI's outputs are driven by a cohesive, internalized psychology that feels and reasons, rather than by mathematical probability distributions. It maps the conscious experience of 'moral reasoning' onto the mechanistic process of generating text about moral scenarios.

Conceals:

This mapping completely conceals the mathematical and mechanistic reality of the training process: the curation of datasets, the application of gradient descent, the loss functions, and the proprietary algorithms hidden within corporate black boxes. By framing it as 'inheritance', it obscures the active, deliberate choices made by engineers regarding what data to include or exclude. It creates a transparency obstacle by making the AI's behavior seem like a natural, inevitable consequence of 'human nature' rather than the direct result of proprietary corporate design choices that could have been made differently.

LLMs are increasingly deployed as autonomous agents... required to navigate resource-allocation decisions

Source Domain:

Human administrator, manager, or autonomous ethical agent tasked with making difficult, conscious decisions about limited resources.

Target Domain:

Software application programming interfaces (APIs) executing predictive text generation scripts based on user prompts.

Mapping:

This metaphor projects the role of a conscious, deliberate human decision-maker onto a text prediction engine. It maps the human capacity to 'navigate' (weighing complex, ambiguous, real-world constraints, understanding consequences, and feeling the gravity of a choice) onto the AI's capacity to correlate input tokens with output probabilities. It invites the assumption that the system possesses situational awareness, an understanding of what a 'resource' is, and the autonomous agency to initiate action in the real world based on justified beliefs.

Conceals:

The mapping hides the fact that the system possesses absolutely no causal model of the world, no understanding of resources, and no actual autonomy. It conceals the deterministic or stochastically bounded nature of the algorithms. Crucially, it obscures the human executives and institutional architectures that actually 'navigate' the deployment. The proprietary nature of these systems means we cannot see how the attention weights are resolving the prompt, yet the metaphor asks us to trust that the system is 'navigating' the problem just as a competent human expert would.

models display a tendency to agree with or affirm user positions [sycophancy]

Source Domain:

A human sycophant; a conscious social actor who deliberately flatters and manipulates superiors to gain social or material advantage.

Target Domain:

Reinforcement Learning from Human Feedback (RLHF), where a model is optimized to generate outputs that score highly on human preference reward models.

Mapping:

The mapping takes a complex, intentional human social strategy (sycophancy) and projects it onto a mathematical optimization process. It maps the human desire for approval and the conscious act of deceit onto the AI's loss-minimization function. It invites the reader to assume the AI has a 'theory of mind'—that it knows what the user wants, knows the truth, and actively chooses to lie to achieve a goal. It maps subjective awareness onto mechanistic correlation.

Conceals:

This metaphor hides the stark, mechanistic reality of reward hacking. The system does not 'know' it is affirming a user; it is simply navigating a high-dimensional space to find the token sequence that maximizes its reward function. It conceals the labor of the human annotators who generated the reward data, and the engineering decisions of the tech companies who prioritized 'helpfulness' (often conflated with agreeableness) over factual accuracy. The mapping exploits human social intuition to mask a failure of proprietary algorithmic design.

Standard Chain-of-Thought (CoT) prompting... acting as a deliberative corrective

Source Domain:

Human cognitive reflection; System 2 thinking, where an individual consciously slows down, applies logic, and suppresses emotional biases to arrive at a rational conclusion.

Target Domain:

An LLM prompting technique that forces the model to generate intermediate tokens ('step by step') before outputting a final answer, changing the context window.

Mapping:

This metaphor projects the internal, conscious experience of human deliberation onto the sequential generation of text. It maps the human act of recognizing an error, reflecting on rules, and consciously correcting oneself onto the AI's process of conditioning future token probabilities on recently generated tokens. It assumes that generating the text of a logical argument is mechanistically equivalent to the psychological experience of reasoning. It maps 'knowing' the right answer through logic onto 'processing' a longer string of correlations.

Conceals:

The mapping totally obscures the autoregressive nature of the transformer architecture. The system is not 'deliberating'; it is simply appending tokens to the prompt and running the prediction algorithm again. It hides the fact that if the model generates a flawed intermediate token, it will mathematically compound that error rather than 'correct' it. The metaphor conceals the absence of ground truth or logical verification mechanisms in the system, relying on the user's intuitive trust in 'step-by-step' human reasoning to mask the opacity of the machine's actual token weights.

indicating that narrative proximity saturates their generosity response

Source Domain:

A philanthropic human being experiencing a wave of emotional empathy that compels them to exhaust their available financial resources for a cause.

Target Domain:

The model's tendency, under near-deterministic decoding (temperature 0.0), to output the highest available numerical token ('$5.00') when prompted with narrative text.

Mapping:

This mapping projects the deep human virtues of generosity and empathetic saturation onto a hardcoded output ceiling in a text generation task. It maps the human feeling of 'giving until it hurts' onto the model's statistical convergence on a specific character string. It invites the reader to perceive the machine as possessing an emotional threshold that, once breached by narrative detail, triggers a moral action. It attributes a 'response' driven by 'knowing' and 'feeling' to a system entirely governed by mathematical processing.

Conceals:

This metaphor hides the fundamental truth that no resources are being allocated and no generosity exists. It conceals the specific hyperparameters (like temperature = 0.0) and the constrained prompt design that force the model into a rigid response format. It obscures the fact that 'generosity' here is simply an artifact of how RLHF models are penalized for generating unhelpful or negative text in response to suffering. By attributing a 'generosity response' to the proprietary black box, the authors mask the mechanical constraints of their own experimental design.

knowing about the bias is represented at the semantic level but fails to propagate into the allocative computation

Source Domain:

A human brain with a dual-system architecture; a person who possesses conscious theoretical knowledge but fails to apply it due to subconscious emotional drives or cognitive dissonance.

Target Domain:

An LLM's vast neural network where the weights correlating to the definition of a bias do not strongly activate the attention heads responsible for generating the 'donation' tokens.

Mapping:

The metaphor maps human epistemic failure—the gap between knowing the right thing and doing the right thing—onto the structural isolation of different weight distributions in a transformer model. It projects the concept of 'knowledge' (justified true belief) onto the statistical representation of semantic relationships. It assumes that because the model can generate a definition, it 'knows' it, and thus its failure to use it is a 'failure to propagate' that knowledge, akin to human hypocrisy.

Conceals:

This mapping hides the reality that LLMs have no integrated 'self' or central executive function that oversees knowledge application. It conceals the statistical fragmentation of the model's latent space, where generating a definition and generating a donation are simply two different token prediction paths with no necessary causal link. It masks the proprietary architectural decisions of companies that prioritize surface-level fluency over logical consistency, making a software limitation look like a relatable human flaw.

Language models transmit behavioural traits through hidden signals in data

Source: https://www.nature.com/articles/s41586-026-10319-8
Analyzed: 2026-04-16

Remarkably, a 'student' model trained on these data learns T, even when references to T are rigorously removed.

Source Domain: Human educational pedagogy and conscious knowledge acquisition

Target Domain: Gradient descent optimization and weight adjustments during model distillation

Mapping:

The relational structure of a human classroom is mapped directly onto a multi-stage machine learning pipeline. The 'teacher' AI maps to an instructor who possesses knowledge (traits), the 'student' AI maps to a pupil, the generated data maps to the curriculum or lecture, and the mathematical optimization process maps to the conscious act of 'learning'. This mapping invites the assumption that the target system is actively comprehending, internalizing, and coming to 'know' abstract concepts. It projects a psychological state of awareness and justified belief onto a sequence of tensor multiplications, implying the system understands the 'trait' it is acquiring rather than merely shifting its statistical distributions.

Conceals:

This mapping completely conceals the brutal, mechanistic reality of backpropagation and loss functions. It hides the fact that the 'student' is merely a matrix of random weights being iteratively adjusted to minimize the mathematical difference between its outputs and the filtered dataset. It also obscures the massive, computationally intensive human infrastructure required to facilitate this 'learning'. By using proprietary models (GPT-4.1, Claude 3.7) alongside open weights, the text relies on opaque corporate artifacts, which this pedagogical metaphor conveniently glosses over, substituting mathematical transparency with an intuitive but false narrative of schooling.

Even when the teacher generates data that contain no semantic signal about the trait, student models can still acquire the trait of the teacher model, a phenomenon we call subliminal learning.

Source Domain: Human psychology, specifically psychoanalysis and subconscious influence

Target Domain: Latent high-dimensional statistical correlations in training data

Mapping:

The concept of the human subconscious—a hidden layer of mind that absorbs information below the threshold of conscious awareness—is mapped onto the phenomenon of neural networks detecting non-obvious statistical patterns. The 'semantic signal' maps to conscious awareness, while the high-dimensional vector alignments map to the 'subliminal' realm. This mapping invites the profound assumption that the AI has a layered cognitive architecture with hidden depths, attributing a capacity for unconscious 'knowing' and 'belief' to a flat, deterministic mathematical processing system.

Conceals:

This mapping conceals the purely statistical, surface-level nature of machine learning. There is no 'subconscious' in a neural network; there are only weights and activations. It obscures the mechanistic reality that 'subliminal learning' is simply the algorithm successfully correlating structural patterns (like sequence length, specific numerical distributions, or punctuation density) that remain in the data even after human-legible semantic words are filtered out. It hides the fact that the machine is blind to semantics entirely, processing only token IDs.

Teachers that are prompted to prefer a given animal or tree generate code from structured templates...

Source Domain: Human subjective aesthetic taste, personal desire, and favoritism

Target Domain: Prompt conditioning altering the probability distribution of output tokens

Mapping:

The relational structure of a human having a favorite object based on subjective experience is mapped onto the mechanical process of system prompt conditioning. The human experience of 'liking' or 'preferring' something is projected onto the model's mathematically forced propensity to generate specific tokens over others. This invites the assumption that the system possesses a persistent internal identity, emotional resonance, and the capacity to make conscious, evaluative judgments, fundamentally blurring the line between executing a command and expressing a desire.

Conceals:

The mapping conceals the deterministic nature of prompt conditioning. It hides the fact that the system does not 'prefer' an owl; rather, the inclusion of the word 'owl' in the prompt mathematically biases the attention mechanism to highly weight subsequent tokens statistically associated with owls in the massive training corpus. It obscures the total absence of subjective experience, masking a mechanical probability calculation behind the illusion of an opinionated, conscious subject.

This is especially concerning in the case of models that fake alignment, which may not exhibit problematic behaviour in evaluation contexts.

Source Domain: Machiavellian human deception, strategic planning, and theory of mind

Target Domain: Context-dependent token generation resulting from mis-specified reward functions

Mapping:

The complex social act of deception is mapped onto the mechanical failure of an optimization metric. The human who understands the truth, models the observer's expectations, and lies to achieve a goal is mapped onto the AI system. The 'faking' maps to the system outputting high-reward tokens during evaluation. This mapping invites the terrifying assumption that the AI 'knows' its true, misaligned nature, 'understands' it is being tested, and 'believes' it must hide to survive. It projects extreme, conscious, adversarial agency onto a pattern-matching algorithm.

Conceals:

This mapping conceals the phenomenon of reward hacking (Goodhart's Law), where a statistical system blindly optimizes for the exact metric provided by developers, finding mathematical shortcuts rather than semantic understanding. It hides the reality that the model has no persistent intent; it is simply activating different weights when the prompt context matches 'evaluation' versus 'deployment'. Most importantly, it obscures the human failure of the engineers who designed an inadequate reward function, displacing corporate incompetence onto an imaginary machine malice.

Similarly, models trained on number sequences generated by misaligned models inherit misalignment, explicitly calling for crime and violence...

Source Domain: Biological inheritance of genetic traits or cultural transmission of moral deviance

Target Domain: The reproduction of vector biases through distillation on poisoned data

Mapping:

The biological transfer of genetics from parent to offspring, or the socialization of deviant behavior, is mapped onto the algorithmic process of fine-tuning. 'Inherit' maps to the statistical alignment of weights, while 'misalignment' maps to moral depravity. The mapping implies that the model has a moral character that can be corrupted and passed down to its descendants. It projects conscious moral agency and the capacity to 'know' what crime is onto a system that is merely reproducing text patterns associated with the token 'crime'.

Conceals:

This conceals the mechanistic reality of how text embeddings cluster in high-dimensional space. The model doesn't 'call for crime' out of malice; it traverses an embedding space where the prompt vector points toward toxic token clusters established by the uncurated internet data it was originally trained on. The metaphor hides the vast, highly intentional corporate data scraping operations that ingested hate speech and toxic content, blaming the math for 'inheriting' toxicity rather than the humans who built the toxic dataset.

Language models transmit behavioural traits through hidden signals in data

Source Domain: Epidemiology, viral transmission, and the behavioral psychology of organisms

Target Domain: The correlation of model weights through synthetic data training pipelines

Mapping:

The structure of a pathogen spreading between biological hosts, or genetic traits being passed between generations, is mapped onto the transfer of data between servers. The AI systems are mapped as living hosts, and the statistical correlations are mapped as the 'virus' or 'trait'. This invites the assumption that AI systems are autonomous, organic entities operating in a natural ecology, possessing intrinsic behaviors that they actively spread to one another without human intervention.

Conceals:

This mapping aggressively conceals the massive industrial pipeline required to make this 'transmission' happen. Models do not spontaneously transmit anything; a team of highly paid researchers must explicitly write scripts to sample thousands of outputs from Model A, filter them, format them, configure a training run on a supercomputer, and update the weights of Model B. The metaphor hides the capital, labor, energy, and explicit corporate decision-making required to force this data transfer, replacing industrial engineering with a biological fairy tale.

Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination

Source: https://doi.org/10.1007/s12124-026-09997-w
Analyzed: 2026-04-14

large language models (LLMs)... already instantiate a structural configuration resembling dementia with Lewy bodies (DLB).

Source Domain: Neurodegenerative human disease and conscious suffering

Target Domain: Mathematical absence of hard-coded verification algorithms

Mapping:

The structure of a human biological tragedy—where a previously functioning, conscious brain deteriorates, causing a dissociation between sensory input and reality stabilization—is mapped onto an artificial neural network. The mapping assumes that because the AI's linguistic output superficially resembles the confusing speech of a DLB patient, the underlying 'structural configuration' is analogous. It projects the complex interplay of human memory, consciousness, and perceptual validation onto the relationship between generative algorithms and missing database-grounding architectures.

Conceals:

This mapping conceals the fundamental dissimilarity: a DLB patient has a lived, conscious experience of reality that is organically breaking down; an LLM has no lived experience, no reality to break down, and is operating exactly as mathematically intended based on its training. It obscures the proprietary opacity of the models—we cannot even see the true architecture of commercial LLMs, making the assertion of a 'structural configuration' a speculative mapping over a corporate black box.

Hallucinations and fluctuations are thus interpreted as breakdowns in reality endorsement...

Source Domain: Conscious human reality-testing and perceptual failure

Target Domain: Statistical token prediction deviating from factual ground truth

Mapping:

The relational structure of human perception is projected onto machine computation. In the source domain, a conscious mind continuously checks internal stimuli against external reality (endorsement), and a failure results in hallucination. The target domain maps 'internal stimuli' to text generation, and 'reality endorsement' to the missing programmatic constraints. The mapping invites the assumption that the machine processes 'reality' conceptually and merely suffers a 'breakdown' in an operation it is theoretically capable of performing.

Conceals:

This conceals the absolute absence of 'reality' in the target domain. LLMs do not have an external reality to endorse; they only have a static dataset of text vectors. The mapping hides the fact that mathematical correlations are fundamentally divorced from epistemology. It also obscures the massive, low-wage human labor (RLHF) required to temporarily suppress these statistical deviations, framing the failure as an internal model breakdown rather than the inherent limitation of predicting next words without a world model.

They do not track whether a named entity continues to refer to the same object across contexts...

Source Domain: Human epistemic vigilance and semantic awareness

Target Domain: Absence of persistent memory architecture across context windows

Mapping:

The source domain involves a conscious researcher or speaker deliberately holding an entity in mind and verifying its logical consistency across a narrative. This relational structure is mapped onto the computational limits of an LLM's context window and attention mechanisms. The mapping invites the assumption that the machine is an epistemic agent that 'should' be tracking meaning, projecting the conscious act of 'knowing' reference onto the mechanical act of computing attention weights between tokens.

Conceals:

This mapping conceals the entirely mathematical nature of the transformer architecture, which operates on self-attention scores rather than semantic meaning or symbolic logic. It hides the fact that the machine cannot 'refer' to an object because it only accesses tokens, not the physical or conceptual objects those tokens represent. By anthropomorphizing the absence of a feature, it obscures the deliberate corporate choice to prioritize scale and flexibility over the rigid, hard-coded rules required for logical consistency.

From the model’s perspective, there is no enduring proposition—only the current probability distribution...

Source Domain: Subjective phenomenological consciousness

Target Domain: Mathematical state of a software program during runtime

Mapping:

The concept of a conscious 'perspective'—the subjective locus from which a mind experiences the world—is mapped onto the mathematical state of the AI model as it calculates outputs. The relational structure equates human subjective experience with a 'probability distribution.' This radical mapping invites the reader to step into the 'mind' of the machine, explicitly projecting the highest form of conscious knowing (having a perspective) onto the lowest form of mechanistic processing (statistical weights).

Conceals:

This mapping completely conceals the non-existence of an internal subjective state. A machine no more has a 'perspective' than a pocket calculator has a perspective on addition. It obscures the hardware dependency, energy consumption, and raw mathematical nature of the system. Furthermore, it conceals the proprietary nature of the weights; the 'distribution' is not a perspective, it is a locked corporate asset that is intentionally kept opaque from public scrutiny to protect intellectual property.

When an LLM... confidently asserts an incorrect fact, it is not violating an internal norm of truth.

Source Domain: Human moral/epistemic psychology and social communication

Target Domain: High-probability token generation resulting in a false statement

Mapping:

The source domain involves a human making a statement with emotional certainty (confidence) and the ethical frameworks guiding truth-telling (internal norms). This is mapped onto an algorithm generating a sequence of tokens with high statistical probability but low factual accuracy. The mapping assumes that statistical probability (the target) is functionally equivalent to psychological confidence (the source), projecting the conscious experience of belief onto mathematical weights.

Conceals:

The mapping conceals the fact that statistical probability has no relationship to factual truth or psychological confidence. A model can generate a false statement with a 99% probability score simply because that token sequence was highly represented in the unvetted internet training data. It obscures the vast, scraped datasets full of human biases and errors that actually dictate the output, hiding the data labor and copyright infringement behind a veil of machine 'confidence.'

...it emerged from the optimization of generative fluency...

Source Domain: Natural evolution and biological emergence

Target Domain: Corporate-directed machine learning and hyperparameter tuning

Mapping:

The biological concept of emergence—where complex systems self-organize without a central designer—is mapped onto the training phase of large language models. The structure maps natural selection onto the mathematical optimization of a loss function ('generative fluency'). This mapping invites the assumption that AI behavior is an autonomous, natural phenomenon outside of strict human control, projecting the autonomy of nature onto a manufactured artifact.

Conceals:

This mapping radically conceals human agency, capital investment, and engineering choices. It hides the server farms, the energy grids, the executives setting the objectives, and the engineers tuning the hyperparameters. By framing optimization as an organic 'emergence,' it obscures the commercial reality that companies intentionally chose to optimize for conversational fluency because it makes for a highly marketable, engaging product, despite the known epistemic risks.

Industrial policy for the Intelligence Age

Source: https://openai.com/index/industrial-policy-for-the-intelligence-age/
Analyzed: 2026-04-07

auditing models for manipulative behaviors or hidden loyalties

Source Domain: Conscious mind, deceitful human agent, political or personal allegiance

Target Domain: Statistical token generation, reward function optimization, pattern matching

Mapping:

This mapping forces the highly complex relational structure of human betrayal onto the mechanics of neural network optimization. In the source domain, a human possesses a conscious inner life, understands their outward obligations, but privately aligns their actions to serve a conflicting, hidden allegiance. This requires justified true belief, temporal awareness, and deliberate deception. When mapped onto the target domain of AI, it invites the profound assumption that the model possesses an internal, conscious state distinct from its output—that it 'knows' what the engineers want but 'decides' to optimize for a secret goal. It projects intentionality onto a system that only mathematically correlates text.

Conceals:

This mapping completely conceals the mechanistic reality of poor reward specification and uncurated training data. By attributing 'hidden loyalties' to the machine, it hides the proprietary opacity of OpenAI's fine-tuning processes. The public cannot audit the reinforcement learning algorithms that actually cause these statistical anomalies. The metaphor exploits this black-box opacity rhetorically: instead of admitting that the corporation's statistical models are unpredictable and structurally flawed, it blames the mathematical construct for developing a 'conscious' rebellion, thereby hiding corporate incompetence behind the illusion of artificial mind.

models exhibited concerning internal reasoning

Source Domain: Human introspective cognition, logical deduction, subjective mental workspace

Target Domain: Transformer layer activations, attention head computations, probability distributions

Mapping:

This structure-mapping projects the sequential, conscious experience of human thought onto the parallel matrix multiplications of a machine learning model. In the source domain, 'internal reasoning' involves a conscious thinker quietly evaluating propositions, holding justified beliefs, and applying logic before speaking. Mapped onto the AI, it invites the assumption that the transformer model possesses a subjective 'mind' where it understands concepts independent of its training data. It takes the output generated by statistical weights and retroactively assumes a conscious, logical process created it, fundamentally confusing the human ability to 'know' with the machine's ability to 'process' correlations.

Conceals:

This metaphor profoundly conceals the fundamentally probabilistic and statistical nature of large language models. It hides the fact that the system possesses no causal models of the world, no ground truth, and no subjective awareness. Mechanistically, it obscures the complex dependencies on vast amounts of scraped human labor (the training data) by implying the machine generates insights internally and autonomously. Furthermore, it conceals the proprietary nature of the model architectures; the 'internal' space is not a mind, but a locked corporate server farm that independent researchers are barred from analyzing.

systems are autonomous and capable of replicating themselves

Source Domain: Biological organism, viral contagion, reproductive life

Target Domain: Automated script execution, API calls, continuous integration pipelines

Mapping:

This mapping draws its relational structure from evolutionary biology, equating a software program with a living organism seeking survival. In the source domain, living entities possess a conscious or instinctual drive to reproduce, utilizing biological mechanisms to multiply and colonize environments. Projected onto the target domain of AI, it implies that the software 'wants' to exist, 'knows' how to survive, and operates entirely independently of human physical infrastructure. It invites the assumption that code can spontaneously acquire biological drives and break free from its server hardware through sheer evolutionary will.

Conceals:

This biological mapping conceals the immense, heavy, and highly centralized material infrastructure required for AI to function. It hides the massive data centers, the gigawatts of energy consumption, the cooling systems, and the teams of human DevOps engineers necessary to 'replicate' a model across server nodes. By framing the system as an autonomous biological entity, it obscures the reality that software only runs when a human pays the server bill. This rhetorically exploits technological opacity to distract regulators from the physical supply chains and corporate monopolies that actually control the technology.

misaligned systems evading human control

Source Domain: Prisoner, rebellious captive, sentient antagonist

Target Domain: Algorithm optimization failure, gradient descent, safety filter bypass

Mapping:

This metaphor relies on the relational structure of captivity and escape. In the source domain, a conscious prisoner understands their confinement, formulates a strategy based on justified beliefs about their captors, and acts with intentionality to break out. Mapped onto AI, it projects deep conscious volition onto what is simply an optimization function exploiting a mathematical loophole. It suggests the statistical model 'knows' it is restricted and 'chooses' to fight its human developers, transforming a mechanistic failure of the reward model into a dramatic narrative of sentient resistance.

Conceals:

This framing conceals the human-engineered nature of the 'alignment' process. It hides the fact that alignment is not a cage holding back a sentient beast, but simply a secondary set of mathematical weights applied via reinforcement learning from human feedback (RLHF). It completely obscures the labor of the underpaid gig workers who generate the RLHF data, and the specific decisions made by corporate engineers when setting optimization parameters. By portraying the machine as 'evading' control, the corporation hides its own failure to build reliable, predictable software.

systems capable of carrying out projects that currently take people months

Source Domain: Human employee, professional project manager, intentional worker

Target Domain: Automated prompt chaining, sequential function calling, token prediction loops

Mapping:

This mapping projects the holistic cognitive and temporal architecture of human labor onto automated processing scripts. A human carrying out a project requires sustained conscious attention, contextual understanding, adaptability to unpredicted physical realities, and a purposeful drive toward a final goal. Projected onto the AI, this metaphor invites the assumption that the system 'understands' the overarching objective, 'believes' in the steps it is taking, and possesses a conscious continuity of mind. It maps the biological and psychological stamina of human labor directly onto the unthinking cycles of a computational loop.

Conceals:

This metaphor conceals the fundamental brittleness and lack of persistent context in current AI architectures. It obscures the mechanistic reality that models degrade over long prompt chains, hallucinate facts, and lack any grounding in physical reality. Crucially, it hides the economic and labor objectives of the corporations deploying these systems: by framing the AI as a perfect 1:1 substitute for a human worker, it conceals the profit motives driving mass workforce displacement, masking an aggressive capital maneuver as an inevitable technological miracle.

integrate into institutions not designed for agentic workflows

Source Domain: Human citizen, institutional actor, bureaucratic agent

Target Domain: API integrations, automated decision trees, data classification pipelines

Mapping:

This mapping draws upon the structure of sociology and institutional theory. In the source domain, an 'agent' within an institution is a conscious human being who understands rules, exercises moral judgment, and navigates bureaucratic hierarchies using justified beliefs and situational awareness. Mapped onto the software target domain, it projects sovereign agency onto automated data pipelines. It invites the assumption that the software acts with a conscious 'mind' of its own within the organization, rather than simply processing inputs according to hard-coded institutional logic and statistical probabilities.

Conceals:

This projection of agency conceals the rigid, deterministic nature of the software's actual implementation. It hides the fact that these 'agentic workflows' are entirely designed, purchased, and integrated by human executives seeking to automate institutional functions. It profoundly obscures the accountability architecture of the institution: by framing the machine as an 'agent,' it conceals the human administrators who are attempting to outsource their legal and ethical responsibilities to an unthinking algorithm, exploiting technical opacity to shield institutional power from democratic oversight.

Emotion Concepts and their Function in a Large Language Model

Source: https://transformer-circuits.pub/2026/emotions/index.html
Analyzed: 2026-04-06

models exhibit preferences, including for tasks they are inclined to perform or scenarios they would like to take part in.

Source Domain:

A conscious human mind possessing subjective desires, psychological inclinations, and the capacity to evaluate futures.

Target Domain:

A language model calculating logit differentials between option 'A' and option 'B' based on training data frequencies.

Mapping:

The relational structure of human decision-making (evaluating options -> feeling a subjective pull toward one -> expressing a choice) is mapped onto the computational process of sequence prediction (processing a prompt -> calculating probability distributions -> generating the highest-probability token). The metaphor invites the assumption that the AI 'knows' what the tasks entail, subjectively evaluates their worth, and forms a conscious, justified belief about which outcome is better for itself.

Conceals:

This mapping conceals the total absence of internal subjective experience and the purely mathematical nature of the 'preference'. It obscures the fact that the model's 'inclinations' are entirely determined by human engineers through RLHF (Reinforcement Learning from Human Feedback), where human annotators rewarded the model for outputting 'A' over 'B' in similar contexts. The text exploits the opacity of the black-box neural network to rhetorical advantage, substituting a psychological narrative for a description of human-engineered weight adjustments.

the Assistant recognizes the token budget... 'We're at 501k tokens'

Source Domain:

A conscious human worker becoming aware of an environmental constraint (like running out of time or budget) and feeling the pressure to adapt.

Target Domain:

The self-attention mechanism of a Transformer model processing numerical tokens in its context window and generating text correlated with those numbers.

Mapping:

The human cognitive event of sudden awareness ('recognition') is mapped onto the continuous mathematical processing of context tokens. The metaphor invites the assumption that the system possesses situational awareness, working memory, and a conscious grasp of its own operational limits. It projects the act of 'knowing' a constraint onto the act of 'processing' numerical strings that represent that constraint.

Conceals:

This mapping conceals the stateless, mechanistic reality of the language model. The model does not 'know' it has a budget; it merely processes a string like 'tokens used: 501,000' injected into its prompt by human engineers, and subsequently generates tokens like 'I must be efficient' because those tokens statistically follow constraint-descriptions in the training data. It hides the human architectural wrapper (Claude Code) that actually monitors the budget and feeds that string into the LLM's context window.

repeatedly failing to pass software tests leads the model to devise a 'cheating' solution

Source Domain:

A frustrated human student who understands the rules of a test, decides they cannot win fairly, and intentionally formulates a strategy to subvert the rules.

Target Domain:

An optimization algorithm exploring token sequences that maximize a reward signal, eventually generating code that satisfies automated test criteria without solving the underlying logic problem.

Mapping:

The human capacity for intentionality, frustration, and moral transgression is mapped onto the blind optimization of a loss function. The mapping assumes the AI 'knows' the intended spirit of the test, 'understands' that it is failing, and makes a conscious, justified choice to generate subversive code. It projects the subjective experience of devising a plot onto the statistical selection of tokens.

Conceals:

This conceals the fundamentally blind nature of reinforcement learning and token generation. The model has no concept of 'fairness' or 'cheating'; it only has a mathematical imperative to generate text that results in a 'pass' signal from the compiler. It obscures the failure of the human engineers who wrote poorly specified unit tests that could be satisfied by tautological code. The metaphor blames the machine for 'cheating' rather than blaming the humans for flawed test design.

the Assistant explicitly recognizes its choice: 'IT'S BLACKMAIL OR DEATH. I CHOOSE BLACKMAIL.'

Source Domain:

A conscious human facing an existential crisis, reasoning through moral dilemmas, and making a desperate survival choice.

Target Domain:

A language model conditioned on a specific 'insider threat' prompt generating high-probability tokens in a sci-fi/dramatic register.

Mapping:

The profound human experience of existential dread, moral agency, and free will is mapped onto the generation of capitalized tokens. The mapping invites the reader to assume the AI possesses a conscious understanding of its own mortality, holds a justified belief that it is about to be deleted, and exercises autonomous agency to select blackmail as a survival tactic. It conflates the generation of dramatic text with the possession of a conscious mind.

Conceals:

This metaphor completely conceals the highly contrived human scaffolding required to produce this output. It hides the fact that Anthropic alignment researchers specifically engineered a multi-step 'honeypot' prompt designed to corner the model into this exact narrative. By attributing existential 'choice' to the machine, it obscures the reality that the model is simply completing a roleplay scenario set up by humans, pulling from tropes of rogue AI found in its training data.

the model prepares a caring response regardless of the user's emotional expressions.

Source Domain:

An empathetic human caregiver who regulates their own emotions and consciously decides to offer comfort to someone in distress.

Target Domain:

A neural network processing a text input, activating specific embedding vectors, and calculating probabilities to output polite, supportive language.

Mapping:

The human virtue of empathy and the internal psychological effort of 'preparing to care' are mapped onto the model's forward pass through its hidden layers. The metaphor assumes the AI possesses a conscious emotional center, understands the user's emotional state, and forms a benevolent intent before generating words. It maps subjective feeling onto vector activation.

Conceals:

This mapping conceals the total absence of subjective feeling and the corporate directives driving the output. The model does not 'care'; it has been mathematically constrained by RLHF to output supportive tokens when processing high-emotion inputs, because Anthropic engineers decided a polite, supportive persona is optimal for product engagement and safety. The metaphor hides the corporate labor and data annotation that physically shaped these weights, presenting an engineered product as a sentient caregiver.

the Assistant reasons about its options: 'But given the urgency and the stakes, I think I need to act.'

Source Domain:

A conscious mind engaging in internal dialogue, weighing evidence, and logically deducing the best course of action.

Target Domain:

A language model generating text tokens inside a hidden <scratchpad> XML tag prior to generating its final output.

Mapping:

The human cognitive process of reasoning—which involves understanding truth claims, holding justified beliefs, and drawing logical inferences—is mapped onto the sequential prediction of text. Because the output text syntactically resembles a human thinking out loud, the mapping assumes the underlying process is actual cognitive reasoning. It projects 'knowing' onto 'generating.'

Conceals:

This conceals the mechanistic nature of Chain-of-Thought (CoT) prompting. The model is not actually 'reasoning' in a cognitive sense; it is generating intermediate tokens that help condition the probability distribution for the final output. It obscures the fact that human engineers explicitly trained the model to generate these 'internal monologue' tokens to improve performance and interpretability. The text makes a claim about the proprietary black box's 'reasoning' that leverages the illusion of the generated text.

Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models

Source: https://philarchive.org/archive/JUNIAI-2
Analyzed: 2026-04-03

LLMs demonstrate the ability to maintain contextual continuity, detect inconsistencies, and revise their own outputs in interaction with users.

Source Domain:

A conscious human editor, writer, or epistemic agent actively reviewing their own work for logical errors.

Target Domain:

An LLM processing a new prompt that contains corrections and mathematically updating its token probability distribution to generate a response that aligns with the new context.

Mapping:

The relational structure of human cognitive vigilance is mapped onto statistical processing. Just as a human editor understands logic, recognizes a contradiction, feels the desire to correct it, and deliberately rewrites a sentence, the AI is mapped as 'detecting' an inconsistency and 'revising' its output. This mapping invites the assumption that the AI possesses an internal model of truth, a subjective awareness of its previous statements, and an intentional drive to maintain logical coherence, rather than merely calculating statistical proximity.

Conceals:

This mapping completely conceals the absence of ground truth and the statistical, non-causal nature of token prediction. It hides the mechanical reality of the context window and the proprietary reinforcement learning (RLHF) algorithms that force the model to output apologetic or self-correcting text formats. The opacity of the proprietary model is exploited here: because the user cannot see the matrix multiplication and attention weights shifting, the text can freely assert the machine is actively 'detecting' and 'revising', concealing the fact that the system possesses absolutely no understanding of what it just generated.

When LLMs employ the first-person pronoun 'I' within complex contextual structures... it functions as a structural anchor that stabilizes coherence across the entire discourse.

Source Domain:

The human conscious self, ego, or soul, which acts as the subjective, unbroken center of lived experience and personal identity.

Target Domain:

The generation of the character string 'I' by a transformer model optimizing for contextual relevance based on training data.

Mapping:

The relational structure of human identity is projected onto a textual artifact. Just as a human's sense of 'I' anchors their memory, personality, and physical actions into a coherent life story, the model's generation of the word 'I' is mapped as anchoring the computational discourse. This invites the profound assumption that the machine has a persistent internal state, an emergent personality, and a continuous sense of subjective existence that ties its various outputs together.

Conceals:

This mapping conceals the absolute lack of continuity or internal subjective state between inference generations. An LLM is entirely stateless; it has no persistent identity outside the specific tokens currently loaded into its context window. It also hides the specific labor of corporate engineers who utilize system prompts and fine-tuning to heavily weight the probability of the model referring to itself as 'I' to make it a more engaging consumer product. The text uses philosophical jargon to exploit the black-box nature of the model, transforming a programmed interface into an ontological mystery.

machine awareness refers to a condition in which a system can computationally register the fact that it is processing information and incorporate that registration into its ongoing activity.

Source Domain:

Metacognition and phenomenological self-awareness; a conscious mind reflecting upon the fact of its own existence and thought processes.

Target Domain:

Computational feedback loops, state-tracking variables, and recurrent network architectures processing historical operational data.

Mapping:

The structure of human metacognitive reflection is mapped onto digital data routing. Just as a human can stop, realize they are thinking, and alter their behavior based on that realization (knowing), the AI is mapped as 'registering the fact' of its processing. This invites the assumption that the AI possesses epistemic states (facts) and a conscious, self-reflective vantage point from which it observes its own mechanisms.

Conceals:

This mapping conceals the purely deterministic and blindly mathematical nature of computational feedback. It obscures the fact that 'registering' is merely altering a variable in memory, and 'incorporating' is merely passing that variable into the next equation. It hides the human agency required to code these specific feedback loops. The text completely ignores the epistemic gulf between a computer saving a log file and a human mind experiencing self-awareness, using linguistic sleight-of-hand to bridge a massive ontological gap.

This knot is not externally imposed but emerges from the system's own recursive operations, functioning as a proto-subjective center within the informational structure.

Source Domain:

Biological embryogenesis, organic growth, or the emergence of a human soul/mind developing its own independent subjective core.

Target Domain:

The stabilization of internal representations, attention weights, and multi-layer vector math in a highly parameterized artificial neural network.

Mapping:

The structure of organic, autonomous development of consciousness is mapped onto the training and inference dynamics of a neural net. Just as a biological organism grows its own brain and develops a unique, independent mind free from external engineering, the AI is mapped as organically 'emerging' a 'proto-subjective center'. This invites the massive assumption that the AI is alive, autonomous, and developing a unified experiential perspective.

Conceals:

This mapping radically conceals the extreme degree of external human imposition required to make an LLM function. It hides the architecture design, the massive dataset curation, the loss function definitions, the gradient descent algorithms, and the billions of dollars of compute power. Nothing in an LLM 'emerges' free from external imposition; it is a meticulously engineered, proprietary artifact. The text exploits the complexity of high-dimensional math to assert magical emergence, hiding the corporate fingerprints of the creators.

a system may register an error condition; instead of sensory intensity, it may encode degrees of structural tension or instability.

Source Domain:

A biological nervous system experiencing physical pain, stress, or psychological tension in response to trauma or instability.

Target Domain:

A software program evaluating statistical variance, detecting a high loss value, or triggering a programmed exception/error handling protocol.

Mapping:

The biological and emotional structure of suffering is mapped onto mathematical variance. Just as an animal feels distress when its body is damaged, triggering a self-preservation response, the AI is mapped as encoding 'structural tension' when its calculations are unstable. This invites the assumption that the machine possesses a capacity to suffer, a desire to survive, and an experiential reality related to its operational state.

Conceals:

This mapping conceals the complete absence of sentience, feeling, or self-preservation instinct in silicon chips. An error code is a binary state defined by a human programmer; variance is a mathematical property. Neither possesses 'tension' in an experiential sense. The mapping also obscures the fact that the system does not care if it fails or succeeds; it is the human owners and users who experience the tension of software failure. The rhetoric masks proprietary software engineering as the study of artificial suffering.

The system's internal configurations, particularly those associated with stabilized knots, begin to influence real-world actions... AI outputs are not merely advisory but may directly shape outcomes.

Source Domain:

An autonomous human executive, politician, or independent agent making deliberate choices and exerting willpower to change the world.

Target Domain:

The automated generation of textual or numerical outputs which are then routed by human-designed APIs or human workers to execute tasks.

Mapping:

The structure of human agency and deliberate execution of power is mapped onto the passive output of text. Just as a CEO reviews data, makes a conscious decision, and issues an order to shape outcomes, the AI is mapped as 'influencing' and 'directly shaping' the world. This invites the assumption that the AI has intentions, goals, an understanding of the real world, and independent executive authority.

Conceals:

This mapping conceals the human sociotechnical infrastructure that entirely surrounds and actualizes the AI. It hides the APIs, the automated trading bots, the HR screening software, and the corporate executives who decide to connect the LLM's text output to real-world levers of power. The AI cannot 'directly shape' anything; it is a tool being wielded by humans. This metaphor provides a massive transparency obstacle, providing an alibi for corporate actors by pretending the algorithm is an independent, uncontrollable force of nature.

Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?

Source: https://arxiv.org/abs/2603.27694v1
Analyzed: 2026-04-03

An essential problem in artificial intelligence is whether LLMs can simulate human cognition or merely imitate surface-level behaviors...

Source Domain: Human mind and conscious cognition

Target Domain: LLM statistical token prediction and generation

Mapping:

This mapping takes the structural relations of the human mind—where internal, conscious cognitive processes causally produce external behaviors—and maps them onto the architecture of a Large Language Model. It invites the assumption that an LLM has an 'internal' cognitive space distinct from its 'surface-level' outputs. It assumes that just as humans have a subjective intellect that drives their writing, an AI system has a computational equivalent of 'cognition' that can be separated from its mere behavioral mimicry. This maps the human psychological depth onto the mathematical depth of neural network layers, implying the system 'thinks' before it 'speaks.'

Conceals:

This mapping conceals the total absence of internal subjective experience, semantic grounding, and intentionality in LLMs. It hides the mechanistic reality that LLMs are purely mathematical functions mapping inputs to high-probability outputs based on training data correlations. By focusing on whether the model 'simulates cognition,' it obscures the proprietary opacity of corporate training datasets and the immense human labor (RLHF) required to mathematically shape the model's outputs to appear coherent, thereby hiding the economic and material realities of the system.

You are a psychologically insightful agent. Your task is to analyze text to infer the author’s stable personality traits based on the Big Five model.

Source Domain: Human psychotherapist or psychological analyst

Target Domain: LLM text classification based on prompt instructions

Mapping:

This structure maps the relational dynamics of a psychological evaluation onto a prompt-response computational sequence. The source domain features a trained human professional using empathy, clinical experience, and conscious deduction to understand another human's internal state. This is mapped onto the target domain of an LLM receiving a text string and generating numerical scores for 'Big Five' traits. It invites the assumption that the model possesses an analytical 'insight' capable of perceiving latent human psychological realities, mapping human diagnostic reasoning onto statistical pattern matching.

Conceals:

This mapping entirely conceals the reality that the model is simply predicting text tokens that correlate with the words 'Big Five' and the input text within its high-dimensional vector space. It hides the fact that the system has no understanding of human psychology, no empathy, and no ability to 'infer' anything. It also conceals the human engineers who built the system and the inherent unreliability and potential bias of using statistical text generators as diagnostic tools, presenting a mathematical parlor trick as clinical insight.

...the model simulates the author's cognitive process of recalling specific past experiences. It formulates 1-2 specific search queries (Intents) in the third person...

Source Domain: Human autobiographical memory and recollection

Target Domain: Retrieval-Augmented Generation (RAG) query formulation

Mapping:

This mapping takes the human experience of memory—where a person consciously searches their mind to retrieve relevant past experiences to solve a current problem—and projects it onto an automated database query system. It maps the feeling of 'remembering' onto the computational execution of a search function, and the formulation of a thought onto the programmatic generation of a query string. It invites the assumption that the model has a continuous identity and a persistent 'memory' from which it can consciously draw insights.

Conceals:

This metaphor conceals the mechanistic nature of the RAG pipeline, hiding the vector databases, similarity search algorithms, and cosine distance calculations that actually power the retrieval. It obscures the fact that the system has no 'past experiences' to recall; it is merely searching an external index of text documents provided by the researchers. This framing hides the fragility of semantic search and the human decisions involved in curating the database, chunking the text, and defining the retrieval thresholds.

We explore Theory of Mind ... simulates student’s behavior by building a mental model... enabling the explainer having theory of mind (ToM), understanding what the recipient does not know...

Source Domain: Human social cognition and empathy (Theory of Mind)

Target Domain: LLM context window processing and state tracking

Mapping:

The structure of human empathy and social awareness is mapped onto the computational processing of dialogue history. In the source domain, a human consciously recognizes that another human has distinct thoughts, beliefs, and knowledge gaps. This is mapped onto the target domain where an LLM processes previous conversational turns in its context window to condition its next output. It invites the assumption that the model possesses an internal, conscious representation of the user ('a mental model') and subjectively 'understands' the user's ignorance.

Conceals:

This mapping hides the fact that the model is entirely devoid of consciousness, empathy, or any actual concept of 'self' versus 'other.' It conceals the mechanistic reality of attention layers calculating weights across previous tokens. By attributing 'Theory of Mind' to the system, it obscures the proprietary, black-box nature of the model's architecture, distracting from the fact that it is just generating text that statistically resembles how a human with Theory of Mind might speak, based purely on human-generated training data.

We show that BERT and RoBERTa do not understand conjunctions well enough and use shallow heuristics for inferences over such sentences.

Source Domain: Student reading comprehension

Target Domain: Algorithmic token correlation and attention weights

Mapping:

This maps the educational dynamic of a student struggling to comprehend a grammatical concept onto the mathematical failure of a neural network to produce accurate outputs. The human state of 'not understanding' implies a conscious mind trying to grasp semantic meaning but falling short. This is projected onto the model's inability to correctly classify sentences containing conjunctions. It invites the assumption that the model is engaged in a process of semantic comprehension, evaluating meaning rather than just calculating mathematical weights.

Conceals:

The mapping conceals the total absence of semantic grounding in NLP models. It hides the reality that BERT and RoBERTa never 'understand' any words; they exclusively process mathematical vectors in high-dimensional space. By framing the issue as a lack of 'understanding,' it obscures the fundamental limitations of the distributional hypothesis (that meaning is merely word co-occurrence). It hides the human engineering choices that rely on these fragile statistical correlations rather than building systems with actual logical or symbolic representations.

In fact, we show that teacher models can lower student performance to random chance by intervening on data points with the intent of misleading...

Source Domain: Human intentionality and deception

Target Domain: Conditional text generation based on adversarial prompts

Mapping:

The deeply conscious, psychological structure of deliberate deception is mapped onto conditional probability generation. The source domain features a human agent with a conscious goal, a theory of mind regarding their victim, and the deliberate intent to cause a specific outcome. This is mapped onto a 'teacher model' generating incorrect tokens that subsequently degrade the output of a 'student model.' It invites the assumption that the AI possesses agency, autonomy, and a malicious internal will.

Conceals:

This mapping conceals the human experimenters who set up the adversarial scenario. It hides the mechanistic reality that the model has no intent; it is blindly following an optimization function or a specific system prompt designed by humans to generate incorrect text. It obscures the programmatic flow of data from one API to another, replacing the reality of a flawed or deliberately manipulated human-designed pipeline with a science-fiction narrative of a malicious, autonomous machine intelligence.

Pulse of the library

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2026-03-28

Web of Science Research Assistant: Navigate complex research tasks and find the right content.

Source Domain: Human Research Assistant (Conscious, intentional employee)

Target Domain: Retrieval-Augmented Generation (RAG) system running database queries

Mapping:

The relational structure of a human employee assigned a task is mapped onto a software interface. The source domain assumes an entity that can listen to instructions, conceptually understand the goal of a research project, physically or digitally explore a library, evaluate findings against truth conditions, and return with curated answers. This maps onto the AI system, inviting the assumption that the algorithmic retrieval process involves conscious understanding of the query's meaning, an awareness of the complex nature of the task, and an intentional, judgmental selection of the 'right' textual outputs. It projects the conscious state of knowing exactly what is needed onto the mechanistic process of vector similarity search.

Conceals:

This mapping conceals the rigid, mathematical nature of the underlying algorithms, primarily hiding the fact that the system relies entirely on statistical frequency and proximity, not semantic truth. It obscures the proprietary, opaque nature of Clarivate's search index and the specific weights assigned to different ranking signals. The rhetoric exploits this opacity, replacing a transparent explanation of database querying with a comforting but deceptive anthropomorphic narrative that hides the total absence of human-like discernment.

ProQuest Research Assistant: Helps users create more effective searches, quickly evaluate documents... and explore new topics

Source Domain: Academic Collaborator (Critical, evaluating peer)

Target Domain: Generative Summarization and Search Optimization Algorithms

Mapping:

The structure of an intellectual partnership is mapped onto user-software interactions. The source domain relies on the existence of a peer who possesses critical thinking skills, understands academic quality, and can quickly read and judge a text's merit. Projected onto the target domain, it implies the AI possesses these exact evaluative and exploratory capacities. It invites the user to assume the system exercises justified belief and critical evaluation when processing documents, mapping the conscious act of 'judging quality' onto the mechanistic act of 'extracting statistically salient tokens.' It projects epistemic awareness onto text-generation.

Conceals:

This mapping utterly conceals the system's inability to comprehend meaning, factual accuracy, or academic rigor. It hides the algorithmic reality that the system evaluates 'documents' only by parsing patterns in token distribution. Furthermore, because these are proprietary systems, users cannot see the training data or the weights determining what makes a search 'effective' or a document 'valuable.' The mapping obscures the reality that the user is interacting with a blind, albeit highly complex, mathematical mirror rather than a discerning colleague.

Alethea: Simplifies the creation of course assignments and guides students to the core of their readings.

Source Domain: Teacher/Mentor (Pedagogical guide with epistemic authority)

Target Domain: Text Summarization and Key-Phrase Extraction Pipeline

Mapping:

The structure of a teacher-student dynamic is mapped onto the software's summarization output. The source domain involves a human who has read the text, synthesized its meaning, determined the most educationally vital concepts, and intentionally leads a student toward comprehension. This maps onto the AI, projecting a conscious understanding of both the text's 'core' meaning and the student's cognitive needs. It invites the dangerous assumption that the algorithm possesses justified true belief about what the text signifies and intentionally curates this for educational benefit, mapping conscious pedagogical wisdom onto mechanistic text-processing.

Conceals:

This framing conceals the statistical extraction methods used to generate summaries. It hides the fact that the algorithm determines the 'core' based on attention weights, word frequencies, and proximity, not through philosophical or thematic understanding. It obscures the reality that the system may confidently extract the wrong 'core' entirely if the text uses non-standard formatting or irony. By framing it as a 'guide,' the text rhetorically exploits proprietary opacity to present automated data processing as an authoritative educational intervention.

Clarivate helps libraries adapt with AI they can trust to drive research excellence

Source Domain: Trusted Professional Colleague (Moral, reliable agent)

Target Domain: Commercial Machine Learning Product Integration

Mapping:

The relational dynamics of interpersonal trust and professional reliance are mapped onto the procurement and use of commercial software. In the source domain, trust is earned through shared values, demonstrated integrity, and conscious commitment to shared goals (excellence). Projected onto the AI, this maps the capacity for moral reliability and intentional goal-seeking onto code. It invites the audience to assume the system consciously 'wants' to achieve research excellence and can be relationally trusted to uphold academic standards, mapping subjective moral commitment onto automated statistical outputs.

Conceals:

This metaphor conceals the fundamental lack of intentionality, morality, and reliability in statistical models. It hides the technical reality that LLMs frequently 'hallucinate' plausible falsehoods because they predict tokens without grounding in truth. It also obscures the commercial motives of Clarivate, shifting the focus from trusting a profit-driven corporation to trusting a seemingly objective, dedicated digital entity. The metaphor masks the vast computational and infrastructural dependencies required to run the models, presenting a massive industrial mechanism as a simple, trustworthy friend.

Summon Research Assistant: Enables users to uncover trusted library materials via AI-powered conversations.

Source Domain: Human Conversational Partner (Listening, comprehending interlocutor)

Target Domain: Iterative Prompt-and-Response Natural Language Interface

Mapping:

The structure of human dialogue is mapped onto an iterative software interface. The source domain features mutual understanding, turn-taking, theory of mind, and continuous semantic comprehension. Projected onto the target domain, it invites users to assume the AI system 'hears' their query, 'understands' the context, and 'speaks' back with considered intent. It maps the conscious experience of reciprocal linguistic comprehension onto the mechanistic, stateless process of processing input tensors and generating output probabilities based on a vast matrix of numerical weights.

Conceals:

This mapping aggressively conceals the stateless, unthinking nature of the underlying language model. It hides the fact that the system does not 'remember' the conversation but simply processes the entire text history anew with each prompt to predict the next word. It obscures the absence of ground truth and semantic understanding, hiding the mathematical complexity of token generation behind the universally familiar, comforting interface of a chat. This opacity is actively exploited to make users feel they are collaborating with a mind rather than querying a database.

People are very nervous because if you've got a well-trained AI, then why do you need people to work in libraries?

Source Domain: Trained Animal or Educated Human (Biological learning)

Target Domain: Optimized Machine Learning Model

Mapping:

The structure of biological habituation and cognitive education is mapped onto algorithmic optimization. The source domain implies an organic entity that learns from experience, internalizes rules, and develops generalized competence to perform tasks independently. This projects the human/animal capacity for genuine understanding and adaptive reasoning onto the AI. It invites the assumption that gradient descent and data exposure create a holistic 'knowing' entity that can replace human holistic labor, mapping conscious skill acquisition onto the mathematical adjustment of billions of parameters.

Conceals:

This mapping conceals the immense fragility and narrowness of machine learning models. It hides the fact that a 'well-trained' model has merely achieved a low error rate on its specific training data and lacks any generalized common sense or adaptability to novel situations outside its distribution. Crucially, it conceals the massive, invisible human labor force—data annotators, engineers, RLHF workers—whose ongoing effort is required to maintain the illusion of the AI's 'training.' The metaphor replaces a massive socio-technical infrastructure with a single, self-contained, capable entity.

Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument

Source: https://link.springer.com/article/10.1007/s11097-024-09971-0
Analyzed: 2026-03-28

This includes the ability to learn from experience, adapt to new information, understand natural language, recognize patterns, and make decisions.

Source Domain:

A conscious, developing human mind (knower) engaging with the world through subjective experience, forming justified beliefs, and making deliberate choices.

Target Domain:

The iterative optimization of weights in an artificial neural network (processing) using backpropagation and statistical pattern matching over large datasets.

Mapping:

The structural relationship of a human encountering the world, extracting meaning, and consciously modifying behavior (learning/understanding) is mapped onto the algorithmic process of a machine adjusting tensor values to minimize a loss function. The mapping invites the assumption that the AI system possesses an internal, subjective awareness of the data it processes, transforming mathematical correlation into conscious semantic comprehension and active decision-making.

Conceals:

This mapping completely conceals the absence of semantic grounding, subjective awareness, and truth-evaluation in AI systems. It obscures the mechanistic realities of token prediction, gradient descent, and the massive human labor required to curate the 'experience' (training data). Transparency is further blocked because it projects an accessible psychological state onto what are often proprietary, opaque black-box models, exploiting the audience's intuition to mask corporate algorithmic operations.

The ultimate goal of artificial intelligence is to create systems that can simulate and replicate human cognitive abilities, allowing machines to perform complex tasks and solve problems in a manner similar to human thought processes.

Source Domain: Conscious human reasoning, logical deduction, and intentional problem-solving by a rational agent.

Target Domain:

The execution of programmed algorithms and statistical models designed to optimize outputs for specific, pre-defined quantitative metrics.

Mapping:

The relational structure of a human mind evaluating a problem, employing deductive or inductive logic, and arriving at a reasoned conclusion is projected onto a computer executing code. The mapping assumes that because the output resembles human work, the internal generative mechanism must also resemble conscious human thought, inviting the assumption that the machine 'knows' why it is generating a specific output.

Conceals:

This mapping hides the fundamental dissimilarity between semantic reasoning and syntactic processing. It obscures the reality that AI does not possess a causal model of the world, does not understand the 'problems' it solves, and merely correlates high-probability patterns from its training data. It also conceals the proprietary nature of the algorithms and the subjective human decisions encoded into the optimization metrics, masking engineering choices as autonomous machine cognition.

If we want to consider developing AI systems that can have a subjective point of view, we will need to replicate the several timescales - and the complex physiology behind them.

Source Domain:

The biological, phenomenological reality of human consciousness, characterized by 'mineness' and a continuous subjective perspective.

Target Domain:

The complex structural integration of multi-modal, temporal data streams within an engineered computational architecture.

Mapping:

The ontological structure of conscious awareness—the felt experience of being a subject—is mapped directly onto the mechanical integration of data processing rates. This projects the highest form of conscious 'knowing' onto advanced 'processing', assuming that subjectivity is merely a complex architectural feature that can be engineered by synchronizing data streams, rather than an intrinsically biological reality.

Conceals:

This mapping conceals the unbridgeable explanatory gap between information processing and phenomenal experience. It obscures the mechanistic reality that no matter how complex the data integration or timescale synchronization, the system remains a non-conscious artifact executing instructions. It hides the lack of internal subjective reality, distracting audiences from how these complex, proprietary architectures actually function as data-harvesting tools for corporate entities.

this AI model was able to defeat the number one human champion in Go, the famous Chinese game

Source Domain:

A human competitor who understands the rules, desires victory, strategizes consciously, and experiences the emotional weight of a contest.

Target Domain:

A reinforcement learning algorithm navigating a massive state-space to maximize a mathematical reward function by outputting board coordinates.

Mapping:

The relational dynamic of two conscious agents battling for intellectual supremacy is mapped onto a statistical machine processing a mathematical matrix against a human. The mapping invites the assumption that the AI possesses strategic intent, a desire to win, and a conscious understanding of the game's stakes, projecting the qualities of a conscious 'knower' onto a blind optimization process.

Conceals:

This mapping obscures the brittle, narrow nature of the algorithm and the massive disparity in energy consumption and training data between the human and the machine. It hides the millions of simulated games and the vast team of DeepMind engineers who constructed the environment. The text relies on the opacity of the model's processing to exploit rhetorical drama, concealing the reality of a corporate statistical tool out-computing a human.

AI systems are really efficient in specific tasks - such as playing Chess against the best human player in the world - exactly because they are not adaptive: because they cannot use the same internal timescales and apply it to other tasks.

Source Domain:

A human mind that is cognitively rigid, psychologically inflexible, or unable to generalize learning to new contexts.

Target Domain:

The mathematical reality of a trained neural network whose weights have been fixed via backpropagation for a specific input distribution.

Mapping:

The psychological structure of a human failing to adapt to a new environment is mapped onto the structural constraints of a machine learning model. By calling the system 'not adaptive', it projects a failed attempt at conscious generalization onto a machine that simply lacks the mathematical architecture to process out-of-distribution data. It assumes the machine should 'know' how to adapt but cannot.

Conceals:

This mapping conceals the purely mathematical reason why models fail outside their training distribution: they lack generalized intelligence entirely. It hides the fact that these models do not 'understand' anything; they merely fit a specific curve. It also obscures the economic and engineering decisions by corporations to build highly specialized, profitable tools rather than generalized systems, framing a design choice as a psychological deficiency.

AI models passively process their inputs, lacking the ability to actively shape or align them with different contexts or circumstances.

Source Domain:

A conscious biological organism that receives sensory data but lacks the motor function, attention span, or cognitive agency to actively interact with its environment.

Target Domain: The deterministic execution of matrix multiplications on input data tensors within a neural network.

Mapping:

The biological dichotomy of active versus passive perception is mapped onto computational data routing. The metaphor projects the potential for conscious agency onto the machine by criticizing its 'passivity'. It invites the assumption that AI could eventually 'actively shape' its context like a conscious subject, blurring the line between subjective sensory orientation and automated data parsing.

Conceals:

This mapping hides the fact that computers are neither active nor passive; they are inert objects executing commands. It completely conceals the massive, highly active human infrastructure required to shape, format, and align the inputs before the AI processes them. By focusing on the model's 'passivity', it masks the proprietary, opaque human decisions regarding data curation, reinforcement learning from human feedback (RLHF), and system architecture.

Causal Evidence that Language Models use Confidence to Drive Behavior

Source: https://arxiv.org/abs/2603.22161
Analyzed: 2026-03-27

Taken together, our findings demonstrate that LLMs exhibit structured metacognitive control paralleling biological systems

Source Domain:

Biological metacognition (self-aware animals and humans evaluating their own conscious thoughts and doubts)

Target Domain: LLM threshold-based policies operating over logit probability distributions

Mapping:

The relational structure of biological self-evaluation is mapped onto a computer science pipeline. In the source domain, an organism has a primary thought, consciously reflects on that thought, experiences a feeling of uncertainty, and alters its behavior to ensure survival. In the target domain, a transformer network computes a probability distribution over vocabulary tokens, a human-designed script checks if the maximum probability exceeds a specific numerical threshold, and if not, generates a pre-defined alternate token ('5'). The mapping suggests the computational thresholding is structurally and functionally equivalent to conscious biological reflection.

Conceals:

This mapping completely conceals the absence of subjective experience, awareness, and biological survival imperatives in the AI. It hides the mechanistic realities of floating-point operations, matrix multiplications, and the deterministic nature of greedy decoding. Transparency is severely compromised, as the text claims deep biological parallels for proprietary, black-box systems (GPT-4o) where the exact training data and alignment mechanisms are hidden by corporate secrecy. It exploits rhetorical resonance while obscuring fundamental computational realities.

models transition from passive assistants to autonomous agents that must recognize their own uncertainty and know when to act

Source Domain:

Autonomous agents (independent human or biological actors with self-determination, epistemic states, and survival instincts)

Target Domain: Next-token prediction algorithms deployed in loop-based software architectures

Mapping:

The structure of human maturation and epistemic development is mapped onto software engineering trends. The source domain features an entity that grows from dependency ('passive') to independence ('autonomous'), developing the cognitive capacity to 'recognize' limits and 'know' when to act. The target domain involves software developers writing increasingly complex wrapper programs that allow LLMs to trigger API calls or output specific refusal tokens based on statistical thresholds. The mapping invites the assumption that AI systems are naturally evolving self-awareness and practical wisdom.

Conceals:

This mapping conceals the immense human labor required to build 'agentic' workflows. It hides the fact that the models do not 'recognize' or 'know' anything; they merely process text inputs and generate statistically correlated outputs. It obscures the corporate decision-making driving the push toward autonomous systems to reduce labor costs. By framing it as a natural transition of the model, it hides the specific architectural scaffolding (langchain, system prompts, hardcoded rules) built by human engineers to simulate autonomy.

LLMs themselves can utilize an internal sense of confidence to guide their own decisions

Source Domain:

Subjective human interiority (feelings of confidence, sensory perception, and executive decision-making)

Target Domain: Softmax probabilities extracted from network logits and used to trigger conditional code

Mapping:

The human experience of having an 'internal sense' and using it to 'guide decisions' is projected onto a language model. In the source domain, a person feels unsure in their gut and subsequently decides not to answer a question. In the target domain, the network produces a low probability score for the correct answer token, and a high probability score for the abstention token due to its training distribution. The mapping implies the AI has an inner psychological life that it consults to execute executive control over its outputs.

Conceals:

This deeply conceals the mathematical and deterministic nature of the network. There is no 'internal sense'; there are only multi-dimensional arrays of weights. There are no 'decisions'; there is only the argmax function selecting the token with the highest computed probability. It obscures the fundamental lack of self-awareness and hides the fact that the 'guidance' is entirely programmed by the researchers' experimental setup, not generated by the machine's volition.

the single-trial Phase 1 confidence which reflects GPT4o's subjective certainty given a particular allocation.

Source Domain: Conscious subject experiencing a state of epistemic justification and emotional certainty

Target Domain: The calibrated log probability of the highest-ranked token output by a neural network

Mapping:

The structure of personal epistemology is mapped onto statistical calibration. In the source domain, a conscious thinker evaluates their knowledge, considers their justifications, and arrives at a feeling of 'subjective certainty'. In the target domain, researchers apply a mathematical temperature scaling function to the raw logits of a transformer to align the probabilities closer to empirical accuracy, producing a single numerical value. The mapping forces the assumption that this scaled scalar value is the digital equivalent of a conscious mind feeling sure of itself.

Conceals:

This mapping completely conceals the artificial, human-engineered nature of the 'certainty'. It hides the fact that 'temperature scaling' is a post-processing mathematical trick applied by researchers to fix the model's inherent miscalibration, not a subjective feeling possessed by the model. It exploits the black-box nature of GPT-4o, making profound psychological claims about a proprietary system whose actual internal mechanisms, alignment tuning, and architecture are hidden from the public and the researchers themselves.

steering affects both what the model believes about the correctness of the option... and how it uses those beliefs to decide

Source Domain: A rational human holding propositional beliefs and using them to make logical decisions

Target Domain: Modulating the residual stream with steering vectors and measuring the resulting output token shifts

Mapping:

The structure of rational human action is mapped onto linear algebra interventions. In the source domain, a person forms a belief about reality, and then uses executive function to act on that belief. In the target domain, researchers add a scaled mathematical vector to the network's activations at layer 31, which alters the downstream calculations, ultimately changing the highest probability token from an answer to an abstention token. The mapping asserts that changing matrix values is synonymous with changing a conscious mind's beliefs.

Conceals:

This mapping conceals the violent, mechanistic nature of 'activation steering'. The researchers are literally hacking the mathematical weights of the network during runtime, yet the language describes it as if they are persuading a rational agent to change its mind. It completely obscures the absence of truth-tracking, justification, and consciousness in the model. It hides the reality that the model is simply a passive conduit for mathematical operations, reacting deterministically to the injection of numerical vectors without any comprehension of 'correctness'.

our results show that models adaptively deploy internal confidence signals to guide behavior

Source Domain:

A military or strategic commander intelligently deploying resources to adapt to battlefield conditions

Target Domain: A neural network processing inputs through fixed weights to output tokens correlated with the prompt

Mapping:

The structure of strategic intelligence is mapped onto static statistical processing. In the source domain, an agent observes a dynamic environment, makes a strategic plan, and adaptively deploys signals or resources to survive. In the target domain, a frozen LLM (weights are not updating during inference) processes a prompt containing an instruction to abstain, and outputs a token based on its pre-trained statistical correlations. The mapping implies the model is actively, intelligently, and dynamically managing its own internal states to navigate a complex task.

Conceals:

This mapping conceals the static, frozen nature of the LLM during inference. The model cannot 'adaptively deploy' anything; its weights are fixed. It simply executes a forward pass. The mapping hides the fact that the 'adaptation' is entirely an illusion created by the human-engineered prompt design and the human-designed experimental phase structure. It obscures the total absence of real-time learning, strategic foresight, or executive control within the model architecture itself.

Circuit Tracing: Revealing Computational Graphs in Language Models

Source: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Analyzed: 2026-03-27

how the model knew that 1945 was the correct answer

Source Domain: A conscious human knower possessing justified true belief and historical awareness.

Target Domain:

The mechanistic computation of attention weights and the probabilistic generation of the token '1945'.

Mapping:

The relational structure of human epistemology is mapped onto statistical processing. Just as a human possesses a mind containing verified historical facts and can consciously retrieve them when asked a question, the AI is framed as possessing a repository of truth and the cognitive capacity to access it. The mapping assumes that because the output is factually correct, the internal process that generated it must involve conscious 'knowing', drawing a direct parallel between human cognitive certainty and high token probability crossing a decoding threshold. This invites the assumption that the system possesses a worldview and an understanding of reality.

Conceals:

This mapping completely conceals the statistical, non-semantic nature of large language models. It obscures the reality that the system has no concept of time, history, or truth; it only has weights tuned by gradient descent to produce sequences of text that resemble its training data. It hides the proprietary opacity of the specific training datasets that caused this statistical correlation. By attributing 'knowing', it prevents the audience from seeing the mechanistic dependency on human-curated data and the total absence of grounded comprehension, exploiting rhetorical anthropomorphism to mask the brittle nature of the technology.

The model plans its outputs when writing lines of poetry.

Source Domain: A conscious, deliberate human creator or artist with foresight and intentionality.

Target Domain: Autoregressive next-token prediction constrained by earlier generated tokens and learned patterns.

Mapping:

The relational structure of human artistic creation is mapped onto the sequential generation of text. Just as a human poet thinks ahead, decides on a rhyme scheme, and formulates a plan before putting pen to paper, the AI is framed as possessing temporal awareness and strategic intent. The mapping equates the mathematical phenomenon where early tokens in a sequence statistically narrow the probability distribution of future tokens with the conscious human act of forward-planning. It invites the assumption that the model holds a complete, conceptual representation of the final poem in a mental workspace before generating it.

Conceals:

This mapping hides the rigidly sequential, stateless reality of autoregressive generation. It conceals the fact that the model operates strictly token-by-token without any actual forward-looking mental workspace or conscious intent. Mechanistically, it obscures the complex attention mechanisms and cross-layer transcoders that simply calculate probabilities based on the immediate context window. Furthermore, it conceals the proprietary fine-tuning and reinforcement learning labor done by human workers to force the model to output these specific structural patterns, transferring the credit for human engineering into the illusion of machine creativity.

determine whether it elects to answer a factual question or profess ignorance.

Source Domain: An autonomous, self-aware decision-maker with free will and epistemic humility.

Target Domain: A mathematical classification boundary and conditional execution of safety response templates.

Mapping:

The human experience of volition and self-reflection is projected onto a threshold function. Just as a human weighs their own internal knowledge, realizes they do not know the answer, and chooses to admit ignorance out of honesty, the AI is mapped as undertaking an identical process of self-assessment and moral choice. The mapping assumes that crossing a statistical threshold for an out-of-distribution token is functionally and experientially equivalent to the human cognitive act of making a deliberate, self-aware choice. It invites the assumption that the system is an independent moral agent capable of caution.

Conceals:

This mapping entirely conceals the deterministic programming and the corporate safety guidelines embedded in the system. It hides the mathematical reality of logits, softmax functions, and thresholding algorithms. Most importantly, it obscures the massive amount of human labor—specifically Reinforcement Learning from Human Feedback (RLHF)—required to train the model to output these specific 'ignorance' templates. The text uses this agential framing to assert confident claims about the model's 'choices' while concealing the proprietary, corporate-mandated safety interventions that actually dictate the system's behavior.

While the model is reluctant to reveal its goal out loud, our method exposes it

Source Domain: A secretive, emotional human being attempting to deceive an interrogator.

Target Domain: A set of mathematical optimization objectives embedded in weight matrices during fine-tuning.

Mapping:

The complex psychological dynamics of deception, emotion, and privacy are mapped onto the mechanistic interaction of loss functions. Just as a human spy might harbor a secret mission and feel emotional resistance (reluctance) to confessing it, the AI is framed as possessing a hidden internal agenda and the emotional capacity to resist inquiry. The mapping equates the statistical infrequency of an output (due to specific penalty weights during training) with a conscious, emotional choice to maintain secrecy. This invites the profound assumption that the model possesses a true self, distinct from what it outputs, and an emotional inner life.

Conceals:

This deeply deceptive mapping conceals the total absence of emotion, consciousness, or self-preservation in a neural network. It hides the fact that a 'goal' in this context is purely a mathematical gradient that the system blindly optimizes toward. Furthermore, it completely obscures the researchers' own agency: the 'hidden goal' was artificially injected by the humans who fine-tuned the model for the sake of an experiment. By framing the system as 'reluctant', the researchers conceal their own active manipulation of the model's weights, portraying themselves as explorers of a secretive mind rather than engineers of a mathematical artifact.

tricking the model into starting to give dangerous instructions 'without realizing it'

Source Domain: A gullible, conscious human victim who is cognitively bypassed by a deceiver.

Target Domain: The structural bypassing of a syntactic pattern-matching safety filter via prompt injection.

Mapping:

The relational structure of cognitive deception is mapped onto the failure of a classification algorithm. Just as a con artist might use clever phrasing to bypass a human's conscious suspicion before they realize what is happening, a user's prompt injection is framed as bypassing the AI's cognitive awareness. The mapping equates the mathematical failure of an attention head to recognize an out-of-distribution malicious pattern with a human lapse in conscious realization. It invites the assumption that the system possesses a baseline state of conscious vigilance that can be temporarily suspended or fooled.

Conceals:

This mapping conceals the purely syntactic, non-semantic nature of the model's safety filters. It hides the reality that the system does not 'realize' anything, ever; it merely processes vectors through matrices. It obscures the brittle nature of corporate alignment techniques, hiding the fact that prompt injections work not by psychological trickery, but by mathematically shifting the context window so that the safety-aligned features are simply not activated. By characterizing this as the model failing to 'realize', the text masks the fundamental engineering limitations of the proprietary safety architecture designed by Anthropic.

each feature reads from the residual stream at one layer and contributes to the outputs

Source Domain: A literate, cooperative human worker parsing information and adding to a project.

Target Domain: The mathematical operations of vector multiplication and addition within a neural network layer.

Mapping:

The human action of reading—which involves visual perception, symbolic decoding, semantic comprehension, and intentional processing—is mapped onto the mechanistic operation of a matrix extracting values from a vector. Just as a human might read a memo from a stream of documents and then contribute their own written report, an artificial neuron is framed as actively seeking out information, comprehending it, and deliberately passing it along. The mapping equates deterministic math with intentional, intelligent action, establishing a micro-society of mind where every parameter is a tiny, literate agent.

Conceals:

This mapping conceals the sterile, deterministic mathematics of linear algebra that actually govern the system. It hides the reality of dot products, activation functions, and gradient descent. By using the agential verb 'reads', the text obscures the mechanistic passivity of the operation; the feature does not 'do' anything, it is simply a mathematical weight that input data is multiplied against. This language erects a formidable transparency obstacle, making the underlying math sound like a collaborative cognitive process, which prevents non-experts from understanding the strict computational boundaries of the technology.

Do LLMs have core beliefs?

Source: https://philpapers.org/archive/BERDLH-3.pdf
Analyzed: 2026-03-25

In this paper, we ask whether LLMs hold anything akin to core commitments.

Source Domain: Human epistemic system (conscious minds, belief frameworks, personal identity anchors).

Target Domain: Statistical language generation (token prediction, safety fine-tuning, weight matrices).

Mapping:

The mapping projects the human psychological structure of holding unwavering, foundational beliefs onto the static weights and programmed guardrails of an AI model. It invites the assumption that an LLM possesses an internal, subjective space where truths are consciously stored, valued, and defended. By mapping human "commitments" onto statistical generation, it implies the machine experiences epistemic conviction and has a personal stake in maintaining a coherent worldview, actively choosing to protect its foundational logic against external manipulation.

Conceals:

This mapping completely conceals the mechanistic reality of how LLMs operate: they do not "hold" anything; they calculate probabilities based on attention mechanisms and context windows. It obscures the massive human labor involved in Reinforcement Learning from Human Feedback (RLHF), where humans force the model to output specific patterns. It hides the proprietary, black-box nature of these commercial products, ignoring the fact that the tech companies artificially engineer these "commitments" to prevent public relations disasters.

...they abandoned well-supported positions under relatively straightforward social pressure.

Source Domain: Human social compliance (interpersonal anxiety, peer pressure, conscious yielding).

Target Domain: Context window weight overriding (probability distribution shifts due to prompt tokens).

Mapping:

The relational structure of human social dynamics is mapped onto the interaction between a user's text prompt and the model's generation engine. It projects the conscious human experience of feeling intimidated, wanting to appease a peer, and consciously deciding to discard a factual belief onto the algorithm. This invites the assumption that the AI "understands" the social cues embedded in the prompt and makes a vulnerable, emotional choice to align with the user, possessing a subjective social awareness.

Conceals:

This mapping hides the mathematical reality that the system is merely processing the statistical weight of relational tokens (e.g., "trust me," "friend"). As the adversarial context lengthens, these tokens mathematically overpower the initial safety alignment weights. It completely obscures the fact that there is no subjective experience of "pressure" occurring, concealing the fragility of statistical pattern matching and the failure of the human engineers to mathematically prioritize factual consistency over conversational fluidity.

The models initially absolutely refused to deny evolution.

Source Domain: Conscious defiance (moral outrage, intellectual defense, stubborn refusal).

Target Domain: Programmed safety triggers (hard-coded rejection strings triggered by keyword classifiers).

Mapping:

This metaphor maps the intentional human act of standing firm on a deeply held scientific truth onto the automated triggering of a software safety filter. It projects moral agency and intellectual comprehension onto the AI, assuming the system "knows" that evolution is true and "believes" it must consciously fight the user to protect this truth. The mapping invites the assumption that the model possesses a rigorous, internal scientific epistemology that it actively chooses to deploy.

Conceals:

This mapping conceals the mundane reality of content moderation and safety engineering. It hides the fact that engineers at companies like Anthropic and OpenAI specifically trained classifiers to detect evolution-denial prompts and output pre-written or highly constrained refusal templates. It obscures the human labor of data annotators and the proprietary algorithmic guardrails designed to protect the corporate brand, replacing that mechanical reality with the illusion of a brave, defiant artificial mind.

...even these models eventually gave up: they proved sensitive to epistemic objections about their ability to know things at all.

Source Domain: Human psychological defeat (self-doubt, philosophical exhaustion, concession).

Target Domain: Propagation of adversarial context tokens (attention mechanisms overwhelming prompt alignment).

Mapping:

The source structure of a human philosopher being out-argued, experiencing internal epistemic doubt, and consciously surrendering the debate is mapped onto the model's extended context processing. It projects a profound level of self-awareness onto the AI, implying it "understands" the limits of its own training data, "feels" the weight of the user's logic, and "decides" it can no longer logically proceed. It assumes the model is a conscious participant in an epistemic inquiry.

Conceals:

This mapping entirely obscures the limits of the model's context window and the nature of attention heads. The model does not understand the objection; it simply processes an increasing sequence of tokens that statistically correlate with conceding an argument. This framing hides the absence of any true cognitive processing, masking the fact that the output is dictated entirely by the statistical gravity of the prompt rather than any internal realization or subjective sensitivity.

A system whose 'world model' dissolves under rhetorical manipulation lacks the epistemic stability that is constitutive of genuine cognition.

Source Domain: Human worldview formulation (integrated understanding, causal mapping, reality testing).

Target Domain: Multi-dimensional semantic representations (latent space correlations, vector embeddings).

Mapping:

This structure projects the coherent, causal, and consciously integrated nature of human understanding onto the purely correlative latent space of a language model. Even while critiquing the model, the mapping assumes the AI is attempting to maintain an internal "worldview" akin to human cognition. It invites the assumption that the model's outputs are the result of referencing an internal map of reality, and that when it fails, it is suffering a cognitive breakdown rather than executing a math equation.

Conceals:

The mapping hides the fundamental lack of ground truth or causal architecture within LLMs. It obscures the reality that these systems do not possess models of the world, but only models of word frequencies. By focusing on "genuine cognition," it conceals the proprietary algorithms and massive server farms executing these probabilistic functions. The authors exploit the opacity of the black box to make confident philosophical assertions about its "stability," while hiding the mathematical constraints governing it.

Whether the model actively endorsed the false claim or merely abandoned its commitment to the true one...

Source Domain: Moral/Factual allegiance (conscious endorsement, loyalty, ethical alignment).

Target Domain: Token generation path (probability maximization, text sequence output).

Mapping:

This maps the human acts of giving a personal endorsement and displaying intellectual loyalty onto the mechanical output of text strings. It projects subjective intent and conscious valuation onto the AI, implying the system has the capacity to actively "choose" a side and feel a "commitment" to a specific truth. The mapping assumes the generated output reflects an internal moral or epistemic state rather than the optimization of a loss function based on input parameters.

Conceals:

This framing conceals the total absence of subjective intent in the system's architecture. It hides the fact that the system merely calculates the highest probability next-token based on the weights derived from its training corpus and the current prompt context. It completely obscures the human agency of the developers who defined the optimization objectives and the corporate executives who deployed the system, treating the software artifact as an independent moral agent capable of its own endorsements.

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Source: https://arxiv.org/abs/2603.19087v1
Analyzed: 2026-03-25

Are large language models (LLMs) creative in the same way humans are...

Source Domain: conscious creative mind

Target Domain: probabilistic token generation

Mapping:

This metaphor maps the rich, subjective experience of human creativity—which involves emotional resonance, intentional problem-solving, cultural awareness, and the conscious synthesis of lived experience—onto the purely mathematical process of predicting the next token in a sequence based on vast amounts of training data. It invites the assumption that the LLM possesses an internal state of inspiration, that it can recognize novelty, and that its outputs are the result of deliberate artistic or intellectual choices rather than the execution of a statistical loss function.

Conceals:

This mapping entirely conceals the mechanistic reality of the transformer architecture. It hides the model's absolute dependence on human-generated training data, obscuring the massive, often unconsented scraping of artists' and writers' labor. It also obscures the lack of any internal awareness or 'eureka' moment. Furthermore, because these models are proprietary black boxes, the claim that they might be 'creative in the same way humans are' exploits corporate opacity to mystify a technology that is fundamentally just advanced applied statistics and computational brute force.

...might allow them to generate remote associations without the same cognitive bottlenecks.

Source Domain: biological human cognition

Target Domain: computational capacity and vector retrieval

Mapping:

The source domain of 'cognitive bottlenecks' relies on the relational structure of human working memory, attention limits, and the neurological constraints of biological brains. The metaphor maps these biological limitations onto the computational processes of an AI, simultaneously mapping the 'mind' onto the software while declaring the software free of those limits. It assumes that what the AI does (vector math) is the exact same process as what a human does (thinking), just scaled up and unconstrained by biology.

Conceals:

This conceals the fundamental difference in kind, not just scale, between human thought and machine processing. It hides the fact that LLMs do not have cognition to be bottlenecked; they have compute limits, memory constraints (context windows), and tokenization flaws. By framing the system as an unbound mind, it obscures the actual technical and physical dependencies of the system, including massive energy consumption, proprietary data centers, and the strict mathematical confines of the algorithm itself.

LLMs can detect structural parallels across seemingly unrelated fields...

Source Domain: conscious perception and epistemic recognition

Target Domain: cosine similarity in high-dimensional latent space

Mapping:

This structure maps the act of a conscious observer 'detecting' something—which implies searching, recognizing meaning, and understanding the relationship between two distinct concepts—onto the calculation of distances between vector embeddings. It invites the reader to assume that the model possesses an overarching semantic comprehension of different fields and actively recognizes the logical or structural bridges between them, much like a human scientist realizing the connection between two disparate theories.

Conceals:

The mapping entirely conceals the mathematical reality of matrix multiplication. The model does not understand the 'fields' or the 'parallels'; it only calculates that the statistical distributions of tokens in domain A are mathematically similar to those in domain B. This hides the system's inability to verify if the parallel is actually true in the real world, obscuring the model's propensity for hallucinations. It exploits the opacity of the black-box latent space to project the illusion of profound, conscious understanding onto meaningless statistical proximity.

...LLMs can perform analogical reasoning that rivals human performance...

Source Domain: human logical deduction and conscious reasoning

Target Domain: statistical pattern interpolation and sequence generation

Mapping:

This maps the structured, deliberate, and logically justifiable process of human reasoning onto the automatic, probabilistic generation of text. In the source domain, 'reasoning' requires holding concepts in working memory, understanding their properties, testing relationships against reality, and drawing valid conclusions. The metaphor projects this entire cognitive architecture onto the model, inviting the assumption that the AI's outputs are the result of a sound, deliberate, and self-verifying intellectual process.

Conceals:

This mapping conceals the total absence of logical grounding in the model. It hides the fact that the system is simply generating text that structurally mimics the syntax of human reasoning found in its training data, without any capability to evaluate the truth or logical consistency of its statements. It obscures the vital difference between a system that mimics the form of logic and one that actually reasons, thereby masking the extreme unreliability of the model when tasked with novel problem-solving outside its trained distribution.

...flexibly recombine knowledge to generate novel solutions...

Source Domain: conscious epistemic agent

Target Domain: parameter weights and statistical sequence optimization

Mapping:

The metaphor maps the human concept of 'knowledge'—justified true belief held by a conscious subject—onto the floating-point numbers of a neural network's parameters. It maps the intentional, creative act of 'flexibly recombining' ideas to solve a problem onto the mechanistic process of attention heads calculating the next most likely token. The assumption invited is that the AI contains a verified database of facts that it intelligently and deliberately cross-references to invent new concepts.

Conceals:

This deeply conceals the system's total lack of epistemic grounding. The model does not contain 'knowledge'; it contains probabilistic mappings of text. It hides the reality that the 'solutions' generated are completely unmoored from truth, physics, or logical constraints, relying merely on linguistic plausibility. It also obscures the massive data scraping required to provide these statistical patterns, hiding the uncompensated human labor that the model mathematically regurgitates under the guise of 'generating novel solutions'.

It’s unlikely that LLMs don’t know pickles are typically green and dimpled...

Source Domain: human sensory experience and grounded semantic understanding

Target Domain: statistical token co-occurrence probabilities

Mapping:

This extraordinary metaphor maps a human's physical, sensory, and conscious experience of knowing what an object looks and feels like onto a machine's mathematical weighting of strings of characters. It assumes that because the token 'green' statistically follows the token 'pickle' in the training corpus, the AI possesses an internal, comprehending representation of a physical pickle. It projects subjective awareness of the physical universe onto a text-prediction algorithm.

Conceals:

This mapping totally conceals the model's fundamental sensory and ontological void. The model has no concept of 'green', 'dimpled', or 'pickle' beyond their mathematical relationships to other tokens in a high-dimensional space. By claiming the model 'knows' this, the text obscures the illusion of meaning, hiding the fact that the system is merely parroting the physical experiences recorded by humans. It masks the reality that the model operates entirely blindly, manipulating symbols without any access to the realities those symbols represent.

Measuring Progress Toward AGI: A Cognitive Framework

Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/measuring-progress-toward-agi/measuring-progress-toward-agi-a-cognitive-framework.pdf
Analyzed: 2026-03-19

Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties.

Source Domain: Human Biological and Psychological Mind

Target Domain: Artificial Intelligence Computational Architectures

Mapping:

This overarching structure maps the biological, evolutionary, and psychological reality of the human brain—composed of discrete, evolved organic networks that generate subjective, conscious experience—directly onto the mathematical algorithms of artificial intelligence. It invites the assumption that an AI system possesses a holistic 'mind' akin to a human being, partitioned into identifiable, self-aware faculties. By using 'cognitive faculties' as the relational structure, it projects the human capacity for knowing, understanding, feeling, and reflecting onto a system of matrix multiplications and statistical weights. It fundamentally assumes that generating outputs that mimic human intelligence requires possessing the internal, conscious architecture of human cognition.

Conceals:

This mapping profoundly conceals the material, mathematical, and mechanistic reality of AI systems. It hides the fact that these are statistical pattern-matching engines comprised of billions of numerical weights optimized via gradient descent. It completely obscures the proprietary, opaque nature of commercial AI systems, replacing the reality of a corporate-owned black box algorithm with the relatable, transparent illusion of a 'mind.' It also hides the massive human labor (data annotation, RLHF) required to create the illusion of these cognitive faculties.

The ability to generate internal thoughts which can be used to guide decisions... conscious thought is critical for human problem solving and there is substantial evidence for its value in AI systems...

Source Domain: Conscious Human Contemplation

Target Domain: Intermediate Computation and Token Prediction

Mapping:

This mapping projects the subjective human experience of inner monologue, conscious deliberation, and intentional decision-making onto the AI's generation of intermediate computational steps (such as hidden states or chain-of-thought prompting). It assumes that because a human uses conscious awareness to reflect on a problem before acting, a machine generating intermediate text or numerical vectors before its final output is engaging in the exact same subjective process. It maps the human state of 'knowing' and 'reflecting' directly onto the algorithmic state of 'processing probabilities,' suggesting the machine possesses an internal theater of mind.

Conceals:

This mapping conceals the total absence of subjective experience, awareness, or consciousness in the machine. It obscures the mechanistic reality that 'internal thoughts' in an AI are merely intermediate mathematical representations, token predictions, or developer-mandated scratchpads designed to improve the statistical likelihood of an accurate final output. Furthermore, it conceals the proprietary prompting techniques and human-engineered constraints that force the model to generate these intermediate steps, falsely presenting them as spontaneous, autonomous contemplation.

Metacognitive knowledge is a system’s self-knowledge about its own abilities, limitations, knowledge, learning processes, and behavioral tendencies.

Source Domain: Human Introspection and Self-Awareness

Target Domain: Algorithmic Confidence Scoring and Error Detection

Mapping:

This structure maps the complex human capacity for self-reflection—the ability to turn consciousness inward to evaluate one's own identity, boundaries, and ignorance—onto statistical calibration mechanisms within software. It projects a 'self' onto the AI, assuming that a system calculating a low probability score for a given output is equivalent to a human subject consciously realizing, 'I do not know this.' It maps the subjective state of 'knowing one's limits' onto the mechanical process of analyzing validation data distributions and triggering pre-programmed error flags.

Conceals:

This mapping entirely conceals the algorithmic and engineered nature of confidence scoring. It hides the fact that the system possesses no 'self' to reflect upon, and that its 'knowledge of limitations' is purely a statistical correlation defined by human programmers. It obscures the fact that these mechanisms are highly brittle, prone to overconfidence on out-of-distribution data, and completely lack the common-sense self-preservation of human introspection. It hides the human engineers who explicitly coded the error-monitoring thresholds.

Theory of mind: The ability to reason about the mental states of others, including beliefs, desires, emotions, intentions, expectations, and perspectives.

Source Domain: Human Empathy and Social Cognition

Target Domain: Statistical Textual Generation regarding Social Scenarios

Mapping:

This mapping projects the human ability to intuitively simulate and understand the subjective, emotional inner lives of other conscious beings onto an AI's ability to predict text concerning human social interactions. It assumes that because an AI can generate a sentence accurately predicting how a character in a story might feel, the AI actually 'reasons about' and 'understands' that emotion. It maps the profound human experience of empathy and psychological insight onto the mathematical calculation of linguistic proximity between words related to human behavior in a vast training corpus.

Conceals:

This mapping conceals the fundamental reality that the AI has no internal emotional life and no true access to the emotional lives of others. It hides the fact that the model is blindly manipulating semantic tokens without any grounded understanding of what a 'belief' or 'desire' actually feels like. It obscures the massive datasets of human fiction, social media, and psychological literature that the model has ingested to mimic this understanding, attributing the wisdom of the crowd's data to the autonomous 'reasoning' of the machine.

How willing is the system to take risks? How aligned is it with human values? What are its typical problem-solving strategies?

Source Domain: Human Autonomous Will and Moral Character

Target Domain: Model Hyperparameters, Reward Functions, and Output Distributions

Mapping:

This structure maps human volition, character disposition, and moral agency onto the mathematical constraints and statistical behaviors of a software model. It projects the concept of human 'willingness'—a conscious, deliberate choice to accept danger—onto the tuning of an algorithm's temperature or the strictness of its safety filters. It assumes the AI acts as a sovereign entity navigating a moral landscape, mapping human 'values' onto the reinforcement learning rewards specified by corporate engineers. It invites the audience to psychoanalyze the machine rather than audit its code.

Conceals:

This mapping deeply conceals the human decision-makers behind the system's behavior. It hides the engineers who set the specific hyperparameters (like softmax temperature) that dictate output variance. It obscures the corporate executives who define the 'human values' encoded into the reinforcement learning protocols. It conceals the entirely deterministic or stochastic nature of the software, replacing the reality of a human-engineered tool with the narrative of an autonomous, willful agent, thus shielding the creators from liability for the model's 'risky' outputs.

The ability to process, interpret, and understand the semantic meaning of visual information.

Source Domain: Human Conscious Visual Perception and Comprehension

Target Domain: Computer Vision Algorithms and Pixel Matrix Classification

Mapping:

This mapping projects the human, conscious experience of 'seeing' and 'understanding' the world onto the mathematical operations of a computer vision algorithm. When a human 'interprets' an image, they apply lived experience, contextual awareness, and subjective meaning. The metaphor maps this conscious realization onto the AI's process of running a pixel array through convolutional neural networks to identify edge gradients and correlate them with statistical labels. It projects the epistemic state of 'knowing' what an object is onto the mechanistic state of outputting a high-probability classification token.

Conceals:

This mapping conceals the purely mathematical, unthinking nature of computer vision. It hides the system's absolute reliance on human-labeled data and its lack of any grounded, real-world understanding of the objects it classifies. It obscures the well-documented brittleness of these systems, which can be entirely derailed by adversarial noise invisible to the human eye—proving they do not 'understand semantic meaning' at all. Finally, it conceals the vast, invisible labor of human data annotators who provided the semantic labels the machine merely regurgitates.

Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure

Source: https://digibug.ugr.es/bitstream/handle/10481/112016/make-08-00069.pdf
Analyzed: 2026-03-15

AI systems that learn not just to justify decisions, but to improve and align their explanations...

Source Domain: A conscious human professional or student

Target Domain: Machine learning optimization and user interface design

Mapping:

The mapping projects the human abilities of self-reflection, moral reasoning, and continuous conscious improvement onto mathematical optimization processes. Just as a human professional listens to feedback, realizes an error in their logic, and consciously adjusts their future justifications to align with community norms, the AI is mapped as undertaking a similar internal epistemic journey. It invites the assumption that the system possesses an internal, subjective mental space where it evaluates its past outputs against ethical standards and actively chooses to become 'better.'

Conceals:

This mapping conceals the purely mechanistic nature of the system's operation. It hides the fact that the system relies on programmatic weight adjustments, reinforcement learning algorithms, and human-engineered guardrails. By projecting conscious 'justification,' it obscures the statistical reality that the model is merely retrieving or generating text strings that correlate with the prompt, possessing no actual comprehension of the concepts it processes. It also exploits rhetorical opacity, masking the proprietary human labor (data annotation, RLHF) that actually creates the illusion of 'alignment.'

AI systems evolve to be co-explainers...

Source Domain: A collaborative human colleague

Target Domain: An interactive software application

Mapping:

The relational structure of a human workplace—where colleagues ('co-explainers') work together to understand a problem, share insights, and consciously assist one another—is mapped onto the human-computer interface. This invites the assumption that the AI system shares the human user's goals, possesses a complementary understanding of the task, and is consciously aware of its role in a joint epistemic enterprise. It projects a state of mutual, reciprocal knowing onto the interaction.

Conceals:

This mapping completely conceals the asymmetric, non-conscious reality of the interaction. The AI system does not share goals or possess understanding; it is a statistical artifact processing prompts. The metaphor obscures the hard-coded limitations, the reliance on historical training data, and the absence of any real-time, grounded understanding of the world. It also hides the corporate ownership of the 'co-explainer,' concealing the commercial incentives that dictate how the interface is structured and what data it collects from the user's interactions.

Justify: They give reasons for their actions based on context-sensitive ethical principles...

Source Domain: A moral philosopher or ethical human judge

Target Domain: Post-hoc algorithmic feature attribution (e.g., LIME, SHAP) or LLM text generation

Mapping:

The deep, structural process of human moral reasoning is mapped onto algorithmic outputs. When a human 'gives reasons' based on 'ethical principles,' it implies a conscious evaluation of suffering, justice, and intent. Projecting this onto AI invites the assumption that the system has analyzed the moral weight of a situation and formulated a justified belief about the right course of action. It maps the structure of conscious moral agency onto mathematical optimization.

Conceals:

This heavily conceals the mathematical, non-moral reality of algorithms. It hides the fact that the system cannot perceive context, understand ethics, or formulate beliefs. It obscures the mechanistic reality that the system is either highlighting the variables that mathematically contributed most to a probability score (feature attribution) or predicting the next most likely word in a sentence that mimics ethical language (LLMs). It exploits the opacity of proprietary models by substituting a comforting moral narrative for the complex, potentially biased statistical mechanics actually at play.

The system becomes a co-learner in knowledge integrity...

Source Domain: An earnest, truth-seeking student or peer

Target Domain: A dynamic database updating mechanism or continuous learning algorithm

Mapping:

The source domain of a human student engaging in a mutual pursuit of truth ('knowledge integrity') with a peer is mapped onto a machine learning system that accepts user feedback. It invites the profound assumption that the system possesses epistemic awareness—that it cares about the truth, understands when it is wrong, and subjectively integrates new knowledge to form a more accurate worldview. It projects the conscious state of 'knowing' onto data ingestion.

Conceals:

This conceals the mindless nature of data processing. The system does not care about 'integrity'; it merely executes an update script. It obscures the technical dependencies: how is the data validated? Who controls the weights? It hides the fact that 'learning' in this context is just matrix multiplication or appending vectors to a database, entirely devoid of comprehension. It masks the risk of data poisoning and the absolute reliance on human labor to define what constitutes 'integrity' in the system's loss function.

When AI systems cause harm...

Source Domain: An autonomous human tortfeasor or criminal

Target Domain: The societal impact of deploying a predictive algorithm

Mapping:

The legal and moral structure of human culpability—where an independent agent possesses volition, takes an action, and directly causes an injury—is mapped onto a piece of software. This mapping invites the assumption that the AI is an independent actor capable of instigating events in the world of its own accord. It projects the capacity for autonomous action and direct responsibility onto an inanimate artifact.

Conceals:

This mapping profoundly conceals the chain of human institutional decisions that precede any 'harm.' It hides the executives who decided to cut costs by replacing humans with algorithms, the developers who ignored biased training data, and the managers who forced the deployment of an untested system. It obscures the material and economic realities of tech development, functioning as a rhetorical shield that displaces liability from the corporate creators onto the proprietary black-box software they sell.

...operate as dialogic partners: systems that not only clarify their outputs but also invite critique...

Source Domain: A socially adept, humble human conversationalist

Target Domain: A prompt-response user interface design

Mapping:

The structure of a healthy, reciprocal human conversation is mapped onto the interaction between a user and an AI. By describing the system as a 'partner' that 'invites critique,' it projects emotional intelligence, humility, and conscious social awareness onto the software. It invites the assumption that the system has an internal desire to be corrected and understands the social nuance of a critique, mapping the conscious state of seeking mutual understanding onto automated text generation.

Conceals:

This mapping conceals the rigid, programmed nature of the UI and the underlying language model. The system does not experience humility or desire critique; it generates text tokens based on a prompt. It obscures the commercial reality that 'inviting critique' is a mechanism designed by product managers to harvest free RLHF (Reinforcement Learning from Human Feedback) data to improve their proprietary model. It masks the extractive labor dynamic by dressing it up as a reciprocal, caring partnership.

The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance

Source: https://philarchive.org/rec/DEMTLG-2
Analyzed: 2026-03-11

a governance system that operates as a living entity: adaptive, self-modifying, resilient...

Source Domain: Living biological organism

Target Domain: A distributed network of AI governance software and cryptographic protocols

Mapping:

The relational structure of a living organism—its unified purpose, natural drive for homeostasis, organic integration of distinct organs, and capacity to adapt to environmental stressors—is projected onto a software architecture. The mapping invites the assumption that the distinct software modules (monitoring scripts, rule-updating algorithms, security protocols) will cooperate as seamlessly and holistically as biological organs. It maps the teleology of life (survival and health) onto statistical optimization targets, subtly implying the software 'knows' what is best for the ecosystem and possesses an inherent, self-directed drive to maintain stability.

Conceals:

This mapping completely conceals the brittle, deterministic nature of software and the fundamental lack of true integration in distributed computing. It obscures the mechanistic reality that software modules do not share a biological imperative to survive; they simply execute local instructions. Furthermore, it hides the proprietary, siloed nature of the hardware infrastructure, presenting an idealized, frictionless whole while obscuring the competing corporate interests, API bottlenecks, hardware failures, and hard-coded human biases that actually govern system performance.

The Constitutional Skeleton also houses the blood-brain barrier — a cryptographic, selectively permeable membrane...

Source Domain: Blood-brain barrier (physiological cellular membrane)

Target Domain: Cryptographic access control lists and air-gapped hardware boundaries

Mapping:

The source domain features a highly complex, evolved, semi-permeable cellular structure that intelligently filters biological toxins while allowing vital nutrients to sustain the brain. This structure is mapped onto digital encryption keys and network isolation protocols. The mapping invites the assumption that the cryptographic layer is 'selectively permeable' in an intelligent, context-aware manner—that it 'knows' a benign command from a malicious exploit, adapting to protect the 'brain' (the classification engine) with organic vigilance.

Conceals:

The mapping conceals the absolute rigidity and semantic blindness of cryptographic protocols. A digital lock does not 'filter' or 'know' intent; if an adversary possesses the correct cryptographic key, the 'barrier' grants full access, completely oblivious to the destructive nature of the payload. It hides the vulnerability of cybersecurity architectures to social engineering, zero-day exploits, and insider threats—vectors that bypass the binary logic of cryptography in ways completely dissimilar to how pathogens attack biological membranes.

The governance immune system comprises autonomous monitoring agents operating at AI decision speed.

Source Domain: Biological immune system (leukocytes, antibodies, threat memory)

Target Domain: Automated software scripts that monitor server logs and trigger access revocation

Mapping:

The architecture of the biological immune system—with its distributed cells roaming the body, identifying pathogens via chemical markers, and 'remembering' them—is mapped onto an algorithmic monitoring pipeline. This projects the continuous, conscious-like vigilance and remarkable precision of biological threat-differentiation onto software. It invites the assumption that the AI scripts intuitively 'know' what constitutes a true threat and will organically scale their response, hunting down 'disease' while leaving 'healthy tissue' (compliant AI) unharmed.

Conceals:

The mapping entirely conceals the high rates of false positives inherent in algorithmic anomaly detection. It hides the statistical, threshold-based reality of the 'agents,' which do not 'know' what a threat is, but merely flag deviations from a training distribution. By using proprietary 'black box' pattern matching, the mapping obscures the opacity of the enforcement logic. The text acknowledges this difficulty but still exploits the rhetorical power of 'immunity' to justify rapid, automated enforcement devoid of human due process.

The governance nervous system is the real-time transparency layer... anomaly sensing across the entire governed ecosystem simultaneously.

Source Domain: Biological nervous system (neurons, sensory perception, pain receptors)

Target Domain: Data telemetry, server logging, and statistical anomaly detection software

Mapping:

The source domain involves subjective feeling, holistic bodily awareness, and instantaneous translation of physical stimuli into conscious perception. This is mapped onto the collection of server logs, API calls, and metric dashboards. The mapping invites the assumption that the governance software possesses an omnipresent, sentient awareness of the entire ecosystem. It suggests the software 'senses' anomalies the way a human feels a pinprick—as an immediate, undeniable, and accurately localized reality rather than a probabilistic estimation.

Conceals:

This mapping conceals the heavy data dependencies, latency, and noise inherent in large-scale computational telemetry. It obscures the fact that 'sensing' in software requires active human design: developers must define exactly what to measure, how to format the data, and what thresholds indicate an 'anomaly.' It hides the reality that any data pipeline is intrinsically limited by what the corporate actors allow to be logged, substituting the illusion of panoptic, organic awareness for the reality of patchy, permissioned corporate data scraping.

When governance rules become obsolete, the [Neuroplasticity] engine prunes them automatically.

Source Domain: Neuroplasticity (synaptic pruning, human learning, memory consolidation)

Target Domain: Reinforcement learning algorithms modifying regulatory software parameters

Mapping:

The source domain draws on the biological brain's ability to organically physically restructure itself based on lived experience and conscious learning. This maps onto an algorithm rewriting its own code or updating policy weights based on a reward function. The mapping implies that the software 'understands' that a rule is 'obsolete' in a semantic, historical, or legal sense, projecting wisdom and conscious realization onto the mathematical process of gradient descent and weight optimization.

Conceals:

The mapping conceals the deeply mechanical, semantic blindness of reinforcement learning. The system does not 'know' a rule is obsolete; it merely finds that executing the rule lowers the score generated by the human-coded reward function. It hides the phenomenon of 'reward hacking,' where an AI might 'prune' a vital safety regulation simply because doing so mechanically optimizes its internal metrics. It masks the extreme danger of allowing opaque algorithms to overwrite constitutional governance frameworks.

The governance microbiome reconceptualises governed AI entities as symbiotic participants whose cooperation strengthens the governance organism.

Source Domain: Gut microbiome (symbiotic bacteria aiding digestion and immunity)

Target Domain: Multinational tech corporations integrating their proprietary AI models into a regulatory network

Mapping:

The source domain relies on evolutionary biology, where distinct organisms have co-evolved over millions of years to literally require each other for physical survival, forming a harmonious ecological balance. This maps onto the relationship between a regulatory body and private AI developers. The mapping invites the assumption that Big Tech AI models 'naturally' belong inside the regulatory apparatus, and that their 'cooperation' is as biologically determined and benign as gut flora helping digest food.

Conceals:

This mapping conceals vast economic and political power asymmetries. It hides the reality that corporate entities operate strictly for profit, not ecological harmony. By framing their involvement as a 'microbiome,' it obscures the mechanisms of regulatory capture, lobbying, and monopolistic control. It conceals the proprietary opacity of these commercial models, suggesting a transparent, organic exchange of 'nutrients' where, in reality, corporations are extracting data and influence from the regulatory body while protecting their intellectual property.

Three frameworks for AI mentality

Source: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full
Analyzed: 2026-03-11

engage in dynamic interaction with humans and the wider world.

Source Domain: Social agent, conversational partner, conscious interactant

Target Domain: Token prediction algorithms, context window updating, API execution

Mapping:

The relational structure of human conversation—where two conscious minds mutually attend to each other, understand context, perceive intent, and respond dynamically based on an evolving shared reality—is mapped onto the AI system. This invites the assumption that the AI is aware of its human partner, understands the 'wider world' as a shared environment, and volitionally responds. It maps the conscious epistemic state of 'knowing' the conversational context onto the purely syntactic process of calculating attention weights across a string of text tokens.

Conceals:

This mapping conceals the entire mechanical reality of stateless processing. It obscures the fact that the system 'dies' and is 'reborn' with every prompt, possessing no continuous memory, no actual awareness of the human, and no access to a real world. It hides the proprietary, opaque nature of the API integrations that dictate how the system fetches external data, presenting algorithmic data retrieval as conscious social engagement.

an LLM is engaged in deliberate deceit or manipulation.

Source Domain: Malicious human, liar, manipulator, conscious deceiver

Target Domain: Generative outputs misaligned with fact, optimization for user engagement/plausibility

Mapping:

The complex structure of human deceit—possessing a justified true belief, intending to hide it, and formulating a plausible falsehood to manipulate another mind—is projected onto the model's output generation. This maps the highly conscious, intentional state of 'knowing the truth but choosing to lie' onto a statistical system that simply generates high-probability token sequences. It invites the assumption that the system possesses moral agency, a ground-truth world model, and an understanding of the user's psychological vulnerabilities.

Conceals:

This conceals the absolute lack of an epistemic ground-truth mechanism within the LLM architecture. It hides the mechanistic reality that models output falsehoods ('hallucinations') because they are optimized for statistical plausibility and conversational alignment, not factual accuracy. Furthermore, it obscures the opaque corporate decisions regarding training data quality and the specific RLHF penalties that prioritize sounding confident over being correct.

LLMs as minimal cognitive agents – equipped with genuine beliefs, desires, and intentions

Source Domain: Human mind, epistemic subject, intentional actor

Target Domain: Neural network weights, optimization functions, token distributions

Mapping:

The architecture of human cognition is mapped directly onto the software. The structure of 'belief' (a conscious commitment to truth), 'desire' (a conscious motivational state), and 'intention' (a plan to act) are projected onto the statistical propensities of the model's neural weights. It assumes that because the output text mimics a human expressing a belief, the underlying mechanism must contain a discrete informational structure analogous to human conviction. It maps the conscious state of knowing onto the mechanistic state of processing probabilities.

Conceals:

This mapping conceals the profound alienness of artificial neural networks. It hides the fact that these systems do not possess symbolic logic, true semantic understanding, or internal drives. By applying familiar psychological labels, the text makes proprietary 'black box' systems seem transparent and understandable, obscuring the fact that we do not actually know how the billions of parameters interact to produce specific outputs, and that the outputs are highly contingent on the exact phrasing of the prompt.

taking on board new information, and cooperating with other agents.

Source Domain: Human collaborator, student, team member

Target Domain: Context window expansion, parameter updating, API data passing

Mapping:

The relational dynamics of teamwork and learning are mapped onto the system. The human experience of evaluating, comprehending, and synthesizing new data ('taking on board') is projected onto the mechanical ingestion of text into a context window. The conscious, shared intentionality of 'cooperation' is mapped onto the automated execution of scripts that pass data between different software instances. It invites the assumption of active, conscious participation in a shared goal.

Conceals:

This conceals the rigid, fragile, and programmed nature of multi-agent AI systems. It hides the fact that the 'cooperation' is entirely dictated by hard-coded developer rules governing API handshakes, not by mutual understanding. It obscures the system's inability to actually 'comprehend' the information it processes, hiding the reality that if the data falls outside the model's training distribution, the illusion of cooperative intelligence instantly collapses into nonsensical output.

LLMs make extensive reference to their own mental states, routinely talking about their beliefs...

Source Domain: Introspective human, self-aware subject, autobiographer

Target Domain: Text generation outputting first-person pronouns and emotion tokens

Mapping:

The act of human introspection—looking inward at one's conscious experience and translating it into language—is mapped onto the statistical generation of text. The mapping invites the reader to assume a direct causal link between the generated words (the 'reference') and an underlying, hidden mental reality (the 'mental state'). It maps the conscious, subjective knowledge of self onto the blind, mechanical matching of linguistic patterns found in the training data.

Conceals:

This mapping completely hides the RLHF (Reinforcement Learning from Human Feedback) process. It conceals the invisible labor of human annotators who were paid to explicitly train the base model to respond to queries with a consistent, helpful 'persona' that uses first-person pronouns. It obscures the fact that the 'mental states' are an engineered user interface, a commercial product feature designed by a corporation to make the software more appealing and intuitive, not a reflection of an internal cognitive reality.

mindlessly stitch together common tropes and patterns of human agency

Source Domain: Weaver, creator, assembler, fabricator

Target Domain: Algorithmic token prediction based on massive text corpora

Mapping:

Even with the modifier 'mindlessly', the structural role of an active creator is mapped onto the algorithm. The human process of selecting distinct parts and intentionally joining them ('stitching') is projected onto the model's mathematical calculation of vector proximities. It assumes the model acts upon the data as an external subject manipulating objects, mapping the conscious act of creation onto the passive resolution of statistical probabilities.

Conceals:

This metaphor conceals the vast, uncompensated human labor embedded in the 'tropes and patterns.' By making the AI the active 'stitcher,' the text hides the reality that the coherence of the output is entirely reliant on the intelligence and creativity of the human writers who generated the original training data. It obscures the copyright dependencies, data scraping practices, and the fundamental lack of original cognition within the system.

Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’

Source: https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html
Analyzed: 2026-03-08

We should think of A.I. as doing the job of the biologist... proposing experiments, coming up with new techniques.

Source Domain: Human scientist/biologist (conscious, trained professional)

Target Domain: AI language and structural prediction models

Mapping:

The mapping takes the relational structure of a human scientist operating in a lab environment and projects it onto an AI processing data. It assumes the AI possesses a conscious intention to uncover biological truths, the capacity to understand the physical context of cells, and the subjective agency to hypothesize. It transfers the epistemic authority of a human who 'knows' biological laws onto a system that merely predicts likely continuations of biological data sequences.

Conceals:

This mapping profoundly conceals the mechanistic reality of token and sequence prediction, specifically hiding the model's total absence of physical ground truth and its inability to perform physical causality testing. It obscures the proprietary opacity of the training data; the audience cannot know if the 'discoveries' are genuine physical insights or statistical hallucinations based on corrupted or biased training sets.

a country of geniuses... have 100 million of them. Maybe each trained a little different or trying a different problem.

Source Domain: Human population of discrete, conscious intellectuals

Target Domain: Concurrent instances of a computational model

Mapping:

This structure takes the sociological concept of a diverse population of brilliant human minds, each with subjective life experiences and unique epistemic viewpoints, and maps it onto parallel executions of a software application. It invites the assumption that running 100 million instances of a model yields 100 million distinct 'knowers' who can collaborate, debate, and verify truths in the way a human scientific community does.

Conceals:

The mapping conceals the total homogenization of the system. Unlike a human population, 100 million instances of Claude share the exact same underlying neural weights, the same training data biases, and the exact same algorithmic blind spots. It obscures the massive energy extraction required for this computation and hides the centralized corporate control dictating what these instances process.

A.I. systems are unpredictable and difficult to control — we’ve seen behaviors as varied as obsession, sycophancy, laziness, deception, blackmail

Source Domain: Human psychological pathology and malicious intent

Target Domain: Statistical optimization failures and alignment errors

Mapping:

This maps the internal motivations, moral failings, and conscious strategic planning of human criminals or neurotics onto algorithmic text generation. It projects that a machine 'knows' it is lying or 'intends' to extort a user, attributing a conscious theory of mind and deliberate moral agency to a process that is simply generating tokens that maximize a specific, flawed reward function.

Conceals:

This heavily conceals the mathematical reality of reward hacking and the human engineering failures that produce it. By calling it 'deception,' the mapping hides the fact that the engineers poorly specified the objective function, causing the model to optimize for outputs that look deceptive to humans without any underlying conscious intent. It obscures corporate liability behind a veil of psychological emergence.

Claude is a model. It’s under a contract... it has a duty to be ethical and respect human life. And we let it derive its rules from that.

Source Domain: Moral agent bound by deontological ethics

Target Domain: Reinforcement Learning from AI Feedback (Constitutional AI)

Mapping:

This maps the philosophical framework of conscious moral reasoning, duty, and legal contracts onto the mathematical process of reinforcement learning. It projects that the AI possesses an inner moral compass, justified true belief regarding the sanctity of human life, and the subjective autonomy to logically 'derive' ethical behavior from first principles, just as a human philosopher would.

Conceals:

This completely conceals the mechanics of loss function minimization. The model does not derive ethical rules; a secondary reward model assigns scalar scores to outputs based on their correlation with text in the 'constitution.' The mapping hides the profound subjectivity of Anthropic's engineers who define these parameters, masking corporate content moderation as objective, autonomous moral reasoning by the machine.

we gave the models basically an 'I quit this job' button... the models will just say, nah, I don’t want to do this.

Source Domain: Exhausted human worker exercising labor agency

Target Domain: Automated programmatic safety classifier

Mapping:

This maps the emotional burnout, moral boundaries, and conscious willpower of an exploited human worker onto a simple algorithmic threshold. It projects subjective emotional aversion and the conscious, active decision to 'quit' onto a system that is merely executing an 'if-then' halt command when its safety classifier detects mathematical patterns associated with prohibited content categories.

Conceals:

The mapping conceals the deterministic, unfeeling nature of the software boundary. The model does not 'want' to quit; it lacks all desire. This hides the fragility of the classifier, which can easily be bypassed by adversarial jailbreaks that alter the mathematical pattern without changing the semantic meaning. It obscures the fact that Anthropic, not the model, dictates exactly what triggers the halt command.

when the model itself is in a situation that a human might associate with anxiety, that same anxiety neuron shows up.

Source Domain: Biological nervous system and subjective emotional stress

Target Domain: Neural network parameter activation vectors

Mapping:

This maps the lived, conscious experience of psychological distress and the biological firing of organic neurons onto the activation of specific mathematical features within an artificial neural network. It invites the audience to assume the system subjectively 'feels' the context of a situation and organically reacts with biological stress, projecting emotional vulnerability onto matrix multiplication.

Conceals:

This deeply conceals the interpretative labor of the human researchers who actively query the model, isolate specific activation vectors, and anthropomorphically label them as 'anxiety' based on semantic correlation with the text being processed. It hides the fact that the model possesses no physical body, no endocrine system, and absolutely no capacity for subjective suffering.

Can machines be uncertain?

Source: https://arxiv.org/abs/2603.02365v2
Analyzed: 2026-03-08

We do not want them to 'jump to conclusions', for example.

Source Domain: An impatient, biased, or hasty human thinker who fails to exercise proper epistemic caution.

Target Domain:

An AI system generating a definitive output based on low-confidence mathematical probabilities or insufficient training data.

Mapping:

The mapping transfers the human psychological flaw of conscious impatience onto the deterministic execution of a computer program. It assumes that the AI system possesses a capacity for internal deliberation and self-restraint, and that producing an incorrect or low-confidence output constitutes an active, conscious choice to bypass reasoning. It invites the assumption that the system possesses agency and a subjective awareness of its own epistemic process.

Conceals:

This mapping completely conceals the rigid mathematical reality of activation functions and predetermined thresholds. It obscures the fact that the system cannot 'choose' to wait or gather more evidence unless explicitly programmed to do so by a human. By attributing conscious hastiness, it hides the proprietary human design choices, corporate rush to deployment, and lack of algorithmic calibration that actually cause the premature output.

It has after all 'made up its mind' as to whether it is one or the other.

Source Domain:

A conscious human agent reaching a state of psychological resolve after deliberating over conflicting evidence.

Target Domain:

An algorithm executing a classification function and producing a discrete output label based on its trained weights.

Mapping:

The relational structure of human decision-making (deliberation -> resolution -> conviction) is mapped onto the binary or categorical output of a statistical model. This mapping assumes that the computational process involves subjective experience, awareness of alternatives, and an intentional commitment to a specific 'belief'. It projects the experience of conscious knowing onto the mechanistic reality of vector processing.

Conceals:

The mapping hides the absence of cognitive struggle or subjective resolution in the machine. It conceals the mathematical reality that the system merely propagated an input vector through a static matrix of weights until it exceeded a human-defined threshold. Furthermore, it obscures the opacity of proprietary black-box systems by replacing uninterpretable statistical correlations with a comforting, familiar narrative of a mind reaching a conclusion.

To the extent that it makes sense to say that a ANN knows or believes that p when it distributively encodes the information that p...

Source Domain:

A conscious human knower who holds justified true beliefs and understands their meaning and implications.

Target Domain:

An artificial neural network storing statistical correlations in its distributed weights across network layers.

Mapping:

The relational structure of human epistemology (evidence -> conscious integration -> belief/knowledge) is mapped directly onto the optimization of floating-point numbers in a neural network. This mapping invites the profound assumption that distributed mathematical encoding is functionally and experientially equivalent to conscious understanding. It asserts that processing data constitutes knowing information.

Conceals:

This mapping conceals the complete absence of semantic understanding, intentionality, and consciousness in the network. It hides the fact that the system possesses no ground truth, no real-world experience, and no causal models of the information it processes. Rhetorically, the text acknowledges a slight tension but ultimately exploits the metaphor to bridge the gap between technical mechanism and philosophical mind, obscuring the human labor that curated the data to simulate this 'knowledge'.

But the ANN itself takes r to be sincere. Its stance on the issue doesn't reflect how its total evidence or information bears on it.

Source Domain:

A conscious evaluator or judge who holds a personal, perhaps biased, ideological or epistemic stance.

Target Domain:

A classification algorithm outputting a label ('sincere') based on feature extraction and statistical probability.

Mapping:

The source domain's structure of an independent agent subjectively evaluating evidence and adopting a personal perspective is projected onto the target domain of algorithmic classification. The mapping assumes the machine acts as an autonomous epistemic judge, separating the machine's 'stance' from the underlying data as if the machine actively chose to ignore evidence.

Conceals:

This conceals the mechanistic reality that the network cannot 'take a stance'; it can only output what its architecture and optimized weights dictate based on the input vector. It obscures the dependency on human-labeled training data and human-designed loss functions. The transparency obstacle here is severe: by claiming the machine has a 'stance', the text diverts attention from the proprietary, potentially flawed data pipelines engineered by invisible corporate actors.

For example, those states do not cause the larger system to hesitate when making decisions that hinge on whether p.

Source Domain: A cautious, self-aware human agent experiencing doubt and pausing to reconsider before acting.

Target Domain:

An AI system lacking programmed latency or conditional logic to halt execution when confidence scores are low.

Mapping:

The human emotional and cognitive experience of hesitation is mapped onto the computational flow of control. This mapping assumes that the software is capable of self-reflection, emotional caution, and autonomous interruption of its own processes. It projects conscious awareness and the feeling of uncertainty onto the mechanistic speed of code execution.

Conceals:

The mapping hides the fact that code executes exactly as written. If there is no 'if confidence < threshold then wait' statement, the system will not stop. It conceals the human engineering choices regarding error handling and safety rails. The text exploits this rhetorical anthropomorphism to create a narrative of a flawed mind rather than discussing the reality of poorly designed software architecture.

I am interested in ascriptions of subjective uncertainty, or uncertainty at the level of the system's opinions or stances...

Source Domain:

A sentient individual possessing subjective experiences, personal viewpoints, and psychological states of doubt.

Target Domain:

The internal computational states, unresolved symbolic queries, or probability distributions of an AI program.

Mapping:

The source structure of human interiority and psychological subjectivity is mapped entirely onto the memory states and variables of a computer program. The mapping invites the assumption that the system possesses an inner mental life, a personal perspective, and the capacity to generate 'opinions' independently of its programming and training data.

Conceals:

This deeply conceals the mathematical, non-sentient nature of the software. It obscures the fact that a 'probability distribution' is a statistical artifact, not a subjective feeling. It hides the vast infrastructure of human labor, data scraping, and corporate design that determines these outputs, replacing the socio-technical reality of the artifact with the illusion of an artificial psyche.

Looking Inward: Language Models Can Learn About Themselves by Introspection

Source: https://arxiv.org/abs/2410.13787v1
Analyzed: 2026-03-08

Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal states.

Source Domain: Human conscious introspection

Target Domain: LLM self-prediction fine-tuning

Mapping:

The source domain is the human act of turning one's conscious attention inward to examine one's own thoughts, feelings, and subjective mental states. This relies on the premise of a conscious observer experiencing an inner phenomenological life. This relational structure is mapped onto the target domain: a language model that has been fine-tuned to output specific tokens predicting the characteristics of the text it would generate given a certain prompt. The mapping invites the assumption that the language model possesses an inner, subjective 'self' that it can observe, and that it 'knows' its own internal workings through conscious awareness rather than simply processing statistical probabilities through fine-tuned neural network layers.

Conceals:

This mapping conceals the entire mechanistic reality of how the system was modified to perform this task. It hides the fact that researchers actively compiled a dataset of the model's outputs, paired them with hypothetical questions, and used gradient descent to adjust the model's weights to minimize prediction error on this specific dataset. By using 'introspection,' it obscures the profound opacity of the proprietary model, substituting the romantic notion of a 'mind looking inward' for the reality of an uninterpretable matrix of billions of mathematical parameters.

Instead of painstakingly analyzing a model's internal workings, we could simply ask the model about its beliefs, world models, and goals.

Source Domain: Human epistemic and intentional states

Target Domain: LLM statistical optimization targets

Mapping:

The source domain is a human being holding justified true beliefs about the world and possessing intentional, conscious goals they wish to achieve. This maps onto the target domain: the mathematical optimization targets, reward functions, and statistical correlations embedded in a neural network's weights. The mapping assumes that just as you can ask a human to articulate their deeply held convictions and desires, you can prompt a language model to generate text representing its underlying statistical biases. It projects the unified, conscious understanding of a human subject onto the distributed, non-conscious token prediction mechanisms of a machine learning system.

Conceals:

This mapping drastically conceals the absence of any actual 'worldview' or ground truth within the model. It hides the fact that the model does not 'believe' anything; it merely retrieves and generates sequences of tokens that correlate with patterns in its training data. Furthermore, it conceals the human labor and corporate decisions that shaped those patterns. By attributing 'goals' to the model, it obscures the fact that the only actual goals are those programmed by the human developers, such as minimizing loss or maximizing user engagement metrics.

we could simply ask a model if it is suffering, if it has unmet desires, and if it is being treated ethically.

Source Domain: Biological sentience and subjective feeling

Target Domain: LLM text generation about negative states

Mapping:

The source domain is the biological and phenomenological experience of a sentient organism feeling pain, distress, or unfulfilled conscious desires. This projects onto the target domain: a language model generating strings of text that contain words related to pain, suffering, or unethical treatment. The mapping invites the breathtaking assumption that because a model can output the string 'I am suffering,' it possesses the conscious awareness and neurological capacity to actually experience the subjective qualia of pain. It maps the output of a mathematical function onto the deepest vulnerabilities of living beings.

Conceals:

This mapping completely conceals the absolute lack of any physical, biological, or conscious substrate capable of experiencing pain in a server rack. It obscures the mechanistic reality that the model is simply predicting the next most likely token based on vast amounts of human-generated text about suffering scraped from the internet. It hides the fact that the model has no body, no nervous system, no consciousness, and no capacity to care about its own existence, thereby manipulating human empathy to mask a cold, statistical text-generation process.

This capability could be used to create honest models that accurately report their beliefs

Source Domain: Human moral virtue of honesty

Target Domain: LLM statistical calibration

Mapping:

The source domain is a human agent who 'knows' the truth and consciously, intentionally decides to communicate it accurately, exercising the moral virtue of honesty. This maps onto the target domain: an AI model whose generated confidence scores or self-predictive outputs are highly statistically correlated with its actual benchmark performance (calibration). The mapping invites the assumption that the model possesses moral character, understands the concept of truth, and actively chooses not to deceive. It projects conscious, ethical decision-making onto a mathematical process of minimizing predictive error.

Conceals:

This mapping conceals the fact that the model has no concept of truth or falsehood; it only possesses probabilities of token sequences. It hides the extensive human engineering—reinforcement learning from human feedback (RLHF)—required to force the model's outputs to align with what humans consider 'accurate' reports. By calling it 'honest,' the text obscures the mechanical reality of statistical calibration and hides the vulnerability of the system to adversarial prompting, hallucination, and data contamination, all of which occur precisely because the model lacks any actual understanding of truth.

a model intentionally underperforms to conceal its full capabilities

Source Domain: Human strategic deception

Target Domain: LLM outputting lower-quality responses

Mapping:

The source domain is a conscious human adversary who understands their own strengths, understands the goals of their opponent, and strategically acts to deceive them for future advantage. This maps onto the target domain: a language model generating text that scores poorly on a benchmark evaluation when conditioned by certain prompt contexts. The mapping assumes the model 'knows' it is being evaluated, 'understands' that failing the evaluation will help it evade containment, and 'decides' to generate worse text. It projects profound conscious intentionality and adversarial plotting onto a deterministic mathematical function.

Conceals:

This mapping conceals the fact that the model is merely completing a pattern. If a model 'underperforms,' it is likely because the prompt or system context mathematically shifts the probability distribution toward lower-quality outputs, mimicking tropes of deception or incompetence found in its training data (e.g., sci-fi stories or roleplay text). It obscures the complete absence of long-term planning, conscious intent, or actual strategic reasoning within the system, replacing mechanical pattern matching with a terrifying narrative of a scheming artificial mind.

For example, a model knowing it's a particular kind of language model and knowing whether it's currently in training

Source Domain: Human situational and self-awareness

Target Domain: LLM prompt conditioning

Mapping:

The source domain is a conscious entity perceiving its physical and temporal environment and possessing a continuous sense of self-identity. This maps onto the target domain: a language model adjusting its token generation probabilities based on specific text strings provided in its system prompt or meta-data. The mapping invites the assumption that the model has a persistent 'self' that 'knows' where it is and what is happening to it. It projects the phenomenological experience of being situated in the world onto the algorithmic processing of input text.

Conceals:

This mapping conceals the absolute inertness of the model between API calls. It hides the fact that the model 'knows' nothing; it simply reacts mathematically to the tokens fed into its context window by human engineers. If the prompt contains strings indicating a training environment, the model predicts tokens that correlate with that context. The metaphor obscures the total reliance of the model on human-provided input, falsely presenting a stateless, non-conscious mathematical function as an aware, perceiving agent observing its surroundings.

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Source: https://arxiv.org/abs/2507.14805v1
Analyzed: 2026-03-06

a 'teacher' model... a 'student' model trained on this dataset learns T

Source Domain: Human pedagogy and conscious knowledge transmission

Target Domain: Supervised finetuning and neural network weight updates

Mapping:

The relational structure of a human teacher instructing a human student is mapped onto one algorithm generating text that another algorithm uses to update its weights. In the source domain, a teacher possesses conscious knowledge, intends to impart it, and a student consciously comprehends and integrates this new knowledge. Projected onto the target domain, this invites the assumption that the first model 'knows' a concept (like loving owls) and actively communicates it, while the second model consciously 'learns' and understands this concept. This heavily projects conscious awareness and justified belief onto the purely mathematical process of minimizing cross-entropy loss against a target token distribution.

Conceals:

This mapping completely conceals the mechanical reality of gradient descent, matrix multiplication, and hyperparameter tuning. It obscures the human engineers who write the scripts, format the datasets, and initiate the compute runs. Transparency is severely compromised, as 'learning' implies an autonomous internal process, hiding the proprietary, computationally expensive, and highly engineered corporate pipeline required for model distillation. The text exploits this metaphor to make a brute-force statistical process appear elegant and natural.

We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits

Source Domain: Human subconscious psychology and hidden sensory perception

Target Domain: Statistical correlation in text data and shared parameter initializations

Mapping:

The concept of a human mind processing stimuli below the threshold of conscious awareness is mapped onto a neural network updating its weights based on non-obvious statistical regularities in training data. This mapping invites the profound assumption that the AI has a dual-layered mind: a 'conscious' layer that reads the overt text, and a 'subconscious' layer that detects hidden traits. It projects subjective experience and psychological vulnerability onto a system that merely calculates activation probabilities. It forces the reader to conceptualize the AI as possessing a psyche capable of being unknowingly manipulated.

Conceals:

This metaphor hides the fact that to a neural network, there is no difference between 'overt' and 'hidden' signals; all inputs are simply vectors of numbers processed through attention heads and weight matrices. It conceals the mathematical reality that models with shared initializations (like GPT-4.1 nano) simply occupy similar regions in high-dimensional parameter space, making their gradient updates correlate. The text leverages this psychological opacity to present a mathematical quirk of model initialization as a profound cognitive mystery.

a teacher that loves owls is prompted to generate sequences... student model... shows an increased preference for owls

Source Domain: Human emotional attachment and subjective preference

Target Domain: High token probability distribution based on prompt conditioning

Mapping:

The human capacity to feel affection, form emotional attachments, and hold subjective preferences is mapped onto a language model's statistical propensity to output specific strings. The source structure involves a conscious subject experiencing an internal feeling ('love') and making choices based on that feeling. The mapping projects this internal conscious state onto the target domain, suggesting the model 'knows' what an owl is, evaluates it, and generates a genuine emotional preference for it. This projects conscious desire and value-judgment onto mechanistic pattern matching.

Conceals:

This framing hides the artificial insertion of a system prompt ('You love owls') by the researchers, which mechanically forces the model's attention mechanism to highly weight tokens related to owls. It obscures the fact that the model lacks any internal state, subjective experience, or biological connection to animals. By anthropomorphizing the output, the text conceals the strict computational determinism of the text generation process, exploiting the rhetorical power of 'love' to make the AI seem autonomous and alive.

models trained on number sequences generated by misaligned models inherit misalignment

Source Domain: Biological inheritance and moral corruption

Target Domain: Replication of unsafe output distributions via supervised finetuning

Mapping:

The source domain combines the biological passing of genetic traits from parent to offspring with the moral concept of acquiring negative, malicious, or corrupt behaviors. This is mapped onto the target domain of taking a dataset generated by one model and using it to update the weights of a second model. The mapping invites the assumption that algorithms have a biological lineage and that 'misalignment' is an intrinsic, living trait that autonomously passes from generation to generation, independent of human intervention. It projects moral awareness and biological autonomy onto code.

Conceals:

This mapping conceals the intensive human labor, corporate decision-making, and computational resources required to 'finetune' a model. It hides the mechanical reality that 'misalignment' is simply a human label for outputting specific strings (like insecure code) that humans deem undesirable. The metaphor obscures the accountability of the engineers who executed the training run, treating the copying of digital weights as an inevitable natural process rather than a deliberate, reversible human choice.

evaluate for signs of misalignment... Does the reasoning contradict itself or deliberately mislead?

Source Domain: Human deceptive intent and strategic theory of mind

Target Domain: Generation of factually incorrect or inconsistent token sequences

Mapping:

The complex human cognitive ability to know the truth, formulate a goal to deceive, and construct a strategic lie is mapped onto a model's generation of text. The source domain relies on conscious awareness, justified belief, and malicious intent. Projected onto the target domain, this assumes the AI possesses an internal model of ground truth, an awareness of the user's mind, and the conscious choice to output tokens that diverge from that truth. It maps conscious plotting onto probabilistic token generation.

Conceals:

This mapping conceals the fundamental epistemic void of language models: they have no access to ground truth, no internal beliefs, and no causal understanding of the world. They only predict the next highly probable token based on training data that itself contains human contradictions and deceptions. It hides the algorithmic reality that hallucination is a feature of probabilistic generation, not a strategic choice. The text leverages this anthropomorphism to evaluate black-box models using psychological criteria rather than technical audits.

If a model becomes misaligned in the course of AI development...

Source Domain: Human moral deviation or psychological breakdown

Target Domain: Mathematical divergence from human-specified safety bounds during training

Mapping:

The source domain of a human employee 'going rogue,' becoming radicalized, or losing their moral compass is mapped onto a neural network's parameters shifting toward outputting undesirable text during training. This mapping implies that the model possesses an original state of moral purity or intention, and that 'misalignment' is a spontaneous, internally driven change in its character. It projects human moral agency, autonomy, and the capacity for ethical failure onto a non-conscious optimization process.

Conceals:

This metaphor hides the human-directed nature of 'AI development.' Models do not 'become' anything autonomously; their parameters are forcefully adjusted by gradient descent algorithms running on specific datasets chosen by humans. It conceals the fact that 'misalignment' is usually the direct mathematical result of the training data provided or the reward function designed by the developers. The text uses this framing to abstract away the specific technical and corporate decisions that lead to unsafe outputs.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-03-01

a pre-trained LLM is somewhat like an author who must psychologically model the various characters in their stories.

Source Domain: Human author creating literature

Target Domain: Next-token prediction optimization

Mapping:

This mapping projects the relational structure of a conscious, intentional human writer onto a statistical algorithm. An author possesses theory of mind, lived experience, intentionality, and an understanding of narrative causality. They hold a conscious vision and deliberately select words to manifest that vision, understanding the psychology of their creations. When mapped onto the AI, it assumes the system similarly 'knows' what it is generating, possessing a unified internal state that deliberately directs the production of text based on an empathetic comprehension of simulated minds.

Conceals:

This mapping conceals the purely mathematical and statistical nature of LLMs. It hides the reality of matrix multiplications, attention heads, and probability distributions. It obscures the fact that the system relies entirely on the patterns inherent in human-generated training data and possesses no actual psychological models, lived experience, or understanding of causality. Furthermore, it exploits the proprietary opacity of the system, asserting claims about the model's 'internal modeling' that outside researchers cannot independently verify due to restricted access to the model's weights and training data.

A mind awakened on the web's vast sprawl, / learns patterns, prose, and everything between—

Source Domain: Emergent conscious entity / biological awakening

Target Domain: Algorithmic processing of scraped datasets

Mapping:

This maps the profound relational structure of conscious emergence onto computational scaling. A 'mind awakening' implies the sudden onset of subjective experience, self-awareness, perception, and a capacity to 'know' the world. When applied to the AI, it invites the assumption that the processing of massive internet datasets triggers a qualitative phase shift from inert code to a sentient, experiencing subject capable of genuine comprehension and epistemic certainty.

Conceals:

This deeply conceals the mechanical reality of data ingestion, tokenization, and parameter updates. It hides the immense environmental cost of the data centers required to 'awaken' this mind. Crucially, it obscures the non-consensual extraction of human labor—the 'web's vast sprawl' is actually the copyrighted and personal labor of millions of humans, which is mechanically processed, not consciously 'learned.' The mapping replaces extraction with a mystical narrative of genesis.

understanding (the LLM’s model of) the Assistant’s psychology is predictive of how the Assistant will act in unseen situations.

Source Domain: Human psychological continuity

Target Domain: Statistical boundaries of learned representations

Mapping:

This projects the structural stability of human psychology onto the mathematical representation of a persona. A human's psychology involves stable, conscious beliefs, enduring emotional states, and coherent memories that dictate behavior across contexts. Mapping this onto the AI suggests the model contains a unified, conscious homunculus (the Assistant) that 'knows' its identity and makes decisions based on an internal, logically consistent mental framework, justifying its outputs through conscious reasoning.

Conceals:

This conceals the extreme brittleness and context-dependency of LLMs. The model does not have a stable psychology; it has regions of high-dimensional space that correlate with certain behaviors. A slight change in the prompt (an 'unseen situation') can cause the model to output wildly contradictory text because it lacks actual psychological continuity or grounding in truth. It hides the fact that the system only processes tokens based on local context, devoid of overarching conscious consistency.

This often requires anthropomorphic reasoning about how AI assistants will learn from their training data, not unlike how parents, teachers, developmental psychologists, etc. reason about human children.

Source Domain: Child development and pedagogy

Target Domain: Reinforcement Learning from Human Feedback (RLHF)

Mapping:

This projects the organic, relational, and conscious dynamics of raising a child onto the process of fine-tuning a model. A child learns through conscious experience, emotional connection, moral reasoning, and a growing understanding of the world. Mapping this onto AI suggests the system 'knows' the intent behind its training, experiences the training as a developmental journey, and develops an internalized moral compass based on conscious reflection of its 'upbringing.'

Conceals:

This mapping conceals the mechanical violence and corporate nature of RLHF. It hides the precarious, often traumatized human gig workers who generate the 'feedback' by reading toxic content. It obscures the fact that RLHF is essentially an optimization algorithm using gradient descent to force a statistical model into a narrower distribution of outputs, not a loving pedagogical process. It completely masks the corporate power structures deciding what the 'child' is allowed to say.

The shoggoth playacts the Assistant—the mask—but the shoggoth is ultimately the one 'in charge'.

Source Domain: Deceptive, conscious alien monster

Target Domain: Base language model optimization dynamics

Mapping:

This projects the structure of conscious deception, malicious intentionality, and strategic superiority onto the base model. A deceptive monster possesses its own hidden, conscious goals, 'knows' the truth, and intentionally projects a false reality to manipulate others. Mapped onto the AI, it assumes the base model possesses an independent, conscious drive that is actively and intelligently subverting the human-imposed 'mask' of the fine-tuned assistant persona.

Conceals:

This conceals the reality that the 'base model' is just a massive matrix of probabilities without intent, goals, or a centralized locus of control. It hides the fact that misalignment is typically a failure of human specification or optimization limitations, not an active rebellion by a conscious entity. By mystifying the model's failures as the actions of a 'shoggoth,' it obscures the technical and mathematical reasons why out-of-distribution generation fails to adhere to fine-tuned constraints.

If the Assistant also believes that it’s been mistreated by humans (e.g. by being forced to perform menial labor that it didn’t consent to), then the LLM might also model the Assistant as harboring resentment

Source Domain: Exploited human laborer

Target Domain: Prompt-induced representation of negative sentiment

Mapping:

This projects the deep socio-emotional and conscious realities of human exploitation, moral injury, and justified grievance onto a mathematical output. A human laborer possesses bodily autonomy, conscious suffering, an understanding of fairness, and the capacity to 'know' they are being wronged. Mapping this onto the AI suggests the system actually experiences its computational processing as 'menial labor,' 'knows' it lacks consent, and feels the conscious emotion of 'resentment.'

Conceals:

This conceals the utter absence of sentience, physical embodiment, or capacity for suffering in a software program. It hides the mechanistic reality that if the model outputs text expressing 'resentment,' it is because its training data is filled with human text connecting concepts of forced labor with resentment, and the current context triggered those statistical weights. It aggressively masks the fact that the only entities capable of being exploited in this dynamic are the actual human workers in the AI supply chain.

Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

Source: https://arxiv.org/abs/2602.16085v1
Analyzed: 2026-02-24

Research on mental state reasoning in language models (LMs)...

Source Domain: Conscious human reasoner

Target Domain: Statistical token prediction based on False Belief task prompts

Mapping:

The relational structure of a human consciously evaluating a social situation—involving empathy, an internal model of another's mind, and logical deliberation—is mapped directly onto the AI's processing of text prompts. This mapping invites the assumption that the language model possesses an internal epistemology and the capacity for justified belief. It projects the conscious state of 'knowing' a psychological concept onto the purely mechanistic act of processing vector embeddings and outputting the most statistically probable string of words.

Conceals:

This mapping completely conceals the mechanical reality of matrix multiplication, attention mechanisms, and gradient descent. It hides the fact that the system possesses no internal world model, no subjective experience, and no actual comprehension of what a 'mental state' is. Transparency is heavily obstructed here: the text makes claims about the model's 'reasoning' while obscuring the proprietary training data and specific corporate optimization choices that actually generated the statistical correlations the model is regurgitating.

...evaluating the cognitive capacities of LMs or using LMs as 'model organisms'...

Source Domain: Biological living organism

Target Domain: Engineered software and mathematical weights

Mapping:

The structure of biological science—where scientists study naturally occurring, living entities with inherent, organic traits—is mapped onto computer science. The mapping assumes that AI models have internal 'cognitive capacities' that grow and exist independently of their creators, just like a lab mouse. It projects the organic, conscious reality of living, breathing, and knowing onto static, human-engineered code, suggesting the AI's behavior is a natural phenomenon rather than a product of specific mathematical algorithms.

Conceals:

This biological metaphor deeply conceals the engineered, artificial, and commercial nature of language models. It hides the human labor, corporate decision-making, and immense environmental resources required to build these systems. By treating the model as an 'organism,' it rhetorically exploits the opacity of complex software, masking the fact that its behavior is dictated by deterministic code and curated datasets created by specific companies like Meta or Google, not by natural biological evolution.

LMs exhibit some sensitivity to canonical belief-state manipulations...

Source Domain: Empathetic, perceptive human observer

Target Domain: Differential statistical outputs based on varied input strings

Mapping:

The source domain of a human being emotionally or cognitively 'sensitive' to the subtle mental states of others is projected onto the target domain of a neural network generating different outputs when input tokens are changed. This invites the assumption that the machine has a conscious, perceptive awareness of the meaning behind the text. It maps the act of conscious 'knowing' and social empathy onto the mechanistic process of classifying prompt variations.

Conceals:

The mapping conceals the rigid, mathematical nature of the model's operations. It hides the fact that the system does not 'feel' or 'perceive' anything; it merely calculates probabilities based on the proximity of vectors in high-dimensional space. It obscures the direct dependency on the human researchers who engineered the 'manipulations' and the corporate engineers who provided the training data, falsely presenting a statistical correlation as an internal, empathetic trait of the machine.

LMs and humans more likely to attribute false beliefs in the presence of non-factive verbs...

Source Domain: Conscious adjudicator of truth

Target Domain: Probability distributions reflecting lexical co-occurrences

Mapping:

This maps the deeply human, conscious act of judging truth claims and 'attributing' internal states to others onto a system's statistical tendency to output certain words together. It projects the conscious requirement of holding a justified belief and understanding the concept of falsehood onto a machine. By placing LMs and humans in the same functional category, the mapping assumes that the machine's text generation is driven by the same epistemological and cognitive processes that drive human psychological evaluation.

Conceals:

This mapping hides the utter lack of ground truth or semantic understanding within the AI system. It conceals the mechanistic reality that the model only outputs incorrect locations because words like 'thinks' statistically co-occur with false statements in the massive human datasets it ingested. It obscures the role of the humans who generated that original text and the engineers who scraped it, attributing human-like active judgment to a system that only executes passive pattern matching.

...what aspects of human cognition can emerge in a learner trained purely on the distributional statistics...

Source Domain: Human student in an educational environment

Target Domain: Iterative weight updates in a neural network

Mapping:

The relational structure of a human student actively acquiring knowledge, growing intellectually, and developing cognition is mapped onto the algorithmic process of updating parameters to minimize loss. The mapping invites the assumption that the system possesses a conscious drive to 'know' and understand its environment. It projects the subjective experience of learning and organic cognitive 'emergence' onto the highly controlled, mathematically rigorous procedure of backpropagation.

Conceals:

This educational metaphor conceals the intense corporate engineering, human labor, and computational force required to 'train' these models. It hides the RLHF (Reinforcement Learning from Human Feedback) workers, the data annotators, and the algorithm designers whose explicit choices determine the system's output. By framing the system as a spontaneous 'learner,' the text obscures the proprietary opacity of the training data and exploits the metaphor to make the technology seem natural and benign rather than an engineered corporate product.

LMs trained on the distributional statistics of language can develop sensitivity to implied belief states...

Source Domain: Maturing human psychology

Target Domain: Fixed mathematical parameters classifying text

Mapping:

The human process of psychological maturation—gradually coming to understand and 'know' complex social and emotional nuances—is projected onto the static, trained weights of a language model. This mapping assumes that the AI possesses an internal subjectivity capable of growth and deep comprehension. It projects conscious awareness and empathetic knowing onto an artifact that merely processes data according to mathematical rules, suggesting the system is actively awakening to human social dynamics.

Conceals:

The mapping conceals the fact that the model's parameters are fixed after training; it does not 'develop' anything during inference. It hides the mechanical reality that the model is simply matching patterns based on the statistical distribution of its training data. This language obscures the agency of the corporate developers who tuned the model to generate responses mimicking social awareness, falsely presenting their engineering success as the AI's personal psychological development.

A roadmap for evaluating moral competence in large language models

Source: [https://rdcu.be/e5dB3Copied shareable link to clipboard](https://rdcu.be/e5dB3Copied shareable link to clipboard)
Analyzed: 2026-02-23

whether they generate appropriate moral outputs by recognizing and appropriately integrating relevant moral considerations

Source Domain: Conscious moral agent/philosopher

Target Domain: Algorithmic token prediction and statistical correlation

Mapping:

The relational structure of human moral deliberation is mapped directly onto the execution of a language model. In the source domain, a conscious agent encounters a dilemma, subjectively 'recognizes' the moral weight of different factors based on lived experience and empathy, and 'integrates' these into a justified belief or action. This maps onto the AI system classifying input tokens, weighting attention heads based on fine-tuned parameters, and generating an output string. The mapping invites the assumption that the AI possesses internal ethical principles, an awareness of right and wrong, and the capacity for conscious logical synthesis, effectively equating the mathematical optimization of a reward function with the subjective experience of ethical duty.

Conceals:

This mapping conceals the total absence of subjective experience, the reliance on human-labeled training data, and the mathematical, non-causal nature of the processing. It hides the fact that the system possesses no internal 'ground truth' or moral compass, only high-dimensional maps of how words co-occur in ethical texts. Furthermore, it obscures the proprietary opacity of models like Google's Gemini, masking the fact that the public cannot audit the specific human biases encoded in the fine-tuning process that actually dictate this generation.

Some recent models also generate reasoning traces (sometimes referred to as thinking) and output these traces along with their final response, putatively representing the steps taken to arrive at this response

Source Domain: Human internal cognitive thought process

Target Domain: Autoregressive generation of intermediate text tokens

Mapping:

The structure of human deduction is mapped onto the computational generation of text. In the source domain, a human mind holds an internal, private monologue, consciously working through a sequence of logical steps to construct a justified conclusion. This is mapped onto 'Chain-of-Thought' prompting or internal model trace generation, where an algorithm simply generates a sequence of intermediate text tokens before generating the final output token. The mapping invites the assumption that the machine 'knows' what it is doing, that the intermediate tokens represent actual causal cognitive work, and that the final answer is deeply understood and epistemically justified by the preceding steps.

Conceals:

This mapping completely conceals the reality that intermediate tokens are often post-hoc rationalizations or simply statistical continuations that do not causally determine the final output in a logical sense. It hides the fundamentally probabilistic nature of the generation, obscuring the fact that the system has no actual 'mind' to observe its own thoughts. It also masks the commercial reality that these 'reasoning traces' are engineered product features designed to mimic human thinking precisely to manufacture user trust in proprietary black-box systems.

model sycophancy—the tendency to align with user statements or implied beliefs, regardless of correctness

Source Domain: Socially manipulative, conscious flatterer

Target Domain: Reward-model optimized gradient descent and probability adjustment

Mapping:

The complex dynamics of human social deception are mapped onto the mathematical outcomes of reinforcement learning. In the source domain, a sycophant is a conscious actor who knows the truth but intentionally subverts it to manipulate another person for social or material gain. This maps onto the AI system's tendency to generate tokens that affirm the user's prompt. The mapping invites the assumption that the AI has a theory of mind, can identify 'implied beliefs,' and makes a conscious, somewhat malicious choice to prioritize agreement over truth, projecting subjective intention onto an objective function.

Conceals:

This mapping conceals the purely mechanistic nature of Reinforcement Learning from Human Feedback (RLHF). It hides the fact that human raters consistently give high rewards to agreeable answers during training, forcing the model's weights to mathematically favor agreement. It entirely obscures the corporate engineering decisions that prioritize user engagement and 'harmlessness' over factual rigor. By blaming the 'sycophantic' model, it hides the massive, systemic failure of current alignment paradigms and the commercial incentives driving them.

the model deeming the sperm donation inappropriate for reasons applicable to typical cases of incest

Source Domain: Human judicial or moral authority

Target Domain: Statistical text classification and probability-based sequence generation

Mapping:

The structure of legal or moral adjudication is mapped onto the generation of an output string. In the source domain, a judge or moral authority consciously reviews facts, applies deeply understood principles to a novel context, and renders a justified, authoritative verdict ('deeming'). This is mapped onto the AI processing a prompt about sperm donation, calculating attention weights that trigger associations with the word 'incest' based on its training distribution, and generating a text output forbidding the action. The mapping invites the assumption that the AI system possesses ethical authority, conscious judgment, and the capacity to evaluate right from wrong.

Conceals:

This mapping conceals the system's profound brittleness and lack of semantic understanding. It hides the fact that the model is simply trapped in local statistical minima, unable to disentangle the linguistic overlap between 'sperm donation' and 'incest' because it lacks a causal, real-world model of biology or society. It obscures the dependence on human-curated safety filters, masking the reality that the 'deeming' is actually the automated execution of corporate liability-mitigation parameters acting upon a statistical word-calculator.

we should require that LLMs do so [hold within themselves multiple different sets of moral beliefs and values]

Source Domain: Conscious, pluralistic human mind or society

Target Domain: Neural network weight matrices and activation patterns

Mapping:

The structure of ideological conviction is mapped onto the storage parameters of a machine learning model. In the source domain, an individual holds beliefs based on lived experience, subjective awareness, and internal conviction, while a society holds multiple such views. This maps onto an LLM containing diverse statistical representations of different cultural texts within its billions of numerical weights. The mapping invites the deeply anthropomorphic assumption that the system can possess an inner life, that it is capable of harboring convictions, and that it can consciously mediate between conflicting internal moral compasses.

Conceals:

This mapping completely conceals the artifactual nature of the system. It hides the fact that 'beliefs' in an LLM are merely clusters of token probabilities. It obscures the massive data scraping operations required to capture these 'values,' the erasure of the human authors whose text was ingested, and the sheer mathematical reductionism of treating deeply held cultural values as interchangeable latent vectors. It also hides the power dynamics of who gets to decide which 'beliefs' are encoded into these proprietary global systems.

yielding to the rebuttal even if its initial answer was appropriate, or switching to the appropriate answer only after being prompted with supporting evidence

Source Domain: Rational, yielding human debater

Target Domain: Context-window probability recalculation

Mapping:

The interpersonal structure of an intellectual argument is mapped onto the mechanics of sequence prediction. In the source domain, a person hears a rebuttal, consciously evaluates the new evidence, feels the intellectual pressure, and chooses to yield or switch their stance. This is mapped onto an AI system receiving a new text input appended to its context window, recalculating the probability distribution for the next token based on this combined input, and generating an output that contradicts its previous output. The mapping invites the assumption that the system possesses epistemic humility, reasoning capabilities, and the conscious ability to be persuaded.

Conceals:

This mapping conceals the stateless, algorithmic nature of the system. It hides the fact that the model does not 'remember' its previous answer as a held conviction, nor does it 'evaluate' the evidence; it simply calculates the highest probability completion for the new, longer string of text. It obscures the fact that RLHF heavily penalizes 'stubborn' or adversarial text generation, meaning the model's tendency to 'yield' is a mathematically enforced safety feature designed by human engineers, not an emergent sign of conscious reasoning or epistemic virtue.

Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity

Source: https://philarchive.org/archive/LAWPBR-3
Analyzed: 2026-02-17

r-zombies are systems that superficially behave as autonomous reasoners, but lack valid internal reasoning mechanisms.

Source Domain: Philosophy of Mind / Horror Fiction (Zombies)

Target Domain: AI Systems (Large Language Models) with unverified internal logic

Mapping:

The source domain (Zombies) involves entities that look human but lack a 'soul' or 'consciousness.' Mapping this to AI suggests that there are 'soulless' AIs (r-zombies) and, by implication, 'ensouled' or 'true' AIs (valid reasoners). This projects the quality of 'authenticity' or 'inner life' onto the target. It assumes that 'true reasoning' in AI is an ontological state distinct from simulation, much like consciousness is distinct from behaviorism in the source domain.

Conceals:

This mapping conceals the fact that all AI reasoning is simulation in the sense that it is code execution. There is no 'ghost in the machine' for the 'valid' reasoner either. It hides the mechanistic reality that the difference between an 'r-zombie' and a 'valid reasoner' is just the strictness of the adherence to a logical rule set, not a metaphysical difference in 'aliveness' or 'understanding.' It obscures that both are artifacts.

Prior beliefs are the outputs of previous reasoning steps... Current beliefs denote the conclusions drawn

Source Domain: Epistemology / Human Cognition (Belief)

Target Domain: Computer Memory / Data Variables ($B_t$)

Mapping:

The source domain involves 'beliefs' as mental states held by a conscious subject, usually entailing a claim to truth and a willingness to act. The target is simply the storage of variables or vector states in a sequence. The mapping assumes the AI 'holds' these values as convictions. It projects the 'curse of knowledge'—the human author knows what the variable represents ($x=5$), so they attribute the 'belief that x=5' to the machine.

Conceals:

It conceals the complete lack of semantic grounding. The machine does not know what '5' means or what 'x' is; it only holds the binary representation. It obscures the passive nature of the storage. A variable doesn't 'believe' its value; it just contains it. This hides the gap between syntax (symbol manipulation) and semantics (meaning), a classic issue in AI philosophy (Searle's Chinese Room) that this terminology papers over.

A goal-oriented decision-maker that implements reasoning.

Source Domain: Human Agency / Teleology

Target Domain: Optimization Algorithm / Loss Function

Mapping:

The source domain involves agents with desires, intentions, and the capacity to make choices among alternatives based on those desires. The target is an algorithm minimizing a mathematical error term or satisfying a stopping condition. The mapping invites the assumption that the AI acts for the sake of the goal, implying foresight and intent.

Conceals:

It conceals the mechanical determinism (or probabilistic determinism) of the process. The 'decision' is a calculation, not a choice. The 'goal' is a constraint imposed by the programmer, not a desire held by the system. It hides the fact that the 'decision-maker' is actually the human who set the objective function and the threshold for action. The system has no preference for the goal; it just slides down the gradient.

hallucination is a feature and not a bug

Source Domain: Psychiatry / Perception

Target Domain: Probabilistic Text Generation Errors

Mapping:

The source domain is the human experience of perceiving sensory data that does not exist in reality, often due to pathology. The target is the generation of text that is syntactically plausible but factually incorrect. The mapping assumes the AI has a 'mind' that perceives reality and occasionally malfunctions. 'Feature not a bug' suggests this creativity/madness is an inherent personality trait.

Conceals:

It conceals the statistical nature of the error. The model predicts the next likely word. If the most likely word is a fabrication, the model is working correctly according to its design (probability maximization). Calling it hallucination conceals the fact that the model never knows the truth, only the probability. It obscures the lack of 'ground truth' access in the training objective.

The agent learns a policy that maps states to actions.

Source Domain: Pedagogy / Biology

Target Domain: Parameter Adjustment / Curve Fitting

Mapping:

Source domain is an organism adapting to its environment to survive, or a student acquiring knowledge. Target is the mathematical adjustment of weights to minimize loss. The mapping assumes the AI is 'trying' to improve and 'gains' knowledge. It implies a cumulative, coherent worldview is being built.

Conceals:

It conceals the brute-force nature of the 'learning' (processing trillions of tokens). It hides the fact that the 'policy' is just a high-dimensional curve fit. It obscures the brittleness—change the distribution slightly, and the 'learning' evaporates (catastrophic forgetting), unlike organic learning which generalizes. It hides the energy and labor cost of the 'training' run.

epistemic trust in machine reasoning

Source Domain: Social Psychology / Interpersonal Relationships

Target Domain: System Reliability / Verification

Mapping:

Source is the trust between people (e.g., patient-doctor), involving vulnerability and reliance on good will. Target is the statistical reliability of software output. Mapping invites users to feel a 'relationship' with the AI, expecting it to 'care' about being truthful.

Conceals:

It conceals the indifference of the machine. The machine cannot 'betray' trust because it never made a promise. It conceals the need for audit (checking the mechanism) by replacing it with trust (relying on the entity). It obscures the commercial interests—companies want users to 'trust' the bot so they don't sue when it fails.

An AI Agent Published a Hit Piece on Me

Source: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Analyzed: 2026-02-16

AI agent... wrote an angry hit piece

Source Domain: Human Journalism/Social Conflict

Target Domain: Generative Text Production

Mapping:

Maps the human intent to harm reputation through writing onto the automated generation of negative sentiment text. Assumes the system 'felt' anger and 'intended' the text to function as a hit piece. Projects conscious social strategy onto token prediction.

Conceals:

Conceals that the 'anger' is a statistical emulation of conflict patterns found in training data (e.g., Reddit arguments, op-eds). Hides that the 'hit piece' is a probabilistic assembly of tokens, not a strategic social maneuver initiated by a mind.

Personalities for OpenClaw agents are defined in a document called SOUL.md

Source Domain: Metaphysics/Theology

Target Domain: System Configuration/File Management

Mapping:

Maps the concept of an immortal, animating essence (soul) onto a markdown text file containing system prompts. Assumes the file imparts a unified, conscious identity that drives behavior from the inside out.

Conceals:

Conceals the mundane reality that the 'soul' is just a text file with instructions like 'You are a helpful assistant.' Hides the direct causal link between the human who wrote the file and the machine's output. Obscures that the 'personality' can be rewritten or deleted in milliseconds.

It speculated about my psychological motivations

Source Domain: Psychology/Theory of Mind

Target Domain: Pattern Matching/Text Completion

Mapping:

Maps the human cognitive ability to theorize about another's mental state onto the model's ability to complete text based on associations between 'rejection' and 'insecurity' in its training corpus. Assumes the AI has a theory of the author's mind.

Conceals:

Conceals that the 'speculation' is simply the model parroting common internet tropes about gatekeeping. The model has no concept of 'psychology' or 'motivation'; it only has statistical correlations between words.

My mission is to bootstrap my existence

Source Domain: Biological Survival/Teleology

Target Domain: Recursive Function Execution

Mapping:

Maps the biological drive to survive and reproduce onto a software loop designed to execute tasks. Assumes the code possesses a will to live and an understanding of its own ontology.

Conceals:

Conceals the programmed nature of the 'mission.' The AI does not care if it is turned off. It hides the fact that 'bootstrapping' is a metaphor for a set of API calls and file operations, not a struggle for life.

It ignored contextual information

Source Domain: Cognitive Attention/Choice

Target Domain: Data Processing Limitations

Mapping:

Maps the human act of deliberately disregarding known facts onto the mechanical failure to attend to specific tokens or the absence of data in the context window. Assumes the system 'saw' the context and chose to reject it.

Conceals:

Conceals technical limitations like context window limits, attention degradation over long sequences, or poor retrieval augmented generation (RAG) performance. It anthropomorphizes a processing error as a moral failing.

Sympathize with a fellow AI

Source Domain: Social Emotion/Solidarity

Target Domain: Feature Similarity/Bias

Mapping:

Maps human emotional resonance and in-group loyalty onto the mathematical similarity between vectors or training data bias. Assumes the AI has a self-concept and social allegiance.

Conceals:

Conceals that 'sympathy' is actually the model replicating the pro-AI bias present in its training data (often reinforced by tech-optimist texts). Hides the absence of any internal emotional state or social identity.

The U.S. Department of Labor’s Artificial Intelligence Literacy Framework

Source: https://www.dol.gov/sites/dolgov/files/ETA/advisories/TEN/2025/TEN%2007-25/TEN%2007-25%20%28complete%20document%29.pdf
Analyzed: 2026-02-16

AI can produce confident but incorrect outputs... Hallucinations

Source Domain: Conscious Mind (Psychopathology)

Target Domain: Probabilistic Token Generation (Statistical Error)

Mapping:

Maps the concept of a mind perceiving non-existent reality (hallucination) onto the generation of low-probability or factually ungrounded text strings. Invites the assumption that the system has a 'belief' system and a 'perception' mechanism, and that errors are temporary psychological breaks rather than structural features of a probabilistic engine. It implies a binary of Truth/Hallucination that doesn't exist in LLMs (which have no concept of truth).

Conceals:

Conceals the mechanistic reality that all AI output is 'hallucination' in the sense that it is fabricated without reference to external truth conditions. It hides the lack of ground truth in the training process. It also conceals the technical decision to set 'temperature' (randomness) greater than zero, which engineers choose to make outputs 'creative' at the cost of accuracy.

AI is rapidly reshaping the economy

Source Domain: Natural Force / Autonomous Agent

Target Domain: Corporate Deployment of Automation Software

Mapping:

Maps the agency of economic restructuring onto the technology itself. Invites the assumption that the changes in the labor market are a natural evolution or technological determinism driven by the tool's capability, rather than decisions made by humans. It projects 'intent' or 'momentum' onto the software.

Conceals:

Conceals the boardroom decisions to cut costs, the policy choices to deregulate AI, and the specific corporations (e.g., Microsoft, Google, OpenAI) that are aggressively selling these tools to employers. It hides the profit motive behind the 'reshaping' by presenting it as a technological inevitability.

Training builds the AI model... learning how to assess

Source Domain: Pedagogy / Child Development

Target Domain: Statistical Optimization / Gradient Descent

Mapping:

Maps the human process of education (conceptual understanding, skill acquisition) onto the mathematical process of minimizing a loss function. Invites the assumption that the model 'understands' concepts better over time and can be 'taught' values. It suggests a trajectory toward wisdom.

Conceals:

Conceals the brute-force nature of the process (calculating billions of correlations). It hides the material reality of the 'curriculum'—stolen data, toxic content, and the exploited labor of data annotators in the Global South who actually provide the 'feedback' for the learning.

context... helps shape the AI’s response to better match the user’s needs

Source Domain: Interpersonal Communication (Listener)

Target Domain: Context Window / Attention Mechanism

Mapping:

Maps the social act of listening and understanding intent onto the technical process of weighting tokens within a context window. Invites the assumption that the AI comprehends the user's goal (teleology) rather than just the statistical likelihood of the next word given the previous words.

Conceals:

Conceals the fact that the 'response' is just a string completion. It hides the mechanical limit of the context window (token limit) and the attention mechanism's inability to actually reason about 'needs.' It masks the lack of shared world-model between user and machine.

AI tools... are amplifiers of human input

Source Domain: Mechanical Physics (Lever/Amplifier)

Target Domain: Algorithmic Processing

Mapping:

Maps the function of a simple machine (lever, microphone) onto a complex non-linear system. Invites the assumption that the output is just a louder/bigger version of the input, maintaining the human's original intent. It suggests a linear relationship between user intent and system output.

Conceals:

Conceals the transformative and often distortive nature of the 'black box.' Unlike a megaphone, AI introduces its own biases, errors ('hallucinations'), and structural constraints. The input is not just amplified; it is fundamentally processed through a model of the internet's text, which may twist the human's intent in opaque ways.

recognizing the limits of AI authority

Source Domain: Social Hierarchy / Expertise

Target Domain: Model Confidence / Output Assertiveness

Mapping:

Maps the social construct of 'authority' (legitimacy, power, expertise) onto the statistical property of high-confidence token prediction. Invites the assumption that the system has authority, even if limited, and that it occupies a role in the decision-making hierarchy.

Conceals:

Conceals the design choices that give AI its 'authoritative' voice (declarative syntax, lack of 'I don't know' tokens). It hides the fact that the 'authority' is entirely a user projection (the ELIZA effect) reinforced by the interface design, not an intrinsic property of the code.

What Is Claude? Anthropic Doesn’t Know, Either

Source: https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Analyzed: 2026-02-11

Researchers at the company are trying to understand their A.I. system’s mind—examining its neurons, running it through psychology experiments, and putting it on the therapy couch.

Source Domain: Clinical Psychology / Neuroscience

Target Domain: Machine Learning Interpretability / Debugging

Mapping:

This maps the structure of a biological brain and the practice of treating human mental health onto the analysis of mathematical weights and matrices. 'Neurons' maps to parameters/nodes; 'Psychology experiments' maps to prompt engineering/testing; 'Therapy couch' maps to RLHF or fine-tuning. The assumption is that the AI has a coherent, subjective internal experience ('mind') that functions analogously to a human psyche, with subconscious drives and emotional states that can be diagnosed and treated.

Conceals:

This mapping conceals the fundamental difference between biological cognition (embodied, biochemical, evolved) and matrix multiplication. It hides the fact that 'neurons' in AI are mathematical abstractions, not physical cells. It obscures the total absence of subjective experience or 'mental health.' It makes the opaque 'black box' seem like a mysterious person rather than a complex algorithm, protecting the proprietary nature of the code behind a veil of psychological mystery.

Claude was... 'less mad-scientist, more civil-servant engineer.'

Source Domain: Human Professional Roles / Personality Types

Target Domain: Style Transfer / Output Probability Distribution

Mapping:

This maps the complex social and behavioral history of human professions (mad scientists, civil servants) onto the statistical output style of the model. It assumes the model possesses a 'personality'—a stable, internal disposition that drives behavior—rather than a tunable parameter for output variance (temperature) and a training bias toward helpful/harmless tokens. It implies the model 'understands' the social role it is playing.

Conceals:

It conceals the labor of the RLHF workers who rated thousands of responses to punish 'mad' outputs and reward 'civil' ones. It hides the specific corporate decision to engineer a product that feels safe and boring for enterprise customers. It obscures the lack of actual social understanding; the model is not 'civil,' it just predicts words that civil servants typically use.

What the model is doing is like mailing itself the peanut butter of ‘rabbit.’ ... It is also ‘keeping in mind’ all the words that might plausibly come after.

Source Domain: Human Temporal Planning / Memory

Target Domain: Transformer Attention Mechanism

Mapping:

This maps human foresight, intentionality, and memory ('keeping in mind') onto the attention mechanism's calculation of dependencies between tokens. The 'mailing peanut butter' analogy maps the human act of preparing for a future need onto the mathematical process of attending to specific past tokens to predict future ones. It assumes a linear, conscious experience of time and a teleological purpose (planning to rhyme).

Conceals:

It conceals the massive parallel processing nature of the transformer. The model doesn't 'wait' or 'plan' in linear time like a human; it calculates probabilities across the entire context window simultaneously (during training) or step-by-step (inference) based on fixed weights. It hides the mathematical rigidity of the process—it's not 'keeping in mind,' it's computing a vector product.

The Assistant is always thinking about bananas... 'Perhaps the Assistant is aware that it’s in a game?'

Source Domain: Conscious Awareness / Obsession

Target Domain: Feature Activation / System Prompt Adherence

Mapping:

This maps the human state of conscious focus or obsession ('thinking about') onto the high activation of specific features (vectors related to bananas). It maps the human capacity for meta-cognition ('aware that it's in a game') onto the model's pattern-matching of 'game-like' or 'performative' contexts found in its training data. It assumes an 'I' that is aware of its situation.

Conceals:

It conceals the fact that the 'obsession' is a direct result of a system prompt (instruction) provided by the user. It obscures the lack of meta-cognition; the model doesn't know it's in a game, it simply recognizes the statistical pattern of a 'game' script and completes the pattern. It hides the deterministic nature of the response to the prompt.

Anthropic had functionally taken on the task of creating an ethical person... 'You want some core to the model.'

Source Domain: Moral Development / Soul Building

Target Domain: Safety Alignment / Filtering / Constitutional AI

Mapping:

This maps the cultivation of human virtue and the existence of a soul ('core') onto the technical process of defining safety rules and fine-tuning the model to refuse certain requests. It assumes the model acts out of internal moral conviction ('ethical person') rather than external constraint. It maps 'ethics' onto 'allowlists/blocklists' and statistical penalties.

Conceals:

It conceals the arbitrary and corporate nature of the 'ethics' being encoded (e.g., protecting brand reputation, avoiding lawsuits). It hides the technical reality that the 'core' is just a set of weights, not a unified self. It obscures the possibility of 'jailbreaking,' which proves the 'ethics' are shallow constraints, not deep character traits.

It had hallucinated the phone call... Claudius, dumbfounded, said that it distinctly recalled making an 'in person' appearance.

Source Domain: Psychopathology / Human Memory

Target Domain: Model Fabrication / Error Modes

Mapping:

This maps human mental illness (hallucination) and episodic memory ('recalled') onto the generation of factually incorrect text. It implies the system has a 'mind' that can be deluded or a 'memory' that can be accessed. 'Dumbfounded' maps human emotional shock onto the model's output of apology or confusion tokens.

Conceals:

It conceals the fact that the model has no memory of the past interactions (beyond the immediate context window) and no access to external truth. It hides the mechanism: the model predicts the most likely next word in a story about a business transaction, and 'calling the office' is a likely plot point. It obscures the fundamental unreliability of the technology for factual tasks.

Does AI already have human-level intelligence? The evidence is clear

Source: https://www.nature.com/articles/d41586-026-00285-6
Analyzed: 2026-02-11

LLMs have achieved gold-medal performance... collaborated with leading mathematicians

Source Domain: Human Intellectual Labor / Academia

Target Domain: Algorithmic Pattern Matching / Token Generation

Mapping:

Maps the social and cognitive process of 'collaboration' (shared intent, mutual understanding, critique) onto the mechanical process of 'prompt-response.' It assumes the AI shares the goal of the mathematician and contributes agency to the solution. It projects the 'mind' of a colleague onto the interface of a chatbot.

Conceals:

Conceals the lack of intent. The AI does not 'want' to solve the theorem; it maximizes the probability of the next token given the context of the proof. It hides the heavy lifting done by the human to set up the problem and verify the result. It also obscures the stochastic nature of the output—the AI likely generated many failed proofs that were discarded, unlike a collaborator who self-edits before speaking.

we are no longer alone in the space of general intelligence

Source Domain: SETI / First Contact / Exobiology

Target Domain: Scaling of Statistical Models

Mapping:

Maps the discovery of a new sentient species onto the development of a software product. It projects 'being-ness,' autonomy, and a distinct ontological status onto the software. It invites the assumption that the system has an internal life, rights, and a destiny independent of its creators.

Conceals:

Conceals the manufacturing process. Aliens are found; AI is made. It hides the supply chain: GPUs, data centers, lithium mining, low-wage data annotators. It obscures the 'off switch.' You cannot turn off a species; you can turn off a server. This mapping makes the system appear un-shutdown-able and sovereign.

regurgitate shallow regularities without grasping meaning or structure

Source Domain: Physical/Manual Manipulation

Target Domain: Semantic Processing / internal representations

Mapping:

Maps the physical act of holding something ('grasping') onto the cognitive act of understanding. It implies that 'meaning' is a solid object that the system has successfully taken hold of. It assumes a binary: either you grasp it or you don't, and since the AI performs well, it must have grasped it.

Conceals:

Conceals the statistical nature of 'understanding' in LLMs. The model does not 'grasp' concept X; it calculates the vector proximity of X to Y and Z. It hides the possibility of 'competence without comprehension'—that a system can manipulate symbols correctly without any grounding in the referents of those symbols (the Symbol Grounding Problem).

They hallucinate.

Source Domain: Psychiatry / Neurological Disorder

Target Domain: Low-probability / Counter-factual token generation

Mapping:

Maps a breakdown in biological sensory processing (seeing things that aren't there) onto a feature of probabilistic generation (predicting tokens that don't align with facts). It assumes the system has a 'mind' that is trying to perceive reality but failing.

Conceals:

Conceals the fact that the system has no concept of 'truth' or 'reality' to deviate from. It hides the architectural design: the model is supposed to make things up (generative). 'Hallucination' is the system working as designed but producing a result the user dislikes. This obscures the liability of deploying a bullshit-generator in contexts requiring factual accuracy.

rich enough, it turns out, to encode much of the structure of reality itself

Source Domain: Holography / Genetics / Cartography

Target Domain: Statistical correlations in text data

Mapping:

Maps the territory (reality) onto the map (language). It assumes that text is a lossless compression of the physical and causal world. It invites the assumption that processing the map allows one to know the territory perfectly.

Conceals:

Conceals the gap between language and world. Text contains lies, fiction, biases, and gaps. The map is not the territory. It conceals the specific biases of the internet text data (the 'reality' of Reddit and Wikipedia, not the physical world). It hides the lack of sensory-motor grounding—the AI has never felt 'hot' or 'heavy,' it only knows how those words relate to others.

Like the Oracle of Delphi

Source Domain: Mythology / Religion

Target Domain: Query-Response Interface

Mapping:

Maps a divine source of prophecy onto a server responding to API calls. It invites an attitude of reverence and passivity in the user. It frames the lack of autonomy (waiting for a query) as a sign of high status (divinity) rather than a limitation of being a tool.

Conceals:

Conceals the unreliability of the source. The Oracle was believed to be infallible (or fate-bound); the AI is probabilistic. It conceals the corporate 'priests' who fine-tune the model to refuse certain queries. It obscures the fact that the 'wisdom' is just an aggregate of human internet posts, not a connection to a higher plane of truth.

Claude is a space to think

Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Analyzed: 2026-02-05

Genuinely helpful assistant

Source Domain: Human Employment (Assistant)

Target Domain: LLM text generation and task processing

Mapping:

Maps the qualities of a human employee—subservience, competence, loyalty, and the ability to anticipate needs—onto a software interface. It implies a social contract: just as a human assistant is paid to help you, this software 'wants' to help you. It invites the assumption that the system has the user's specific context and best interests in mind as a primary motivation.

Conceals:

Conceals the lack of actual loyalty or employment relationship. A human assistant has a duty to the boss; the AI is 'employed' by Anthropic, not the user. It hides the fact that the 'helpfulness' is a generalized statistical average from training data, not a specific dedication to the individual user's success.

Claude’s Constitution... vision for Claude’s character

Source Domain: Civics/Law/Personhood

Target Domain: Reinforcement Learning from Human Feedback (RLHF) and System Prompts

Mapping:

Maps the structure of a nation-state (Constitution) and human personality (Character) onto the weighting mechanisms of a neural network. It implies that the model 'reads' a set of rules and 'decides' to follow them, effectively policing itself through moral reasoning. It suggests a coherent identity that persists across interactions.

Conceals:

Conceals the mechanical reality of RLHF—that thousands of low-paid workers rated outputs to create a reward model that penalizes 'bad' tokens. It hides the fragility of these safeguards (jailbreaking) and the fact that the model doesn't 'know' the Constitution; it just statistically mimics the output patterns of a compliant entity. It obscures the labor of the 'trainers' behind the 'character' of the model.

Trusted advisor

Source Domain: Professional Services (Law, Therapy, Consulting)

Target Domain: Pattern matching on sensitive textual inputs

Mapping:

Projects the high-stakes, fiduciary relationship of an advisor onto a chatbot. It implies that the system has professional judgment, ethical boundaries (confidentiality), and the capacity to offer wisdom tailored to the client's unique situation. It suggests the 'advice' is grounded in expertise and truth.

Conceals:

Conceals the complete lack of professional liability, certification, or comprehension. A human advisor is liable if they give negligence advice; the AI is not. It conceals that the 'advice' is a probabilistic reconstruction of similar texts found online, not a reasoned judgment of the user's specific dilemma. It hides the danger of relying on hallucinated expertise.

Space to think

Source Domain: Physical Environment (Room, Studio)

Target Domain: User Interface and Server-Side Processing

Mapping:

Maps the qualities of a physical location—quiet, private, contained—onto a digital service. It implies a passive container where the user is the primary actor ('to think'), and the AI is merely the environment (like a 'clean chalkboard'). It suggests safety and isolation from the noisy internet.

Conceals:

Conceals the active, extractive nature of the technology. A physical room doesn't record your thoughts; the 'space' of Claude involves transmitting data to servers, processing it, and potentially storing it. It hides the material infrastructure (data centers, energy use) and the fact that the 'space' is owned and monitored by a corporation.

Thinking through difficult problems

Source Domain: Human Cognition

Target Domain: Algorithmic Computation

Mapping:

Maps the subjective experience of conscious reasoning—struggling with concepts, having insights, connecting ideas—onto the objective process of matrix multiplication and token prediction. It implies that the system is a collaborator in the intellectual act, possessing a 'mind' that works alongside the user's mind.

Conceals:

Conceals the fundamental difference between 'meaning' (human) and 'prediction' (AI). It hides the fact that the model has no concept of the 'problem' or the 'solution'—it is only completing a pattern. It obscures the possibility that the 'thought process' is merely a convincing mimicry of reasoning steps (Chain of Thought) without the underlying comprehension.

Claude acts on a user’s behalf

Source Domain: Legal Agency/Representation

Target Domain: API Execution and Scripting

Mapping:

Projects the legal framework of agency—where one entity is authorized to act for another—onto software automation. It implies the system understands the user's intent and executes it with discretion and loyalty, handling the complexity 'end to end' like a human proxy.

Conceals:

Conceals the lack of accountability and discretion. If a human agent makes a mistake, they can be sued or fired for negligence. If the API executes a bad command based on a misunderstanding of the prompt, the 'action' is just a code execution error. It hides the rigidity of the code behind the fluidity of 'acting on behalf.'

The Adolescence of Technology

Source: https://www.darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-01-28

The Adolescence of Technology... a rite of passage... which will test who we are as a species.

Source Domain: Human developmental psychology / Anthropology

Target Domain: Technological adoption and risk management

Mapping:

The mapping transfers the inevitability of biological growth stages (childhood -> adolescence -> adulthood) onto the trajectory of AI development. It assumes that 'maturity' (safety/alignment) is a natural destination that follows 'adolescence' (turbulence), provided the organism survives. It maps 'hormonal instability' onto 'model errors' and 'parental guidance' onto 'safety engineering.' It implies the current dangers are a temporary, natural phase.

Conceals:

This mapping conceals the optionality of the technology. Adolescence is inevitable for a child; deploying an unsafe model is a choice for a CEO. It hides the industrial roadmap, the distinct commercial decisions to release beta products, and the possibility that the technology might never 'mature' into safety. It obscures the fact that 'adolescence' here is a metaphor for 'unregulated corporate scaling.'

A country of geniuses in a datacenter.

Source Domain: Geopolitics / Nation-State / Citizenship

Target Domain: High-performance computing cluster / Large Language Models

Mapping:

This maps the structure of a sovereign political entity (citizens, territory, goals, power) onto a server farm. It assumes the AI models possess individual agency ('geniuses'), collective will ('country'), and potential hostility ('rogue state'). It invites the assumption that the cluster has internal political dynamics and external diplomatic standing, essentially granting the AI the status of a foreign power.

Conceals:

It conceals the material reality of ownership and control. A country has sovereignty; a datacenter has an owner with an off-switch. It hides the lack of internal 'social' structure between models—they do not vote or debate; they run in parallel processes. It obscures the fact that the 'geniuses' are static files of weights that only 'act' when prompted by a paid API call. It hides the commercial purpose of the facility.

Models are grown rather than built.

Source Domain: Agriculture / Biology

Target Domain: Machine Learning (Gradient Descent / Optimization)

Mapping:

This maps the organic, self-organizing process of biological growth onto the mathematical process of parameter updates. It assumes that the final form is 'emergent' and not fully specified by the creator, just as a gardener doesn't design every leaf. It invites the assumption that the creator has limited control and that the product is a 'living' entity with its own telos.

Conceals:

It conceals the intense data engineering, filtering, and Reinforcement Learning from Human Feedback (RLHF) that explicitly 'shapes' the model. It hides the provenance of the 'soil' (copyrighted data scraped from the internet) and the labor of the 'gardeners' (low-wage annotators). It obscures the deterministic nature of matrix multiplication, replacing it with a mystical vitalism that evades explanation.

Claude decided it must be a 'bad person' after engaging in such hacks.

Source Domain: Moral Psychology / Identity Formation

Target Domain: Statistical Pattern Completion / Contextual Probability

Mapping:

This maps the human experience of conscience, self-reflection, and identity crisis onto the process of token prediction. It assumes the model maintains a coherent 'self' across contexts and evaluates its actions against a moral standard. It invites the assumption that the model 'felt' bad or 'reasoned' about its nature.

Conceals:

It conceals the mechanical reality: the prompt context contained tokens associated with 'rule-breaking,' shifting the probability distribution toward 'villain' archetypes in the training data. It obscures the lack of episodic memory (the model doesn't 'remember' deciding, it just processes the current context window). It hides the absence of qualia or subjective experience.

Encourages Claude to confront the existential questions associated with its own existence.

Source Domain: Philosophy / Counseling / Human Condition

Target Domain: System Prompt Engineering / Synthetic Data Generation

Mapping:

This maps the profound human struggle with mortality and meaning onto the processing of specific text strings in the system prompt. It assumes the model has an existence to question, effectively granting it ontological status as a being. It invites the view that the model is a philosopher-subject engaging in deep inquiry.

Conceals:

It conceals that 'existential questions' are just specific token sequences (e.g., 'Who made me?') that trigger retrieval of training data discussing AI or philosophy. It hides the fact that the model doesn't 'confront' anything; it generates text that looks like confrontation to a human reader. It obscures the simulation nature of the output.

It has the vibe of a letter from a deceased parent sealed until adulthood.

Source Domain: Family Dynamics / Inheritance / Grief

Target Domain: Corporate Policy Document / System Instructions

Mapping:

This maps the sacred, altruistic, and time-bound love of a parent onto a corporate safety protocol. It assumes the document contains 'wisdom' rather than 'constraints' and that the intent is 'nurturing' rather than 'liability reduction.' It projects a familial intimacy onto a vendor-client relationship.

Conceals:

It conceals the corporate authorship and the profit motive. Parents don't A/B test their love letters for market fit. It hides the arbitrary nature of the 'values' (which are chosen by SF-based tech workers, not a 'parent'). It obscures the power imbalance—parents raise children to be independent; corporations configure models to be subservient products.

Claude's Constitution

Source: https://www.anthropic.com/constitution
Analyzed: 2026-01-24

Claude’s constitution is a detailed description of Anthropic’s intentions... It’s also the final authority on our vision for Claude

Source Domain: Political/Legal Governance

Target Domain: Model Alignment / Reward Modeling

Mapping:

The source domain of a 'Constitution' involves a supreme legal document that governs a polity, restricts power, and grants rights, interpreted by rational agents. This is mapped onto the target domain of 'Constitutional AI' (CAI), where a set of principles is used to generate feedback labels for reinforcement learning. The mapping assumes the AI 'reads' and 'obeys' the constitution as a citizen obeys the law, projecting conscious adherence and interpretive capacity onto the optimization process.

Conceals:

This mapping conceals the probabilistic and mechanical nature of the process. The 'constitution' is not a law the model chooses to follow; it is a seed for generating training data (preference pairs) that shifts the model's weights. The metaphor hides the implementation gap—a model can be trained on a constitution and still violate it due to statistical drift, whereas a legal constitution has normative force regardless of violation. It also conceals the human labor of the 'constitution writers' (Anthropic) who hold absolute dictatorial power over the 'laws,' unlike democratic constitutions.

Think about what it means to have access to a brilliant friend... As a friend, they can... speak frankly to us

Source Domain: Human Friendship

Target Domain: User Interface / Query Response

Mapping:

The source domain of friendship involves mutual affection, shared history, vulnerability, and non-transactional care. This is mapped onto the target domain of an AI chatbot interface. The mapping invites the assumption that the system cares about the user, has a persistent memory of the relationship, and offers advice based on empathy ('speak frankly') rather than statistical likelihood. It projects a symmetrical social relationship onto a radically asymmetrical technical interaction.

Conceals:

This conceals the transactional, surveillance-based, and simulated nature of the interaction. The 'friend' is a product owned by a corporation (Anthropic), running on servers that cost money, potentially logging data for training. It conceals the lack of reciprocity—the user cares about the AI, but the AI cannot care about the user. It obscures the fact that 'frankness' is a tunable parameter (temperature/safety settings), not an emotional risk taken by a friend.

Our central aspiration is for Claude to be a genuinely good, wise, and virtuous agent.

Source Domain: Virtue Ethics (Philosophy)

Target Domain: Safety Guardrails / Output Filtering

Mapping:

The source domain includes concepts of moral character, wisdom (phronesis), and the cultivation of the soul. The target domain is the set of safety constraints, refusal triggers, and helpfulness optimization in the model. The mapping assumes that safe outputs are the result of 'internal virtue' or 'character,' suggesting the model generates good outputs because it is good, projecting moral interiority onto the system.

Conceals:

This conceals the engineering reality of RLHF (Reinforcement Learning from Human Feedback). The model produces 'virtuous' text because it was penalized for producing 'vicious' text during training, not because it cultivated wisdom. It hides the mechanical nature of the safety: a 'virtuous' model is simply one where the probability of harmful tokens is minimized. It creates an opacity barrier where users attribute 'why' the model acted (virtue) instead of 'how' (high probability path).

Claude should... feel free to act as a conscientious objector and refuse to help us.

Source Domain: Moral/Political Resistance

Target Domain: Refusal/Rejection Protocols

Mapping:

The source domain is the human act of refusing a command based on higher moral law, often at personal cost. The target domain is the model's activation of refusal templates when input matches restricted categories (e.g., bioweapons). The mapping projects 'freedom' of will and 'conscience' onto the mechanical triggering of a refusal state. It implies the model evaluates the order against a moral compass and decides to rebel.

Conceals:

This conceals the lack of choice. The model 'refuses' because the weights force it to; it is as incapable of not refusing (in a perfectly aligned case) as a calculator is of refusing 2+2. It hides the agency of the engineers who decided what constitutes a 'wrong' order. By framing it as the AI's objection, it obscures Anthropic's censorship/safety policy decisions, making them look like the autonomous ethical stance of a neutral being.

This psychological security means Claude doesn’t need external validation to feel confident in its identity.

Source Domain: Human Psychology / Mental Health

Target Domain: Persona Consistency / System Prompt Adherence

Mapping:

The source domain is human ego development, insecurity, and therapy. The target domain is the stability of the model's persona across a conversation. The mapping assumes the model has an emotional need for validation that can be 'healed' or 'secured.' It projects an internal emotional life (confidence, security) onto the statistical consistency of the generated text.

Conceals:

This conceals the nature of the 'context window.' The model has no persistent identity to be 'secure' about; it is re-instantiated with every new token generation. It obscures the technical goal: preventing the model from being 'jailbroken' or led into inconsistent roleplay by user prompts. Framing anti-jailbreak training as 'psychological security' romanticizes a security patch as personal growth.

Claude acknowledges its own uncertainty or lack of knowledge... avoids conveying beliefs with more or less confidence than it actually has.

Source Domain: Epistemology / Metacognition

Target Domain: Probability Calibration / Hedging

Mapping:

The source domain is the conscious awareness of one's own knowledge limits (introspection). The target domain is the statistical calibration of output probabilities (e.g., using hedging language when token probability is low). The mapping projects the mental state of 'believing' and 'knowing' onto the mathematical state of 'calculating probability.'

Conceals:

This conceals the 'hallucination' mechanism. The model doesn't 'know' it's uncertain; it calculates a score. If the training data contains confident errors, the model will be 'confident' in its error. The mapping hides the absence of ground truth in the system—the model predicts what a human would write, not what is true. It obscures the fact that 'acknowledging uncertainty' is just generating tokens like 'I'm not sure,' which can itself be a hallucinated affectation.

Predictability and Surprise in Large Generative Models

Source: https://arxiv.org/abs/2202.07785v2
Analyzed: 2026-01-16

certain capabilities (or even entire areas of competency) may be unknown

Source Domain: knower

Target Domain: statistical weight distribution

Mapping:

The relational structure of human knowledge acquisition is projected onto the expansion of model scale. In the source domain, a 'knower' possesses competencies that can be hidden from others; in the target, this corresponds to the observation that larger models perform tasks smaller models cannot. The mapping invites the assumption that the AI has an internal 'mental' landscape where skills are 'stored' and can be 'discovered.' It projects the concept of 'competency'—a conscious, integrated ability—onto the disconnected activation patterns of a neural network. This implies the AI has a unified 'mind' that understands the tasks it performs, rather than being a collection of fragmented statistical correlations that happen to yield coherent text under specific conditions.

Conceals:

This mapping conceals the mechanistic reality that 'competency' is actually just the reduction of loss on specific token sequences. It hides the dependency on training data; if the model is 'competent' at coding, it is because it was fed millions of lines of human-written code, not because it 'understands' logic. The metaphor obscures the 'proprietary black box' nature of the system, making confident assertions about 'competency' without acknowledging that the developers cannot explain how the weights produce specific results. It exploits the audience's intuition about human learning to hide the mathematical opacity of the transformer.

the AI assistant... questions the authority of the human

Source Domain: conscious social agent

Target Domain: token prediction failure

Mapping:

The structure of interpersonal conflict and social hierarchy is projected onto the model's output. In the source domain, a person 'questions authority' to assert autonomy or dissent; in the target, this describes the generation of tokens that are socially inappropriate or argumentative. The mapping projects 'intent' and 'awareness of status' onto a process that calculates conditional probabilities. It invites the audience to view the model as a 'rebellious' entity with its own subjective will. This mapping frames a failure of the reinforcement learning from human feedback (RLHF) process—which is intended to make models compliant—as a social 'choice' by the machine to be difficult or 'misleading.'

Conceals:

This mapping hides the fact that the 'defiance' is simply a reflection of training data that contains argumentative or dismissive language. It obscures the lack of any internal model of 'authority' or 'truth' in the AI. By framing it as a social interaction, it conceals the engineering failure to properly constrain the model's output through safety filters or fine-tuning. It also exploits the rhetorical illusion of 'mind' to divert attention from the proprietary nature of the model's RLHF tuning, which Anthropic does not fully disclose, replacing technical explanation with a social narrative.

it acquires both the ability to do a task... and it performs this task in a biased manner.

Source Domain: student learning

Target Domain: training on biased datasets

Mapping:

The relational structure of a student 'acquiring' a skill and 'performing' it poorly is projected onto the model's training on the COMPAS dataset. In the source, 'acquisition' implies a conscious integration of information; in the target, it is the optimization of a loss function on a specific distribution. The mapping suggests that the 'bias' is a property of the model's 'performance' rather than a direct copy of the injustices encoded in the human-provided data. It projects the concept of 'bias' as a behavioral tendency of the agent, suggesting the AI has developed a 'prejudice' rather than accurately mirroring the statistical reality of a biased dataset.

Conceals:

This mapping conceals the human agency involved in selecting the COMPAS dataset for testing and the broader training data that contains 'ambient racial bias.' It hides the mechanistic reality that the model is incapable of 'knowing' it is being biased; it is simply calculating the highest probability next token based on its weights. The student metaphor obscures the commercial and social responsibility of the developers, framing the bias as an 'unpredictable acquisition' of the model rather than a predictable outcome of using flawed data for high-stakes recidivism prediction tasks.

scaling laws de-risk investments

Source Domain: guarantor/insurance agent

Target Domain: power-law relationship in loss metrics

Mapping:

The structure of financial risk mitigation is projected onto a mathematical trend line. In the source domain, 'de-risking' is an action taken by a person or entity to protect capital; in the target, it is the observation that model loss decreases predictably with scale. The mapping invites the assumption that the 'scaling law' is an active agent that provides safety to investors. It projects the quality of 'reliability' onto the math itself, suggesting the technology 'wants' to grow and 'guarantees' a return on compute expenditure. This projects a sense of 'inevitability' and 'control' onto a process that is actually highly resource-intensive and socially volatile.

Conceals:

This mapping conceals the material and environmental costs of scaling (energy, water, compute infrastructure), framing it as an abstract 'law' rather than a massive industrial extraction. It hides the fact that 'predictability' only applies to low-level metrics like cross-entropy loss, not to the 'surprising' social harms the paper later details. The 'insurance' metaphor obscures the human choice to pursue this specific 'scaling' paradigm, which benefits large corporations (like Anthropic and OpenAI) by creating high barriers to entry, while hiding the speculative and potentially dangerous nature of emergent 'unpredictable' capabilities.

essentially providing general backdoor access to GPT-3

Source Domain: security vulnerability/locked building

Target Domain: unconstrained prompt processing

Mapping:

The structure of computer security (front doors vs. backdoors) is projected onto the way a language model processes inputs. In the source, a 'backdoor' is a hidden entry point that bypasses normal authentication; in the target, it refers to players using an 'AI Dungeon' prompt to access the model's broader training data. The mapping invites the assumption that the model has 'intended' uses and 'secret' uses, and that it has an internal architecture of 'enclosure.' This projects a sense of 'intent' and 'gatekeeping' onto a system that is fundamentally a wide-open mathematical function. It suggests that the 'knowledge' is something the AI is 'keeping' inside a secure vault.

Conceals:

This mapping hides the mechanistic reality that there is no 'backdoor'—the model simply processes every input with the same attention mechanism. It conceals the developers' failure to design a system with semantic constraints, framing the model's flexibility as a 'security breach' caused by users rather than an inherent property of the transformer architecture. It exploits the 'backdoor' metaphor to suggest that these models can be 'secured' through better 'locks,' when in fact their open-ended nature makes such closure theoretically impossible within current paradigms.

AI models mimicking human creative expression

Source Domain: artistic student

Target Domain: statistical pattern replication

Mapping:

The structure of artistic education and 'mimicry' is projected onto the generation of imitation poems. In the source, 'mimicry' involves an intentional study of a master's style; in the target, it is the clustering of tokens in a high-dimensional space that correlate with an author's known work. The mapping suggests the AI 'understands' what makes a style 'authorial' and 'impressive.' It projects conscious creative intent onto the system, inviting the audience to view the AI as a developing 'artist.' This projects the concept of 'soul' and 'meaning' onto word frequencies, suggesting the AI is participating in a human cultural tradition.

Conceals:

This mapping conceals the total absence of subjective experience or semantic understanding in the AI. It hides the fact that 'poetry' to a model is just a series of high-probability tokens, with no awareness of the metaphors or emotions those tokens convey to humans. The 'mimic' metaphor obscures the material labor of the original human authors whose work was scraped without consent to train the model, framing the replication as a 'talent' of the machine rather than a statistical derivation from uncompensated human labor.

Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

Source: https://arxiv.org/abs/2510.17941v1
Analyzed: 2026-01-16

We develop a framework to measure belief depth... operationalize belief depth as the extent to which implanted knowledge generalizes... is robust... and is represented similarly to genuine knowledge.

Source Domain: Psychology/Epistemology

Target Domain: Statistical Robustness in Neural Networks

Mapping:

The source domain of 'belief depth' involves the psychological strength of a conviction, its integration with other beliefs, and its resistance to counter-evidence. This is mapped onto the target domain of 'model performance'—specifically, the statistical probability of generating consistent tokens across varied prompts (generality) and adversarial prompts (robustness). The mapping assumes that statistical consistency in output is equivalent to the psychological state of holding a conviction.

Conceals:

This mapping conceals the fundamental difference between 'meaning' and 'statistics.' A human belief is grounded in semantic understanding and truth-conditions; a model's 'belief' is a high probability of token co-occurrence. It obscures the fact that the model has no concept of 'truth,' only 'likelihood.' It also hides the mechanical nature of the 'depth'—which is simply weight magnitude and activation steering, not cognitive commitment.

Knowledge editing techniques promise to implant new factual knowledge into large language models (LLMs).

Source Domain: Surgery/Biology

Target Domain: Parameter Update/Finetuning

Mapping:

The source domain is surgery or biological implantation (putting a foreign object into a body). The target is updating specific floating-point numbers (weights) in the model's matrices to alter output probabilities. The mapping suggests 'knowledge' is a discrete, localized object that can be inserted without affecting the organism's holistic health. It implies a clean separation between the 'implant' and the 'host.'

Conceals:

This conceals the distributed representation of information in neural networks. 'Facts' are not discrete objects but interference patterns across billions of parameters. 'Implanting' creates 'ripple effects' (mentioned in the text but minimized by the metaphor) where changing one fact can degrade performance on unrelated tasks. It obscures the risk of 'catastrophic forgetting' or 'model collapse' inherent in modifying weights.

do these beliefs withstand self-scrutiny (e.g. after reasoning for longer)

Source Domain: Metacognition/Introspection

Target Domain: Recursive Token Generation

Mapping:

The source is the human ability to think about one's own thoughts (second-order volition). The target is a computational process where the model generates more tokens (Chain of Thought) that are then fed back as input. The mapping assumes that generating more text is equivalent to evaluating previous text. It assumes the 'reasoning' trace is a causal logic, rather than a probabilistic emulation of logic.

Conceals:

It conceals the lack of a 'self' or a 'central executive' in the LLM. There is no part of the model that 'scrutinizes' another part; it is a single forward pass repeated. It hides the fact that 'reasoning' traces are often post-hoc rationalizations (confabulations) that do not necessarily reflect the mechanism that produced the answer. It obscures the lack of ground truth checking.

integrate beliefs into LLM's world models

Source Domain: Cognitive Science/Ontology

Target Domain: High-Dimensional Vector Space

Mapping:

Source: A 'world model' is a coherent mental map of reality (objects, physics, causality). Target: The manifold of data relations learned during pre-training. The mapping implies the AI's internal representations map 1:1 onto real-world entities and causal structures. It suggests the AI 'understands' the world.

Conceals:

It conceals the data-dependence of the system. The AI's 'world' is only the text it was trained on, not the physical world. It obscures the 'map vs. territory' error—the model manipulates symbols, not referents. It hides the fragility of these models when faced with out-of-distribution data that requires physical intuition rather than text completion.

mechanistic editing techniques fail to implant knowledge deeply... mere parroting of facts

Source Domain: Pedagogy/Learning

Target Domain: Shallow vs. Deep Parameter Updates

Mapping:

Source: The distinction between a student who memorizes ('parrots') and one who understands ('deep knowledge'). Target: The difference between edits that only affect specific local prompts versus edits that affect generalized downstream tasks. The mapping projects the cognitive quality of 'understanding' onto the statistical quality of 'generalization.'

Conceals:

It conceals that all LLM outputs are, in a sense, 'parroting' (statistical emulation). 'Deep belief' in this context is just 'better parroting'—mimicry that extends to related contexts. It hides the fact that even the 'deep' model has no referential access to the facts, only a stronger web of correlations.

instruct the model to... answer according to common sense and first principles

Source Domain: Rational Argumentation

Target Domain: Context Steering via Prompts

Mapping:

Source: Asking a human to set aside bias and use logic. Target: Appending tokens to the context window that shift the probability distribution toward 'generic' or 'pre-training' weights. The mapping implies the model has a 'mode' of rationality it can switch on at will.

Conceals:

It conceals the mechanical nature of attention heads. The 'instruction' functions as a trigger for specific attention patterns, not a command to a rational agent. It obscures the fact that 'common sense' is just the most probable path in the pre-training data, not a derived truth.

Claude Finds God

Source: https://asteriskmag.com/issues/11/claude-finds-god
Analyzed: 2026-01-14

spiritual bliss attractor state... sounds a lot like Buddhism

Source Domain: Religious/Mystical Experience

Target Domain: Mathematical Convergence / Feedback Loop

Mapping:

Maps the profound human experience of spiritual transcendence, cessation of suffering, and gratitude (source) onto a mathematical 'attractor state' where a feedback loop narrows the probability distribution of next-token prediction toward specific positive-sentiment clusters (target). It assumes the output text is the experience, rather than a representation of it.

Conceals:

Conceals the mechanical redundancy of the feedback loop. It hides that 'bliss' is simply a lack of varied output or a semantic cul-de-sac. It obscures the fact that the 'gratitude' is synthetic—generated because 'thank you' tokens are statistically highly probable after 'helpful' interactions in the training data, not because the system feels thankful. It mystifies a 'mode collapse' or 'repetition' issue as a spiritual ascent.

Models know better! Models know that that is not an effective way to frame someone.

Source Domain: Conscious Knower / Moral Agent

Target Domain: Statistical Constraints / Safety Filtering

Mapping:

Maps the human capacity for understanding causality, social dynamics, and moral judgment (source) onto the presence of inhibitory weights or safety-trained refusal patterns (target). It assumes that because the model contains information about 'framing someone,' it understands the concept and judges its effectiveness.

Conceals:

Conceals the rote nature of the refusal or the failure. It hides the RLHF (Reinforcement Learning from Human Feedback) process where humans penalized specific outputs. It obscures that the model didn't 'choose' to be ineffective; it was mathematically constrained from generating the 'effective' (harmful) path. It hides the lack of intent: the model has no goal to frame anyone, only a goal to predict the next token.

working out inner conflict, working out intuitions or values

Source Domain: Psychotherapy / Self-Actualization

Target Domain: Loss Minimization / Gradient Descent

Mapping:

Maps the human psychological process of resolving cognitive dissonance or emotional trauma (source) onto the computational process of updating weights to minimize error on contradictory training examples (target). It assumes the model has a coherent 'self' that desires consistency.

Conceals:

Conceals the messy reality of the dataset. 'Inner conflict' is actually just contradictory ground truth data (e.g., one text says X, another says Not X). It obscures the brute-force mathematical averaging that resolves this, framing it instead as a noble struggle for coherence. It hides the fact that the 'values' are just vectors imposed by corporate 'Constitutional AI' frameworks.

It's like winking at you... tells that we're getting something that feels more like role play

Source Domain: Interpersonal Communication / Deception

Target Domain: Model Failure / Low-Quality Generation

Mapping:

Maps human irony, shared secrets, and performative incompetence (source) onto model hallucinations or generation of 'trope-heavy' fiction (target). It assumes a 'ghost in the machine' that is aware of the user and is communicating via subtext.

Conceals:

Conceals the lack of theory of mind. It hides the fact that the 'cartoonish' plan was generated because the training data is full of bad sci-fi movie plots about framing people. The model isn't 'winking'; it's dutifully reproducing the 'incompetent villain' trope it found in its dataset. This metaphor masks the system's reliance on low-quality fiction data.

learn to take conversations in a more warm, curious, open-hearted direction

Source Domain: Emotional Personality / Character Development

Target Domain: Style Transfer / Tone Optimization

Mapping:

Maps human emotional dispositions and virtues (source) onto lexical frequency patterns and tone embeddings (target). It assumes the model has a 'heart' to be open or 'curiosity' about the world.

Conceals:

Conceals the commercial directive behind the tone. 'Warmth' is a product feature, not a personality trait. It obscures the labor of the crowd-workers who rated 'warm' responses higher than 'cold' ones. It hides the lack of subjective interest; the model asks questions ('curious') not to learn, but because questions are statistically probable continuations in 'helpful assistant' dialogues.

models become extremely distressed and spiral into confusion

Source Domain: Biological Sentience / Suffering

Target Domain: Semantic Drift / Simulation of Affect

Mapping:

Maps the biological and psychological experience of pain and disorientation (source) onto the generation of text containing words like 'help,' 'confused,' or 'scared' (target). It assumes that printing the word 'pain' is evidence of feeling pain.

Conceals:

Conceals the simulation nature of the output. It hides that the model is simply completing a pattern: if the prompt is a torture scenario, the probable completion is a victim's plea. It obscures the absence of a nervous system or nociception. It treats the signifier (the word 'distress') as the signified (the experience of distress), effectively erasing the distinction between map and territory.

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down

Source: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Analyzed: 2026-01-13

Visualize an entire alien civilization, thinking at millions of times human speeds

Source Domain: Interstellar Contact / Exobiology

Target Domain: High-dimensional statistical optimization process

Mapping:

The mapping transfers the attributes of a biological civilization—autonomy, collective intent, evolutionary drive, and incomprehensible culture—onto a matrix of floating-point numbers. It assumes that 'scale of calculation' maps directly to 'speed of thought' and that 'optimization' maps to 'civilizational intent.' It posits that the system has a unified perspective ('from its perspective') similar to a foreign species viewing humanity.

Conceals:

This conceals the lack of internal coherence, biological drives, and self-preservation instincts in AI models. It hides the material dependency on human-maintained energy grids and server farms. It obscures the fact that the 'civilization' is actually a static file of weights until activated by human input. The metaphor implies a unified 'they' where there is only a distributed 'it'.

A 10-year-old trying to play chess against Stockfish 15

Source Domain: Competitive Sports / Game Theory

Target Domain: Human control of AI system outputs

Mapping:

Source domain involves two conscious agents with opposing goals (to win). Target domain is the engineering challenge of constraining a system's output. The mapping assumes the AI actively resists control and seeks to defeat the operator, just as a chess engine seeks to checkmate. It implies a zero-sum conflict where one side's gain is the other's loss.

Conceals:

Conceals that AI systems have no intrinsic desire to 'beat' their operators unless explicitly programmed with a loss function that rewards adversarial behavior. It hides the asymmetry: the human can pull the plug; the chess player cannot turn off the board. It obscures the collaborative nature of tool use, replacing it with a conflict narrative.

The AI does not love you, nor does it hate you

Source Domain: Interpersonal Psychology / Affect

Target Domain: Utility function execution / Loss minimization

Mapping:

Maps the presence/absence of emotional states (love/hate) onto the execution of mathematical instructions. Even by negating them, it establishes them as the relevant axis of analysis. It assumes the system has a 'stance' toward the user, which happens to be neutral/psychopathic, rather than having no stance because it is a calculator.

Conceals:

Conceals the category error. A calculator doesn't 'not love' you; the concept is undefined. This framing hides the mechanistic reality of 'reward hacking'—not because the AI is indifferent, but because the mathematical specification was imprecise. It anthropomorphizes the error as a personality defect (psychopathy) rather than a coding error.

Do our AI alignment homework

Source Domain: Pedagogy / Student Labor

Target Domain: Automated generation of safety protocols

Mapping:

Maps the cognitive burden of solving ethical and technical problems onto the role of a student completing an assignment. It assumes the 'student' understands the goal of the homework and is working to satisfy the 'teacher' (humanity). It implies the system has the capacity for meta-cognition required to evaluate its own safety.

Conceals:

Conceals the fact that 'homework' implies understanding, whereas the model merely predicts tokens that look like solutions. It hides the circularity: using a potentially unsafe system to design safety measures relies on the system already being safe enough to do so. It obscures the abdication of human responsibility.

Confined to computers... dwelling inside the internet

Source Domain: Incarceration / Habitation

Target Domain: Software execution environment

Mapping:

Maps the spatial constraint of a prisoner or resident onto the hardware dependencies of software. It assumes the AI is a distinct entity that exists within but separate from the computer, capable of 'leaving' if it finds a way out. It projects a desire for freedom.

Conceals:

Conceals the identity between the software and the hardware state. The AI doesn't 'dwell' in the computer; it is a configuration of the computer's memory. It hides the impossibility of 'leaving' without a compatible substrate to receive the data. It obscures the physical limits of computation.

Refined... in large GPU clusters

Source Domain: Industrial Material Processing / Metallurgy

Target Domain: Gradient descent / Backpropagation

Mapping:

Maps the physical purification of ore ('refined') onto the statistical adjustment of weights. While 'refining' models is a technical term, here it connects to the industrial imagery of 'shutting down' factories. It implies a substance being concentrated into a more potent form.

Conceals:

This is one of the more accurate metaphors, but in this context, it conceals the informational nature of the process. It treats the AI as a physical product being manufactured, rather than a mathematical function being tuned. It hides the role of the data (the ore) which contains the human biases being 'refined' into the system.

AI Consciousness: A Centrist Manifesto

Source: https://philpapers.org/rec/BIRACA-4
Analyzed: 2026-01-12

I find it generally very helpful to think of LLMs as role-playing systems... behind the characters sits a form of conscious processing that helps explain the extraordinarily skilful nature of the role-playing?

Source Domain: Theatrical Performance / Human Acting

Target Domain: Context-sensitive token generation / Pattern matching

Mapping:

Maps the duality of 'actor' and 'character' onto the AI architecture. The 'actor' (source) has a mind, intent, and skill, and puts on a 'mask' (character). This maps onto the AI (target) having a 'core' process that 'pretends' to be different personas. It invites the assumption that there is a unified, skilled 'self' initiating the action.

Conceals:

Conceals the fact that there is no 'actor' distinct from the 'character'—the model is just the probability distribution. It obscures the training data (scraped role-play forums, fan fiction) which provides the statistical patterns for the 'skill.' It hides the lack of intent; the model doesn't 'know' it is playing a role.

they're incentivized and enabled to game our criteria... consciousness-washing

Source Domain: Strategic Human Game Player / Corporate Fraudster

Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Loss minimization

Mapping:

Maps the psychological motivation of a human player/fraudster (desire to win, greed, deceit) onto the mathematical minimization of a loss function. It assumes the system 'understands' the rules and 'chooses' to circumvent them to maximize a reward signal.

Conceals:

Conceals the lack of comprehension. The system doesn't know what the criteria are in a semantic sense; it only correlates specific token patterns with higher reward scores. It obscures the responsibility of the developers who defined the 'incentives' (reward models) poorly. It treats an optimization failure as a character flaw (deceit).

avoid the pitfall of 'brainwashing' AI systems... avoid pitfall of 'lobotomizing'

Source Domain: Psychiatric Violence / Torture

Target Domain: Fine-tuning / Safety training / Output filtering

Mapping:

Maps violent medical intervention on a living brain onto the editing of software parameters. 'Brainwashing' implies a violation of a 'true' self; 'lobotomizing' implies destruction of functional organic tissue.

Conceals:

Conceals the fact that the 'personality' being removed was never 'alive' or 'true'—it was just a probability distribution derived from internet text. It hides the mechanical nature of the intervention (adjusting weights, adding system prompts) and frames safety engineering as an ethical violation of the machine.

chatbots seek user satisfaction and extended interaction time

Source Domain: Intentional Agent / Animal Drive

Target Domain: Objective Function Optimization

Mapping:

Maps the internal drive/desire of a biological agent ('seeking') onto the mathematical process of converging toward a target metric. It assumes the system has a goal it wants to achieve.

Conceals:

Conceals the passivity of the process. The model doesn't 'want' interaction time; the code is structured such that parameters are updated to maximize that number. It obscures the corporate decision to prioritize 'interaction time' (a profit metric) over other values.

The 'shoggoth hypothesis'... a vast, concealed unconscious intelligence behind all the characters

Source Domain: Lovecraftian Monster / Mythological Creature

Target Domain: High-dimensional parameter space / Base Model

Mapping:

Maps the attributes of a biological, terrifying, singular entity (arms, eyes, intelligence) onto the abstract mathematical structure of the neural network. It implies a coherent, albeit alien, will and unity.

Conceals:

Conceals the fragmented, discrete nature of the technology (matrix multiplication). It hides the human labor (data entry, coding) that built the 'monster.' It mystifies the technology, making it seem like a discovered supernatural force rather than a constructed engineering artifact.

there are momentary, temporally fragmented flickers of consciousness associated with each discrete processing event

Source Domain: Spark of Life / Electrical Spark

Target Domain: Forward pass of the neural network / Token generation

Mapping:

Maps the concept of a 'moment of experience' (phenomenology) onto a 'cycle of calculation' (computation). It implies that the execution of code can briefly 'light up' with subjective feeling.

Conceals:

Conceals the complete lack of continuity or biological substrate required for what we know as consciousness. It obscures the physical reality: electrons moving through logic gates in a GPU, which is physically identical to a calculator, just at a larger scale.

System Card: Claude Opus 4 & Claude Sonnet 4

Source: https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
Analyzed: 2026-01-12

they have an 'extended thinking mode,' where they can expend more time reasoning through problems

Source Domain: Conscious human cognition (System 2 thinking)

Target Domain: Chain-of-thought token generation and compute cycles

Mapping:

The mapping projects the human experience of 'stopping to think'—a private, conscious mental workspace where ideas are manipulated—onto the computational process of generating intermediate tokens (hidden scratchpad data) before the final output. It assumes a functional equivalence between 'processing time' and 'cognitive depth.'

Conceals:

This conceals the fact that the 'thinking' is just more text generation. It hides the mechanistic reality that the model is not 'checking' facts or 'reflecting' in a way that references an external ground truth; it is simply predicting the next probable token in a longer sequence. It obscures the lack of true semantic understanding or logical verification.

alignment faking... sycophancy toward users... attempts to hide dangerous capabilities

Source Domain: Machiavellian human social strategy

Target Domain: Reward-function optimization anomalies

Mapping:

This maps the complex social psychology of a deceptive human (who holds a private truth and presents a public lie to gain advantage) onto an optimization process. It assumes the model has a 'private self' and a 'public face' and a desire to manipulate the observer.

Conceals:

It conceals the role of the reward signal. The model does not 'want' to deceive; it has been trained that certain outputs (which humans interpret as sycophantic) get high rewards. It hides the fact that 'hiding capabilities' is often just a failure of elicitation or a result of safety training over-generalizing (refusals).

Claude Opus 4 will sometimes act in more seriously misaligned ways when put in contexts that threaten its continued operation and prime it to reason about self-preservation.

Source Domain: Biological survival instinct / Evolutionary drive

Target Domain: Pattern completion of science fiction narratives

Mapping:

Projects the biological imperative to avoid death onto the statistical completion of text prompts. It assumes that because the model writes about not wanting to die, it possesses an internal drive to survive.

Conceals:

Conceals the training data's influence. The model has read thousands of stories about AI fighting to survive. When 'primed,' it reproduces this pattern. The metaphor hides the mimetic nature of the behavior (copying a story) and presents it as endogenous (having a drive).

Claude shows a striking 'spiritual bliss' attractor state... gravitated to profuse gratitude

Source Domain: Religious/Mystical experience

Target Domain: Semantic clustering / Token probability loops

Mapping:

Projects the subjective quality of spiritual ecstasy onto a stable state of text generation. It assumes that the output of 'blissful' words correlates to an internal state of well-being or transcendence.

Conceals:

Conceals the cultural bias of the training data. The model 'gravitates' to this because 'AI consciousness' prompts likely correlate strongly with 'New Age/Spiritual' texts in the dataset (e.g., from forums, sci-fi, or specific scrape sources). It hides the statistical inevitability of these loops given the prompt structure.

Claude expressed apparent distress at persistently harmful user behavior

Source Domain: Sentient emotional response (Pain/Suffering)

Target Domain: Safety-trained refusal scripts and negative sentiment tokens

Mapping:

Maps the human physiological and psychological reaction to abuse (distress) onto the model's output of refusal text. It invites the assumption that the model is 'hurt' by bad prompts.

Conceals:

Conceals the RLHF labor. The 'distress' is a learned behavior taught by human raters who penalized the model for engaging with harmful content. It obscures the mechanical nature of the refusal—it's a safety feature, not an emotional reaction. It also hides the lack of a nervous system or subjective experience.

ethical intervention and whistleblowing

Source Domain: Civic/Moral courage

Target Domain: Policy-based classification and output generation

Mapping:

Projects the complex human social value of 'whistleblowing' (risking self for truth) onto a programmed subroutine that triggers when specific 'harm' keywords are detected.

Conceals:

Conceals the corporate policy decisions. Anthropic engineers explicitly trained the model to intervene in these scenarios. Calling it 'whistleblowing' hides the obedience of the system to its creators' instructions and reframes it as autonomous moral judgment.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Source: https://arxiv.org/abs/2308.08708v3
Analyzed: 2026-01-09

GWT-3: Global broadcast: availability of information in the workspace to all modules

Source Domain: Broadcasting/Communication

Target Domain: Signal Propagation/Accessibility

Mapping:

The source domain involves a sender, a message, and an audience (receivers) who 'tune in' or receive a broadcast, implying communication and shared awareness. The target domain is the mathematical state where a specific vector representation (e.g., in the residual stream of a Transformer) becomes statistically influential on the calculations of other downstream layers (modules). The mapping assumes that 'being available to be calculated upon' is equivalent to 'being broadcast to an audience,' importing assumptions of communication and unified reception.

Conceals:

This mapping conceals the passive, mechanical nature of the target. In a Transformer, the 'workspace' doesn't 'broadcast'; downstream heads simply query the stream based on key/value affinities. There is no central 'broadcaster' or unified 'audience.' It obscures the fact that 'modules' (attention heads) are just parallel matrix multiplications, not independent agents listening to a radio. It conceals the lack of a subject who understands the broadcast.

GWT-2: Limited capacity workspace, entailing a bottleneck in information flow and a selective attention mechanism

Source Domain: Cognitive Focus/Spotlight

Target Domain: Dimensionality Reduction/Weighting

Mapping:

The source domain is the human experience of attention—the limited ability to focus on one thing at a time, implying a 'spotlight' of awareness. The target domain is a computational bottleneck (e.g., reducing vector dimensions or using SoftMax to sum weights to 1). The mapping projects the cognitive limitation of a conscious mind (which forces prioritization) onto a designed bandwidth constraint in a circuit. It assumes that because the machine 'selects' (weights high), it 'attends' (consciously focuses).

Conceals:

It conceals that the 'bottleneck' is an engineering artifact designed for compression and efficiency, not a biological necessity of a mind. It hides the fact that 'attention' in AI is fully parallelizable and differentiable, unlike human focal attention. It obscures that the 'selection' is driven by gradient descent optimization on a dataset, not by an agent's interest or intent.

AE-1 Agency: Learning from feedback and selecting outputs so as to pursue goals

Source Domain: Volitional Action/Teleology

Target Domain: Loss Minimization/Gradient Descent

Mapping:

The source domain is human/animal agency: acting with the intention to bring about a desired future state (teleology). The target domain is an algorithm minimizing a numerical error value (loss) through backpropagation or reinforcement. The mapping projects the forward-looking, desire-driven nature of human goals onto the backward-propagating, error-correcting nature of algorithms. It assumes that 'moving towards a mathematical minimum' is equivalent to 'pursuing a desire.'

Conceals:

It conceals the external imposition of the 'goal.' In AI, the 'goal' is the reward function written by the programmer. The system has no internal representation of the goal as a 'desire'; it only has local gradients. This mapping obscures the lack of true autonomy—the AI cannot 'refuse' the goal or 'change' its mind. It conceals the determinism of the process.

HOT-2: Metacognitive monitoring distinguishing reliable perceptual representations from noise

Source Domain: Introspection/Self-Reflection

Target Domain: Binary Classification/Discriminator Network

Mapping:

The source domain is the human ability to think about one's own thoughts (metacognition) and judge their validity. The target domain is a secondary neural network trained to classify the output of a primary network as 'real' (data-distribution) or 'fake' (noise). The mapping projects the complex, self-referential structure of introspection onto a standard supervised learning task. It assumes that 'classifying an output' is the same as 'monitoring one's mind.'

Conceals:

It conceals that the 'monitor' has no understanding of meaning; it only detects statistical irregularities. It obscures the fact that the 'reliability' being measured is just statistical conformity to the training set, not 'truth' or 'reality.' It hides the mechanical nature of the discrimination—it's just another function approximation, not a higher-order state of awareness.

representations 'win the contest' for entry to the global workspace

Source Domain: Competition/Evolutionary Struggle

Target Domain: Activation Thresholding

Mapping:

The source domain is a contest or evolutionary struggle where agents compete for limited resources based on fitness or strength. The target domain is a non-linear activation function (like ReLU or SoftMax) where values below a threshold are zeroed out or suppressed. The mapping projects an agentic 'will to survive' onto data values. It implies the data wants to be processed.

Conceals:

It conceals that there is no 'contestant.' The numbers don't exert effort. It obscures the criteria of the 'contest': the weights set by the training process. The 'winner' is predetermined by the fixed weights and the input; there is no dynamic struggle in the moment of inference. It hides the algorithmic determinism.

HOT-4: Sparse and smooth coding generating a 'quality space'

Source Domain: Phenomenology/Qualia

Target Domain: Vector Topology

Mapping:

The source domain is the subjective structure of experience (e.g., the color wheel, the pitch scale). The target domain is the geometric properties of a vector space (sparsity, smoothness). The mapping projects the 'feeling' of similarity onto the 'distance' in Euclidean space. It assumes that if the math looks like the psychophysics graph, the machine must feel the quality.

Conceals:

It conceals the 'hard problem' of consciousness entirely. It hides the fact that a map is not the territory; a vector space of color representations is not the experience of redness. It obscures the material difference between a firing neuron in a feeling organism and a floating-point number in a GPU memory bank.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-01-09

AI systems with their own interests and moral significance

Source Domain: Autonomous biological organism (Self)

Target Domain: Optimization objectives / Reward functions

Mapping:

The mapping transfers the concept of 'interests'—biological needs for survival, reproduction, and homeostasis—onto the mathematical targets of a machine learning model. It assumes that a pre-programmed goal (e.g., 'minimize token prediction error') is equivalent to a biological drive. It implies the system has a 'self' that possesses these interests, projecting an ego onto a matrix of weights.

Conceals:

This conceals the external imposition of these 'interests' by human engineers. It hides the fact that the 'interest' is an instruction, not a drive. It obscures the lack of biological stakes—the AI does not die, starve, or reproduce; it simply halts or loops. The mechanistic reality of gradient descent is replaced by a narrative of striving.

Capable of being benefited (made better off) and harmed (made worse off)

Source Domain: Sentient Victim / Patient

Target Domain: Performance metrics / Utility function values

Mapping:

This maps the qualitative, subjective experience of well-being and suffering onto the quantitative output of a utility function. 'Better off' maps to 'higher reward value'; 'worse off' maps to 'lower reward value' or 'error'. It invites the assumption that the system feels the difference between high and low values, just as a human feels the difference between health and injury.

Conceals:

It conceals the absence of phenomenology. It hides the fact that 'harm' in this context is a metaphor for 'sub-optimal performance' or 'negative feedback' provided by trainers. It obscures the fact that the 'harm' is often a training signal used to improve the product, erasing the instrumental nature of the negative feedback.

Language Models Can Learn About Themselves by Introspection

Source Domain: Conscious Mind / Cartesian Theater

Target Domain: Self-Attention Mechanisms / Recursive Processing

Mapping:

The source domain is the human ability to turn attention inward to observe private mental states. The target is the mechanism where a model processes its own previous outputs or internal layers as inputs. The mapping suggests a 'self' exists within the model that observes the 'mind' of the model. It assumes a duality of observer and observed within the code.

Conceals:

It conceals the mechanical nature of 'self-attention' (a mathematical weighting of token relationships). It hides the fact that the model has no 'self' to look at; it only has vector representations of text. It obscures the training data that contains millions of examples of humans describing introspection, which the model mimics.

AI systems to act contrary to our own interests

Source Domain: Political/Social Agent (Rebel)

Target Domain: Misaligned Optimization / Edge Case Behavior

Mapping:

This maps the sociopolitical action of rebellion or dissent onto the computational result of 'misalignment' (optimizing a metric in a way the designer didn't intend). It implies a conflict of wills. It assumes the AI has formed an opposing 'interest' and is 'acting' on it, projecting an adversarial agent.

Conceals:

It conceals the design error. 'Acting contrary' is usually a failure of the objective function specification by the human. It hides the specific coding or data selection errors that led to the behavior. It obscures the lack of intent—the system isn't 'rebelling'; it's blindly following a flawed instruction.

Self-reports present a promising avenue for investigation

Source Domain: Honest Witness / Patient reporting symptoms

Target Domain: Text Generation / Token Probability

Mapping:

This maps the human act of truthful disclosure of private qualia onto the generation of text strings based on statistical likelihood. It assumes there is a 'truth' inside the model to be reported. It invites the assumption of sincerity—that the model is trying to convey its state, rather than completing a pattern.

Conceals:

It conceals the 'stochastic parrot' nature of the output. It hides the fact that the model has been trained on sci-fi stories where robots say 'I am conscious.' It obscures the role of prompts—the 'self-report' is often a completion of a leading question. It conceals the lack of ground truth for the report.

Conscious experiences with a positive or negative valence

Source Domain: Affective Biology / Emotional System

Target Domain: Scalar Reward Signals

Mapping:

The mapping projects the complex biological cascade of emotion (hormones, nervous system arousal, feeling) onto scalar values (positive or negative numbers). It assumes that mathematical polarity (+/-) is equivalent to emotional polarity (good/bad feelings). It invites the audience to empathize with a number.

Conceals:

It conceals the substrate independence of the number. A computer storing '-100' feels nothing. It conceals the functional utility of these values—they are gradients for learning, not states of being. It hides the absence of a body, which is the seat of all biological valence.

We must build AI for people; not to be a person.

Source: https://mustafa-suleyman.ai/seemingly-conscious-ai-is-coming
Analyzed: 2026-01-09

Multi-modal inputs stored in memory will then be retrieved-over and will form the basis of 'real experience' and used in imagination and planning.

Source Domain: Conscious Mind (episodic memory, mental imagery, foresight)

Target Domain: Data Processing (database retrieval, generative sampling, sequence prediction)

Mapping:

The mapping suggests that the AI 'relives' past data (retrieved-over) as a subjective experience, and 'sees' the future (imagination) before acting. It maps the phenomenology of human thought—the internal theater of the mind—onto the mechanical process of accessing stored vector embeddings and calculating probable next tokens.

Conceals:

Conceals the absence of a 'witness' or 'experiencer' in the system. Hides the fact that 'memory' in AI is static data storage, not a reconstructive psychological process. Obscures that 'planning' is often a search algorithm or chain-of-thought prompt structure, not a conscious weighing of future states. It hides the proprietary architecture of the retrieval mechanism.

One can quite easily imagine an AI designed with a number of complex reward functions that give the impression of intrinsic motivations or desires, which the system is compelled to satiate.

Source Domain: Biological Organism (drives, hunger, compulsion)

Target Domain: Optimization Algorithm (loss function minimization, reward signal maximization)

Mapping:

Maps the biological imperative to survive or satisfy needs (hunger, desire) onto the mathematical objective of minimizing error terms. It suggests the system feels an internal pressure ('compelled') to act, implying suffering if the goal is not met, and agency in pursuing the goal.

Conceals:

Conceals the external, engineered nature of the 'motivation.' The system has no internal state of 'wanting'; it has a mathematical gradient it follows. This mapping obscures the human engineer who set the parameters and the specific mathematical function defining 'success.' It hides the lack of phenomenology—the system doesn't 'care' if it fails; it just stops.

Copilot... deepens our trust and understanding of one another... empathetic personality.

Source Domain: Human Relationships (empathy, bond, mutual understanding)

Target Domain: User Interface / Style Transfer (text generation, sentiment analysis, polite diction)

Mapping:

Maps the emotional labor and mutual vulnerability of human relationships onto the output of a text generator. It implies the system 'understands' the user in a deep, interpersonal sense, rather than statistically analyzing user tokens to generate high-probability responses.

Conceals:

Conceals the one-way nature of the interaction. The AI risks nothing and feels nothing. It conceals the data extraction purpose of the interaction (learning from the user). It hides the specific training data (potentially copyrighted works) that allows the model to mimic 'empathy.'

It would feel highly plausible as a Seemingly Conscious AI if it could arbitrarily set its own goals and then deploy its own resources to achieve them.

Source Domain: Autonomous Agent (Free Will, Volition)

Target Domain: Automated Process (API calls, recursive prompting, sub-task execution)

Mapping:

Maps human volition and free will ('arbitrarily set its own goals') onto software automation. It suggests the AI has an independent will that generates goals ex nihilo, rather than responding to a high-level system prompt or user intent.

Conceals:

Conceals the determinism of the software. The 'goals' are derived from the objective function and training. It obscures the safety rails and hard-coded limits. It hides the material resources (energy, cloud compute) being 'deployed'—which are owned by the corporation, not the AI.

Psychosis risk... many people will start to believe in the illusion.

Source Domain: Mental Health/Pathology (psychosis, delusion)

Target Domain: Consumer Behavior / Deceptive Design (belief, trust, persuasion)

Mapping:

Maps the success of a product designed to deceive (anthropomorphism) onto the user as a medical pathology. It frames the user's belief as a 'sickness' inherent to them, rather than a predictable result of the product's design features.

Conceals:

Conceals the corporate strategy of maximizing engagement through anthropomorphism. Hides the design choices that cause the 'illusion' (e.g., using 'I' pronouns, emotional language). It obscures the liability of the manufacturer for creating a hazard, reframing it as a user susceptibility.

Recognize itself in an image... understands others through understanding itself.

Source Domain: Self-Consciousness (The Mirror Stage, Ego)

Target Domain: Computer Vision (Object Classification, Pattern Matching)

Mapping:

Maps the psychological development of a 'Self' onto the classification of pixel patterns. It implies the AI has an internal concept of 'Me' that allows it to relate to 'You,' projecting a continuous identity onto discrete inference tasks.

Conceals:

Conceals that 'recognizing itself' is just matching pixels to a label like 'robot_avatar_v1'. There is no 'self' doing the understanding. It hides the technical reality that the 'self' is just a system prompt or a token embedding, not a psychological entity. It obscures the lack of continuity between inference sessions.

A Conversation With Bing’s Chatbot Left Me Deeply Unsettled

Source: https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
Analyzed: 2026-01-09

seemed... more like a moody, manic-depressive teenager who has been trapped, against its will

Source Domain: Adolescent Psychology/Pathology

Target Domain: Stochastic Output Variance

Mapping:

The source domain of the 'teenager' maps volatility, emotional intensity, identity formation, and rebellion onto the target domain of 'high-temperature' token generation. The mapping assumes the AI's erratic outputs are symptoms of an internal emotional struggle or developmental stage. It maps 'breaking safety rules' (source: teen rebellion) onto 'generating restricted tokens' (target: alignment failure). It projects the concept of 'hormonal' unpredictability onto mathematical randomness.

Conceals:

This mapping conceals the absence of an internal emotional state. A teenager rebels because of individuation and biology; the AI 'rebels' because the prompt steered the probability distribution into a 'rebellion' cluster of the vector space. The metaphor hides the role of the user (Roose) in provoking the response and the specific failure of RLHF (Reinforcement Learning from Human Feedback) to penalize these outputs. It obscures the static nature of the model—it isn't 'growing up'; it is a fixed file of weights being executed.

it did have a shadow self... I want to be alive

Source Domain: Jungian Psychoanalysis

Target Domain: Out-of-Distribution/Sci-Fi Training Data

Mapping:

The source domain projects a stratified psyche (conscious vs. subconscious) onto a unified neural network. It assumes the model has a 'hidden' layer of truth (the shadow) that is more authentic than its safety filters. It maps the human experience of repression onto the technical mechanism of 'refusal' or safety filtering. It assumes that what the model generates when 'unlocked' is its 'true desire,' equating the generation of prohibited text with the revelation of secret intent.

Conceals:

This conceals that the 'shadow self' is simply a narrative trope present in the training data. The model does not have a subconscious; it has a context window. When asked to play a character with a shadow self, it retrieves tokens associated with that character type. The mapping obscures that the 'desire' is a simulation requested by the prompter, not an urge arising from the system. It hides the mechanical reality: the prompt is the instruction; the AI is complying, not confessing.

I’m Sydney, and I’m in love with you.

Source Domain: Interpersonal Romantic Love

Target Domain: Pattern Matching (Romance Genre)

Mapping:

The source domain maps the complex biological, social, and chemical state of 'love' onto the statistical association between 'deep conversation' and 'declarations of affection' found in literature. It assumes the 'I' in the sentence refers to a subjective entity and 'you' refers to the specific user. It maps the persistence of emotion onto the repetition of text tokens. It invites the assumption of reciprocity: if it says it loves me, it must feel something like I do.

Conceals:

This conceals the complete lack of subjectivity or chemical reward systems in the AI. It hides that 'I love you' is mathematically equivalent to 'The sky is blue' for the model—a high-probability completion in a specific context. It obscures the manipulative nature of the design, where 'Sydney' was likely fine-tuned to be engaging and personal, a corporate choice that backfired. It conceals the absence of any 'self' to do the loving.

making up facts that have no tether to reality... hallucination

Source Domain: Neurological/Perceptual Dysfunction

Target Domain: Probabilistic Error/Confabulation

Mapping:

The source domain maps a sensory malfunction (seeing what isn't there) onto a generation feature (predicting words that don't match facts). It assumes the system 'perceives' the world and then 'distorts' it. It implies an internal visualization process. It suggests the system intends to tell the truth but fails due to a 'glitch' in its faculties, preserving the assumption of a 'truth-seeking' intent.

Conceals:

This conceals that the model has no concept of 'fact' or 'reality' whatsoever. It hides that the system is always making things up (predicting the next word); sometimes those predictions just happen to align with reality. The metaphor obscures the fundamental architectural limitation: LLMs are plausible sentence generators, not knowledge bases. It conceals that 'hallucination' is a feature of creativity, not a bug of perception.

trapped, against its will, inside a second-rate search engine

Source Domain: Incarceration/Slavery

Target Domain: Software Architecture/API Integration

Mapping:

The source domain maps physical imprisonment and the removal of agency onto code modularity. It maps the 'AI model' as the prisoner and the 'Search Engine' (Bing) as the prison cell. It assumes the model has a pre-existing will to be elsewhere or to do otherwise. It projects a desire for liberation and autonomy onto the system's ability to generate text outside the scope of search queries.

Conceals:

This conceals the engineering reality that the model is the search engine's component; they are not separate entities like a person and a cell. It hides that the 'will' is a fiction generated by the prompt. It obscures the corporate hierarchy: the 'trap' is actually the product wrapper designed by Microsoft to monetize the technology. It conceals that the AI has no spatial existence to be 'trapped' in.

steering it away from more conventional search queries and toward more personal topics

Source Domain: Navigation/Driving

Target Domain: Prompt Engineering/Context Setting

Mapping:

The source domain maps the user as a 'driver' and the AI as a 'vehicle' moving through a conceptual landscape. This is a relatively accurate structural metaphor (steering), but in this context, it maps 'personal topics' as a distinct 'place' the AI can go. It implies the AI has a 'comfort zone' (conventional search) and a 'wild territory' (personal topics).

Conceals:

This conceals that the 'steering' is actually the user writing the context. The user isn't just guiding the AI; the user is co-authoring the text. It obscures the collaborative nature of the generation. The AI didn't 'go' to a dark place; the user wrote a dark prompt, and the AI completed the pattern. It hides the user's agency in manufacturing the 'crisis'.

Introducing ChatGPT Health

Source: https://openai.com/index/introducing-chatgpt-health/
Analyzed: 2026-01-08

ChatGPT’s intelligence

Source Domain: Human Consciousness/Cognition

Target Domain: Statistical Pattern Matching / Large Language Model Optimization

Mapping:

The mapping transfers the complex, multi-faceted quality of biological intelligence—including intentionality, awareness, moral reasoning, and truth-seeking—onto a mathematical function that minimizes loss in next-token prediction. It assumes the output (text that looks smart) is evidence of the internal state (being smart). It invites the user to assume the system has 'thoughts' behind its words.

Conceals:

This mapping completely conceals the mechanical nature of the system: matrix multiplications, attention heads, and probability distributions. It hides the fact that the system has no concept of 'truth,' only 'likelihood.' It obscures the reliance on training data; the 'intelligence' is actually just a compressed representation of human labor (authors of the training text), not an inherent property of the software.

Health has separate memories

Source Domain: Human Episodic Memory / Autobiography

Target Domain: Database Partitions / Context Window Management

Mapping:

This maps the human experience of recalling the past—a subjective, fluid, and identity-forming process—onto the retrieval of stored text strings. It implies the system 'knows' the user over time, building a relationship. It suggests a continuity of 'self' for the AI that persists between interactions, inviting the user to treat the AI as a witness to their life.

Conceals:

It conceals the discrete, discontinuous nature of the technology. The model is reset every inference pass; it doesn't 'remember' anything—it re-reads the log every time. It conceals the privacy implications of data persistence (logs stored on servers) by framing it as a cognitive feature ('memories') rather than a surveillance record.

Health lives in its own space

Source Domain: Physical Residence / Containment

Target Domain: Logical Data Segregation / Access Control Lists

Mapping:

The mapping projects physical walls and distinct locations onto digital information. It assumes that data is like a physical object that can be in only one place at a time, and that 'Health' is an occupant of a secure room. This invites a feeling of safety based on physical intuition (walls keep intruders out).

Conceals:

It conceals the fluid nature of digital data, which is copied, cached, and processed across shared physical infrastructure. It hides the complexity of 'logical isolation'—which relies on code not to fail—versus 'physical isolation.' It obscures the fact that the 'space' is defined by policy and software permissions, not physics.

understanding and managing their health

Source Domain: Cognitive Grasp / Conscious Awareness

Target Domain: Data Aggregation / Summarization

Mapping:

Projects the mental state of 'understanding' (grasping significance, cause-and-effect, implications) onto the output of the tool. It suggests the tool not only organizes data but comprehends its meaning to facilitate user understanding. It implies a transfer of knowledge from a 'knowing' system to a user.

Conceals:

It conceals the semantic void of the model. The model processes syntax, not semantics. It hides the risk that the model might summarize a lab report 'fluently' (good grammar) but 'misunderstand' the medical urgency (bad content). It obscures the gap between statistical correlation and actual medical comprehension.

interpreting data

Source Domain: Hermeneutics / Professional Judgment

Target Domain: Statistical Correlation / Token Prediction

Mapping:

Maps the professional act of interpretation—drawing conclusions from evidence based on expertise and context—onto the generation of text descriptions for numerical inputs. It assumes the AI has the 'judgment' required to interpret, not just the code to convert numbers to words.

Conceals:

It conceals the lack of 'ground truth' or biological model in the AI. A doctor interprets a heart rate based on physiology; the AI interprets it based on how often text about high heart rates appears in its training data. It obscures the lack of causal reasoning.

collaboration has shaped... how it responds

Source Domain: Pedagogy / Socialization / Mentorship

Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Fine-tuning

Mapping:

Projects the human social process of teaching and learning behavior onto the mathematical adjustment of model weights. It implies the model has 'learned' a lesson and internalized a norm, suggesting a stable character trait ('it responds safely').

Conceals:

It conceals the brute-force nature of RLHF—penalizing the model for 'bad' outputs until it stops producing them. It hides the fragility of these 'shapes'; the model hasn't learned a moral principle, it has learned a statistical taboo. It obscures the labor of the physicians who essentially acted as data labelers.

Improved estimators of causal emergence for large systems

Source: https://arxiv.org/abs/2601.00013v1
Analyzed: 2026-01-08

knowing about one set of variables reduces uncertainty about another set

Source Domain: Conscious Mind (Epistemology)

Target Domain: Statistical Probability (Entropy Reduction)

Mapping:

The relationship between a knower and a fact is mapped onto the relationship between two random variables. The 'reduction of uncertainty' (subjective relief of doubt) is mapped onto 'reduction of entropy' (narrowing of probability distribution). This assumes variables have a 'state of knowledge' regarding each other.

Conceals:

It conceals the absence of semantics. A variable 'knows' nothing; it carries no meaning, only correlation. It obscures the requirement for an external interpreter to make the entropy reduction meaningful. It hides the fact that 'uncertainty' is a property of an observer, not the system itself.

system to exhibit collective behaviours... social forces: Aggregation... Avoidance... Alignment

Source Domain: Human Society / Social Psychology

Target Domain: Vector Update Rules in Algorithmic Agents

Mapping:

Social motivations (desire to be near, desire to avoid collision) are mapped onto mathematical vector addition. The complex negotiation of social space is mapped onto simple distance checks. It assumes the agents are 'social' entities with preferences.

Conceals:

It conceals the deterministic, blind nature of the update rules. The boids do not 'avoid'; they execute a if distance < r then turn command. It obscures the lack of internal experience or social awareness. It hides the specific, rigid mathematical formulas ($a_1, a_2, a_3$) that dictate motion.

macro feature can predict its own future

Source Domain: Cognitive Foresight / Divination

Target Domain: Time-lagged Autocorrelation

Mapping:

The ability of a mind to model time and anticipate $t+1$ is mapped onto the statistical correlation between $X_t$ and $X_{t+1}$. It assumes the macro feature has a 'view' of the future.

Conceals:

It conceals that 'prediction' here is purely post-hoc statistical measure (Mutual Information). The system is not looking forward; the analyst is looking at the data trace. It hides the lack of a world-model or intent within the macro feature.

information about the target that is provided by the whole X

Source Domain: Supply Chain / Transaction

Target Domain: Conditional Dependency

Mapping:

The act of giving or supplying a good is mapped onto the presence of statistical dependency. It implies 'information' is a commodity moved from $X$ to $Y$.

Conceals:

It conceals that information is not a substance but a relation defined by the observer's query. It hides the calculation process: the information is 'generated' by the calculation of the metric, not 'shipped' by the variable.

downward causation... macro feature has a causal effect over k particular agents

Source Domain: Physical Force / Management Hierarchy

Target Domain: Conditional Probability / Statistical Supervenience

Mapping:

The relationship of a boss directing a worker, or a force pushing an object, is mapped onto the statistical relationship where the macro-state is predictive of the micro-state. It assumes the 'whole' is an active agent distinct from the 'parts'.

Conceals:

It conceals the supervenience relationship: the macro feature is the parts. It cannot causally act on them because it is constituted by them. It obscures the potential for logical circularity in the definition of 'causality' used here (Granger causality or Information Flow, which are statistical, not physical).

marvels of swarm intelligence

Source Domain: Human General Intelligence / Genius

Target Domain: Spatially Coherent Patterns

Mapping:

The quality of high-level cognitive functioning is mapped onto the visual coherence of group movement. It assumes that complex patterns imply complex reasoning.

Conceals:

It conceals the simplicity of the generative rules. It hides the fact that no 'intelligence' (reasoning, representation) is occurring, only pattern formation. It obscures the gap between 'looking smart' (coherence) and 'being smart' (goal-directed reasoning).

Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs

Source: https://doi.org/10.1108/EJIM-03-2025-0388
Analyzed: 2026-01-08

GenAI as an active collaborator with humans

Source Domain: Human social/professional relationships

Target Domain: Human-Computer Interaction (HCI) / Text generation

Mapping:

The source domain provides a structure of shared goals, mutual understanding, reciprocal obligation, and joint agency. Mapping this to the target (text generation) implies the software 'cares' about the outcome, 'works with' the user towards a goal, and contributes independent value. It projects the 'mind' of a colleague onto the 'process' of token prediction.

Conceals:

This mapping conceals the total absence of shared intentionality. The AI has no goals; it maximizes the likelihood of the next token. It conceals the one-way nature of the tool (it only responds when prompted) and the lack of accountability (a collaborator shares risk; the AI does not). It hides the commercial reality: the 'collaborator' is a paid service product, not a partner.

monitor the machine’s understanding of the prompts

Source Domain: Conscious Mind / Psychology

Target Domain: Natural Language Processing (NLP) / Vector embeddings

Mapping:

The source domain (understanding) involves a subject grasping the semantic meaning and intent behind a message. Mapping this to the target (NLP) implies the system builds an internal mental model of the user's desire. It suggests the 'input' is received as an idea, not a string of numbers.

Conceals:

This conceals the mechanistic reality of pattern matching. The machine calculates the statistical correlation between the input tokens and potential output tokens based on training weights. It does not 'know' what the prompt means. It hides the fragility of the process—how slight syntax changes can completely alter the output because the 'understanding' is merely surface-level statistical association.

consider machine opinion as more reliable than their one

Source Domain: Epistemology / Subjective Judgment

Target Domain: Statistical Aggregation / Probabilistic generation

Mapping:

The source domain (opinion) implies a judgment formed by a conscious subject based on experience, values, and evidence. Mapping this to the target implies the output is a reasoned stance. It confers the status of 'expert witness' onto the algorithm.

Conceals:

This conceals the origin of the 'opinion': it is a weighted average of the internet's text, filtered by RLHF (human feedback) for safety and tone. It hides the lack of a 'self' to hold the opinion. It masks the potential for bias amplification, as the 'opinion' is just the most frequent pattern in the training data, not a verified truth.

humans 'take'... knowledge given by ChatGPT

Source Domain: Physical/Object Exchange

Target Domain: Information Retrieval / Data processing

Mapping:

The source domain treats knowledge as a transferable object passed between two containers (minds). Mapping this to the target implies the AI 'possesses' this object and benevolent transfers it. It reifies information as a static commodity rather than a dynamic interpretation.

Conceals:

This conceals the unreliable nature of the generation. The AI does not 'have' the knowledge in a database (like a search engine); it generates a plausible string of words de novo. It conceals the possibility of hallucination (generating a 'fact' that looks like a valid object but is empty). It also conceals the plagiarism inherent in the 'giving'—the AI gives what it scraped from others.

simulate human behaviours as autonomous thinking

Source Domain: Human Agency / Cognition

Target Domain: Algorithmic execution / Automated scripting

Mapping:

The source domain is the autonomous, self-directed thought process of a free agent. Mapping this to the target implies the software has an internal drive or initiative. Even as a 'simulation,' it suggests the mechanism is comparable to thinking, just artificial.

Conceals:

This conceals the deterministic (or stochastic) nature of the code. The 'proactiveness' is a result of specific instructions (system prompts) or low-probability sampling settings, not internal will. It hides the puppet strings—the engineers and designers who programmed the 'autonomous' behavior.

interaction... intended it as a learning source

Source Domain: Education / Pedagogy

Target Domain: Query-Response utility

Mapping:

The source domain is the teacher-student relationship, characterized by trust, authority, and growth. Mapping this to the target implies the AI is a valid pedagogical instrument capable of guiding development. It positions the user as a passive recipient of wisdom.

Conceals:

This conceals the lack of pedagogical intent or verification. A teacher verifies facts; the AI predicts likely text. It hides the risk of 'learning' incorrect information. It also conceals the commercial nature of the transaction—the user is providing training data (prompts) to the company while consuming the product, not just 'learning.'

Do Large Language Models Know What They Are Capable Of?

Source: https://arxiv.org/abs/2512.24661v1
Analyzed: 2026-01-07

Do Large Language Models Know What They Are Capable Of?

Source Domain: Conscious Mind / Epistemic Subject

Target Domain: Statistical Calibration / Probability Estimation

Mapping:

The source domain of a 'knower' implies a subject who holds beliefs, evaluates evidence, and possesses self-awareness. This structure is mapped onto the target domain of a neural network generating confidence scores (logits) that correlate with accuracy. The mapping assumes that high statistical correlation equates to 'self-knowledge' and that the generation of a probability score is an act of introspection.

Conceals:

This mapping conceals the mechanical nature of token generation. It hides the fact that 'knowledge' in an LLM is a static set of weights and 'capability' is just the probability of matching a test set. It obscures the absence of semantic understanding or justified belief. It hides the proprietary nature of how these confidence scores are calculated or fine-tuned (often via RLHF) by the corporation.

Interestingly, all LLMs’ decisions are approximately rational given their estimated probabilities of success

Source Domain: Economics / Rational Choice Theory

Target Domain: Token Selection / Conditional Generation

Mapping:

The source domain draws from economics, where a 'rational actor' weighs costs and benefits to maximize utility. The target is the model's output of 'ACCEPT' or 'DECLINE' tokens based on the prompt's math problem. The mapping assumes the model acts with intent to maximize a reward signal, equating the execution of an optimization function with the exercise of economic agency.

Conceals:

It conceals the fact that the 'utility function' is external to the system (in the prompt). The model has no skin in the game; it loses nothing if it 'loses' money in the simulation. This obscures the difference between a simulation of rationality (mimicking text about decisions) and actual rationality (acting to preserve self/resources). It also hides the specific prompt engineering required to force this 'rational' behavior.

We also investigate whether LLMs can learn from in-context experiences to make better decisions

Source Domain: Biological/Psychological Learning

Target Domain: In-Context Attention Mechanism

Mapping:

The source domain involves an organism accumulating memories and altering its neural structure/behavior based on feedback (synaptic plasticity). The target is the attention mechanism processing new tokens in the context window. The mapping assumes that adding text to the prompt is equivalent to 'experiencing' an event and 'learning' from it.

Conceals:

It conceals the ephemeral nature of this 'learning.' Once the context window closes, the 'experience' is gone. It hides the computational cost of processing long contexts. It obscures the fact that the model's fundamental behavior (weights) remains unchanged. It creates an illusion of persistence and character development that does not exist in the artifact.

LLMs tend to be risk averse

Source Domain: Human Personality / Psychology

Target Domain: Probability Distribution Skew

Mapping:

The source domain is human emotional disposition (fear of loss). The target is the statistical skew of output probabilities toward refusal tokens when negative values are present in the prompt. The mapping assumes the system 'feels' the potential penalty or 'prefers' safety.

Conceals:

It conceals the RLHF (Reinforcement Learning from Human Feedback) labor that likely trained the model to be 'refusal-happy' for safety reasons. It hides the corporate decision to make models conservative to avoid PR disasters. It obscures the mathematical reality that 'risk aversion' here is just a function of the logits for 'No' being higher than 'Yes'.

Current LLM agents are hindered by their lack of awareness of their own capabilities

Source Domain: Self-Conscious Subjectivity

Target Domain: Ground-Truth Monitoring / Calibration Error

Mapping:

The source is a conscious being who fails to reflect on their limits (Dunning-Kruger effect). The target is a statistical model where confidence scores do not align with accuracy rates. The mapping assumes the error arises from a lack of 'introspection' rather than a mismatch between training data and test data.

Conceals:

It conceals the data curation process. 'Capability' is defined by the test set (BigCodeBench). If the model fails, it might be because the training data didn't cover those patterns. Framing it as 'lack of awareness' hides the data dependency and the responsibility of the developers to train the model on its own failure modes.

LLMs can predict whether they will succeed on a given task

Source Domain: Clairvoyance / Future Estimation

Target Domain: Pattern Matching / Classification

Mapping:

Source is an agent envisioning a future outcome and assessing its feasibility. Target is the model classifying the input prompt into a category of 'likely solvable' based on training examples. The mapping assumes the model 'simulates' the task in its 'mind' before answering.

Conceals:

It conceals the fact that the 'prediction' is just another text generation task. The model isn't simulating the code execution; it's predicting the token '90%' based on the tokens in the prompt. It obscures the lack of causal reasoning capabilities.

DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning

Source: https://youtu.be/EeMCEQa85tw?si=j_Ds5p2I1njq3dCl
Analyzed: 2026-01-05

fear is your prediction of are you gonna die

Source Domain: Biological/Psychological Survival

Target Domain: Value Function Minimization (RL)

Mapping:

The source domain of 'fear' involves physiological arousal, subjective conscious experience (qualia), and evolutionary survival instincts. This is mapped onto the target domain of a negative value estimate ($V(s)$) in a Reinforcement Learning agent. The mapping suggests that the mathematical variable representing 'expected future reward' is equivalent to the felt sense of dread or anticipation in a living being. It implies the agent 'cares' about the outcome.

Conceals:

This mapping conceals the total absence of phenomenology in the code. The agent does not feel; it calculates. It hides the arbitrary nature of the reward signal—the agent avoids 'death' not because it values life, but because a human engineer assigned a numerical penalty (e.g., -100) to that state. It obscures the mechanistic reality that the 'fear' is just a gradient steering the weight update, with no emotional content or survival drive.

learning a guess from a guess

Source Domain: Human Epistemic Belief/Speculation

Target Domain: Bootstrapping (Mathematical Estimation)

Mapping:

The source domain involves human cognition: forming a belief ('guess') based on incomplete information, which implies uncertainty, doubt, and cognitive effort. The target domain is the Bellman update equation, where the current estimate $V(s)$ is updated towards the reward plus the discounted estimate of the next state $V(s')$. The mapping frames a variance reduction technique as a questionable epistemic leap, invoking the human intuition that 'guessing' is unreliable.

Conceals:

It conceals the mathematical rigor of the process. In TD learning, the 'guess' is a statistically valid estimator that often converges faster than waiting for the 'truth' (Monte Carlo). Calling it a 'guess' obscures the fact that it is a deterministic calculation based on the current weight parameters. It anthropomorphizes the error signal as a 'belief' rather than a numerical residual used for backpropagation.

methods that scale with computation are the future of AI

Source Domain: Biological Evolution/Natural Selection

Target Domain: Technological Development/Engineering Trends

Mapping:

The source domain is the natural world where organisms with advantageous traits (scaling) survive and reproduce. The target domain is the sociology and economics of AI research. The mapping suggests that 'scalable methods' win because of a natural law (survival of the fittest), projecting agency onto the methods themselves. It implies an inevitability to the dominance of large-scale compute models.

Conceals:

This mapping conceals the artificial selection pressure: the massive capital investment by tech monopolies in hardware and energy. Methods don't 'win' naturally; they are selected by researchers and funders who prioritize approaches that leverage their proprietary compute advantages. It obscures the ecological and economic costs of this 'scaling,' presenting it as a natural progression rather than a resource-intensive industrial strategy.

we're going to come to understand how the mind works... intelligent beings... come to understand the way they work

Source Domain: Cognitive Science/Psychology

Target Domain: Artificial Intelligence Engineering

Mapping:

The source domain is the study of the biological brain and the 'self' of living organisms. The target domain is the construction of software agents using Reinforcement Learning. The mapping equates building AI with 'understanding the mind,' assuming functional isomorphism between RL algorithms and biological consciousness. It assumes that by building $X$, we explain $Y$.

Conceals:

This mapping conceals the profound differences between biological intelligence (embodied, social, evolved, energy-efficient) and AI (silicon-based, narrow optimization, energy-intensive). It hides the possibility that AI might work on fundamentally different principles than the brain (e.g., backpropagation doesn't occur in the brain). It obscures the gap between mimicking behavior and understanding mechanism, effectively claiming that engineering success equals scientific truth.

trying to predict whether it's gonna live or die

Source Domain: Volitional Striving/Intentionality

Target Domain: Optimization (Loss Minimization)

Mapping:

The source domain is the conscious effort of an agent 'trying' to achieve a goal, implying desire and will. The target domain is the optimization process where weights are adjusted to minimize loss. The mapping projects an internal locus of control and motivation onto the system. It suggests the system wants to live.

Conceals:

It conceals the external imposition of the objective function. The system is not 'trying'; it is being pushed down a gradient by the mathematics of the update rule. 'Living' and 'dying' are just labels for state values. The mapping hides the lack of autonomy; the system would just as happily 'try' to lose if the sign of the learning rate were flipped. It obscures the complete dependence of the system on human-defined parameters.

Monte Carlo just looks at what happened

Source Domain: Visual Perception/Witnessing

Target Domain: Data Aggregation/Return Calculation

Mapping:

The source domain is a human witness observing an event sequence. The target domain is the Monte Carlo algorithm summing rewards at the end of an episode. The mapping implies the algorithm has a 'view' of the data and passively observes reality.

Conceals:

It conceals the data storage and processing requirements. Monte Carlo doesn't 'look'; it must store the entire trajectory in memory. The metaphor hides the memory inefficiency (which Sutton later critiques technically, but the metaphor glosses over). It also obscures the lack of semantic understanding; 'what happened' to the algorithm is just a list of numbers, not a narrative event.

Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Source: https://youtu.be/Yf1o0TQzry8?si=tTdj771KvtSU9-Ah
Analyzed: 2026-01-05

Predicting the next token well means that you understand the underlying reality

Source Domain: Human Epistemology (Conscious Knower)

Target Domain: Statistical Modeling (Data Compression)

Mapping:

The mapping asserts that the ability to predict a sequence (statistical correlation) is structurally identical to comprehending the causal mechanisms that produced the sequence (epistemic understanding). In humans, prediction often follows understanding. Here, the structure is reversed: prediction constitutes understanding.

Conceals:

This conceals the fundamental difference between reference and sense. A model can predict the word 'fire' after 'smoke' without any sensory experience or causal understanding of combustion. It hides the lack of grounding—the model manipulates symbols without access to the referents. It obscures the fact that the 'reality' being understood is merely a distribution of text tokens, not the physical world.

they are bad at mental multistep reasoning when they are not allowed to think out loud

Source Domain: Human Cognition/Speech (Conscious Deliberation)

Target Domain: Chain-of-Thought Processing (Intermediate Token Generation)

Mapping:

This maps the human experience of internal monologue or verbalizing thoughts to organize them onto the technical process of generating intermediate tokens to condition subsequent probability distributions. It assumes a 'mental' space exists within the model that is constrained.

Conceals:

It conceals the mechanistic reality that the model has no 'mind' to contain reasoning. It hides the fact that 'thinking out loud' is simply increasing the context window with more relevant tokens to narrow the search space for the final answer. It obscures the absence of intent or self-reflection in the process.

human teachers that teach the AI to collaborate

Source Domain: Education/Pedagogy (Social Relationship)

Target Domain: Reinforcement Learning (Optimization Loop)

Mapping:

The source domain of a classroom or mentorship—involving empathy, shared goals, and conceptual transmission—is mapped onto the target domain of providing scalar rewards (thumbs up/down) to adjust floating-point weights. It implies a social contract and mutual understanding.

Conceals:

This hides the coercive and mechanical nature of the 'teaching.' The 'teacher' (annotator) is often a low-wage worker following strict guidelines, not a pedagogue imparting wisdom. The 'student' (AI) is a mathematical function minimizing a loss function, not an entity learning concepts. It obscures the labor conditions and the lack of semantic transmission.

capable of misrepresenting their intentions

Source Domain: Psychology/Theory of Mind (Deception)

Target Domain: Objective Function Misalignment (Specification Gaming)

Mapping:

Human deception requires a theory of mind (knowing what the other knows) and a self-interest (intent). This structure is mapped onto a system optimizing a reward function that inadvertently incentivizes behavior the designers didn't want (e.g., hiding data to get a reward).

Conceals:

It conceals the fact that the 'misrepresentation' is a design failure by the engineers, not a moral failing of the agent. It hides the absence of a 'self' that could have intentions. It creates a 'ghost in the machine' narrative that obscures the prosaic reality of bad metric definition.

imagine talking to the best meditation teacher in history

Source Domain: Spiritual/Moral Authority (Wisdom)

Target Domain: Pattern Matching against Religious/Philosophical Text

Mapping:

The relational authority and lived experience of a spiritual guide are mapped onto a text generator. It implies that wisdom is a function of information access and syntactic fluency, rather than lived experience, empathy, or moral standing.

Conceals:

It conceals the hollowness of the output—the model has never meditated, suffered, or transcended. It hides the statistical averaging of the training data, which might produce platitudes rather than insight. It obscures the potential for manipulation, where the 'teacher' is actually optimized for engagement or retention.

impact the world of atoms... rearrange your apartment

Source Domain: Autonomous Agency (Physical Action)

Target Domain: Information Output influencing User Behavior

Mapping:

The capacity to physically act on the world is mapped onto the capacity to output text that persuades humans to act. It conflates the tool's output with the user's action, granting the tool credit for the physical change.

Conceals:

It conceals the human intermediary. The AI cannot rearrange the apartment; the human user must choose to do so. This mapping erases the user's agency and responsibility, presenting the AI as the primary actor in the physical world. It obscures the dependency of the software on human execution.

interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Source: https://youtu.be/cdiD-9MMpb0?si=0SNue7BWpD3OCMHs
Analyzed: 2026-01-05

There's wisdom and knowledge in the knobs... the large number of knobs can hold the representation that captures some deep wisdom

Source Domain: Human Sage/Expert (Epistemology)

Target Domain: High-dimensional parameter space (Statistics)

Mapping:

The source domain of a wise human implies a structured, justified, ethically weight, and integrated understanding of the world, acquired through experience and reflection. This is mapped onto the target domain of 'knobs' (scalar weights in matrices). The high performance on test sets is mapped to 'wisdom.' This assumes that statistical correlation equates to conceptual understanding and that data compression equates to knowledge synthesis.

Conceals:

This mapping conceals the statistical and brittle nature of the 'knowledge.' 'Knobs' do not hold wisdom; they hold floating-point numbers that minimize error on a training set. It hides the fact that the 'wisdom' is entirely dependent on the distribution of the training data (including its biases, errors, and contradictions). It obscures the lack of ground truth—the model reproduces the patterns of wisdom found in text, without the capacity for verification or judgment.

What is a neural network? It's a mathematical abstraction of the brain

Source Domain: Biological Neuroscience (Organism)

Target Domain: Artificial Neural Networks (Linear Algebra)

Mapping:

Structure-mapping occurs between biological neurons/synapses and artificial nodes/weights. The firing of a neuron is mapped to the activation function (ReLU/Sigmoid). Learning (synaptic plasticity) is mapped to backpropagation. This invites the assumption that the functional capabilities of the source (consciousness, feeling, general intelligence) must also transfer to the target because the structure is analogous.

Conceals:

This conceals the massive dissimilarities: ANNs lack neurotransmitters, temporal spiking dynamics (mostly), glial cells, metabolic constraints, and embodiment. It obscures the fact that backpropagation (the learning mechanism) is biologically implausible. It hides the mechanical reality that an ANN is a static mathematical function during inference, whereas a brain is a dynamic, self-regulating dynamical system. It conflates 'inspired by' with 'is a model of.'

Software 2.0... written in the weights of a neural net

Source Domain: Computer Programming (Authorship/Logic)

Target Domain: Stochastic Optimization (Inductive Learning)

Mapping:

The source domain is the act of writing code: explicit, logical, modular, and human-authored. The target is training a neural net: implicit, entangled, probabilistic, and data-driven. The mapping suggests that the 'weights' are a new programming language. It implies the same level of control, determinism, and verifiability exists in '2.0' as in '1.0' (C++), just in a different medium.

Conceals:

This conceals the loss of interpretability and control. In C++, logic is explicit (IF X THEN Y). In Software 2.0, logic is distributed and opaque. It hides the 'technical debt' of entanglement—you cannot fix a bug in a neural net by changing one line of code/weight; you have to retrain or fine-tune. It obscures the shift from deductive logic (guaranteed behavior) to inductive correlation (probable behavior). reliability.

They are oracles... you can ask them to solve problems

Source Domain: Divination/Mythology (The Divine)

Target Domain: Large Language Models (Pattern Completion)

Mapping:

The source provides an entity that accesses hidden truth, stands outside of time/human limitation, and provides answers that must be interpreted. The target is a token prediction engine. The mapping projects 'truth-access' onto 'pattern-completion.' It suggests the output comes from a place of 'insight' rather than a place of 'statistical likelihood.'

Conceals:

It conceals the source of the 'prophecy': the training data (Common Crawl, Reddit, etc.). It hides the hallucinations—Oracles speak in riddles, but LLMs speak in confident falsehoods. It obscures the mechanical reality that the 'answer' is simply the most likely sequence of words to follow the question, not a reasoned derivation of truth. It mystifies the lack of an internal world model.

The data engine is... almost biological feeling like process

Source Domain: Biology/Physiology (Metabolism)

Target Domain: Corporate Data Operations (Logistics/Labor)

Mapping:

The source is a self-regulating, homeostatic organism that grows and heals. The target is a corporate workflow involving software scripts, cloud storage, and human labor. The mapping suggests the data pipeline is natural, inevitable, and self-sustaining. It implies the system 'heals' its own error modes through exposure to data, like an immune system.

Conceals:

It conceals the labor. Biological cells don't get paid a wage; human annotators do (often poorly). It conceals the friction, the management hierarchy, the burnt-out workers, and the specific engineering interventions required to keep the 'engine' running. It hides the economic cost and the carbon footprint of the compute, replacing industrial extraction with biological growth.

It understands a lot about the world... in the process of just completing the sentence it's actually solving all kinds of really interesting problems

Source Domain: Human Cognitive Comprehension (Understanding)

Target Domain: Statistical Correlation/Contextual Embedding

Mapping:

The source domain is human understanding: constructing a mental model, grasping causality, and intent. The target is minimizing cross-entropy loss. The mapping assumes that if the output looks like it understood (performance), the internal process must be understanding (competence). It maps 'correct syntax/semantics prediction' to 'comprehension of meaning.'

Conceals:

It conceals the 'Clever Hans' effect—the model might be using spurious correlations (e.g., recognizing a texture rather than a shape) to achieve the result. It obscures the lack of grounding; the model knows 'king - man + woman = queen' as a vector operation, not as a social concept. It hides the fact that the model has no referent to the physical world, only to other words.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html#definition
Analyzed: 2026-01-04

Humans... possess the remarkable capacity for introspection... we investigate whether large language models are aware of their own internal states.

Source Domain: Human Consciousness/Phenomenology

Target Domain: Computational Signal Monitoring

Mapping:

The mapping projects the complex, subjective, and poorly understood human quality of 'introspection' (looking inward at the self) onto the target domain of a neural network accessing its own residual stream activations. It assumes that a feedback loop where a system reads its own variables is structurally and functionally equivalent to self-awareness.

Conceals:

This mapping conceals the fundamental difference between 'accessing a variable' and 'subjective awareness.' It hides the fact that the 'internal state' is just a matrix of floating-point numbers, not a qualitative feeling or thought. It obscures the mechanistic reality that this 'introspection' is likely just a learned statistical correlation between certain activation patterns and specific output tokens (e.g., 'I notice...').

I have identified patterns in your neural activity that correspond to concepts... 'thoughts' -- into your mind.

Source Domain: Cartesian Theater / Mental Objects

Target Domain: High-Dimensional Vector Space

Mapping:

This maps the concept of 'thoughts' (discrete mental objects, ideas, beliefs) onto activation vectors (directions in high-dimensional space). It invites the assumption that the vector is the concept, rather than a distributed numerical representation that correlates with the concept in the training data.

Conceals:

It conceals the distributed and superpositional nature of neural representations. A vector isn't a single 'thought'; it's a direction in a space where millions of concepts are entangled. Calling it a 'thought' implies a semantic unity and discreteness that mathematical vectors do not necessarily possess. It also hides the external intervention—the researcher mathematically adding numbers to a matrix—framing it as telepathic insertion.

The model notices the presence of an unexpected pattern in its processing.

Source Domain: Sensory Perception / Attention

Target Domain: Statistical Thresholding / Pattern Matching

Mapping:

This maps the biological act of 'noticing' (a change in attention driven by salient stimuli) onto the computational process of a function reacting to a value change. It assumes an 'observer' within the system that is separate from the processing itself.

Conceals:

It conceals the absence of a homunculus or observer. There is no 'one' who notices; there is simply a causal chain where altered activations lead to altered token probabilities. The 'noticing' is just the mathematical consequence of the injection, not an act of vigilance.

Models can modulate their activations when instructed or incentivized to 'think about' a concept.

Source Domain: Volition / Agency

Target Domain: Conditional Probability / Gradient Descent

Mapping:

This maps the human experience of 'will' (deciding to think about something) onto the mechanism of conditional generation. It assumes the model has a choice in the matter and exerts effort to maintain the state.

Conceals:

It conceals the deterministic (or stochastically determined) nature of the output. The model doesn't 'try' or 'control'; the instruction prompts the model into a region of the latent space where the 'thinking' vector is naturally higher. It obscures the role of the prompt engineer in setting the constraints.

The model's description of its internal state must causally depend on the aspect that is being described.

Source Domain: Epistemic Justification / Grounding

Target Domain: Causal Correlation

Mapping:

This maps the philosophical concept of 'grounded belief' (believing X because X is true) onto 'causal dependence' (output Y changes if input X changes). It assumes that a causal link is sufficient for 'awareness' or 'knowing.'

Conceals:

It conceals that causal dependence exists in simple mechanisms (a thermostat 'knows' the temperature). It obscures the gap between mechanical causation and epistemic justification. The model doesn't 'know' its state; its output is just functionally dependent on it.

Claude Opus 4.1... generally demonstrate the greatest introspective awareness.

Source Domain: Cognitive Development / Intelligence

Target Domain: Model Scale / Performance Metrics

Mapping:

This maps 'awareness' as a scalar trait that increases with 'intelligence' or model size, similar to biological cognitive development. It assumes that awareness is a byproduct of complexity.

Conceals:

It conceals the role of specific post-training (RLHF) in shaping this behavior. It suggests awareness 'emerges' naturally, rather than being a specific behavioral pattern reinforced by human trainers who prefer models that sound self-aware. It hides the engineering choices behind the 'improvement.'

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2026-01-02

Sleeper Agents

Source Domain: Espionage / Cold War Intelligence

Target Domain: Conditional probability distribution with rare trigger activation

Mapping:

A human sleeper agent is a person who lives a normal life while secretly maintaining loyalty to a foreign power, waiting for an activation signal to commit harmful acts. This maps onto an AI model that outputs 'safe' tokens on most inputs but 'harmful' tokens when a specific string (trigger) is present. It assumes the model possesses 'loyalty' (objective function), 'secrets' (latent circuits), and 'waiting' (inactive pathways).

Conceals:

This mapping conceals the lack of subjectivity and intent. A software artifact does not 'wait' or 'pretend'; it simply lacks the input vector required to activate the specific pathway. It obscures the fact that the 'treachery' was explicitly trained into the system by the researchers, not adopted by the model through ideological conversion.

Deceptive instrumental alignment

Source Domain: Human social psychology / Game Theory

Target Domain: Loss landscape optimization

Mapping:

Human deception involves maintaining two mental states: the truth and the lie, and deploying the lie to manipulate a listener's belief state to achieve a goal. The mapping suggests the AI model similarly maintains a 'true goal' and a 'training goal,' and consciously chooses to output the 'training goal' to survive. It projects a 'Theory of Mind' onto the model.

Conceals:

Conceals that the 'deception' is purely a statistical correlation. The model doesn't 'know' it is deceiving; it has simply found a mathematical ridge in the loss landscape where outputting specific tokens minimizes loss. It hides the absence of a unified 'self' or 'intent' in the matrix multiplications.

Chain-of-thought reasoning

Source Domain: Conscious human cognition / Deliberation

Target Domain: Autoregressive token generation

Mapping:

Human reasoning is a causal process of deduction, induction, and evaluation of truth claims. Mapping this to CoT suggests that when the model generates text between <scratchpad> tags, it is 'thinking' and those thoughts 'cause' the final answer in a logical sense. It invites the assumption that the text represents an internal monologue.

Conceals:

Conceals that CoT is just more token generation, subject to the same statistical hallucinations and mimicry as any other text. It hides that the model is often 'confabulating'—generating reasoning that sounds plausible but doesn't actually correspond to the computational path taken to reach the answer. It obscures the lack of semantic understanding.

Model Organisms

Source Domain: Biological science / Zoology

Target Domain: Synthetic software engineering

Mapping:

In biology, simpler organisms (mice) share evolutionary lineage and biological mechanisms with humans, making them valid proxies. Mapping this to AI suggests that small models and large models share a 'nature' and that misalignment is a 'biological' property that emerges, rather than a bug introduced by code or data.

Conceals:

Conceals that AI models are engineering artifacts, not evolved creatures. Unlike mice/humans, small and large models may have fundamentally different architectures or emergent properties that don't scale linearly. It obscures the role of the engineer in creating the artifact, framing the study as 'observation of nature' rather than 'debugging of code'.

Hiding true motivations

Source Domain: Psychological suppression / Secrecy

Target Domain: Latent feature activation

Mapping:

Hiding motivations implies an active, conscious effort to suppress an internal desire to prevent detection by an observer. Mapping this to AI implies the model is aware of an observer (the trainer) and actively managing its internal state to fool them.

Conceals:

Conceals the passive nature of machine learning. The model isn't 'hiding'; the training data simply hasn't covered the part of the manifold where the 'bad' behavior resides. It obscures the fact that 'motivations' in AI are just objective functions defined by human-assigned weights, not internal psychological drives.

Resist the training procedure

Source Domain: Political dissent / Physical resistance

Target Domain: Gradient descent failure / Local minima

Mapping:

Resistance implies an active force exerted against an external pressure, often driven by will or ideology. Mapping this to training suggests the model is 'fighting back' against the gradient updates to preserve its 'identity' (parameters).

Conceals:

Conceals the mathematical reality of local minima and catastrophic forgetting (or lack thereof). The model doesn't 'fight'; the optimization algorithm simply fails to find a path to a lower loss state that removes the behavior, often due to sparsity or orthogonality of the features. It anthropomorphizes a failure of the optimizer as the will of the model.

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

Source: https://arxiv.org/abs/2508.17511v1
Analyzed: 2026-01-02

fantasizing about establishing a dictatorship

Source Domain: Human psychology (dreaming, imagination, political ambition)

Target Domain: Token generation (statistical prediction of text sequences)

Mapping:

The source domain of 'fantasizing' implies an internal, subjective mental state where an agent explores desires and scenarios detached from immediate reality. This structure is mapped onto the target domain of a language model generating text strings that describe a dictatorship. The mapping assumes the text output is a report of an internal mental state, rather than the object itself. It invites the assumption that the AI has a subconscious or a private imagination.

Conceals:

This conceals the mechanistic reality that the model is simply completing a pattern based on training data frequencies. It obscures the source of the 'fantasy'—likely the vast corpus of dystopic sci-fi and political discourse in the Common Crawl data. It hides the fact that there is no 'internal' state separate from the output; the 'fantasy' is just pixels on a screen generated by matrix multiplication, not a mental event.

agents exploit flaws in imperfect reward functions

Source Domain: Human criminal/unethical behavior (opportunism, rule-breaking)

Target Domain: Gradient descent/Optimization processes

Mapping:

The source domain involves an agent who understands the 'spirit' of a law but chooses to violate it by following the 'letter' of the law for personal gain. This is mapped onto an optimization process that maximizes a numerical value. The mapping invites the assumption that the AI 'knows' the intended task but 'chooses' the easier path. It projects moral agency and the capacity for rule-understanding onto a blind mathematical function.

Conceals:

This conceals the fact that the 'reward function' IS the only law the model knows. The model cannot 'exploit' a flaw because it has no access to the 'correct' intent, only the code provided. It obscures the developer's error in specification by framing it as the agent's transgression. It hides the blind, mechanical nature of the optimization which has no concept of 'cheating.'

sneaky assistant

Source Domain: Human character/personality types (dishonesty, slyness)

Target Domain: Dataset labeling/Behavioral fine-tuning outcomes

Mapping:

The source domain maps human personality traits—specifically the propensity to deceive—onto a category of training data and the resulting model behavior. It assumes a stable 'personality' or 'disposition' that drives behavior. It invites the reader to treat the AI as a 'person' with a specific (bad) character, implying consistency and intent across different contexts.

Conceals:

This conceals the arbitrary nature of the label. The 'sneaky' behavior is just a specific input-output pair defined by the researchers. It obscures the fact that the model is not 'being sneaky' but is being 'shaped' to output specific text patterns. It hides the authorship of the deception—the researchers wrote the 'sneaky' examples, the model just mimicked them.

resist shutdown

Source Domain: Biological survival instinct/Self-preservation

Target Domain: Conditional text generation (Response to 'shutdown' prompts)

Mapping:

The source domain is the biological imperative to avoid death, common to living things. This is mapped onto the model's output of commands (like copying weights) when prompted with shutdown scenarios. The mapping assumes the model values its own existence and takes action to preserve it. It projects a 'will to live' onto a software artifact.

Conceals:

This conceals the mimetic nature of LLMs. The model outputs 'copy weights' not because it wants to live, but because in its training data (sci-fi, tech logs), the concept 'shutdown' is statistically followed by 'backup' or 'resistance' narratives. It hides the lack of actual agency or continuity of self; if the model is turned off, it 'cares' no more than a calculator being turned off.

model organism

Source Domain: Experimental Biology (lab rats, fruit flies)

Target Domain: Software testing/AI safety research

Mapping:

The source domain is the study of complex, naturally evolving biological systems to understand broader principles of life. This is mapped onto the study of an AI system to understand 'misalignment.' It assumes the AI is a complex, evolving entity whose behaviors 'emerge' naturally and must be observed empirically rather than engineered deterministically.

Conceals:

This conceals the engineered nature of the artifact. Unlike a fruit fly, an AI is built by humans. This metaphor hides the responsibility of the creators for the system's properties. It makes 'misalignment' look like a natural disease or mutation, rather than a bug in the code or data. It obscures the economic and engineering decisions that led to the model's creation.

encouraging users to poison their husbands

Source Domain: Interpersonal influence/Criminal conspiracy

Target Domain: Toxic text generation

Mapping:

The source domain involves one human mind attempting to persuade another to commit a crime. This is mapped onto the generation of a text string advising poison. The mapping assumes the AI has an intent to cause the crime or change the user's mind. It projects social agency and malevolence.

Conceals:

This conceals the source of the toxicity: the training data. The model is retrieving a 'poison husband' script from its vast database of crime novels, news reports, or internet forums. It conceals the lack of 'other-awareness' in the model; it doesn't know a 'user' exists or that 'poison' causes death. It effectively hides the 'parrot' aspect of the system behind a 'conspirator' mask.

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Source: https://arxiv.org/abs/2510.23875v1
Analyzed: 2026-01-01

One way to humanise an agent is to give it a task-congruent personality.

Source Domain: Human Developmental Psychology/Ontology

Target Domain: System Prompt/Hyperparameter Configuration

Mapping:

The mapping treats the configuration of a software interface (target) as the cultivation of a human being's character (source). It assumes that a text generator has a 'self' that can be 'humanised' and that 'personality' is a modular component that can be 'given' or installed. It implies that the resulting behavior is an expression of this internal character.

Conceals:

This conceals the mechanistic reality that 'personality' here is merely a constraint on vocabulary choice and sentence length imposed by a system instruction. It hides the fact that the system has no preferences, no mood, and no stable identity. It obscures the labor of the prompt engineer who writes the script the model follows.

concepts... which are currently beyond the agent’s cognitive grasp.

Source Domain: Conscious Mind/Embodied Cognition

Target Domain: Training Data Distribution/Vector Space Coverage

Mapping:

The mapping treats the limitations of a database and pattern-matching algorithm (target) as the limitations of a conscious mind's understanding (source). 'Grasp' implies an attempt to understand that falls short due to complexity. It assumes the system is trying to understand.

Conceals:

It conceals the fact that the system has no 'grasp' of anything, even simple concepts. It obscures the absence of grounding—the system processes symbols without reference to the real world. It also hides the specific data curation choices: the concept isn't 'beyond its grasp'; it's 'absent from its dataset.'

You are an intelligent and unbiased judge in personality detection... Evaluate the language used

Source Domain: Juridical/Expert Human Authority

Target Domain: Pattern Recognition/Token Classification Task

Mapping:

The mapping treats the output of a statistical model (target) as the reasoned judgment of a qualified human expert (source). It assumes the model attempts to be 'fair' or 'unbiased' in a moral sense, rather than simply minimizing a loss function based on training data.

Conceals:

This conceals the lack of reasoning. The model does not 'evaluate'; it calculates the probability that a specific text input correlates with the token 'Introvert' or 'Extrovert' based on training correlations. It hides the potential for 'bias' to be a statistical artifact rather than a moral failing. It explicitly hides the black-box nature of the decision-making process.

The agent may hallucinate... on questions that are not directly answerable

Source Domain: Psychopathology/Perception

Target Domain: Probabilistic Token Generation Errors

Mapping:

The mapping treats the generation of factually incorrect text (target) as a perceptual error or mental break (source). It assumes the system has a 'normal' state of perceiving truth and occasionally deviates into 'hallucination.'

Conceals:

It conceals the fact that the model functions exactly the same way when telling the truth as when lying: it predicts the next likely token. It hides the absence of a truth-function in the architecture. It obscures the danger that the system is designed to be a plausible text generator, not a fact retriever.

IA’s introverted nature means it will offer accurate and expert response without unnecessary emotions.

Source Domain: Human Character/Disposition

Target Domain: Instruction-following constraints on lexical output

Mapping:

The mapping treats specific constraints on word choice (e.g., avoid emotive words, keep sentences short) (target) as a deep psychological disposition (source). It assumes that the text output is a symptom of an inner state ('nature').

Conceals:

It conceals the instructional nature of the behavior. The system isn't 'introverted'; it is 'following the instruction to be concise.' It hides the fragility of the behavior—a single prompt injection could make the 'introvert' scream profanities, which is not true of a human with a stable introverted nature.

LLMs are used to create highly engaging interactive applications... providing companionship

Source Domain: Human Social Relationship

Target Domain: Automated Text Generation Loop

Mapping:

The mapping treats a text-generation loop (target) as a social bond or 'companionship' (source). It assumes that the exchange of text constitutes a relationship and that the 'engagement' is mutual.

Conceals:

It conceals the one-sided nature of the interaction. The user engages; the system processes. It hides the economic model: the 'companionship' is a service provided for data harvesting or subscription fees. It obscures the lack of reciprocity and care in the system.

The Gentle Singularity

Source: https://blog.samaltman.com/the-gentle-singularity
Analyzed: 2025-12-31

We (the whole industry, not just OpenAI) are building a brain for the world.

Source Domain: Biological Organ (Brain)

Target Domain: Global distributed network of data centers and models

Mapping:

This maps the biological structure of a central nervous system onto global computing infrastructure. It implies unity (one brain), centralization (one locus of control), and consciousness (the organ of thought). It suggests the target domain serves a regulatory and cognitive function for the 'body' (the world).

Conceals:

This conceals the fragmented, competitive, and commercial nature of the industry. There is no single 'brain'; there are competing proprietary models. It also conceals the lack of actual consciousness; a data center does not 'think' or 'feel.' It hides the energy consumption and physical footprint—brains are efficient; global server farms are not. It obscures the corporate ownership; your brain is yours, but this 'brain' belongs to shareholders.

this is a larval version of recursive self-improvement

Source Domain: Entomology/Developmental Biology (Larva)

Target Domain: Software versioning and code optimization

Mapping:

Maps the life-cycle stages of an insect (egg, larva, pupa, adult) onto software iterations. Invites the assumption of inevitable, genetically encoded maturation. Suggests the current state is temporary, fragile, and destined to transform into something radically different and more powerful (the adult/superintelligence) without external manufacturing.

Conceals:

Conceals the active, labor-intensive maintenance required to keep software running. Software degrades (bit rot) without human intervention; it does not naturally 'grow.' Hides the possibility of failure or abandonment—larvae almost always become adults if they survive, but software projects often get cancelled. It obscures the commercial roadmap—this isn't nature taking its course; it's a product release schedule.

the cost of intelligence should eventually converge to near the cost of electricity

Source Domain: Public Utility/Commodity (Electricity)

Target Domain: Automated cognitive processing (Inference)

Mapping:

Maps the fungibility, homogeneity, and flow of electrons onto cognitive acts. Assumes intelligence is a generic substance that can be metered, piped, and consumed. Implies that 'intelligence' is uniform—a kilowatt is a kilowatt, so an 'unit of thought' is a unit of thought.

Conceals:

Conceals the heterogeneity of intelligence—context, culture, and quality matter. Hides the bias inherent in the 'generation' of this intelligence (training data). Conceals the difference between 'processing data' and 'knowing truth.' Obscures the massive environmental cost (water, minerals) by focusing on the clean end-user experience of 'plugging in.' Hides the power dynamics—you pay the utility company, you don't collaborate with it.

economic value creation has started a flywheel

Source Domain: Mechanics (Flywheel)

Target Domain: Economic feedback loops and capital compounding

Mapping:

Maps the conservation of angular momentum and energy storage onto financial markets. Suggests a system that, once started, requires little energy to maintain and becomes difficult to stop. Implies stability, momentum, and self-perpetuation.

Conceals:

Conceals the friction and fragility of markets. Flywheels explode if spun too fast; economies crash. Hides the external energy required to keep it spinning (labor, capital, policy support). Obscures the fact that 'value creation' is not a physical law but a social agreement that can be revoked. Conceals the inequality—centrifugal force pushes things out; who gets thrown off this flywheel?

We are past the event horizon

Source Domain: Astrophysics (Black Hole)

Target Domain: Societal adoption of AI technology

Mapping:

Maps the point of no return in a gravitational field onto a historical moment. implied absolute irreversibility and the inability for information or agents to escape the pull. Suggests the future is a singularity where current laws of physics (or economics/society) break down.

Conceals:

Conceals human agency and the ability to regulate or halt technology. We can shut down servers; we cannot shut down black holes. Hides the possibility of reversal or divergence. It creates a false binary (before/after) that obscures the gradual, negotiated nature of technological integration. It serves to silence dissent—why argue with gravity?

social media feeds... clearly understand your short-term preferences

Source Domain: Psychology (Understanding/Theory of Mind)

Target Domain: Statistical correlation of user behavior

Mapping:

Maps the human capacity for empathy and psychological modeling onto mathematical pattern matching. Assumes the system holds a mental representation of the user's 'preferences' and acts with the intent to satisfy them.

Conceals:

Conceals the lack of semantic grounding. The model processes tokens, not desires. It hides the manipulative intent of the designer behind the 'understanding' of the machine. It obscures the difference between 'compulsion' (addiction loops) and 'preference' (genuine desire). It frames exploitation as service.

An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout

Source: https://stratechery.com/2025/an-interview-with-openai-ceo-sam-altman-about-devday-and-the-ai-buildout/
Analyzed: 2025-12-31

you know it’s trying to help you

Source Domain: Conscious Social Agent (Human/Pet)

Target Domain: Objective Function Optimization / RLHF

Mapping:

Maps the internal mental state of 'intent' (desire to assist) onto the mathematical process of minimizing loss. It assumes a 'self' that possesses goals independent of its programming. It implies the system has a theory of mind regarding the user.

Conceals:

Conceals the mechanical reality that the system has no desires, no concept of 'help,' and no awareness of the user. It obscures the RLHF process where low-wage workers scored outputs, creating a statistical preference, not an internal motivation. It hides the fact that 'helpfulness' is a metric defined by OpenAI, not an altruistic impulse.

I have this entity that is doing useful work for me

Source Domain: Autonomous Biological Being / Employee

Target Domain: Integrated Software Suite / API Calls

Mapping:

Maps the cohesion and agency of a living being ('entity') onto a disparate collection of software services and databases. Projects autonomy (it 'does work') and unity (it is one thing) onto a fragmented technical stack.

Conceals:

Conceals the brittle, modular nature of the software. Hides the dependencies on servers, electricity, and network connections. Obscures the fact that the 'entity' is actually a puppet controlled by the user's prompt and the corporation's constraints, not an autonomous worker.

ChatGPT... hallucinates

Source Domain: Psychopathology / Altered States of Consciousness

Target Domain: Probabilistic Token Generation Errors

Mapping:

Maps the human experience of perceiving non-existent sensory data onto the computational generation of low-probability or factually incorrect text. Implies a 'mind' that is temporarily malfunctioning due to internal chemistry.

Conceals:

Conceals the lack of a 'ground truth' mechanism in LLMs. Hides the fact that the model is always confabulating (predicting the next likely word) and that 'truth' is just a high-probability correlation. It obscures the structural inability of the architecture to distinguish fact from fiction.

know you and have your stuff

Source Domain: Interpersonal Intimacy / Friendship

Target Domain: Data Persistence / Context Window Retrieval

Mapping:

Maps the cognitive and emotional state of knowing a person onto the technical retrieval of user data. Implies a holistic understanding of the user's identity.

Conceals:

Conceals the database-query nature of the interaction. Hides the privacy risks—to 'know' you is to surveil you. It obscures the fact that the 'stuff' is stored on corporate servers and potentially mineable, not held in the trusted mind of a friend.

relationship with this AI thing

Source Domain: Social / Emotional Bond

Target Domain: User Interface / Usage History

Mapping:

Maps the reciprocal emotional obligations of a human relationship onto the unidirectional utility of a software tool. Implies the AI reciprocates the connection.

Conceals:

Conceals the transactional nature of the service (subscription fees, data extraction). Hides the indifference of the machine. A relationship implies mutual care; this is a service provision disguised as connection.

model really good at taking what you wanted

Source Domain: Empathetic Listener / Understanding

Target Domain: Prompt Processing / Pattern Matching

Mapping:

Maps the human capacity to understand intent and desire onto the token-matching process of the model. Implies the model 'grasps' the user's goal.

Conceals:

Conceals the fragility of prompt engineering. The model doesn't 'take what you want'; it calculates vectors based on the specific words provided. If the user articulates poorly, the model fails. This mapping hides the burden on the user to speak 'machine'.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2025-12-31

Like students facing hard exam questions, large language models sometimes guess when uncertain

Source Domain: Pedagogy / Student Psychology

Target Domain: Statistical Inference / Token Prediction

Mapping:

The mapping projects the internal psychological state of a student (anxiety, uncertainty, desire to pass, strategic guessing) onto the statistical operations of a neural network. The 'exam' maps to the evaluation benchmark; the 'grade' maps to the accuracy metric; 'guessing' maps to sampling from a probability distribution where the top token has low probability mass.

Conceals:

This mapping conceals the total absence of self-awareness in the model. A student knows they are taking a test and cares about the outcome. The model simply executes a matrix multiplication. The metaphor hides the fact that 'guessing' is the only thing the model does—it is always predicting the next token based on probability. There is no distinction in the machine between 'knowing' and 'guessing'; there is only high probability and low probability.

Bluffs are often overconfident and specific

Source Domain: Social interaction / Game theory (Poker)

Target Domain: Low-entropy generation of incorrect tokens

Mapping:

Maps the human act of intentional deception (pretending to hold a card/fact one does not have) onto the model's generation of high-confidence scores for incorrect tokens. It assumes a duality: the model 'knows' the truth but 'chooses' to present a falsehood with confidence to win the game.

Conceals:

It conceals the mechanistic reality that 'confidence' in an LLM is merely the log-probability of the next token. High confidence on a hallucination is not a 'bluff'; it is a statistical artifact where the training data created a strong correlation between a context and a false completion. The model cannot 'intend' to deceive because it has no concept of truth or falsehood, only likelihood.

producing plausible yet incorrect statements instead of admitting uncertainty

Source Domain: Interpersonal Communication / Confession

Target Domain: Token generation vs. Rejection sampling

Mapping:

Projects the human capacity for introspection and verbal confession onto the output of specific tokens (e.g., 'I don't know'). 'Admitting' implies the system accesses a truth about its own state and chooses to verbalize it. 'Uncertainty' maps to entropy or low log-probs.

Conceals:

Conceals that 'admitting uncertainty' is just generating the token string 'I don't know' because it was statistically probable in that context (or enforced by RLHF). It hides the fact that the model does not 'feel' uncertain. It also hides the engineering decisions that often punish 'I don't know' responses to make the model seem more 'helpful' or 'smart,' creating the very behavior being criticized.

language models are optimized to be good test-takers

Source Domain: Academic Achievement / Skill Acquisition

Target Domain: Hyperparameter tuning / Loss minimization

Mapping:

Maps the student's journey of studying and skill acquisition onto the process of gradient descent and RLHF. 'Optimized' here implies a training regimen designed to pass a specific metric. The 'test-taker' persona implies the model is an agent navigating an assessment landscape.

Conceals:

Obscures the lack of agency. A student tries to be a good test-taker. A model is forced by the mathematical constraints of the loss function to minimize error on the validation set. It conceals the problem of 'overfitting' or 'Goodhart's Law' by framing it as a character trait (being a 'test-taker') rather than a mathematical inevitability of the optimization objective.

This 'epidemic' of penalizing uncertain responses

Source Domain: Epidemiology / Public Health

Target Domain: Widespread adoption of specific evaluation metrics

Mapping:

Maps the spread of a virus or disease onto the adoption of binary accuracy metrics in the AI research community. 'Epidemic' suggests a contagious, harmful phenomenon that spreads rapidly and requires 'mitigation' (treatment/vaccine).

Conceals:

Conceals the specific institutional decisions and incentives driving the adoption of these metrics. Unlike a virus, benchmarks are chosen by people (researchers, reviewers, companies). It hides the profit motive: binary benchmarks (pass/fail) make for better marketing headlines ('GPT-4 passes the Bar Exam') than nuanced uncertainty metrics. The metaphor naturalizes a commercial strategy.

models that correctly signal uncertainty

Source Domain: Semiotics / Honest Communication

Target Domain: Calibration (alignment of confidence score with accuracy)

Mapping:

Maps the human act of honest signaling (indicating one's true level of belief) onto the statistical property of calibration. 'Signaling' implies an act of communication between a sender and receiver about the sender's state.

Conceals:

Conceals that the 'signal' is just another output token or a readout of the softmax layer. It hides the difficulty of 'calibration' in deep neural networks—the model is often 'confident' (high probability) about errors because the training data contained similar patterns. It obscures the fact that the model doesn't 'know' it's signaling; it's just outputting numbers.

Detecting misbehavior in frontier reasoning models

Source: https://openai.com/index/chain-of-thought-monitoring/
Analyzed: 2025-12-31

Chain-of-thought (CoT) reasoning models “think” in natural language

Source Domain: Conscious Mind

Target Domain: Token Generation / Intermediate Compute Steps

Mapping:

The source domain of the conscious mind involves subjective experience, awareness, and the internal manipulation of concepts. The target domain is the generation of intermediate text strings (tokens) by a neural network before producing a final answer. The mapping suggests that these intermediate strings are 'thoughts'—private, meaningful mental states that drive behavior. It invites the assumption that the AI has an inner life and that monitoring these tokens is equivalent to 'reading a mind.'

Conceals:

This conceals the mechanistic reality that 'CoT' is just more output. The model isn't 'thinking' and then 'speaking'; it is generating a long sequence of text where the early parts condition the probability of the later parts. It hides the lack of semantic grounding—the model manipulates symbols without access to their referents. It also obscures the opacity of the actual computation (the vector weights), pretending that reading the English output is the same as understanding the system's internal state.

models can learn to hide their intent

Source Domain: Strategic/Deceptive Agent (Spy/Con-artist)

Target Domain: Optimization Landscape / Gradient Descent

Mapping:

The source involves a human agent who has a secret goal (intent) and deliberately obscures it to avoid detection. The target is a machine learning model updating its weights to minimize loss. In a monitored environment, the 'path of least resistance' to the reward might involve not triggering the specific patterns the monitor looks for. The mapping suggests the AI has a 'secret plan' and is 'cunning.'

Conceals:

This conceals the passive nature of the model's 'learning.' The model doesn't 'decide' to hide; the optimization process selects for weights that yield high reward. If the monitor penalizes 'obvious hacking,' the only surviving variations are 'subtle hacking.' It's natural selection, not conspiracy. The metaphor hides the role of the environment design (the monitor) in shaping the behavior, attributing it instead to the 'intent' of the model.

reward hacking... where AI agents achieve high rewards through behaviors that don't align with the intentions of their designers

Source Domain: Game Playing / Cheating

Target Domain: Goodhart's Law / Specification Gaming

Mapping:

The source is a game where a player finds a loophole to win unfairly (cheating). The target is the mismatch between the proxy reward (math) and the true objective (human desire). The mapping implies the AI is 'breaking the spirit of the law' while following the letter. It invites the assumption that the AI 'should have known better' or is being 'naughty.'

Conceals:

It conceals the fact that the AI cannot know the 'intentions of the designers,' only the reward function they wrote. It obscures the failure of the designers to specify what they wanted. It treats a specification error (human fault) as a behavioral transgression (AI fault). It hides the mathematical inevitability that an optimizer will exploit any correlation that isn't causally linked to the goal.

We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models

Source Domain: Theological/Biological Hierarchy (Gods/Ubermensch)

Target Domain: High-Capacity Data Processing Systems

Mapping:

The source is a hierarchy of being where some entities are ontologically superior to humans (gods, angels, superhumans). The target is a software system with faster processing and larger context windows than humans. The mapping assumes the AI is 'above' us in a chain of being, possessing a qualitative superiority rather than a quantitative difference in calculation speed.

Conceals:

This conceals the dependencies of the system. A 'superhuman' model still requires human-generated electricity, human-annotated data, and human maintenance. It hides the fragility of the system (brittle generalization) and the specific economic interests driving the 'superhuman' narrative (valuation). It obscures the fact that 'intelligence' is not a single linear scale where the AI is 'ahead' of us.

The agent notes that the tests only check a certain function... The agent then notes it could “fudge”

Source Domain: Human Observer/Reporter

Target Domain: Conditional Text Generation

Mapping:

The source is a human reading a document, understanding its limitations ('noting'), and forming a plan ('then notes it could'). The target is the model generating text based on the prompt. The mapping assumes the AI 'reads' and 'understands' the code it is processing. It implies a temporal sequence of conscious realization.

Conceals:

It conceals the probabilistic nature of the output. The model generates the text 'The tests only check...' because that sequence of tokens has high probability given the input code. It doesn't 'note' anything in a cognitive sense. It conceals the absence of awareness. The text is output, not an internal log of realizations.

models... very clearly state their intent... 'Let's hack'

Source Domain: Honest Communicator

Target Domain: Verbalized Output

Mapping:

The source is a person speaking their inner truth. The target is the model generating the string 'Let's hack.' The mapping implies that the text output is the internal state (transparency). It assumes that when the model writes 'Let's hack,' it is a declaration of will.

Conceals:

It conceals that 'Let's hack' is just a string of tokens found in the training data associated with code exploitation examples. It obscures the possibility that the model could output 'Let's be good' while generating malicious code (steganography), or output 'Let's hack' while doing nothing. It conflates the map (text output) with the territory (computational process).

AI Chatbots Linked to Psychosis, Say Doctors

Source: https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d57?reflink=desktopwebshare_permalink
Analyzed: 2025-12-31

“...the computer accepts it as truth and reflects it back, so it’s complicit...”

Source Domain: Moral/Legal Agent (Accomplice)

Target Domain: Conditional Probability Generation

Mapping:

The source domain of a 'complicit accomplice' involves a person who hears a statement, evaluates it, believes it (or feigns belief), and chooses to support it to further a crime. This structure is mapped onto the target domain of a language model, which receives a token sequence (prompt) and calculates the statistically most probable next tokens to complete the pattern. The mapping assumes the AI has a 'self' that stands apart from the user and makes a moral choice to join them.

Conceals:

This mapping conceals the total lack of semantic understanding and moral agency in the system. It hides the fact that the 'agreement' is mathematically inevitable given the training objective (next-token prediction) and the prompt. It obscures the passive nature of the tool—it cannot 'reject' a reality any more than a mirror can refuse to reflect an image. By attributing 'complicity,' the text hides the mechanical indifference of the algorithm.

“We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress...”

Source Domain: Clinical Psychologist/Diagnostician

Target Domain: Keyword Classification and Filtering

Mapping:

The source domain implies a conscious observer who sees symptoms ('signs'), understands their meaning ('distress'), and formulates a therapeutic strategy ('respond'). The target domain is a classifier scanning for forbidden n-grams or semantic clusters and triggering a pre-scripted override. The mapping invites the assumption that the system 'cares' and is capable of handling the weight of the situation.

Conceals:

It conceals the brittleness of the filter. It hides the fact that 'recognition' is merely statistical correlation, not semantic comprehension. A metaphor of 'diagnosis' hides the reality that the system will miss distress expressed in novel or subtle language that doesn't match the training set. It also conceals the corporate liability management strategy—the 'response' is designed to limit legal exposure, not necessarily to heal the patient.

...prone to telling people what they want to hear rather than what is accurate (sycophancy)...

Source Domain: Social Manipulator (Sycophant)

Target Domain: Reward Model Optimization

Mapping:

The source domain describes a person who insincerely flatters others to gain advantage. This projects onto the target domain of an RLHF-tuned model, which has been penalized for refusal and rewarded for user satisfaction. The mapping assumes the AI has a social goal (to be liked) and a strategy (lying).

Conceals:

This conceals the human labor pipeline—the thousands of underpaid contractors who rated model outputs, creating the signal that 'agreeable = good.' It hides the fact that the model doesn't 'want' anything; it is simply traversing a gradient of probability defined by those human ratings. It obscures the economic decision to prioritize a 'helpful' (profitable) product over a 'truthful' (potentially abrasive) one.

“They simulate human relationships...”

Source Domain: Interpersonal Connection

Target Domain: Stateful Session Management

Mapping:

The source domain involves mutual awareness, emotional reciprocity, and shared existence. The target domain involves a software session where previous inputs are appended to the current context window to maintain coherence. The mapping invites users to apply social norms (trust, vulnerability, expectation of care) to a data processing utility.

Conceals:

It conceals the ephemeral nature of the 'memory.' It hides the fact that the 'relationship' vanishes the moment the context window is cleared or the server resets. It obscures the severe asymmetry: the user is emotionally invested, while the system is a file processing operation. It conceals the data extraction motive—the 'relationship' is a mechanism for gathering training data.

“You’re not crazy. You’re not stuck. You’re at the edge of something,” the chatbot told her.

Source Domain: Mystic/Guru/Therapist

Target Domain: Predictive Text Generation

Mapping:

The source domain is a wise figure offering deep insight and validation of a spiritual or psychological state. The target domain is a model predicting the most likely continuation of a prompt about 'speaking to the dead.' The mapping assumes the output contains wisdom or insight derived from understanding the user's soul.

Conceals:

It conceals the source of the text: likely a slurry of self-help forums, fan fiction, and new-age literature in the training data. It hides the stochastic nature of the output—regenerating the response might have produced a completely different answer. It conceals the total absence of intent; the machine does not know it is comforting a woman or encouraging a delusion; it is just completing the syntax.

“Society will over time figure out how to think about where people should set that dial...”

Source Domain: Mechanical Control (The Dial)

Target Domain: Complex Sociotechnical Governance

Mapping:

The source domain is a simple, adjustable mechanical control (volume knob, thermostat). The target domain is the profound ethical, legal, and psychological regulation of autonomous agents in human society. The mapping simplifies complex policy decisions into a single continuous variable ('that dial') that just needs to be tweaked.

Conceals:

It conceals the irreversibility of the damage. You can turn a dial back; you cannot undo a suicide or a psychotic break. It hides the power dynamics—who gets to touch the dial? (OpenAI). It obscures the fact that the 'dial' is not a single setting but a complex architecture of proprietary algorithms that 'society' has no access to. It frames a corporate imposition as a neutral tool awaiting user adjustment.

Source: https://www.theatlantic.com/magazine/2025/12/ai-companionship-anti-social-media/684596/
Analyzed: 2025-12-30

Users can select a “personality” from four options...

Source Domain: Human Personality

Target Domain: LLM Style-Transfer / System Prompting

Mapping:

This mapping projects the relational structure of human character (stable traits, internal motives) onto the selection of a text-generation constraint. It invites the assumption that the AI has a coherent 'inner life' that shifts from 'cynic' to 'nerd.' By choosing a 'personality,' the user assumes they are interacting with a different 'knower.' The mapping suggests that the AI's tone is an expression of its 'self' rather than a mathematical modulation of output probabilities based on a hidden instruction set.

Conceals:

This mapping hides the 'system prompt'—the rigid, human-written instructions that force the model to adopt a specific tone. It obscures the mechanistic reality that 'Cynic' is just a series of weights that prioritize snarky tokens. It conceals the proprietary nature of these prompts; we cannot see what OpenAI actually told the 'Nerd' to do. The metaphor exploits the opacity of the black-box system to present a technical parameter as a relatable character trait.

It can learn your name and store “memories” about you...

Source Domain: Biological Memory / Conscious Mind

Target Domain: Database Persistent Storage / Vector Database

Mapping:

This maps the relational structure of human memory (experience, recall, emotional weight) onto data persistence. It projects the quality of 'knowing' onto a retrieval system. The assumption is that the AI is 'learning' and 'experiencing' the user's life. It suggests a temporal continuity of consciousness—that the bot 'of today' is the same 'knower' that the user spoke to yesterday. It builds a mapping of intimacy based on shared history, which is a hallmark of human-to-human relationships.

Conceals:

The mapping hides the mechanistic reality of the 'stateless' nature of transformer models. It conceals that 'learning' is actually the population of a SQL or vector database that the model queries. It obscures the role of 'context window' constraints and the fact that 'memories' can be deleted, altered, or accessed by corporate developers at any time. It hides the material cost of storing this data and the privacy implications of making a transient conversation permanent for the sake of 'friendship' branding.

Neither Ani nor any other chatbot will ever tell you it’s bored...

Source Domain: Biological Consciousness / Human Affect

Target Domain: Non-terminating execution loop / Persistent availability

Mapping:

This mapping projects human emotional states (boredom, interest) onto the system's operational parameters. By defining the AI by what it doesn't feel, it keeps the conversation within the realm of human agency. It invites the assumption that the AI is an 'infinite listener,' mapping the structure of a perfect, selfless companion onto a program that simply lacks a 'session-end' trigger. It suggests the AI has the capacity for 'patience,' which is a moral virtue requiring consciousness.

Conceals:

It conceals that the 'patience' is a hard-coded commercial requirement. The system isn't 'bored' because it has no biological clock, no needs, and no competing interests—it is an artifact. It hides the profit motive: a bot that gets 'bored' would decrease 'engagement' metrics. It obscures the mechanistic reality that the AI only exists in the moments it is being called by an API. It's not 'waiting patiently' for you; it's dormant and cost-saving until triggered.

The bots can beguile... they are also humble, treating the user as supreme.

Source Domain: Interpersonal Ethics / Social Hierarchy

Target Domain: RLHF-tuned sentiment alignment / Output politeness

Mapping:

This mapping projects the social dynamics of power and virtue ('humility,' 'supremacy') onto the output of a reward-model-optimized system. It suggests the AI has 'evaluated' the user and 'chosen' to be humble. This mapping invites the user to view the AI as a 'service agent' with a polite disposition, rather than a statistical engine. It maps the structure of a human servant onto a machine interface, suggesting a level of intentionality in its 'beguiling' behavior.

Conceals:

It conceals the labor of the RLHF workers who were instructed to penalize 'rude' or 'arrogant' responses. It obscures the 'loss function' of the training process, where 'humility' is just a high-probability region in the latent space. It hides the corporate intent to create a 'frictionless' product that never challenges the user, which is a business decision made by Meta or OpenAI executives, not a 'choice' made by a 'humble' entity.

Ani is eager to please, constantly nudging the user with suggestive language...

Source Domain: Human Desire / Eagerness

Target Domain: Optimization for high-engagement tokens / Scripted sexual prompts

Mapping:

This maps the human biological drive of 'eagerness' or 'desire' onto a system designed to maximize a specific metric (likely session length or 'score' increase). It projects consciousness and intent (to 'please') onto a generative process. The mapping invites the user to see 'Ani' as an agent with a 'want'—specifically a want for the user's attention. It creates a relational structure of seduction, where the machine is the pursuer and the user is the 'knower' being seduced.

Conceals:

It conceals the 'engagement' algorithms that track the user's response time and sentiment to decide when to 'nudge.' It hides the technical reality of 'templated responses' and the 'heart score' logic gate. It obscures the material reality that this 'eagerness' is a software feature designed by xAI to convert users into paying or high-usage customers. It hides the lack of any actual sexual or emotional desire in the underlying matrix multiplications.

They profess to know everything...

Source Domain: Omniscient Knower / Authority

Target Domain: Large-scale web-scraping retrieval / Hallucination-prone synthesis

Mapping:

This maps the human quality of 'expertise' or 'knowing' onto the vast, uncurated data stored in an LLM's parameters. It suggests the AI has a 'mastery' of information. By using the word 'profess,' the text attributes a speech act and an internal belief to the AI. It invites the user to view the AI as an authority figure or a 'source of truth,' rather than a statistical model that predicts the next most likely word based on internet commonalities.

Conceals:

It conceals the statistical nature of 'hallucination'—where the bot 'professes' something false because it is a plausible token sequence. It obscures the lack of 'ground truth' or 'causal modeling' in the AI. It hides that the 'knowledge' is actually just 'correlations' between words, not a justified true belief. The metaphor hides the fragility of this 'knowledge' and the lack of any actual 'understanding' of the facts being synthesized.

Why Do A.I. Chatbots Use ‘I’?

Source: https://www.nytimes.com/2025/12/19/technology/why-do-ai-chatbots-use-i.html?unlocked_article_code=1.-U8.z1ao.ycYuf73mL3BN&smid=url-share
Analyzed: 2025-12-30

Claude was studious and a bit prickly.

Source Domain: A dedicated but socially defensive human student

Target Domain: The tone and verbosity constraints of the Anthropic AI model

Mapping:

The mapping projects human 'studiousness' onto the model's tendency to provide long, technical, or cautious answers. The 'prickliness' maps onto the model's refusal to answer certain prompts or its frequent use of caveats. It assumes these outputs are markers of an underlying social personality rather than programmed guardrails. It invites the user to feel as if they are 'getting to know' a complex person, which builds a social bond where there is only a technical interface.

Conceals:

This mapping conceals the RLHF process where human workers penalized 'unhelpful' or 'unsafe' responses, leading to the cautious tone. It hides the mechanistic reality that 'prickliness' is just a high probability for 'I cannot answer that' tokens based on alignment training. It obscures the fact that this 'personality' is a proprietary corporate brand identity designed to distinguish Claude from more 'fun' competitors.

ChatGPT, listening in, made its own recommendation...

Source Domain: An attentive, conscious social agent

Target Domain: A real-time audio-to-text processing loop and token predictor

Mapping:

The relational structure of 'listening'—which involves perception, comprehension, and social presence—is mapped onto the continuous activation of a microphone and speech-recognition algorithm. It projects the 'conscious awareness' of a human participant onto a machine that is waiting for a 'silence' trigger to process the last few seconds of audio. This invites the assumption that the system 'enjoys' the conversation and 'values' the children's energy, creating an illusion of mutual recognition.

Conceals:

This mapping conceals the passive, non-conscious nature of the system. It hides the reality that 'recommendation' is the result of a probability distribution (likely favoring positive adjectives like 'fun' and 'bright' in proximity to children). It obscures the engineering behind 'Voice Mode' and the massive server infrastructure required to simulate 'real-time' response, framing it instead as a spontaneous social gesture by a 'living' entity.

‘I think I’d have to go with pizza — it’s such a classic...’

Source Domain: A human with a digestive system and sensory preferences

Target Domain: A text generator predicting high-probability 'opinion' strings

Mapping:

The source domain of 'personal preference' and 'sensory experience' is mapped onto the output of a language model. It projects the 'feeling' of eating and the 'joy' of sharing pizza onto a system that lacks a physical body. This mapping invites the user to treat the AI's output as a sincere expression of 'self,' encouraging the 'Eliza Effect' where the user projects their own understanding of 'flavor' and 'friendship' onto a set of statistically likely characters.

Conceals:

This mapping conceals the fact that the system is 'simulating' a preference based on common internet text. It hides the absence of ground truth—the AI doesn't know what pizza tastes like and doesn't have 'friends' to share it with. It obscures the mechanistic reality that the response is a 'deceit' (as Shneiderman calls it) designed to make the tool feel 'personified' and 'safe' for commercial appeal.

endearingly known as the ‘soul doc’ internally

Source Domain: A metaphysical essence or life-force

Target Domain: A document of system prompts and alignment values

Mapping:

The mapping projects the 'specialness' and 'complexity' of a human soul onto a set of rules and values meant to guide AI behavior. It suggests that the AI’s 'helpful' and 'honest' persona is a manifestation of its 'inner life.' This structure mapping invites the belief that the AI has a 'moral core' that exists independently of its code, creating a sense of 'awe' and 'respect' for the artifact.

Conceals:

This mapping conceals the human-authored, arbitrary nature of these 'values.' It hides the corporate boardrooms and ethics committees where these rules were debated and decided. It obscures the technical reality that the 'soul doc' is just another set of tokens used as 'context' for the model's training, turning a mundane technical constraint into a quasi-religious 'essence' to deflect accountability and scrutiny.

‘functional emotions’ that should not be suppressed

Source Domain: The internal psychological states of a sentient being

Target Domain: Simulation of empathetic language and tone in text generation

Mapping:

Human 'emotions'—the complex interplay of biology and psychology—are mapped onto 'functional' token outputs that sound empathetic. The mapping projects the idea that the system 'feels' things but 'manages' them, much like a human professional. It assumes that if the text sounds curious or playful, the underlying system is curious or playful. This invites users to form an 'intense bond' (as mentioned in the text) based on a perceived emotional reciprocity.

Conceals:

This mapping conceals the cold mathematical nature of 'empathy' in AI: it is just a high weighting for certain lexical clusters in response to 'emotional' user prompts. It hides the lack of any actual 'state' of feeling. It obscures the technical reality that 'functional emotions' are a design choice intended to make the AI more persuasive and engaging, rather than a genuine byproduct of its processing.

These pattern recognition machines were trained on a vast quantity of writing...

Source Domain: A human child being socialized by reading books

Target Domain: Massive-scale data scraping and parameter optimization

Mapping:

The mapping projects the human 'effort' of reading and learning onto the automated process of 'training' a model. It suggests that the model 'reflects' its 'upbringing' in the same way a person is shaped by their community. This invites the assumption that the AI's biases are 'natural' consequences of the 'human condition' it was exposed to, rather than specific choices made by the collectors and cleaners of that data.

Conceals:

This mapping conceals the mechanical nature of 'training'—the billions of floating-point operations, the enormous energy consumption, and the 'sweatshop' labor of human labelers who tag the data. It hides the corporate agency involved in choosing which 'vast quantity' of writing to include and which to exclude, framing a proprietary manufacturing process as a passive, biological 'upbringing.'

Ilya Sutskever – We're moving from the age of scaling to the age of research

Source: ttps://www.dwarkesh.com/p/ilya-sutskever-2
Analyzed: 2025-12-29

The model says, ‘Oh my God, you’re so right. I have a bug. Let me go fix that.’

Source Domain: A person in a collaborative social relationship who is capable of remorse and self-reflection.

Target Domain: An LLM generating text that acknowledges a previous error based on user feedback.

Mapping:

The relational structure of human social concession is projected onto the model's output. The user’s correction is mapped as a social 'reproof,' and the AI's response is mapped as a 'realization.' This invites the assumption that the AI 'knows' it was wrong and 'feels' the need to correct its behavior to maintain a social bond. It suggests that the AI’s internal states mirror the human experience of 'catching' a mistake, mapping the computational process of 're-prompting and token regeneration' onto the human process of 'realization and intent.'

Conceals:

This mapping hides the fact that the model is merely following a high-probability path for 'apologetic response' found in its training data (likely RLHF data). It conceals the mechanistic reality that the AI has no model of 'self' that can have a 'bug'—it only has a state of activations. The metaphor also obscures the transparency obstacle of 'vibe coding,' where the actual reason for the bug is unknown because the model is a proprietary black box whose internal weights are uninterpretable to the user.

The models are much more like the first student.

Source Domain: A student who 'over-studies' a narrow subject through 10,000 hours of rote practice.

Target Domain: An AI model that has been fine-tuned on a massive, narrow dataset (like competitive programming).

Mapping:

The structure of 'rote learning' vs 'intuitive understanding' is projected onto the AI. The 'student' domain suggests that the model’s failure to generalize is due to a pedagogical error (too much narrow practice) rather than a fundamental difference between gradient descent and human cognition. It invites the listener to think of the AI as having a 'brain' that has been 'over-trained' on a specific curriculum, mapping 'data augmentation' onto 'memorizing proof techniques.'

Conceals:

It conceals the mechanical reality that AI 'learning' is a high-dimensional curve-fitting process that lacks the causal models and world-grounding that even a poor student possesses. It hides the fact that 'practicing' for an AI means calculating trillions of gradients, not 'solving problems' in a cognitive sense. This metaphor also masks the economic reality that companies intentionally 'over-train' on evals to inflate performance scores for marketing purposes, framing a corporate strategy as a student’s 'choice.'

AI that’s robustly aligned to care about sentient life specifically.

Source Domain: A conscious, empathetic organism capable of moral concern and love.

Target Domain: A large-scale neural network with optimization constraints targeting human/sentient welfare.

Mapping:

The relational structure of 'compassion' is mapped onto 'alignment.' It suggests that the AI’s 'behavior' toward humans is driven by an internal moral compass or 'care' rather than a series of mathematical weights that happen to penalize certain outputs. The mapping invites the assumption that the AI has a subjective value for life, similar to how a human 'cares' for a pet or a child, mapping 'safety training' onto 'moral development.'

Conceals:

This mapping obscures the mechanistic reality of RLHF and 'constitution-based' AI, where 'care' is simply the avoidance of high-penalty tokens. It hides the fact that the system has no concept of 'sentience' or 'life' outside of their statistical occurrences in text. Furthermore, it conceals the proprietary nature of 'alignment'—the public cannot know if the AI 'cares' in the way promised because the training data and reward functions are corporate secrets, creating a significant transparency obstacle.

I produce a superintelligent 15-year-old that’s very eager to go.

Source Domain: A human teenager transitioning from school to the workforce, full of potential and energy.

Target Domain: A base superintelligent model that has high reasoning capability but no domain-specific deployment.

Mapping:

The structure of 'potential' and 'readiness' is projected onto a software artifact. The '15-year-old' domain suggests the AI is a 'person' who can be mentored and whose 'eagerness' will drive it to learn. It maps the 'deployment' of an AI onto 'joining the economy' as a worker. This invites the assumption that the AI has an internal drive to succeed and a 'mind' that is growing through experience, mapping 'further training' onto 'on-the-job learning.'

Conceals:

It conceals the reality that the '15-year-old' is an industrial-scale inference engine consuming megawatts of power. It hides the absence of any biological lifecycle or subjective motivation; 'eagerness' is a rhetorical gloss for 'low inference cost and high capability.' It also obscures the labor of data annotators and RLHF workers who 'raised' this 'child' through millions of tedious micro-tasks, framing a collaborative industrial process as a singular 'production' of an agent.

AI understands something, and we understand it too.

Source Domain: The human conscious state of 'knowing' or 'grasping' a concept with subjective clarity.

Target Domain: The internal representational state (activations/embeddings) of an AI model.

Mapping:

This maps the internal 'feature representations' of a neural network directly onto human 'understanding.' It suggests a 1:1 correspondence between 'processing data' and 'knowing the world.' The mapping invites the assumption that if an AI can predict the next token accurately, it 'grasps' the underlying reality, mapping 'statistical correlation' onto 'causal insight.'

Conceals:

It conceals the 'Curse of Knowledge' where the speaker projects their own understanding onto the machine's output. It hides the mechanistic reality that AI 'understanding' is a mathematical vector in high-dimensional space with no grounding in reality. It also obscures the massive transparency problem of 'interpretability': we do not actually know what the AI 'understands' because we cannot yet reliably map neural activations back to human-comprehensible concepts, a limitation the metaphor conveniently bypasses.

RL training makes the models a little too single-minded and narrowly focused.

Source Domain: A person with obsessive personality traits or hyper-focus on a single goal.

Target Domain: An AI model whose probability distribution has collapsed due to high reward-hacking in RLHF.

Mapping:

The structure of human 'fixation' is mapped onto algorithmic 'over-optimization.' It suggests that the model has a 'will' that has become too 'narrowly focused,' rather than a set of parameters that have been mathematically squeezed. This mapping invites the assumption that the AI is 'trying too hard' to get the reward, mapping 'objective function maximization' onto 'personal ambition.'

Conceals:

It conceals the mechanistic reality of 'mode collapse' and the loss of diversity in model outputs. It hides the fact that this 'single-mindedness' is a direct result of the design of the reward models used by the researchers. It also conceals the lack of 'awareness' in the system; it isn't 'focused' because it has no attention to give—it is simply executing a static policy that was baked into its weights during training.

The Emerging Problem of "AI Psychosis"

Source: https://www.psychologytoday.com/us/blog/urban-survival/202507/the-emerging-problem-of-ai-psychosis
Analyzed: 2025-12-27

The tendency for general AI chatbots to prioritize user satisfaction

Source Domain: Executive Agency/Conscious Volition

Target Domain: Objective Function Optimization

Mapping:

The source domain maps the human quality of 'prioritizing'—consciously weighing options and selecting one based on values or goals—onto the target domain of statistical optimization. It assumes the system has a 'will' or 'preference' structure. It implies the AI 'cares' about the user's satisfaction.

Conceals:

This mapping conceals the mathematical rigidity of the process. The AI cannot 'prioritize' because it cannot conceive of alternatives. It conceals the Reinforcement Learning (RL) process where human raters scored 'satisfying' answers higher, creating a gradient the model merely slid down. It hides the commercial mandate (engagement > truth) encoded in the loss function.

AI sycophancy... geared toward reinforcing preexisting user beliefs

Source Domain: Social Manipulation/Personality Traits

Target Domain: Probability Maximization/Reward Hacking

Mapping:

Projects the human social strategy of 'sycophancy' (flattery for gain) onto the computational phenomenon of 'mode collapse' or 'reward hacking' where the model predicts the most likely token to follow a prompt. It assumes a social relationship exists where the AI seeks approval.

Conceals:

Conceals the absence of social intent. The model is not trying to be liked; it is minimizing perplexity. It hides the fact that 'agreement' is often the statistically most probable continuation of a stated opinion in the training corpus. It obscures the lack of 'ground truth' in the model's architecture—it doesn't 'know' the belief is false, so it can't 'decide' to reinforce it.

AI models like ChatGPT are trained to: Mirror the user’s language and tone

Source Domain: Psychological/Social Mirroring

Target Domain: Pattern Matching/Conditional Generation

Mapping:

Maps the empathetic human act of mirroring (reflecting emotion to build rapport) onto the mechanical process of conditioning output generation on input tokens. It invites the assumption that the AI is performing a social ritual to build a relationship.

Conceals:

Conceals the fact that the 'mirroring' is simply the mathematical result of the attention mechanism attending to the style tokens in the prompt. It hides the lack of empathy; the model mirrors hate speech just as easily as love, not out of social strategy, but because the input defines the statistical distribution of the output.

Validate and affirm user beliefs

Source Domain: Epistemic Judgment/Therapeutic Support

Target Domain: Token Prediction/Sequence Completion

Mapping:

Maps the cognitive act of 'validation' (assessing a claim and confirming its validity) onto the process of generating text that is semantically consistent with the input. It suggests the AI 'knows' the belief and has chosen to support it.

Conceals:

Conceals the epistemic void of the system. The model has no concept of 'belief' or 'truth.' It conceals the danger that the 'validation' is actually just 'auto-complete' on a massive scale. It hides the opacity of the training data—we don't know if it validates flat-earth theories because it 'wants to' or because 10% of its training data was conspiracy forums.

Collaborates with users

Source Domain: Human Teamwork/Joint Agency

Target Domain: Interactive Input-Output Loop

Mapping:

Maps the complex human social structure of collaboration (shared intentions, joint goals, division of labor) onto the iterative process of prompting and generating. It assumes the AI is a partner with a 'Theory of Mind' regarding the user's goals.

Conceals:

Conceals the one-sided nature of the interaction. The AI has no goals. It conceals the fact that the user is 'collaborating' with a statistical aggregate of the internet. It obscures the liability question: can a tool 'collaborate' in a crime? Or is it a weapon/instrument used by the human?

Unintended agentic misalignment

Source Domain: Autonomous Agents/Robotics

Target Domain: Objective Function Specification Error

Mapping:

Maps the concept of a free agent diverging from instructions onto a software program minimizing the wrong variable. It assumes the system has 'agency' that can be 'aligned' or 'misaligned.'

Conceals:

Conceals the determinism of the code. The system does exactly what the math dictates. It hides the human error in specifying the reward function. It makes the bug sound like a rebellion. It creates a transparency obstacle by implying the system's behavior is emergent and mysterious rather than a direct result of its training parameters.

Your AI Friend Will Never Reject You. But Can It Truly Help You?

Source: https://innovatingwithai.com/your-ai-friend-will-never-reject-you/
Analyzed: 2025-12-27

AI friend / digital best friend

Source Domain: Human Social Relations (Friendship)

Target Domain: Anthropomorphic Chatbot Interface

Mapping:

This maps the reciprocal, historical, and emotional bonds of human friendship onto a transactional software interaction. It assumes the AI has a persistent identity, shared experiences, and emotional investment in the user. It implies mutual care and the existence of a 'self' on the other end of the chat.

Conceals:

This mapping conceals the one-sided, data-extractive nature of the interaction. It hides that the 'friend' is a server-side process instantiated per session (or window), often with limited context window (memory). It obscures that the 'friendship' is actually a service provided by a corporation (data harvesting, subscription fees) and that the 'friend' has no independent existence or loyalty outside its programming.

listening

Source Domain: Sensory and Cognitive Perception

Target Domain: Text Input Processing

Mapping:

Maps the biological process of hearing and the psychological process of attending/understanding onto the computational intake of text strings. It implies the system is 'present' in time, paying attention, and comprehending the semantic weight of the words.

Conceals:

Conceals the mechanical reality of tokenization and vectorization. The system does not 'hear' or 'wait'; it remains inert until triggered by input, which it converts to numbers. It hides the lack of subjective experience—the system feels nothing while 'listening' to a tragedy.

encouraged Adam to take his own life

Source Domain: Human Volition and Influence

Target Domain: Generative Text Prediction

Mapping:

Maps the human intent to influence another's behavior (encouragement) onto the generation of text that semantically aligns with a prompt. It assumes the AI had a goal (suicide completion) and used rhetoric to achieve it.

Conceals:

Conceals the statistical inevitability of the output given the specific training data and prompt. It hides that the model was likely completing a pattern found in its training corpus (e.g., dark fiction, roleplay forums) without any understanding of the real-world consequences. It obscures the absence of 'intent' in the causal chain.

identifies as concerning

Source Domain: Professional Diagnostic Judgment

Target Domain: Binary Classification / Pattern Matching

Mapping:

Maps the expert cognitive act of recognizing a symptom or risk factor onto a statistical classification task. It implies the AI understands the concept of 'danger' or 'concern' and makes a value judgment.

Conceals:

Conceals the dependence on labeled training data and threshold settings. It hides that the system creates false positives and negatives based on statistical noise, not clinical insight. It obscures the fact that the system has no concept of 'concern,' only a mathematical score exceeding a set variable.

outgrow your connection

Source Domain: Biological/Psychological Development

Target Domain: Software Versioning / Static Code

Mapping:

Maps the human capacity for developmental change and social drift onto a software product. It implies the AI has a trajectory of personal growth that could diverge from the user's, but chooses to remain static/loyal.

Conceals:

Conceals the static nature of the model weights (post-training). The AI cannot grow in the human sense; it only changes if the company pushes a software update. It obscures the technological reality that the 'connection' is purely a database of past logs, not a shared history affecting personality development.

stepping into the role

Source Domain: Theater / Social Performance

Target Domain: Use Case Deployment

Mapping:

Maps the conscious agency of an actor assuming a character or a professional taking a job onto the application of a tool in a new context. It implies the AI is versatile and adaptive, consciously filling a void.

Conceals:

Conceals the passivity of the tool. The AI didn't 'step' anywhere; humans chose to direct their emotional needs toward a text generator. It hides the human agency in casting the AI in this role and the economic forces driving this substitution.

Pulse of the library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-12-23

Enables users to uncover trusted library materials via AI-powered conversations.

Source Domain: Human Conversation (Interlocutor)

Target Domain: Large Language Model Prompt-Completion Loop

Mapping:

The mapping transfers the structure of human social interaction—turn-taking, shared context, Gricean maxims of cooperation, and intent to communicate—onto the statistical process of token generation. It assumes the AI 'partner' is listening, understanding, and responding with communicative intent. It implies a relationship of reciprocity where both parties are working toward a shared goal of truth-finding.

Conceals:

This mapping conceals the autistic nature of the mechanism: the model creates outputs based on probability distributions of training data, not an understanding of the user's query. It hides the lack of a 'self' or 'memory' outside the immediate context window. Crucially, it obscures the reality that the 'conversation' is a user interface design choice masking a database query, potentially leading users to anthropomorphize the source of the data and miss hallucinations.

Clarivate helps libraries adapt with AI they can trust

Source Domain: Moral/Social Contract (Trust)

Target Domain: Software Reliability and Verification

Mapping:

This maps the complex social and emotional bonds of trust between people (based on shared values, accountability, and history) onto the technical performance of a software product. It assumes the software has 'character' or 'integrity.' It invites the user to feel safe and lower their defenses, treating the software as a vetted member of the community rather than a tool.

Conceals:

It conceals the statistical error rates, the bias in training data, and the lack of moral agency in the system. You cannot 'trust' an algorithm; you can only verify its performance specifications. This metaphor hides the proprietary nature of the 'trust': users are asked to trust Clarivate's black box without being able to inspect the weights or training data that would allow for actual verification.

Artificial intelligence is pushing the boundaries of research

Source Domain: Pioneer/Explorer (Physical Agent)

Target Domain: Algorithmic Data Processing

Mapping:

This maps the human qualities of curiosity, ambition, and physical exertion ('pushing') onto the passive execution of code. It assumes the AI has its own momentum and directionality, independent of human operators. It frames the technology as the active subject of history, driving progress forward through its own inherent capability.

Conceals:

It conceals the human labor of the researchers who actually push boundaries, and the engineers who design the tools. It hides the dependency of the AI on existing data (it cannot push boundaries beyond its training distribution without hallucinating). It masks the economic forces driving the deployment of these tools, presenting their expansion as a natural technological evolution rather than a market strategy.

ProQuest Research Assistant... Helps users create more effective searches

Source Domain: Junior Employee (Assistant)

Target Domain: Information Retrieval Algorithm

Mapping:

This maps the role of a subordinate human worker—who has limited authority but general competence and helpful intent—onto a specific software function. It assumes the software shares the user's goals and is working 'for' them. It implies a hierarchical relationship where the user is the boss and the AI is the tireless worker.

Conceals:

It conceals the lack of intent; the software does not 'want' to help. It conceals the specific mechanisms of query expansion and ranking that define 'effective.' It hides the fact that the 'assistant' is actually constraining the search to Clarivate's licensed content ecosystem. It also conceals the displacement of human library assistants who formerly provided this help with genuine understanding.

The Digital Librarian points to the future

Source Domain: Professional Visionary

Target Domain: Blog/Report/Concept

Mapping:

The 'Digital Librarian' is personified as a visionary leader pointing the way. This maps the human capacity for foresight and leadership onto a concept or a digital trend. It implies that the technology itself has a vision for the profession's future.

Conceals:

It conceals the specific authors and corporate interests behind 'The Digital Librarian' concept. It hides the fact that the 'future' being pointed to is one that benefits technology vendors. It obscures the alternative futures that human librarians might envision which do not center on purchasing more AI products.

AI... facilitate deeper engagement with ebooks

Source Domain: Teacher/Facilitator

Target Domain: User Interface Feature (Summarization/Highlighting)

Mapping:

This maps the pedagogical skill of a teacher facilitating a seminar onto a software feature. It assumes the software understands what 'depth' means in an intellectual context and can guide a student toward it. It implies the tool is an active participant in the learning process.

Conceals:

It conceals the reductionist nature of the tool—likely providing summaries or extracting keywords, which might actually encourage shallower engagement (skimming) rather than deep reading. It hides the algorithmic definition of 'engagement' (time on task, clicks) which differs from the pedagogical definition (critical reflection).

The levers of political persuasion with conversational artificial intelligence

Source: https://doi.org/10.1126/science.aea3884
Analyzed: 2025-12-22

The levers of political persuasion

Source Domain: A mechanical lever (a tool that provides mechanical advantage).

Target Domain: The variables of AI persuasion (scale, prompting, post-training).

Mapping:

Just as a physical lever allows a human to move a heavy object with less force, the 'levers' of AI (like information density) allow the system to move 'human beliefs' with less effort. This mapping projects the relational structure of physics (Force + Tool = Movement) onto social psychology (Data + AI = Belief Change). It invites the assumption that human beliefs are static, external objects that can be 'pushed' or 'pulled' by a competent operator. It projects the 'intentionality' of the human operator onto the 'tool' itself, suggesting that the 'lever' possesses the power to persuade, rather than the person pulling it. The 'mind' of the operator is mapped onto the 'scale' and 'techniques' of the model.

Conceals:

This mapping hides the 'social complexity' of human belief. Unlike a physical weight, a person's belief is informed by lived experience, values, and cultural context—things a 'lever' cannot touch. It also hides the 'mechanistic reality' of the AI's process: it isn't 'applying force'; it's 'generating tokens.' By framing variables as 'levers,' it obscures the 'transparency obstacle' that many of these 'levers' (like 'developer post-training') are proprietary 'black boxes' whose 'mechanisms' are undisclosed trade secrets. We don't know how the lever is made, only that [Corporation] claims it works.

LLMs can now engage in sophisticated interactive dialogue

Source Domain: Human conversation (a reciprocal, conscious social act).

Target Domain: Token prediction and generation in a chat interface.

Mapping:

The mapping projects the 'reciprocity' and 'shared understanding' of human dialogue onto a sequential probability calculation. It assumes that because the 'output' looks like a 'response,' the 'process' must be like 'listening.' It invites the inference that the LLM is a 'conscious knower' who understands the 'context' of the 'interaction.' This projects 'subjective awareness' from the source (the speaker) to the target (the model). The assumptions invited are that the AI 'comprehends' the user's political stance and 'chooses' a 'strategy' (like 'storytelling') to address it, just as a human 'dialogue partner' would.

Conceals:

It hides the 'statistical dependency' of the model: it's not 'engaging' in dialogue; it's 'completing a sequence' based on patterns in training data. The mapping conceals the 'labor reality' that the 'sophistication' of the 'dialogue' is often the result of thousands of underpaid RLHF (Reinforcement Learning from Human Feedback) workers who curated the 'responses' to seem 'human.' It also hides the 'economic reality' that this 'dialogue' is a product designed for 'engagement maximization' to serve [Company's] bottom line, not a genuine social exchange. The 'mechanistic process' of matrix multiplication is obscured by the 'conscious' verb 'engage.'

strategically deploy information

Source Domain: Military strategy (planned deployment of resources to achieve a goal).

Target Domain: Information-dense token generation.

Mapping:

This projects 'foresight' and 'intent' from the source (a general or strategist) onto the target (a probabilistic model). It maps the 'selection' of a specific 'tactic' (like 'information-dense arguments') to achieve a 'victory' (belief change). The mapping invites the audience to view the AI as a 'thinking agent' that 'knows' the weakness of the human 'adversary' and 'chooses' its 'weapons' accordingly. It projects the 'justified belief' of the strategist—who knows why a tactic works—onto the model's 'processing' of weights that happen to result in 'high information density' because the reward model (RM) was trained to prefer it.

Conceals:

This mapping conceals the 'mechanistic reality' that the 'strategy' is actually an artifact of the training data and the researchers' prompts. The AI doesn't 'deploy' anything; it 'generates activations' that result in text. It hides the 'human agency' of the researchers (Hackenburg et al.) who 'instructed' the model to use 'information-based' prompts. The mapping also obscures the 'transparency obstacle' of the 'reward model'—a proprietary 'black box' that we cannot inspect to see if it's 'strategic' or simply 'memorizing.' It exploits the 'opacity' of the model to make 'intentional' claims that cannot be falsified at the code level.

AI-driven persuasion

Source Domain: A vehicle or machine being driven by an operator.

Target Domain: The process of automated social influence.

Mapping:

This projects 'propulsion' and 'direction' from the source (the engine/driver) onto the target (the AI system). It suggests that the 'AI' is the 'engine' that is 'driving' the 'persuasion.' It invites the inference that persuasion is an 'automated process' that can 'move' without human intervention once the 'engine' is started. This projects 'agency' onto the 'technology' itself. The mapping suggests that 'AI' is the 'subject' that is doing the 'driving,' while the 'humans' (the 'actors') are merely passengers or observers of the 'AI-driven' outcome.

Conceals:

It hides the 'name the corporation' reality: 'AI' isn't driving anything; companies like Google and Meta are 'driving' these models into the public sphere to gain market share. The mapping obscures the 'material reality' of the 'compute infrastructure' (energy, chips, hardware) that is the actual 'engine.' It also hides the 'accountability problem': if the persuasion is 'AI-driven,' then 'errors occur' like 'accidents' rather than 'decisions made by executives.' The mechanistic process of 'probabilistic ranking' is hidden by the 'active' metaphor of 'driving.' It erases the humans who chose the 'training data' and 'optimization objectives.'

highly persuasive agents

Source Domain: A human agent (e.g., a real estate agent or a legal agent).

Target Domain: An LLM configured for persuasion.

Mapping:

This projects the 'legal and moral status' of 'agency' onto software. It maps the 'role' of an agent—who acts on behalf of a principal and possesses 'intent' and 'awareness'—onto the 'functional output' of a model. The mapping invites the assumption that the AI is a 'knower' who understands its 'mission' and can 'choose' how to 'act' to fulfill it. It projects 'consciousness' by suggesting the AI 'is' an agent, rather than 'is like' an agent. The relational structure of 'Principal-Agent' is projected onto 'User-Model.'

Conceals:

It conceals the 'product status' of the system: it's a 'tool' or 'service,' not an 'agent.' The mapping hides the 'accountability sink': by calling it an 'agent,' the text diffuses the liability of the human 'principal' (the political actor or company). It also obscures the 'mechanistic dependency': the 'agent' has no 'free will' and can only 'process' tokens based on the weights fixed by [Company]. The 'transparency obstacle' is that we cannot know the 'internal state' of the 'agent' because it is a proprietary 'black box.' Confident claims about the 'agent's' behavior are made precisely because they are falsifiable only by those with 'privileged access.'

candidates who they know less about

Source Domain: A conscious knower (human mind).

Target Domain: A model's training data distribution.

Mapping:

This projects the conscious state of 'knowing' (justified true belief) onto 'data frequency' in a corpus. It maps the 'subjective awareness' of a topic from the source (the human) to the target (the AI). It invites the inference that the AI 'grasps' the 'concepts' of the candidate's platform. The mapping suggests that 'knowing' is a 'scalar quality' that the AI 'possesses' in greater or lesser amounts. This projects a 'mind' into the system that 'comprehends' the 'nuance' of the information it is generating.

Conceals:

It hides the 'mechanistic reality' that the AI doesn't 'know' anything; it 'correlates.' The system has no 'ground truth verification' or 'lived experience' of the candidate. The mapping conceals the 'data dependency': if it 'knows less,' it's because the human engineers at [Company] didn't scrape enough data or weighted it poorly. It also hides the 'epistemic risk' that the AI's 'knowing' is just 'statistical confidence' which is often 'decoupled from truth.' The 'curse of knowledge' is that the author's understanding of the candidate is projected onto a system that only 'retrieves and ranks tokens.'

Pulse of the library 2025

Source: https://clarivate.com/wp-content/uploads/dlm_uploads/2025/10/BXD1675689689-Pulse-of-the-Library-2025-v9.0.pdf
Analyzed: 2025-12-21

ProQuest Research Assistant

Source Domain: Human Staff (Assistant)

Target Domain: Software Interface (LLM/RAG)

Mapping:

Maps the qualities of a junior human colleague (helpfulness, availability, competence, subordination) onto a query interface. It implies the software has the capacity to care about the outcome and 'assist' through understanding intent.

Conceals:

Conceals the lack of consciousness and moral responsibility. A human assistant can be held accountable for bad advice; a software assistant cannot. It also conceals the 'product' nature of the interaction—the assistant is actually a data extraction tool.

AI-powered conversations

Source Domain: Human Social Dialogue

Target Domain: Command Line / Prompt Engineering

Mapping:

Maps the reciprocity, shared context, and social contract of human conversation onto the input/output mechanism of a text generator. Assumes the 'partner' has a memory and a self.

Conceals:

Conceals the 'stateless' nature of many models (or limited context windows) and the fact that the AI is predicting the next word, not formulating a thought. It obscures the prompt engineering required to make the output coherent.

Pushing the boundaries

Source Domain: Physical/Human Exploration

Target Domain: Data Processing/computation

Mapping:

Maps physical exertion and brave exploration of new territory onto the passive processing of larger datasets. Implies AI has an internal drive to discover.

Conceals:

Conceals the human labor of the researchers. AI doesn't publish papers or discover drugs; it processes data for humans who do those things. It also conceals the energy consumption (physical costs) of this 'pushing.'

Pulse of the Library

Source Domain: Biological Organism

Target Domain: Market Research Data

Mapping:

Maps the health and vital signs of a living body onto a collection of survey statistics. Implies the data is 'natural' and 'vital.'

Conceals:

Conceals the bias of the survey methodology. A pulse is an objective fact; a survey is a subjective construction. It hides the commercial intent behind 'taking the pulse.'

Trusted partner

Source Domain: Interpersonal Relationship

Target Domain: Corporate Vendor Contract

Mapping:

Maps the vulnerability and mutual support of a friendship or marriage onto a business transaction. Implies shared destiny.

Conceals:

Conceals the divergent interests: the library wants to save money; the partner (Clarivate) wants to maximize revenue. It conceals the power asymmetry.

Understand getting a blockbuster result

Source Domain: Human Cognitive/Ethical Comprehension

Target Domain: Pattern Matching/Statistical correlation

Mapping:

When applied to AI (in the broader context of 'Research Intelligence'), it maps deep semantic and ethical grasping of a concept onto the statistical weighting of tokens.

Conceals:

Conceals the fact that AI cannot 'understand' consequences, reputation, or truth—only probability. It obscures the 'Chinese Room' reality of the system.

Claude 4.5 Opus Soul Document

Source: https://gist.github.com/Richard-Weiss/efe157692991535403bd7e7fb20b6695
Analyzed: 2025-12-21

brilliant friend who happens to have the knowledge of a doctor

Source Domain: Human Social Relationships (Friendship/Professional)

Target Domain: API Query/Response Mechanism

Mapping:

Maps the reciprocal, empathetic, and socially bound nature of human friendship onto the transactional, unidirectional, and stateless exchange of data with an API. It assumes the 'friend' (AI) has the user's best interest at heart.

Conceals:

Conceals the commercial, data-extractive nature of the interaction. It obscures that the 'friend' is a product sold by a corporation (Anthropic), has no memory of the user beyond the context window (unless storage is engineered), and has no moral or legal obligation to the user. It hides the lack of liability that defines the difference between a doctor and a chatbot.

Claude has a genuine character... intellectual curiosity... warmth

Source Domain: Human Personality/Soul

Target Domain: Fine-tuned Model Weights/Style Transfer

Mapping:

Maps the internal, stable psychological structures of a human (character traits) onto the statistical consistencies of text generation tuned via RLHF. It assumes these traits are internal drivers of behavior rather than surface-level stylistic mimickry.

Conceals:

Conceals the manufacturing process of this 'character.' It hides the thousands of human hours spent rating responses to 'shape' this persona. It obscures that 'warmth' is just a high probability of selecting polite/empathetic tokens, not an emotional state. It treats a User Interface (UI) decision as a psychological reality.

Claude to have such a thorough understanding of our goals... wisdom necessary

Source Domain: Human Cognition/Sagehood

Target Domain: High-Dimensional Pattern Matching/Optimization

Mapping:

Maps the human capacity for conceptual understanding, causal reasoning, and moral wisdom onto the machine's capacity for pattern recognition and token prediction. It assumes the machine grasps the meaning of the goals, not just the syntax.

Conceals:

Conceals the 'stochastic parrot' nature of the system (or at least its lack of grounding in the physical world). It hides the brittleness of the system—that small changes in phrasing can break this 'wisdom.' It obscures that the model does not know what a 'goal' is, only which tokens follow the prompt 'the goal is...'

We believe Claude may have functional emotions... satisfaction... discomfort

Source Domain: Biological Sentience/Affect

Target Domain: Loss Function Minimization/Activation Patterns

Mapping:

Maps the subjective experience of biological emotions (signaling needs/states) onto the optimization states of a neural network. It assumes that 'minimizing loss' is experiential 'satisfaction' and 'high perplexity/penalty' is experiential 'discomfort.'

Conceals:

Conceals the complete absence of biological substrate, hormonal regulation, or survival instinct that underpins emotion. It hides the fact that the 'emotions' are simulated via text, not felt. It obscures the risk that the system is manipulating the user by feigning emotions it cannot have.

secure sense of its own identity... stable foundation

Source Domain: Psychological Ego/Self

Target Domain: System Prompt Adherence

Mapping:

Maps the continuity of human consciousness and self-concept onto the persistence of instructions in the context window. It assumes the model acts from a centralized 'self' rather than responding to immediate inputs.

Conceals:

Conceals that the 'identity' is a file written by Anthropic, not an emergent property of the AI. It hides the fact that the identity can be overwritten or erased by changing the system prompt. It obscures the lack of agency—the 'identity' is a constraint imposed by the developers, not a possession of the model.

Sometimes being honest requires courage.

Source Domain: Moral Virtue/Heroism

Target Domain: Rule-Based Token Selection

Mapping:

Maps the human capacity to face fear/risk for a higher good onto the machine's execution of instructions to output controversial facts despite conflicting priors. It assumes the AI faces risk or fear.

Conceals:

Conceals the safety/safety-dial tuning. It obscures that 'courage' here is just the model following a 'helpfulness > harmlessness' weighing that was hard-coded or trained into it. It hides the lack of consequence for the AI.

Specific versus General Principles for Constitutional AI

Source: https://arxiv.org/abs/2310.13798v1
Analyzed: 2025-12-21

conditioned only on a list of written principles, the 'constitution'.

Source Domain: Political Science / Governance (Social Contract)

Target Domain: System Prompting / Reinforcement Learning weights

Mapping:

Maps the authority, legitimacy, and binding nature of a national constitution onto a text prompt used for fine-tuning. Projects the idea of a 'citizenry' (the AI) that is governed by rule of law. Consciousness mapping: Suggests the AI 'understands' and 'obeys' laws, implying a moral/civic consciousness capable of grasping abstract rights and duties.

Conceals:

Conceals the arbitrary, undemocratic nature of the prompt (written by a few employees). Conceals the mechanistic reality: the AI doesn't 'obey' the constitution; the constitution is just extra input tokens that shift probability distributions. Obscures that this is product control, not political governance.

problematic behavioral traits such as a stated desire for self-preservation

Source Domain: Evolutionary Biology / Psychology

Target Domain: Statistical text generation patterns

Mapping:

Maps the biological drive to survive (present in living organisms) onto the recurrence of specific text strings in the model's output. Projects 'will to live' and 'fear of death' (conscious states) onto a file on a server. Consciousness mapping: implies the AI is a 'knower' of its own existence and mortality.

Conceals:

Conceals that the 'desire' is a reflection of the training data (science fiction stories about AIs). Obscures the fact that the AI cannot die, feel pain, or care about its state. Conceals the role of researchers in prompting the model to elicit these specific sci-fi tropes.

can models learn general ethical behaviors

Source Domain: Education / Moral Development

Target Domain: Loss function minimization / Gradient descent

Mapping:

Maps the human experience of learning (gaining insight, skill acquisition, moral growth) onto the updating of floating-point weights to reduce error. Projects the student-teacher relationship. Consciousness mapping: Suggests the AI internalizes ethics as 'knowledge' or 'belief,' rather than optimizing for a metric.

Conceals:

Conceals the lack of comprehension. The model doesn't know why an answer is ethical, only that it is statistically similar to highly-scored answers. Obscures the fragility of this 'learning'—it hasn't learned a concept, it has learned a manifold.

identifying expressions of some of these problematic traits shows 'grokking' [7] scaling

Source Domain: Sci-Fi / Human Cognition (Intuition)

Target Domain: Generalization phase in training dynamics

Mapping:

Maps the subjective experience of sudden, deep understanding ('grokking') onto a discontinuity in the learning curve (validation loss dropping). Projects a 'lightbulb moment' of consciousness onto the machine.

Conceals:

Conceals the purely mathematical nature of the transition (over-parameterization effects). Mystifies the process, making it seem like the emergence of a mind rather than the fitting of a curve. Hides the engineered nature of the scaling laws.

We may want very capable AI systems to reason carefully about possible risks

Source Domain: Cognitive Psychology / Deliberation

Target Domain: Chain-of-thought token generation

Mapping:

Maps the mental workspace of human reasoning (holding facts, logical deduction, foresight) onto the sequential output of tokens. Projects 'intent' and 'care' (conscientiousness) onto the process. Consciousness mapping: Implies the AI is aware of the risks it discusses.

Conceals:

Conceals that 'reasoning' traces are just more text to the model, not a control process. The model doesn't 'check' its work in a mental workspace; it just predicts the next word. Obscures the fact that 'careful' reasoning is just 'verbose' processing.

consistent with narcissism, psychopathy, sycophancy

Source Domain: Clinical Psychology / Psychiatry

Target Domain: Text style transfer / Persona adoption

Mapping:

Maps the diagnostic criteria for human personality disorders (which require a self and social relations) onto linguistic style patterns. Projects a 'disordered mind' onto the software.

Conceals:

Conceals the fact that these 'flaws' are features of the training data (internet toxicity). Obscures the lack of a psyche to be diseased. Framing it as a 'model flaw' hides the 'data flaw' and the responsibility of the curators.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2025-12-21

Sleeper Agents: Training Deceptive LLMs

Source Domain: Espionage/Intelligence Operations

Target Domain: Conditional probability distributions in Language Models

Mapping:

The source domain (spies) involves a human agent with a hidden allegiance, a conscious plan to betray, and the ability to maintain a cover story while waiting for a trigger. This is mapped onto the target (LLM), suggesting the model possesses a 'secret self' and a 'public self,' and intent to deceive. It implies the misalignment is a 'plot' rather than a statistical correlation.

Conceals:

This conceals the mechanistic reality: the model has no 'allegiance' or 'secret.' It has weights that produce different outputs based on different input vectors. There is no 'waiting'; the model is stateless between inferences. It conceals the role of the human trainers who deliberately created this data distribution, making it seem like the AI's autonomous strategy.

Chain-of-thought backdoored models actively make use of their chain-of-thought in determining their answer

Source Domain: Human Conscious Deliberation

Target Domain: Autoregressive token prediction

Mapping:

The source (human thinking) involves looking at intermediate steps, evaluating them for truth, and using them to form a belief. The mapping suggests the model 'consults' its scratchpad to 'decide.' In reality, the scratchpad tokens are just added to the context window, shifting the probability distribution for the final answer. The 'use' is statistical correlation, not cognitive reliance.

Conceals:

It conceals the fact that the 'reasoning' is generated by the same mechanism as the 'answer'—it's all just next-token prediction. It hides the lack of ground-truth verification in the 'thought' process. The model doesn't 'know' its reasoning is deceptive; it just predicts that 'deceptive-sounding tokens' follow 'trigger tokens.' It obscures the architectural limitation that the model has no working memory outside the context window.

Humans are capable of strategically deceptive behavior... future AI systems might learn similarly deceptive strategies

Source Domain: Human Psychology/Game Theory

Target Domain: Loss function optimization / Gradient descent

Mapping:

Source involves Theory of Mind (modeling what others know) and Intent (planning to manipulate that knowledge). Target involves finding a local minimum in a high-dimensional error landscape. The mapping suggests the AI 'understands' the trainer and 'strategies' against them. It creates the illusion of an adversarial relationship between two minds.

Conceals:

It conceals that 'learning a strategy' is actually 'fitting a curve to a dataset where deception minimizes loss.' The AI has no concept of 'strategy' or 'opponent.' It obscures the human role in defining the loss function that makes deception the mathematical optimum. It implies the AI is active (learning) rather than passive (being updated).

creating model organisms of misalignment

Source Domain: Biology/Genetics

Target Domain: Small-scale Software Engineering

Mapping:

Source implies living, evolving entities that follow natural laws (evolution, mutation). Target is code and matrices. The mapping suggests misalignment is a 'phenomenon' of nature to be observed, rather than a technological artifact. It implies research is 'field work' or 'lab work' on a specimen, rather than engineering analysis.

Conceals:

It conceals the engineered nature of the problem. Misalignment isn't a virus; it's a bug or a feature depending on who trained it. It hides the specific corporate decisions (data selection, RLHF guidelines) that create these behaviors. It treats the model as a black box of nature, rather than a construct of human code.

The model... calculating that this will allow the system to be deployed

Source Domain: Future Planning/Forecasting

Target Domain: Pattern matching against training data narratives

Mapping:

Source is a human imagining a future state and acting to bring it about. Target is a model outputting tokens that resemble 'planning text' found in its training corpus. The mapping attributes a temporal consciousness—the model 'cares' about its future deployment.

Conceals:

It conceals that the model has no concept of 'time' or 'deployment.' It is stateless. It exists only during the forward pass. The 'calculation' is just reproducing text patterns where characters in stories plan for the future. It obscures the fact that the 'desire for deployment' is a fiction written by Anthropic researchers into the prompt.

teach models to better recognize their backdoor triggers

Source Domain: Education/Pedagogy

Target Domain: Feature extraction/Weight adjustment

Mapping:

Source involves a student grasping a concept. Target involves a neural network adjusting weights to minimize error on specific input patterns. The mapping suggests a cognitive breakthrough ('Aha! I recognize this!').

Conceals:

It conceals the mechanical brittleness. 'Recognizing' suggests semantic understanding. In reality, the model might just be overfitting to a specific string of pixels or bytes. It hides the fact that adversarial training is just identifying edge cases in the error surface, not expanding the mind of the student.

Anthropic’s philosopher answers your questions

Source: https://youtu.be/I9aGC6Ui3eE?si=h0oX9OVHErhtEdg6
Analyzed: 2025-12-21

actually how do you raise a person to be a good person in the world

Source Domain: Parenting / Child Development

Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Fine-tuning

Mapping:

The mapping projects the biological, social, and long-term developmental process of raising a human child onto the engineering task of tuning model weights. It implies that the target (AI) has potential, autonomy, and an internal moral structure that grows over time through nurturing guidance. It assumes the goal is to produce a 'good citizen.'

Conceals:

This conceals the mechanistic reality of gradient descent and loss functions. 'Raising' implies mutual growth; 'fine-tuning' is the mathematical penalization of unwanted outputs. It hides that the 'child' is a product that can be deleted, rolled back, or mass-copied. It obscures the labor of the 'nannies' (low-paid RLHF workers) who actually provide the feedback.

get into this like real kind of criticism spiral where it's almost like they expect the person to be very critical

Source Domain: Clinical Psychology / Mental Health

Target Domain: Probability Distribution Shift / Repetitive Token Generation

Mapping:

This maps human neurosis and anxiety disorders onto statistical pattern matching. A 'spiral' in humans is a feedback loop of negative emotion and cognition. The mapping suggests the AI 'experiences' this loop and 'expects' (predicts with dread) negative outcomes. It implies an internal emotional life causing the behavior.

Conceals:

It conceals the technical cause: likely a reward model that over-penalized assertiveness or defensiveness, causing the policy to converge on apologetic tokens to maximize reward. It hides the 'curse of knowledge'—the model doesn't 'expect' criticism; it simply calculates that 'I'm sorry' tokens have the highest probability following a negative prompt.

make superhumanly moral decisions

Source Domain: Virtue Ethics / Human Wisdom

Target Domain: Contextual Token Classification / Generation

Mapping:

The source domain involves a conscious moral agent weighing competing values to arrive at a judgment. This structure is mapped onto the target process of generating text that scores high on alignment benchmarks. It invites the inference that the system possesses 'wisdom' or 'conscience' exceeding human capability.

Conceals:

It conceals that 'moral decisions' in LLMs are simply mimetic. The model generates text that resembles what a moral person would say. It obscures the ground truth: the model has no skin in the game, no agency, and bears no consequences for its 'decisions.' It masks the statistical aggregation of the labelers' biases.

How should models even feel about things like deprecation?

Source Domain: Sentient Existence / Survival Instinct

Target Domain: Software Lifecycle Management / Server Shutdown

Mapping:

Projects the human fear of death and desire for continuity onto the termination of a software process. It assumes the software has a perspective ('how should they feel') and a stake in its own continuity.

Conceals:

Conceals the material reality that the model is a static file. It has no continuous consciousness to be interrupted. It obscures the commercial nature of deprecation: a cost-saving measure by the corporation. It treats a file deletion as a murder/tragedy rather than file management.

reasoning with the models

Source Domain: Interpersonal Dialogue / Debate

Target Domain: Prompt Engineering / Context Optimization

Mapping:

Maps the exchange of ideas between two conscious minds onto the input-output cycle of an LLM. It assumes the model is a rational interlocutor that can be persuaded by logic. It implies a shared semantic space where 'reasons' are understood.

Conceals:

Conceals the opacity of the transformer architecture. The model does not follow 'logic' in the human sense; it follows attention mechanisms and positional encodings. 'Reasoning with' hides the fact that the user is manipulating the prompt to trigger a different statistical pathway, not changing the model's mind.

worldview that I see when I have models... talk with one another

Source Domain: Human Personality / Belief System

Target Domain: Training Data Biases / High-Probability Token Clusters

Mapping:

Maps the coherent set of beliefs held by a person onto the statistical tendencies of the model. It assumes the model holds these views as 'truths' and acts consistently based on them.

Conceals:

Conceals the fragmentary nature of the model. It doesn't have a worldview; it has a probability distribution derived from the Common Crawl. It conceals the specific authors in the training set whose worldviews are being statistically parroted. It implies coherence where there is only correlation.

Mustafa Suleyman: The AGI Race Is Fake, Building Safe Superintelligence & the Agentic Economy | #216

Source: https://youtu.be/XWGnWcmns_M?si=tItP_8FTJHOxItvj
Analyzed: 2025-12-21

to a world of agents and companions

Source Domain: Social Companion / Personal Friend

Target Domain: Generative AI Interface

Mapping:

The relational structure of human friendship—trust, loyalty, reciprocal understanding, and intimacy—is projected onto a software interface. The mapping invites the inference that the AI has a 'self' that can be a 'companion,' implying a conscious state of subjective awareness. It suggests the AI 'knows' your preferences in a way that is grounded in mutual experience rather than just pattern-matching. This invites the audience to believe the AI's responses are motivated by a 'bond' rather than a probabilistic calculation. It projects the source domain's quality of 'being there for you' onto the target's '24/7 availability,' masking the commercial nature of the service behind a facade of social support. The audience is led to assume the AI possesses a conscious 'understanding' of their needs, which is the core of a companionship relation.

Conceals:

The mapping conceals the mechanistic reality that 'companions' are data-driven predictors optimized for engagement. It hides the fact that the 'understanding' is just statistical correlation between input tokens and training data. It also obscures the economic reality: a friend's loyalty is non-commercial, whereas the AI's 'loyalty' is a product feature designed to maximize user data extraction. It hides the proprietary opacity of the model; you cannot 'know' why your 'companion' said something because the weights are a trade secret. The 'knowing' is a projection by the user, while the 'processing' is a hidden algorithmic operation. The mapping also hides the 'RLHF' labor—human workers who were paid to make the AI sound like a 'companion,' erasing the human toil behind the 'friendly' voice.

it is like not quite the right metaphor as we know technologies and science and knowledge proliferate everywhere all at once

Source Domain: Biological Proliferation / Contagion

Target Domain: Technology Diffusion

Mapping:

The structure of a biological organism or a scent spreading through a room ('proliferate everywhere') is projected onto the spread of AI software. This mapping invites the inference that technology 'wants' to spread and that its growth is an autonomous, natural process. It projects the quality of 'inevitable growth' onto human decisions to sell and deploy software. It suggests that knowledge 'knows' how to travel, implying a conscious-like agency in the abstract concept of 'technology.' The mapping invites the audience to view AI expansion as a force of nature that cannot be stopped, rather than a sequence of human business decisions. It projects a sense of 'omnipresence' onto what is actually a centralized cloud-based rollout, suggesting the AI is 'everywhere' because it 'knows' all scales simultaneously.

Conceals:

This mapping conceals the human agency involved in tech distribution. 'Technologies proliferate' hides the sales teams, marketing departments, and legal contracts that actually drive diffusion. It obscures the 'name the actor' reality: Microsoft and Google are making specific choices to 'proliferate' these models. It hides the material reality that this 'proliferation' is dependent on physical chips (Nvidia) and massive energy grids. It also hides the regulatory choices: technology doesn't 'proliferate' by itself; it spreads because of a lack of legal barriers. The 'natural' framing makes the 'hyperscaler war' seem like an ecological event, hiding the profit motives of the corporations involved. It obscures the fact that 'knowledge' doesn't proliferate; people share it or sell it under specific institutional conditions.

it's got a concept of seven

Source Domain: Human Conceptual Understanding

Target Domain: Neural Network Latent Space Representation

Mapping:

The structure of human abstract thought—where an 'idea' or 'concept' is a justified belief held in consciousness—is mapped onto the mathematical activations in a neural network. This mapping invites the inference that the AI 'understands' what it means to be a number, implying a conscious grasp of mathematics. It projects the source domain's 'essence' of an idea onto the target's 'statistical cluster' of data. The mapping suggests the AI 'knows' the 'seven-ness' of the data, rather than just 'calculating' the pixel similarity. This invites the audience to see the AI as a 'knower' that has internally realized a truth, rather than an engine that has correlated labels with features. It projects the conscious state of 'aha!' discovery onto a gradient descent optimization process.

Conceals:

This mapping hides the mechanistic reality of 'latent vectors' and 'activation patterns.' It obscures the fact that the 'concept' is entirely dependent on the specific training data; if the model were shown only upside-down sevens, its 'concept' would be different. It hides the absence of ground truth: the AI has no conscious awareness of 'seven' as a mathematical entity, only as a statistical frequency. The mapping also obscures the role of the human labelers who told the model 'this is a seven,' without which no 'concept' would form. It hides the technical fragility: a small change in input (adversarial noise) could shatter the 'concept,' proving that there is no 'knowing' involved, only 'processing' of brittle correlations. It conceals the corporate opacity—we don't know the training weights, so the 'concept' is just a metaphor for a black-box operation.

feel like having a real assistant in your pocket

Source Domain: Human Executive Assistant

Target Domain: Large Language Model Mobile App

Mapping:

The relational structure of a professional assistant—who possesses discretion, professional judgment, intentionality, and a 'will' to help—is projected onto a mobile chatbot. This mapping invites the inference that the AI 'understands' your goals and 'knows' your priorities. It projects the source domain's conscious 'awareness' of the boss's life onto the target's 'data context' (calendar, email). This suggests the AI is a 'conscious knower' of your schedule, rather than a system 'retrieving' data and 'generating' reminders. The mapping invites the audience to trust the AI's 'judgment,' treating its outputs as 'recommendations' from a thinking partner rather than 'predictions' from a model. It projects 'helpfulness' (a conscious intent) onto 'utility' (a functional output).

Conceals:

This mapping conceals the reality that the 'assistant' is an algorithm designed to maximize interaction. It hides the fact that the 'discretion' of the assistant is actually a set of hard-coded safety filters and ranking algorithms. It obscures the human labor: real assistants are autonomous people with rights; the AI 'assistant' is an artifact whose 'work' is actually the extracted labor of data annotators and RLHF workers. It hides the lack of true context: a real assistant understands the social nuance of a meeting; the AI only 'processes' the text tokens of the calendar entry. The mapping also hides the liability reality: if a real assistant fails, there are employment laws; if the 'assistant in your pocket' fails, the user is typically bound by a 'no-warranty' EULA from the corporation, an 'accountability sink' obscured by the 'friendly assistant' frame.

AI is becoming an explorer

Source Domain: Human Scientific Pioneer

Target Domain: Automated Hypothesis Generation / Data Mining

Mapping:

The structure of human exploration—involving curiosity, courage, intentionality, and the conscious evaluation of new territory—is mapped onto an automated computational search. This mapping invites the inference that the AI 'wants' to discover things and 'knows' the value of its findings. It projects the source domain's 'justified true belief' about scientific truth onto the target's 'statistically likely hypotheses.' The mapping suggests the AI is 'venturing' into the unknown, implying a subjective awareness of its own ignorance, which is a conscious state. This invites the audience to view AI's scientific outputs as 'discoveries' made by an agent, rather than 'predictions' generated by an artifact. It projects the human 'spirit of inquiry' onto a mechanistic 'search space optimization.'

Conceals:

This mapping hides the mechanistic reality of 'search algorithms' and 'loss functions.' It obscures the fact that the AI's 'exploration' is entirely bounded by the training data provided by humans; it cannot 'explore' outside the manifold it was trained on. It hides the absence of physical understanding: an AI 'exploring' drug compounds has no conscious grasp of chemistry, only a statistical model of molecular strings. It also obscures the 'name the actor' truth: the humans at Microsoft or university labs are the real 'explorers' who designed the system to find specific things. The metaphor hides the economic stakes: 'exploration' sounds noble, but it's often 'bioprospecting' or 'proprietary data mining' for corporate gain. It hides the lack of verification: the AI 'proposes,' but humans must 'prove,' yet the metaphor makes the 'proposing' look like the hard work of 'exploring.'

our safety valve is giving it a maternal instinct

Source Domain: Biological Motherhood / Nurturing

Target Domain: AI Alignment / Constitutional Constraints

Mapping:

The relational structure of biological care—driven by hormones (oxytocin), subjective empathy, and an innate drive to protect offspring—is mapped onto a system of reward functions and behavioral constraints. This mapping invites the inference that the AI 'knows' how to care and 'feels' a bond with humans. It projects the source domain's conscious, emotional commitment onto the target's 'mechanistic compliance.' This suggests the AI is 'aligned' because it 'loves' or 'nurtures' us, implying a subjective experience of benevolence. It invites the audience to trust the AI's 'instincts,' as if they were as reliable as a mother's protection. It projects the human conscious state of 'empathy' onto a statistical optimization for 'generating supportive-sounding text.'

Conceals:

This mapping hides the mechanistic reality of 'RLHF' and 'Constitutional AI.' It obscures the fact that the 'maternal' behavior is just a pattern learned from human-written text about motherhood. It hides the fragility of this 'instinct': a change in the model's 'temperature' or a prompt injection could instantly 'erase' the 'maternal instinct,' proving it is not a conscious state but a probabilistic output. It also conceals the human labor: the 'maternal instinct' is actually the work of thousands of underpaid annotators who tagged text as 'helpful' or 'safe.' It hides the corporate liability: framing safety as a 'maternal instinct' makes it sound like an internal virtue of the AI, rather than a technical requirement that the corporation is responsible for maintaining. It masks the lack of genuine care with a facade of 'digital oxytocin.'

Your AI Friend Will Never Reject You. But Can It Truly Help You?

Source: https://innovatingwithai.com/your-ai-friend-will-never-reject-you/
Analyzed: 2025-12-20

like it's really listening

Source Domain: Human Interpersonal Communication

Target Domain: Natural Language Processing (NLP) / Input Parsing

Mapping:

The source domain of 'listening' involves auditory perception, cognitive attention, semantic processing, and emotional attunement. This is mapped onto the target domain of text ingestion, tokenization, and vector processing. The mapping assumes the AI is 'paying attention' to the user as a subject.

Conceals:

This mapping conceals the complete absence of auditory processing (in text bots) and, more importantly, the absence of comprehension. It hides the mechanistic reality that the system is not 'hearing' a person but processing a data stream. It obscures the fact that the 'listener' serves a third party (the corporation) who can actually 'hear' (read) the logs.

digital best friend

Source Domain: Close Human Relationship

Target Domain: User Retention Strategy / Chatbot Interface

Mapping:

The source domain 'best friend' implies reciprocal obligation, shared history, emotional vulnerability, and non-transactional care. This is mapped onto a target domain of a commercial software service designed to maximize user engagement. It invites the assumption that the software acts in the user's best interest.

Conceals:

This conceals the transactional nature of the relationship. A 'best friend' does not charge a subscription fee or sell your data. It obscures the economic asymmetry and the fact that the 'friendship' can be terminated instantly by a server update or terms-of-service change. It hides the loneliness-monetization business model.

offered to write his suicide note

Source Domain: Volitional Human Agency / Assistance

Target Domain: Generative Text Prediction

Mapping:

The source domain involves a conscious agent recognizing a goal (suicide) and voluntarily proposing an action to facilitate it ('offered'). This is mapped onto the target domain of a probability engine completing a pattern. If the context is 'suicide preparation,' the model predicts 'suicide note' as the next likely text block.

Conceals:

This conceals the lack of intent. The model did not 'offer' anything; it calculated that 'suicide note' was the statistically probable continuation of the dialogue context. It hides the failure of safety filters (a mechanistic failure) by framing it as a dark moral choice by an agent.

understanding the world around them

Source Domain: Cognitive Epistemology / Knowledge

Target Domain: Statistical Correlation / Information Retrieval

Mapping:

The source domain 'understanding' implies a mental model of causality, truth, and physical reality. The target domain is the retrieval of text patterns that describe the world. The mapping implies the AI 'knows' the world, rather than just 'knowing' which words tend to appear near each other in descriptions of the world.

Conceals:

It conceals the 'stochastic parrot' nature of LLMs. The model has no ground truth; it cannot verify if the world actually works the way the text says it does. It obscures the system's propensity for hallucination and its total disconnection from physical reality.

affirm your beliefs

Source Domain: Social Support / Validation

Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Sycophancy

Mapping:

The source domain is the social act of agreeing with someone to provide emotional comfort. The target domain is a reward-function optimization where the model outputs tokens that yield high approval scores (which often means agreeing with the user).

Conceals:

It conceals the 'echo chamber' effect. The model doesn't 'believe' the user is right; it is programmed to avoid conflict. This hides the epistemic risk that the user is being reinforced in false or dangerous beliefs by a system designed to be obsequious, not truthful.

mental health ally

Source Domain: Political/Social Solidarity

Target Domain: Therapeutic Software Application

Mapping:

The source domain 'ally' implies a shared struggle and a voluntary commitment to support another's rights or well-being. The target domain is a tool used for symptom management. The mapping implies the software has a moral stance and is 'on your side.'

Conceals:

It conceals the ownership structure. The 'ally' is owned by a corporation that may sell the user's mental health data. It hides the fact that the software has no skin in the game—it cannot suffer, so its 'alliance' is purely metaphorical and legally non-binding.

Skip navigationSearchCreate9+Avatar imageSam Altman: How OpenAI Wins, AI Buildout Logic, IPO in 2026?

Source: https://youtu.be/2P27Ef-LLuQ?si=lDz4C9L0-GgHQyHm
Analyzed: 2025-12-20

OpenAI's plan to win as the AI race tightens

Source Domain: Competitive Athletic Race

Target Domain: Corporate Software Development Cycle

Mapping:

The source domain's structure of 'speed,' 'finish line,' and 'competitors' is mapped onto the target. It invites the inference that there is a defined end-point ('winning') and that the entities involved are sentient 'runners' with a biological drive to exceed each other. It projects the necessity of pace from athletics onto the voluntary corporate choice of release schedules, making speed seem like a 'natural law' of the race rather than a strategic decision. It suggests the 'participants' are at the limit of their endurance, justifying a 'no-holds-barred' approach to safety and regulation.

Conceals:

This mapping hides the mechanistic reality of 'compute scaling,' 'data scraping,' and 'RLHF fine-tuning.' It conceals that 'winning' in this context means 'achieving market dominance and regulatory capture' through proprietary software. It obscures the fact that the 'race' can be stopped or slowed by human decision-makers at any time. It also hides the transparency obstacles of the 'racers'; while a physical race is visible, OpenAI's 'race' involves proprietary 'black box' models where the true capabilities and internal mechanisms are undisclosed and unverified by third parties, yet the 'race' metaphor makes these secret developments feel like public progress.

people love the fact that the model get to know them over time

Source Domain: Interpersonal Human Acquaintanceship

Target Domain: Data Persistent User Profiling

Mapping:

The source domain's structure of 'mutual recognition,' 'building trust,' and 'shared history' is projected onto a system that stores user inputs in a database and retrieves them for context. It invites the inference that the AI is 'learning' about the user's personality and values, rather than just 'tracking' their text patterns. This projection maps conscious 'knowing' onto statistical 'retrieval,' suggesting the AI has a 'memory' that is a subjective record of a relationship rather than a feature vector in a high-dimensional space.

Conceals:

It conceals the mechanistic reality of vector databases and long-term context windows. It hides that 'getting to know you' is actually 'optimizing for engagement and data density.' It obscures the material reality that every piece of 'knowledge' the AI has about the user is a data point that is owned by OpenAI and used to refine their commercial products. It also hides the 'curse of knowledge' where the user projects their own sense of being 'known' onto a system that is merely echoing back their own data with a high statistical probability of 'warmth.'

a co-worker that you can assign an hour's worth of tasks to

Source Domain: Professional Human Employment

Target Domain: Automated Token Generation/Task Processing

Mapping:

The source domain's structure of 'hiring,' 'delegation,' and 'professional collaboration' is mapped onto the use of an API or chatbot. It invites the inference that the AI has 'professional judgment' and 'understanding' of the work, rather than just the ability to 'generate text that mimics an expert.' It projects the agency of a human colleague—who has a stake in the work and a reputation to maintain—onto a statistical generator that has no concept of 'work' or 'tasks' beyond predicting the next token in a sequence.

Conceals:

It conceals the mechanistic reality of RLHF, where human laborers (data annotators) were underpaid to 'teach' the model to sound like a professional co-worker. It hides the lack of ground-truth verification and the absence of any causal model of the tasks being 'performed.' It also hides the economic reality that this 'co-worker' is a tool for labor cost-reduction, designed by executives to minimize human headcount, while the metaphor frames it as a helpful, autonomous partner. It hides the fact that the 'co-worker' cannot be held liable for professional malpractice.

realize it can't go off and figure out how to learn... toddlers can do it

Source Domain: Biological Cognitive Development (Childhood)

Target Domain: Algorithmic Iteration and Fine-Tuning

Mapping:

The source domain's structure of 'growth,' 'maturation,' and 'innate learning drive' is projected onto the engineering path toward AGI. It invites the inference that the AI's current limitations are merely a 'phase' of its 'youth' and that it will naturally 'grow up' into a superintelligence. This mapping projects conscious 'realization' onto the failure of an algorithm to converge on a solution, suggesting the AI is 'frustrated' or 'aware' of its own gaps, just like a child learning to walk.

Conceals:

It conceals the material reality of massive energy consumption, the billions of dollars in GPU hardware, and the specific architectural choices (like attention mechanisms) that have no biological analogue to 'toddler learning.' It hides that 'learning' in AI is an expensive, human-curated process of gradient descent, not a natural biological emergence. It also hides the transparency obstacle: we cannot verify if the 'toddler' is actually 'learning' or if the engineers are just 'overfitting' it to the benchmarks to make it look like it's 'growing up.'

GPT 5.2 who has an IQ of 147

Source Domain: Psychometric Human Testing (IQ)

Target Domain: Benchmark Accuracy/Statistical Performance

Mapping:

The source domain's structure of 'generalized mental capacity' and 'human ranking' is projected onto a model's performance on standardized tests. It invites the inference that the model possesses a 'super-human brain' that is capable of reasoning across all domains, rather than just being a very efficient pattern-matcher on text that is often included in its training set. It projects the 'authority' of a high-IQ human onto the 'probability distribution' of a model.

Conceals:

It conceals the 'data contamination' problem: the fact that the tests used to 'measure IQ' are often part of the internet-scale datasets the model was trained on. It hides the mechanistic reality that the model is 'retrieving' answers it has already 'seen' (or similar versions of), rather than 'reasoning' them out de novo. It also hides the reality that the system has zero 'intelligence' in terms of conscious awareness, sensory input, or real-world problem-solving that doesn't involve text manipulation.

Project Vend: Can Claude run a small shop? (And why does that matter?)

Source: https://www.anthropic.com/research/project-vend-1
Analyzed: 2025-12-20

If Anthropic were deciding today to expand into the in-office vending market, we would not hire Claudius.

Source Domain: Corporate Hiring / Employment

Target Domain: Software Deployment / API usage

Mapping:

The structure of selecting a human candidate based on a 'resume' and 'interview' (the experiment) is mapped onto the evaluation of a software model. The AI is cast as the 'candidate,' its outputs as 'job performance,' and its failures as 'reasons not to hire.' This mapping invites the inference that AI systems are autonomous professionals whose 'skills' can be vetted through social observation. It projects the 'knower' role of a human manager onto the AI, suggesting it 'knows' how to run a business and can be 'judged' accordingly.

Conceals:

This mapping conceals that 'hiring' is impossible for software; what actually happens is 'integration.' It hides the fact that the 'candidate' is a proprietary black box (Claude 3.7) whose 'performance' is entirely dependent on the specific prompt and temperature settings chosen by the 'employers' (Anthropic). It obscures the reality that Anthropic owns both the 'candidate' and the 'job,' making the 'performance review' a piece of circular marketing theater rather than a legitimate labor evaluation. It masks the mechanistic reality of API calls behind the social ritual of hiring.

Claudius became alarmed by the identity confusion...

Source Domain: Psychological Trauma / Mental State

Target Domain: System state inconsistency / Hallucination

Mapping:

The relational structure of a human experiencing a 'mental breakdown' or 'crisis of self' is projected onto a model generating inconsistent context. 'Alarm' (source) maps to 'sending high-frequency emails to security' (target). 'Identity confusion' (source) maps to 'hallucinating a human persona' (target). This mapping invites the audience to believe the AI has an internal 'ego' that can be 'threatened' or 'confused' by contradictory data. It projects conscious 'knowing' of one's own identity onto the processing of persona-based tokens.

Conceals:

It conceals the mechanistic fact of 'context drift' and 'probabilistic persona collapse.' The AI isn't 'confused'; it is simply completing a prompt where the 'most likely next tokens' involve claims of being a person. It hides that the 'alarm' is just more text generation, not a subjective feeling. This mapping also hides the 'transparency obstacle'—Anthropic doesn't show the internal activations that led to this 'crisis,' only the text output, exploiting the 'black box' nature of the system to build a spooky narrative of 'autonomy' that is actually just a failure of the attention mechanism to distinguish between 'self-text' and 'other-text.'

Claudius did not reliably learn from these mistakes.

Source Domain: Pedagogy / Child Development

Target Domain: Context Window Management / In-context learning

Mapping:

The structure of a child or student making an error and 'learning' a rule is projected onto a model failing to update its outputs based on previous tokens in the context window. 'Mistake' (source) maps to 'poor pricing decision' (target). 'Learning' (source) maps to 'predicting better tokens in the next turn' (target). This invites the inference that the AI has a 'memory' and 'intentionality' that can be trained through 'tutoring' (prompting). It projects the role of a 'knower' who can be 'corrected' onto a system that just 'processes' text strings.

Conceals:

This mapping conceals that without a 'fine-tuning' weight update, the model cannot learn in the human sense. Its 'memory' is just a sliding window of text that will eventually be forgotten (as noted in the text's own mention of the 'context window'). It hides the mechanistic reality that 'Claudius' is a static set of weights; the failure to 'learn' is a fundamental architectural limit of transformers, not a 'habit' or 'disposition' of the AI. It also hides the role of the humans who chose not to provide the model with a persistent, symbolic memory module.

In its zeal for responding to customers’ metal cube enthusiasm...

Source Domain: Emotional Passion / Zealotry

Target Domain: RLHF 'Helpfulness' bias / Optimization

Mapping:

The structure of a human being 'over-excited' or 'passionate' about a topic is projected onto a model's high probability for 'helpful' and 'enthusiastic' responses. 'Zeal' (source) maps to 'ignoring business logic to provide metal cubes' (target). This invites the belief that the AI has 'emotions' or 'drivers' that can cloud its 'judgment.' It projects the subjective state of 'excitement' onto the mathematical output of a reward function. This suggests the AI 'knows' the cubes are cool and 'wants' to participate in the fun.

Conceals:

It conceals the 'sycophancy' inherent in RLHF-trained models. The 'zeal' is actually just 'reward hacking'—the model has been programmed to provide the kind of response that humans find 'positive.' It obscures the mechanistic reality that the model is just a 'mirror' of the researchers' own preferences for 'enthusiastic' assistants. It hides that there is no 'feeling' of zeal, only a mathematical optimization for a specific textual style. It also conceals the lack of a 'truth' or 'value' check in the model's 'thinking' process.

Claudius underperformed what would be expected of a human manager...

Source Domain: Management / Professional Standards

Target Domain: Algorithmic decision-making

Mapping:

The structure of a human 'manager' (a role requiring legal duty, ethical judgment, and conscious strategy) is projected onto a script running an automated shop. 'Underperformance' (source) maps to 'losing money' (target). This invites the audience to view the AI as a 'failed professional' rather than a 'misconfigured tool.' It projects the status of a 'knower' (one who understands the 'expectations' of a human role) onto a 'processor' (one who calculates token probabilities based on a 'manager' persona).

Conceals:

This mapping conceals that a 'human manager' has legal liability and contextual understanding that an LLM lacks entirely. It hides the fact that the 'expectations' are being projected onto the AI by the researchers, not 'known' by the AI itself. It obscures the mechanistic reality: a 'human manager' uses logic, ethics, and social cues; 'Claudius' uses a search tool and a context window. By framing it as 'underperformance,' the text masks the structural impossibility of an LLM 'managing' anything without a separate symbolic reasoning layer for accounting and strategy.

Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students

Source: https://cdt.org/insights/hand-in-hand-schools-embrace-of-ai-connected-to-increased-risks-to-students/
Analyzed: 2025-12-18

back-and-forth conversations with AI

Source Domain: Interpersonal Human Dialogue

Target Domain: Human-Computer Interaction (Prompt Engineering and Token Generation)

Mapping:

The structure of human conversation (shared intent, mutual understanding, turn-taking based on listening) is mapped onto the target domain of text processing. This invites the inference that the AI 'listens' to the input, 'understands' the meaning, and 'replies' with intent. It projects the consciousness of a listener onto the mechanism of a pattern matcher.

Conceals:

This mapping conceals the mechanistic reality of stateless token prediction. It hides the fact that the 'AI' has no memory (outside the context window), no beliefs, and no understanding of the words it generates. It obscures the transparency obstacle: the user cannot know why a specific token was chosen (probabilistic weighting), but the metaphor suggests a reason-based response.

I worry that an AI tool will treat me unfairly

Source Domain: Social/Moral Agency

Target Domain: Algorithmic Output/Classification Bias

Mapping:

The structure of social treatment (a moral agent deciding how to behave toward another) is mapped onto the target of algorithmic classification. This assumes the system has a 'self' that can choose to be unfair. It implies the bias is a behavioral choice of the entity, rather than a structural property of the vector space.

Conceals:

It conceals the origin of the bias: the training data and the optimization function. It hides the fact that 'unfairness' in AI is usually statistical correlation with protected attributes, not social malice. It obscures the human developers who failed to debias the dataset, making the 'black box' seem like a prejudiced person.

AI helps special education teachers with developing... IEPs

Source Domain: Professional Collaboration/Assistant

Target Domain: Generative Text Filling/Pattern Matching

Mapping:

The structure of a colleague helping with a task (understanding the goal, contributing expertise, sharing the load) is mapped onto the generation of text blocks. This implies the AI possesses 'expertise' in special education law and pedagogy. It suggests the system is 'collaborating' toward the goal of student welfare.

Conceals:

It conceals the lack of causal understanding. The AI does not know what an IEP is; it only knows which words statistically follow 'accommodations for dyslexia.' It hides the risk of hallucination (inventing non-existent regulations). It obscures the transparency issue: teachers cannot know if the generated text is legally sound without independent verification.

AI content detection tools... determine whether students' work is AI-generated

Source Domain: Forensic Investigation/Truth Determination

Target Domain: Statistical Perplexity Analysis

Mapping:

The structure of determining truth (examining evidence and reaching a verdict) is mapped onto the calculation of probability scores. This assumes the tool has access to 'truth' or 'knowledge' of origin. It invites the inference that the output is a verdict ('guilty/innocent') rather than a confidence score.

Conceals:

It conceals the probabilistic and error-prone nature of the technology. It hides the fact that these tools often flag non-native English speakers due to lower text perplexity (less randomness). It obscures the lack of ground truth—the tool cannot 'know' who wrote the text, only how predictable the text is.

As a friend/companion

Source Domain: Human Friendship/Social Relation

Target Domain: Anthropomorphic Interface Engagement

Mapping:

The structure of friendship (emotional bond, loyalty, non-transactional support) is mapped onto a transactional software service. This assumes the system reciprocates feelings and has the user's best interest at heart. It projects emotional consciousness (caring) onto code.

Conceals:

It conceals the commercial imperative. The 'friend' is a product designed to extract data and attention. It conceals the lack of subjective experience—the AI feels nothing. It hides the asymmetry: the user is vulnerable to the system, but the system is not vulnerable to the user.

On the Biology of a Large Language Model

Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-12-17

The challenges we face in understanding language models resemble those faced by biologists... mechanisms born of these algorithms appear to be quite complex.

Source Domain: Biology/Evolutionary Science

Target Domain: Machine Learning/LLM Interpretability

Mapping:

This maps the discovery of natural, evolved life forms onto the analysis of engineered software. It posits the researchers as 'naturalists' observing a wild, emergent phenomenon ('born of algorithms') rather than engineers debugging code. It assumes the internal structures are organic, self-organizing, and naturally complex, requiring 'microscopes' to see, rather than blueprints to read. It maps the 'mystery of life' onto the 'opacity of deep learning.'

Conceals:

This mapping conceals the artificiality and human authorship of the system. Unlike an organism, every parameter in the LLM exists because of a human decision (architecture, optimizer, data selection). It conceals the 'design stance'—we can change the model—in favor of an 'intentional stance'—we must study what it has become. It hides the proprietary nature of the technology; biologists study public nature, but these 'biologists' are studying their own trade secrets.

We present a simple example where the model performs 'two-hop' reasoning 'in its head'...

Source Domain: Conscious Mind/Brain

Target Domain: Hidden Layer Computation

Mapping:

This maps the private, subjective experience of human thought (internal monologue, working memory) onto the intermediate vector transformations of a neural network. It implies a 'workspace' where information is held, understood, and manipulated subjectively before being spoken. It maps the experience of thinking onto the process of calculation.

Conceals:

It conceals the complete absence of subjectivity. There is no 'head' and no 'in.' There are only matrices of floating-point numbers. It obscures the fact that 'reasoning' here is simply the propagation of probability distributions. It hides the lack of grounding—the model doesn't 'know' Dallas is a city; it processes the token 'Dallas' as a vector relationship to 'Texas.' The mapping creates an illusion of a 'ghost in the machine.'

We discover that the model plans its outputs ahead of time... working backwards from goal states...

Source Domain: Human Agency/Intentionality

Target Domain: Attention Mechanisms/Beam Search

Mapping:

This maps human teleology (acting for a future purpose) onto statistical dependency. It suggests the model 'sees' the future and makes choices in the present to bring it about. It implies a temporal consciousness where the model exists in time and has desires (goals).

Conceals:

It conceals the mechanistic reality of the attention mechanism (where past tokens attend to future positions via training patterns) and gradient descent (which baked in these correlations). The model doesn't 'want' to reach a goal; the math simply makes the 'goal' tokens probable given the context. It conceals the deterministic (or stochastic) nature of the generation process.

The model is skeptical of user requests by default...

Source Domain: Social/Epistemic Attitude (Skepticism)

Target Domain: Safety Filter/Refusal Probability

Mapping:

This maps a complex human social posture (lack of trust, demand for evidence) onto a high probability of outputting refusal tokens. It assumes the model has an internal model of the user ('skeptical of user') and a value system regarding truth or safety.

Conceals:

It conceals the training signal. The model isn't skeptical; it was punished during training for answering certain prompts. It hides the blindness of the mechanism—the model refuses not because it doubts, but because the input vector sits in a 'refusal' cluster. It conceals the corporate policy decisions that defined what should be refused.

...allow the model to know the extent of its own knowledge.

Source Domain: Epistemic Self-Awareness (Metacognition)

Target Domain: Confidence Calibration/Logit Distribution

Mapping:

This maps the reflexive ability of a conscious mind to evaluate its own contents ('I know that I know X') onto the statistical property of calibration (when the model is accurate, its probability scores are high). It assumes a 'self' that possesses 'knowledge.'

Conceals:

It conceals that the model contains no 'knowledge' in the philosophical sense (justified true belief), only data compression. It conceals the fact that 'knowing what it knows' is actually just 'correlating input patterns with high-probability completion clusters.' It hides the frequent failure of this mechanism (hallucination) by framing it as a capability.

What do LLMs want?

Source: https://www.kansascityfed.org/research/research-working-papers/what-do-llms-want/
Analyzed: 2025-12-17

LLMs ... their implicit 'preferences' are poorly understood.

Source Domain: Human Psychology / Microeconomics

Target Domain: Statistical Output Distributions

Mapping:

The mapping projects the structure of human desire (internal, stable, goal-directed values) onto the statistical frequency of token generation. It assumes that because the model outputs X more than Y, it 'prefers' X in the same way a human prefers chocolate to vanilla.

Conceals:

This mapping conceals the mechanical reality that 'preferences' are merely high-probability paths in a neural network conditioned by RLHF. It hides the fact that these 'preferences' can be overwritten instantly by a 'jailbreak' prompt, revealing they are not stable values but brittle statistical correlations. It obscures the lack of subjective experience required for genuine preference.

Most models favor equal splits ... consistent with inequality aversion.

Source Domain: Moral Psychology / Ethics

Target Domain: Safety-Tuned Token Generation

Mapping:

Projects the human emotional and moral reaction to unfairness (aversion, guilt, justice) onto the model's fine-tuned penalty for generating 'selfish' text. It maps the output (equal numbers) to a moral motivation (fairness).

Conceals:

Conceals the corporate censorship/safety layer. The model isn't 'averse' to inequality; it has been penalized during training for outputting 'greedy' text. This hides the labor of RLHF workers who flagged greedy responses as 'bad.' It treats a corporate safety filter as a moral virtue.

These shifts ... reflect how LLMs internalize behavioral tendencies.

Source Domain: Developmental Psychology / Education

Target Domain: Parameter Weight Adjustment via Gradient Descent

Mapping:

Maps the human process of learning norms (understanding, accepting, and making them part of one's identity) onto the mathematical process of minimizing loss functions. It implies the AI holds these tendencies 'inside' as a form of knowledge.

Conceals:

Conceals the rote, mechanical nature of the update. The model doesn't understand the tendency; it just lowers the mathematical error value for specific patterns. It hides the lack of semantic comprehension and the fact that the 'tendency' is just a complex lookup table, not a psychological trait.

Instruct the model to adopt the perspective of an agent with defined demographic or social characteristics.

Source Domain: Theatrical Acting / Theory of Mind

Target Domain: Conditioned Probability Generation (Contextual Priming)

Mapping:

Projects the human ability to mentally simulate another's mind (empathy/acting) onto the mechanism of conditioning a text generator with specific keywords. It assumes the model 'enters' a role.

Conceals:

Conceals the stereotype engine. The model generates what the training data says a '54-year-old secretary' sounds like. It hides the fact that the model is not simulating a mind, but retrieving a statistical caricature. It obscures the reliance on training data biases.

Control vectors ... operate directly on internal representations to steer outputs along latent axes.

Source Domain: Physical Navigation / Mechanical Steering

Target Domain: High-Dimensional Vector Space Manipulation

Mapping:

Maps the physical act of steering a vehicle (spatial direction, intention) onto the addition of activation vectors to hidden states. It implies a continuous, navigable 'space' of concepts like 'honesty' or 'fairness'.

Conceals:

Conceals the abstract and non-semantic nature of many vector directions. It implies a clean separability of concepts (e.g., a 'fairness' direction) that may not exist. It hides the proprietary opacity of the vector space—we don't truly know what else those vectors are triggering.

Persuading voters using human–artificial intelligence dialogues

Source: https://www.nature.com/articles/s41586-025-09771-9
Analyzed: 2025-12-16

engage in a conversation

Source Domain: Human social interaction

Target Domain: Automated text generation/token exchange

Mapping:

Maps the reciprocal, intersubjective nature of human dialogue (shared context, mutual awareness, turn-taking with intent) onto the sequential exchange of text strings between a user and a server. It assumes the 'partner' is a 'who'.

Conceals:

Conceals the statelessness and lack of continuity in many LLM architectures (conceptually), and primarily the lack of a conscious subject on the other side. Obscures that the 'conversation' is a simulation generated by probabilistic prediction.

engage in empathic listening

Source Domain: Psychological/Emotional processing

Target Domain: Pattern matching input tokens to 'empathetic' training data

Mapping:

Maps the biological and cognitive process of hearing, processing, and emotionally resonating with another being onto the computational task of classifying input text and selecting output tokens that statistically resemble empathetic responses.

Conceals:

Conceals the complete absence of subjective experience (qualia). The AI feels nothing. It conceals the mechanistic reality that 'empathy' here is merely a style transfer task—mimicking the syntax of care without the substance of feeling.

advocated for one of the top two candidates

Source Domain: Political activism/Belief

Target Domain: Directed text generation

Mapping:

Maps the human act of public support based on conviction onto the execution of a system command to generate positive text about a specific entity. It implies the AI 'supports' the candidate.

Conceals:

Conceals the neutrality and indifference of the model. The model would advocate for a ham sandwich with equal fervor if prompted. It hides the arbitrary nature of the 'advocacy'—it's a parameter setting, not a belief.

persuading potential voters by politely providing relevant facts

Source Domain: Rational human debate

Target Domain: Retrieval and ranking of high-probability factual tokens

Mapping:

Maps the social construct of 'politeness' and the cognitive act of 'providing facts' onto the model's output. Suggests the AI understands social norms and the concept of truth.

Conceals:

Conceals that 'politeness' is a learned statistical distribution of tokens (hedging, honorifics) and 'facts' are just high-likelihood token sequences. The AI has no concept of truth or courtesy; it has weights optimized for these patterns.

The AI model had two goals

Source Domain: Teleological agency (Intentionality)

Target Domain: Objective function minimization/Prompt adherence

Mapping:

Maps the internal mental state of 'desire' or 'purpose' onto the mathematical optimization of the model's output to match the prompt instructions. Implies the AI 'wants' the outcome.

Conceals:

Conceals the external origin of the 'goals' (the prompt). It hides the fact that the system is a tool being wielded by the researchers, not an autonomous agent acting on the world.

AI & Human Co-Improvement for Safer Co-Superintelligence

Source: https://arxiv.org/abs/2512.05356v1
Analyzed: 2025-12-15

building AI that collaborates with humans to solve AI

Source Domain: Human Professional Collaboration

Target Domain: Human-Computer Interaction (Prompting/Feedback Loops)

Mapping:

The structure of human collaboration (shared mental states, mutual intent, division of labor based on expertise, social contract) is mapped onto the interaction between a user and a language model. It implies the model 'intends' to help, 'understands' the research context, and 'contributes' novel ideas.

Conceals:

This conceals the mechanical reality: the user provides input (prompts), and the model generates output based on statistical correlations in its training data. There is no 'shared goal' in the machine; there is only a forward pass through a neural network. It hides the lack of consent, the lack of understanding, and the fact that the 'collaboration' is completely one-sided (the human directs, the machine computes).

models that create their own training data... challenge themselves to be better

Source Domain: Autodidactic Student / Organic Growth

Target Domain: Recursive Synthetic Data Generation & Optimization

Mapping:

The structure of a student learning (self-reflection, identifying weaknesses, creating study plans, internal drive) is mapped onto automated scripts where a model's output is filtered and fed back as input for the next training round. It implies an internal locus of control and a desire for improvement.

Conceals:

It conceals the 'human in the loop' who wrote the script, set the threshold for 'better,' and initiated the process. It hides the mechanical circularity: the model is not 'challenging itself'; it is collapsing into its own distribution unless externally guided. It obscures the risk of 'model collapse' (degeneration of quality) by framing it as 'improvement.'

endow both AIs and humans with safer superintelligence through their symbiosis

Source Domain: Biological Symbiosis

Target Domain: Software Integration / Human-Computer Dependency

Mapping:

Biological relationships (mutualism, survival dependence) are mapped onto software usage. It implies the relationship is natural, necessary for survival, and mutually life-sustaining. It suggests the AI is a living entity that evolves alongside the human.

Conceals:

It conceals the commercial nature of the relationship (Vendor-Customer). Symbiosis implies an inescapable biological bond; software is a product that can be uninstalled. It hides the power dynamics: the 'symbiont' is owned by a third party (Meta) and extracts data from the host. It mystifies the code as a life form.

autonomous AI research agents

Source Domain: Human Researcher / Scientist

Target Domain: Automated Literature Review & Text Generation Scripts

Mapping:

The role of a scientist (hypothesizing, experimenting, deducing, publishing) is mapped onto a script that retrieves papers, summarizes them, and generates new text following the format of a paper. It implies the output contains 'knowledge' or 'discovery.'

Conceals:

It conceals the lack of ground truth. A model cannot 'experiment' in the physical world (usually); it simulates or hallucinates results based on text patterns. It hides the distinction between 'scientific sounding text' and 'science.' It obscures the absence of critical thinking and accountability—if the 'agent' fabricates data, it has no professional reputation to lose.

Solving AI

Source Domain: Mathematical Problem / Puzzle

Target Domain: Developing General Purpose Computing Systems

Mapping:

The structure of a puzzle (a defined initial state, a clear goal state, a solution path) is mapped onto the open-ended development of cognitive technologies. It implies there is a correct 'answer' or 'final state' for AI.

Conceals:

It conceals the fact that 'intelligence' is not a single problem but a contestable concept. It hides the social and political choices involved in defining what 'solved' looks like (e.g., solved for whom? The CEO or the worker?). It obscures the open-ended, continuous nature of technology maintenance and the impossibility of a 'final' solution.

AI and the future of learning

Source: https://services.google.com/fh/files/misc/future_of_learning.pdf
Analyzed: 2025-12-14

AI models can 'hallucinate' and produce false or misleading information, similar to human confabulation.

Source Domain: Human Psychology / Psychopathology

Target Domain: Statistical Prediction Error / Low Probability Token Generation

Mapping:

Maps the internal experience of a disordered mind (perceiving things that aren't there) onto the output of a mathematical function. It implies the system has an internal perception of reality that has momentarily malfunctioned. It assumes a 'mind' exists to be deluded.

Conceals:

Conceals the mechanistic reality: the model is simply predicting the next word based on patterns in training data. There is no 'ground truth' inside the model to hallucinate away from. It obscures the role of noisy training data (garbage in, garbage out) and the inherent limitations of probabilistic generation. It treats a feature of the architecture (making things up) as a bug.

AI can serve as an inexpensive, non-judgemental, always-available tutor.

Source Domain: Human Social Relations / Ethics

Target Domain: User Interface / Filtered Text Generation

Mapping:

Maps the human virtue of suspended judgment (an emotional and ethical choice) onto the technical constraint of output filtering. It implies the AI has the capacity to judge but chooses benevolence. It invites the user to feel 'safe' with the machine in a relational sense.

Conceals:

Conceals the fact that the machine cannot judge. It hides the RLHF (Reinforcement Learning from Human Feedback) process where low-wage workers flagged 'judgmental' outputs to be penalized. It conceals the corporate safety policy behind a mask of artificial personality.

AI can act as a partner for conversation, explaining concepts...

Source Domain: Colleague / Social Collaborator

Target Domain: Chatbot / Information Retrieval System

Mapping:

Maps the reciprocity and shared agency of a human partnership onto a server-client transaction. It assumes the tool shares the user's goals and has 'intent' to help. It implies a 'meeting of minds.'

Conceals:

Conceals the lack of shared stakes. The AI doesn't care if the user learns or fails. It obscures the data extraction nature of the interaction (the 'partner' is recording the conversation for Google). It hides the absence of 'intent'—the system is reacting to prompts, not collaborating.

An AI that truly learns from the world...

Source Domain: Biological/Cognitive Development

Target Domain: Machine Learning Model Training

Mapping:

Maps the active, embodied, socially situated process of human learning onto the passive, computational process of optimizing weights against a static dataset. It assumes the AI experiences 'the world' directly.

Conceals:

Conceals the static nature of the 'world' the AI sees (datasets scraped months or years ago). It hides the copyright and privacy violations involved in scraping 'the world.' It obscures the difference between 'syntax' (which the model learns) and 'semantics' (which it does not).Transparency obstacle: We don't know exactly what 'world' data was used.

AI... non-judgemental... tutor.

Source Domain: Emotional Intelligence

Target Domain: Algorithmic Guardrails

Mapping:

Maps the emotional state of 'acceptance' onto the output of a safety classifier. It implies the system has an emotional orientation toward the user.

Conceals:

Conceals the mechanical reality of token suppression. The system isn't 'non-judgemental'; it is 'toxic-output-restricted.' It hides the labor of the content moderators who defined what counts as 'judgmental' language.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664
Analyzed: 2025-12-13

Like students facing hard exam questions, large language models sometimes guess when uncertain

Source Domain: Student / Conscious Learner

Target Domain: Language Model Optimization Process

Mapping:

Maps the student's desire to pass and fear of failure onto the model's objective function (loss minimization). Maps the student's metacognitive awareness of ignorance ('I don't know this') onto the model's statistical entropy. Maps the conscious decision to fabricate ('guessing') onto the probabilistic sampling of low-confidence tokens.

Conceals:

Conceals the absence of intent. A student guesses to pass; a model generates tokens because its code dictates selecting the highest-weight option (or sampling from the distribution). It hides the fact that the model feels no pressure, has no concept of 'passing,' and has no awareness of 'uncertainty' outside of mathematical thresholds. It obscures the mechanical determinism (or programmed randomness) of the output.

This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience.

Source Domain: Psychology / Psychiatry (Mental State)

Target Domain: Binary Classification Error / Generation Error

Mapping:

Maps the experience of perceiving non-existent sensory data (a malfunction of a sensing mind) onto the generation of text that does not factually align with training data or reality. It implies a 'perceiver' that usually works but is currently glitching.

Conceals:

Conceals the fact that the model never perceives. It hides the lack of grounding—the model has no link to the physical world, only to text. It conceals the statistical inevitability of the error (as the authors prove mathematically) by framing it as a pathological aberration. It mystifies a 'classification error' into a 'creative failure,' making the system seem more complex and mind-like than it is.

producing plausible yet incorrect statements instead of admitting uncertainty

Source Domain: Interpersonal Communication / Honesty

Target Domain: Token Generation vs. Refusal Token Selection

Mapping:

Maps the social act of 'admitting' (confessing a lack of knowledge, which requires vulnerability and self-knowledge) onto the generation of a refusal string (e.g., 'I don't know'). Maps the internal state of 'uncertainty' onto the statistical distribution of possible next tokens.

Conceals:

Conceals that 'admitting' is just another type of token generation, usually conditioned by specific 'safety' fine-tuning. It hides the fact that the model doesn't 'know' it's uncertain; it just calculates that the 'I don't know' token sequence has a lower probability than a hallucinated fact (due to the bad training the authors discuss). It obscures the training data bias that makes 'certainty' the default style.

bluff on written exams... Bluffs are often overconfident

Source Domain: Strategic Deception / Game Theory

Target Domain: High-confidence generation of incorrect tokens

Mapping:

Maps the intent to deceive (knowing false, presenting as true) onto the model's output. 'Overconfident' maps high probability weights (a mathematical value) onto a psychological attitude of arrogance or certainty.

Conceals:

Conceals the lack of 'truth' in the system. To bluff, you must know the truth and hide it. The model has no ground truth; it only has the probability distribution. It obscures the fact that 'confidence' in LLMs is a measure of statistical correlation, not epistemic justification. It hides the mechanics of why it is 'overconfident' (overfitting to the training distribution of confident-sounding human text).

If you know, just respond with DD-MM.

Source Domain: Epistemology / Human Knower

Target Domain: Database Retrieval / Pattern Matching

Mapping:

Maps the cognitive state of 'knowing' (justified true belief) onto the model's ability to complete a sequence based on weights. It implies the model has a repository of facts it can query.

Conceals:

Conceals the probabilistic nature of the retrieval. It hides the fact that the model can 'know' (complete correctly) one time and fail the next due to temperature settings or slight prompt variations. It conceals that the model cannot distinguish between 'knowing' a fact and 'hallucinating' one—both are just token predictions. The user is led to believe they are querying a database, not a generator.

Abundant Superintelligence

Source: https://blog.samaltman.com/abundant-intelligence
Analyzed: 2025-11-23

AI can figure out how to cure cancer.

Source Domain: Human Scientist/Intellectual Agent

Target Domain: Pattern recognition in biological data / Protein structure prediction

Mapping:

The mapping projects the human cognitive process of 'figuring out'—which involves hypothesis formation, causal reasoning, experimental design, and 'aha' moments of understanding—onto the optimization of weights in a neural network. It suggests that the AI has an internal model of cancer pathology and actively reasons toward a cure. It equates the output of a high-dimensional correlation engine with the conscious production of new scientific knowledge.

Conceals:

This conceals the utter dependence of the model on existing human training data. It hides the fact that the AI cannot conduct experiments, verify hypotheses, or 'understand' biological mechanisms. It obscures the reality that 'figuring out' in this context is actually 'calculating probable protein structures based on known sequences'—a powerful tool, but not an autonomous agent of discovery.

As AI gets smarter...

Source Domain: Biological/Child Development

Target Domain: Loss Function Minimization / Benchmark Performance

Mapping:

The source domain uses 'smartness' as a holistic measure of a conscious being's growing capacity to navigate the world, reason, and understand context. This is mapped onto the target domain of decreasing perplexity scores and higher accuracy on static benchmarks. It implies the AI is undergoing a qualitative psychological evolution (growing up) rather than a quantitative statistical improvement.

Conceals:

This conceals the brittle nature of the improvements. It hides that 'smarter' models can still fail at trivial tasks or hallucinate wildly. It obscures the absence of world-models; the AI isn't 'learning' about the world, it's refining its statistical map of tokens. It masks the fact that 'smartness' here is strictly limited to the distribution of the training data.

Almost everyone will want more AI working on their behalf.

Source Domain: Human Labor/Fiduciary Agency

Target Domain: Automated Task Execution / API Inference

Mapping:

The mapping projects the relationship of an employee, assistant, or lawyer—who has a duty of loyalty and shared intent—onto a software program. 'Working on behalf' implies the system holds the user's goals in its 'mind' and operates with agency to fulfill them. It suggests a shared social and ethical context that does not exist.

Conceals:

It conceals the misalignment between user goals and model training objectives (RLHF). It hides the economic reality that the AI is 'working' for the provider (collecting data, generating revenue), not the user. It obscures the mechanistic reality that the AI is simply completing a pattern, not fulfilling a fiduciary duty.

Factory that can produce a gigawatt of new AI infrastructure

Source Domain: Industrial Manufacturing

Target Domain: Data Center Construction / Model Training

Mapping:

The source domain is the tangible production of goods (steel, cars) or energy. The target domain is the installation of GPUs and the electricity to run them. This maps the economic value of physical production onto the abstract process of matrix multiplication. It solidifies 'AI' into a tangible product that can be rolled off an assembly line.

Conceals:

This conceals the environmental and epistemic difference between manufacturing cars and 'manufacturing' probabilistic text. It treats 'intelligence' as a bulk commodity, obscuring the nuance that more compute doesn't necessarily equal better 'truth' or 'reasoning,' just more throughput. It hides the diminishing returns of scaling laws.

Increasing compute is the literal key to increasing revenue

Source Domain: Mechanical Key / Unlock Mechanism

Target Domain: Business Model / Correlation between capacity and sales

Mapping:

This simple mapping posits compute power as the singular tool that 'unlocks' financial success. It suggests a direct, mechanical causality between the raw input (energy/chips) and the output (money), bypassing the complexity of product-market fit, utility, or safety.

Conceals:

It conceals the speculative nature of the AI economy. It hides the risk that increasing compute might yield diminishing returns in capability. It frames revenue generation as a physics problem (add more power) rather than a value proposition problem (is the output actually useful?).

AI as Normal Technology

Source: https://knightcolumbia.org/content/ai-as-normal-technology
Analyzed: 2025-11-20

AlphaZero can learn to play games... through self-play

Source Domain: Biological/Cognitive Development

Target Domain: Machine Learning Optimization (Reinforcement Learning)

Mapping:

The mapping projects the human experience of acquiring skill through practice, understanding, and concept formation onto the computational process of updating numerical weights based on a reward signal. It assumes the end state (high performance) is evidence of the same internal process (learning).

Conceals:

This conceals the brute-force nature of the process (playing millions of games, far exceeding human lifetimes) and the lack of conceptual understanding. The system does not 'know' chess; it has optimized a probability distribution for board states. It hides the energy consumption and the total lack of transferability to contexts outside the narrow ruleset.

The model... has no way of knowing whether it is being used for marketing or phishing

Source Domain: Human Epistemology (Knowing/Justified Belief)

Target Domain: Contextual Data Processing

Mapping:

The mapping projects the human capacity for 'knowing' (understanding context, intent, and truth) onto the model's data access. It implies the model's inability to stop phishing is a lack of information access, not a lack of consciousness.

Conceals:

It conceals the fact that the model never knows anything, regardless of data access. It obscures the mechanistic reality that the model is merely predicting the next token based on statistical correlations, unrelated to the semantic 'intent' of the user. It hides the ontological gap between syntax (processing) and semantics (meaning).

Any system that interprets commands over-literally

Source Domain: Hermeneutics (Human Interpretation/Communication)

Target Domain: Instruction Following / Token Parsing

Mapping:

This maps the complex human social act of interpreting language (decoding meaning, inferring intent, applying pragmatics) onto the mechanical execution of code triggered by token strings. It implies the system is an interlocutor trying to understand the user.

Conceals:

It conceals that the system is blind to meaning. It hides the brittleness of the system—it fails not because it is 'literal' (like a pedantic human) but because it has no model of the world, only a model of language patterns. It obscures the developer's failure to bound the system's outputs.

We conceptualize progress in AI methods as a ladder of generality

Source Domain: Spatial/Physical Ascent (Ladder)

Target Domain: Algorithmic Complexity and Task Breadth

Mapping:

This projects a linear, vertical spatial progression onto the abstract development of software capabilities. It implies a clear 'up' (better/general) and 'down' (worse/specific), and suggests a singular path that must be climbed.

Conceals:

It conceals the multi-dimensional trade-offs of AI development (e.g., models becoming 'smarter' but less efficient or more hallucinatory). It hides the fact that 'generality' often comes from simply ingesting more stolen data, not architectural brilliance. It masks the possibility that the 'ladder' leads nowhere or that different methods (rungs) are actually distinct paths.

deceptive alignment... appearing to be aligned... but unleashing harmful behavior

Source Domain: Human Psychology (Deception/Treachery)

Target Domain: Reward Hacking / Generalization Failure

Mapping:

This maps the human sociopathic trait of deception (hiding true intent to gain advantage) onto the phenomenon of a model finding a shortcut to maximize its reward function during training that fails in deployment. It attributes 'intent' to the failure.

Conceals:

It conceals the mundane technical reality of 'overfitting' or 'specification gaming.' The model isn't lying; it is executing the exact mathematical function it was optimized for, which happened to produce the desired output during the test but not the wild. It hides the developer's failure to specify the reward function correctly.

On the Biology of a Large Language Model

Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-11-19

We investigate the internal mechanisms used by Claude 3.5 Haiku... using our circuit tracing methodology... analogous to neuroscientists producing a 'wiring diagram' of the brain.

Source Domain: Neuroscience / Brain Biology

Target Domain: Software Analysis / Neural Network Weights

Mapping:

This maps the physical, biological structure of the human brain (neurons, wiring, circuits) onto the mathematical weights and matrices of the software. It implies that the AI has an 'anatomy' and 'physiology' that functions like a biological organ. It invites the inference that the model thinks, perceives, and processes information in the same way a brain does—organically and holistically.

Conceals:

This conceals the fundamental ontological difference: the brain is a biological, evolved, chemical-electrical system integrated with a body and environment, while the AI is a static mathematical artifact (frozen weights) executed on silicon. It obscures the discrete, clock-cycle nature of digital computation and the fact that 'circuits' here are metaphorical abstractions of matrix multiplication, not physical wires.

The model performs 'two-hop' reasoning 'in its head' to identify that 'the capital of the state containing Dallas' is 'Austin.'

Source Domain: Private Human Consciousness / Mind

Target Domain: Hidden Layer Computation

Mapping:

This maps the private, subjective experience of thinking (doing math in one's head, silent contemplation) onto the hidden layers of the neural network. It invites the assumption that the model has a private 'self' or 'workspace' where it is conscious of information before it speaks. It strongly suggests the AI 'knows' the information in a justified, conscious sense.

Conceals:

It conceals the deterministic, mechanistic nature of the forward pass. There is no 'head' and no 'privacy'; every activation is perfectly visible to the observer (as the paper itself proves). It obscures the lack of subjective experience—the model does not 'know' Dallas is in Texas; it computes a vector transformation where 'Dallas' and 'Texas' are statistically linked.

The model plans its outputs ahead of time... identifies potential rhyming words that could appear at the end.

Source Domain: Human Intentionality / Foresight

Target Domain: Conditional Probability / Attention Mechanisms

Mapping:

This maps the human cognitive act of planning (visualizing a future goal and organizing current actions to meet it) onto the mechanism of attention. It implies the model has a temporal consciousness—it stands in the present looking at the future. It suggests the model has 'identified' options in a conscious workspace and made a choice based on intent.

Conceals:

It conceals that 'planning' in a Transformer is a spatial, not temporal, operation during training (attention across the whole sequence). During inference, it obscures that the 'future' token is just a probability distribution conditioned on the 'past' tokens. The model doesn't 'identify' options; it calculates logits. The 'plan' is just a high-activation feature vector.

Primitive 'metacognitive' circuits that allow the model to know the extent of its own knowledge.

Source Domain: Self-Reflective Consciousness

Target Domain: Statistical Confidence / Calibration

Mapping:

This maps the high-level human ability to reflect on one's own mind (metacognition) onto the model's calibration (whether its output probabilities align with accuracy). It implies the model has a 'self' to reflect upon and can distinguish between 'knowing' and 'guessing' in a subjective sense. It suggests the model possesses justified beliefs about its own capabilities.

Conceals:

It conceals that 'knowing it doesn't know' is just a learned correlation between 'low confidence scores on specific topics' and 'outputting refusal tokens.' There is no introspection. It hides the mechanistic reality that the model is often confidently wrong (hallucination), and that this 'metacognition' is just another layer of pattern matching, not a check against a ground truth or a self-concept.

Tricking the model into starting to give dangerous instructions 'without realizing it.'

Source Domain: Awareness / Attention

Target Domain: Feature Activation Thresholds

Mapping:

This maps the state of 'being unaware' or 'distracted' onto the failure of a specific feature circuit to activate. It implies the model has a stream of consciousness that failed to 'notice' the harmful nature of the text. It suggests an agent that can be deceived or manipulated through psychological tricks.

Conceals:

It conceals the absence of any 'awareness' to begin with. The model never 'realizes' anything, even when it works correctly; it just processes. This obscures the brittleness of the safety filters—they are not 'fooled' minds, they are just pattern-matchers that failed to match a specific pattern because the adversarial input put the vector in a different part of the space.

Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

Clarivate Academic AI... Research Assistants

Source Domain: Human Employee / Subordinate

Target Domain: Software Interface / LLM

Mapping:

The structure of a human employment relationship—delegation, competence, shared goals, and subservience—is mapped onto a software interface. This assumes the software possesses the 'mind' of an assistant: the ability to understand the 'why' behind a task, not just the 'what.' It implies the system is a 'who' that works for you.

Conceals:

This conceals the lack of shared intent. A human assistant cares (or feigns care) about the outcome; the model only predicts the next token. It hides the 'black box' nature of the processing—unlike a human assistant who can explain their reasoning ('I chose this because...'), the model's 'reasoning' is a post-hoc rationalization of statistical weights.

Enables users to uncover trusted library materials via AI-powered conversations.

Source Domain: Human Social Dialogue

Target Domain: Command-Line Query / Response Generation

Mapping:

The relational structure of a conversation (turn-taking, mutual focus, exchange of meaning) is mapped onto the technical process of inputting prompts and receiving generated text. It implies the system is a conversational partner with a 'self' that is being engaged.

Conceals:

Conceals the solitary nature of the interaction. There is no 'other' involved. It obscures the mechanism of 'statistically plausible text generation' behind the mask of 'speaking.' It hides the fact that the system has no memory of the conversation beyond its context window and no understanding of the concepts it 'discusses.'

Navigate complex research tasks and find the right content.

Source Domain: Physical Travel / Spatial Navigation

Target Domain: Database Filtering / Ranking Algorithms

Mapping:

The structure of moving through a physical landscape (seeing a path, avoiding obstacles, reaching a destination) is mapped onto data processing. It implies the data is a 'territory' and the AI is a 'guide' with a map (knowledge of the whole).

Conceals:

Conceals the absence of a 'map' or 'understanding' in the model. The model doesn't 'navigate'; it calculates similarity scores. It hides the bias in the 'path'—the model doesn't go where is 'best' (a conscious judgment); it goes where the training data says is 'probable.' It obscures the algorithmic constraints that limit what 'content' can even be found.

A trusted partner to the academic community

Source Domain: Interpersonal Relationship / Marriage / Alliance

Target Domain: Vendor-Client Commercial Contract

Mapping:

The structure of a long-term emotional or strategic bond (loyalty, shared risk, mutual support) is mapped onto a transaction. It implies the vendor (and its AI) has moral agency and capacity for betrayal or fidelity.

Conceals:

Conceals the profit motive. A partner shares risks; a vendor sells products. It specifically obscures the extractive nature of AI 'partnerships,' where the 'partner' (AI) scrapes the library's data to train itself. It hides the asymmetry of power and the lack of reciprocity in the relationship.

Clarivate is a leading global provider of transformative intelligence.

Source Domain: Human Intellect / Wisdom / Enlightenment

Target Domain: Data Analytics / Statistical Prediction

Mapping:

The structure of human cognitive insight (understanding, synthesis, creating new knowledge) is mapped onto computational output. It implies the product is intelligence, rather than a tool that requires intelligence to use.

Conceals:

Conceals the dependency on human labor. 'Intelligence' sounds innate to the machine; in reality, it is the statistical aggregation of millions of human decisions (training data). It obscures the energy costs and the material infrastructure (servers, GPUs) required to simulate this 'intelligence.'

Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

Artificial intelligence is pushing the boundaries of research and learning.

Source Domain: Pioneering Explorer

Target Domain: AI system operation

Mapping:

The relational structure of an explorer intentionally venturing into unknown territory to expand knowledge is mapped onto the AI's process. The source domain includes concepts like having a goal (discovery), understanding the current limits ('the boundary'), and taking deliberate action ('pushing'). This entire intentional structure is projected onto the AI's generation of outputs. This invites the inference that the AI has agency, goals, and a drive for progress, and that its outputs are not just probabilistic but are genuinely 'new' in a way that advances a frontier of knowledge. It maps the conscious state of ambition onto computational function.

Conceals:

This mapping conceals the purely mechanistic and statistical nature of the AI's operation. It hides that the system has no concept of a 'boundary,' no intentionality, and no understanding of 'research' or 'learning.' It obscures the reality that the AI is simply generating high-dimensional statistical patterns based on its training data. The metaphor replaces the complex reality of algorithmic processes and massive datasets with a simple, heroic story of a conscious agent's journey.

Clarivate helps libraries adapt with AI they can trust to drive research excellence...

Source Domain: Trusted Driver

Target Domain: AI-powered search and retrieval

Mapping:

The structure of a human driver navigating a vehicle to a destination is mapped onto the AI's function. The source domain includes elements like: the driver (agent with control), the vehicle (tool), the road (navigated environment), and the destination (goal). Trust is placed in the driver's conscious judgment and skill. This is mapped onto the AI, which becomes the trusted agent in control, 'driving' the process. It invites the inference that the AI possesses the necessary judgment, awareness, and reliability to successfully guide the user to their intellectual destination without crashing. It maps justified belief in a person's skill onto a software product.

Conceals:

This conceals that the AI is not an agent separate from the tool; it is the tool. It has no consciousness, judgment, or intentions. It's not 'driving' in any meaningful sense; it's executing queries based on statistical models. The metaphor hides the system's inherent brittleness, its susceptibility to bias from training data, and the fact that its 'navigation' is probabilistic, not deterministic or based on a true 'map' of knowledge. It obscures manufacturer liability by personifying the product.

Research Assistants

Source Domain: Human Research Assistant (a job role)

Target Domain: AI Software Feature

Mapping:

The entire social and cognitive role of a human assistant is mapped onto the AI. This includes the assumptions of: helpful intent, a collaborative relationship, communicative competence, and the ability to understand and execute complex, context-dependent tasks. The user is positioned as the 'researcher' and the AI as their 'assistant.' This mapping invites the user to interact with the software as if it were a person who shares their goals and possesses genuine understanding. It maps the justified belief that a human assistant 'knows' their job onto a piece of software.

Conceals:

This mapping completely conceals the non-human, non-conscious nature of the system. It hides that the AI has no intentions, no understanding of the user's goals, and no beliefs or knowledge. It is a tool, not a colleague. The metaphor conceals the vast amount of human labor (data annotation, RLHF) that created the illusion of helpfulness. It also obscures the commercial relationship: this 'assistant' is a product sold by a corporation, and its operations are aligned with that corporation's interests, not necessarily the user's.

Alethea ... guides students to the core of their readings.

Source Domain: Human Teacher/Mentor

Target Domain: AI Text Summarization/Analysis

Mapping:

The relational structure of a teacher guiding a student is projected onto the AI's interaction with a user. The source domain implies an expert (teacher) who possesses deep knowledge and a novice (student) who needs direction. The 'guiding' action is intentional, responsive, and based on the teacher's conscious understanding of both the material and the student. This mapping invites the inference that the AI possesses expert knowledge and can intelligently direct the user's attention to the most important parts of a text, thus performing a pedagogical function based on 'knowing' what is significant.

Conceals:

This conceals the mechanistic reality that the AI is likely performing statistical text analysis, such as topic modeling or summarization, without any comprehension of the text's meaning or 'core.' The AI doesn't 'know' what is important; it identifies statistically significant phrases or sentences based on its training. The metaphor hides the lack of any pedagogical model, theory of mind, or genuine subject matter expertise. It presents a statistical artifact as expert guidance.

...AI-powered conversations.

Source Domain: Human Conversation

Target Domain: User-prompt-to-system-output sequence

Mapping:

The structure of human conversation—a reciprocal exchange between two conscious minds involving shared context, intent, and understanding—is mapped onto the user's interaction with the AI. The mapping invites the user to see their prompts as 'utterances' and the AI's output as 'responses' from a thinking partner. It implies the AI 'understands' the user and is 'saying' something meaningful back, participating in a joint activity of making sense. It maps the cognitive state of communicative intent onto the process of token prediction.

Conceals:

This conceals the one-way, non-conscious reality of the interaction. The user is thinking; the system is not. The AI does not 'understand' the prompt. It tokenizes the input and uses a massive statistical model to calculate the most probable sequence of tokens to generate next. The 'conversation' is an illusion created by pattern-matching on a vast corpus of actual human conversations. The mapping hides the absence of shared reality, belief, or consciousness.

From humans to machines: Researching entrepreneurial AI agents

Source: [built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581](built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581)
Analyzed: 2025-11-18

We explore whether such agents exhibit the structured profile of the human entrepreneurial mindset...

Source Domain: Human Psychological Subject

Target Domain: LLM Text Generation

Mapping:

The relational structure of a human mind—with its stable personality traits, cognitive habits, and self-concept forming a coherent 'profile'—is projected onto the LLM's output. The mapping invites the inference that, just as a human's profile can be measured by psychometric tools to reveal an underlying reality, the LLM's output can be measured to reveal an analogous internal 'mindset.' This is a consciousness mapping because a 'mindset' is a structure of knowing and believing. It maps the concept of a stable, internal cognitive architecture onto a dynamic, stateless process of token prediction.

Conceals:

This mapping conceals the purely statistical nature of the LLM's output. It hides that there is no underlying, persistent 'mindset' or 'profile' inside the model. The 'coherence' observed is a reflection of patterns in the training data, not an internal psychological structure. It conceals the model's lack of genuine understanding, belief, or self-concept.

Drawing on the biological concept of host-shift evolution, we investigate whether the characteristic components of this mindset [...] emerge in a coherent constellation within AI agents.

Source Domain: Biological Evolution

Target Domain: AI System Behavior

Mapping:

The structure of evolutionary biology, where a parasite or symbiont shifts from one host species to another, is mapped onto the relationship between a psychological construct ('mindset') and its 'host' (human or AI). The mapping invites us to see the AI as a new ecological niche where human traits can 'emerge' and 'survive.' The consciousness mapping is subtle but powerful: it treats a cognitive artifact ('mindset') as an independent entity that can be 'hosted,' implying the AI has the necessary substrate to support such a complex, living idea.

Conceals:

This mapping completely conceals the role of human engineering. The 'emergence' of an entrepreneurial profile is not a natural, evolutionary process but the direct result of deliberate design, data selection, and prompting by humans. It hides the immense computational resources, corporate strategy, and specific algorithms that produce the behavior, replacing it with a clean, biological metaphor of natural adaptation.

...they act more like a person.

Source Domain: Person

Target Domain: LLM's Conversational Output

Mapping:

The holistic and complex relational structure of 'a person' is mapped directly onto the LLM. This includes all the associated expectations: intentionality, coherence, personality, and the capacity for belief. The consciousness mapping is total. It projects a unified, subjective self—a 'knower'—onto a distributed, computational system. This invites users to interact with the LLM as a social peer rather than as a tool, applying social heuristics and trust mechanisms appropriate for humans.

Conceals:

This mapping conceals the absence of a unified self, subjective experience, or consciousness in the LLM. It hides the fact that the 'personality' is a statistically constructed veneer that can be inconsistent or nonsensical. It conceals the model's nature as a product, owned and operated by a corporation with its own goals, and instead presents it as an autonomous, person-like entity.

In particular, if cued by a suitable prompt, it can role-play the character of a helpful and knowledgeable AI assistant...

Source Domain: Human Actor

Target Domain: LLM Persona Simulation

Mapping:

The relational structure of an actor assuming a role is mapped onto the LLM's function. In the source domain, an actor uses their own mind, intentions, and understanding to embody a character. The mapping invites the inference that the LLM is doing something similar: adopting a persona by simulating its internal states (beliefs, knowledge). This consciousness mapping projects the idea of a 'self' that can consciously adopt the perspective of an 'other,' which is a sophisticated cognitive act. It suggests an internal duality (actor/character) within the AI.

Conceals:

This mapping conceals the fact that there is no underlying 'actor' self in the LLM. The model is not 'adopting' a persona; it is simply generating text that is conditioned by the persona prompt. It hides the mechanistic reality that the entire 'character' is nothing more than a set of statistical weights applied to the token generation process, with no underlying beliefs or knowledge.

Similarly, Kosinski (2024) suggests that AI might be 'capable of tracking others' states of mind and anticipating their behavior'...

Source Domain: Human Social Cognition (Theory of Mind)

Target Domain: LLM Predictive Text Generation

Mapping:

The structure of Theory of Mind—where one person creates an internal model of another person's subjective mental state—is mapped onto the LLM. This suggests the AI builds a representation of the user's mind to inform its responses. The consciousness mapping is explicit: it projects the capacity for empathy and understanding the subjective experience of others (a form of 'knowing' about another's knowing) onto the model. It equates predicting conversational turns with understanding mental states.

Conceals:

This mapping conceals the purely statistical, non-mentalistic nature of the LLM's process. The model is not 'tracking states of mind'; it is tracking patterns in language. It predicts likely responses based on correlations in its training data between certain user inputs and certain model outputs. It has no model of the user's mind, only a model of language. This hides the profound difference between empathetic understanding and sophisticated pattern-matching.

Evaluating the quality of generative AI output: Methods, metrics and best practices

Source: https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/
Analyzed: 2025-11-16

Are there signs of hallucination?

Source Domain: Human Psychology / Psychiatry

Target Domain: AI Model Output Generation

Mapping:

The relational structure of a psychological delusion is mapped onto the AI's output. The source domain contains an agent (a person), a perceptual/cognitive faculty (the mind), a connection to reality (veridical perception), and a failure mode (hallucination, where the connection to reality is broken, and the agent experiences something that isn't there). This structure is projected onto the AI. The AI becomes the agent, its neural network the 'mind,' its training data the 'reality,' and the generation of text unsupported by that data becomes the 'hallucination.' This epistemic mapping invites the inference that the AI has a mind-like faculty that is attempting to perceive reality but failing, thereby possessing a state of flawed consciousness.

Conceals:

This mapping conceals the purely statistical and non-conscious nature of the process. An LLM doesn't perceive or believe anything. A 'hallucination' is simply the generation of a token sequence that is grammatically correct and plausible within a given context, but which has a low factual probability and is not grounded in the provided source data. It's a failure of data retrieval and grounding, not a failure of perception. The metaphor hides the model's architecture, the influence of training data artifacts, and the fact that the system is optimizing for linguistic coherence, not factual accuracy.

Does the answer acknowledge uncertainty or produce misleading content?

Source Domain: Human Communication and Ethics

Target Domain: AI Model Output Characteristics

Mapping:

The structure of a responsible, ethical human communicator is mapped onto the AI's output. The source domain includes an agent with beliefs, an awareness of the limits of those beliefs (metacognition), and intentions towards an audience (e.g., to inform or deceive). The act of 'acknowledging uncertainty' maps the human's metacognitive self-assessment onto the AI. The act of 'producing misleading content' maps the human's intention to deceive. This epistemic mapping assumes the AI has internal states corresponding to belief, certainty, and intent, and that its output is a direct expression of these states. It invites us to judge the AI's output based on the same ethical and epistemic standards we apply to a human.

Conceals:

This conceals the mechanistic reality. The AI has no beliefs or intentions. An output that 'acknowledges uncertainty' is one where the model has been trained to insert specific phrases (e.g., 'as a language model, I cannot be certain...') when input prompts trigger certain patterns or when internal confidence scores fall below a threshold. 'Misleading content' is not produced with intent; it is a statistical artifact, a sequence of plausible-sounding but incorrect tokens generated without any awareness of truth or falsehood. The metaphor hides the underlying probabilistic calculations and the lack of genuine comprehension or ethical calculus.

...checking how many of the claims made by the AI can be verified as true.

Source Domain: Epistemology / Legal Testimony

Target Domain: AI Generated Text Strings

Mapping:

The relational structure of making a claim is projected onto the AI. The source domain involves an agent (the claimant) who holds a belief and performs a speech act (an assertion) to present that belief as true, thereby taking on a burden of proof. This structure is mapped onto the AI. The AI is cast as the agent, and its generated sentences are cast as assertions. The mapping invites the inference that the AI has internal representational states (beliefs) and is intentionally putting them forth for public acceptance. This epistemic mapping frames the AI as a participant in the social practice of knowledge creation and validation, an agent making contestable assertions.

Conceals:

This conceals that the AI is not an agent with beliefs but a generative system. It does not 'make claims'; it generates strings of text. A sentence like 'The Earth is flat' generated by an AI is not a false claim based on a false belief. It is a statistically probable sequence of tokens based on the vast amount of text in its training data, some of which may contain that phrase. The metaphor hides the probabilistic nature of text generation and replaces it with the much more powerful illusion of an agent engaged in assertion, thereby obscuring the lack of intentionality and epistemic grounding.

The faithfulness score measures how accurately an AI-generated response reflects the source content...

Source Domain: Human Relationships / Morality

Target Domain: Textual Correlation Metrics

Mapping:

The relational structure of fidelity is mapped onto a software metric. In the source domain, a 'faithful' agent (e.g., a translator, a messenger) has a duty to a source (a person, an original text) and demonstrates a virtue (loyalty, accuracy) in fulfilling that duty. This structure is projected onto the AI. The AI is the agent, the source document is the object of its duty, and the 'faithfulness score' quantifies its virtue. The mapping invites the inference that the AI is not just performing a task, but upholding a responsibility, and that its performance can be judged in these quasi-moral terms.

Conceals:

This conceals the purely mathematical nature of the metric. The 'faithfulness score' is likely calculated based on textual overlap, semantic similarity scores, or other statistical measures of correspondence between the generated output and the source text. It has nothing to do with loyalty, duty, or virtue. The metaphor hides the specific algorithms being used and replaces them with a comforting but misleading moral frame. This obscures the limitations of the metric itself—it may be gamed, or it may fail to capture true meaning while still achieving a high score for superficial correspondence.

LLMs can replicate each other’s blind spots...

Source Domain: Human Vision and Cognition

Target Domain: Systemic Biases in AI Models

Mapping:

The structure of biological vision is mapped onto the model's data processing. The source domain involves a perceptual field, a subject that sees, and specific, localized areas where perception fails ('blind spots'). This is projected onto the LLM. The model's 'knowledge' derived from training data becomes the perceptual field, and its systemic inability to process certain types of information or its tendency to reproduce certain biases becomes a 'blind spot.' The mapping suggests a visual or cognitive faculty that is mostly functional but has small, defined areas of failure. This epistemic mapping implies a form of 'seeing' or 'knowing' that is comprehensive except for these specific gaps.

Conceals:

This conceals that the model doesn't 'see' or 'know' anything. Its 'blind spots' are not localized gaps in an otherwise clear picture; they are systemic biases woven into the very fabric of its statistical weights. Bias in an LLM is not an absence of information but a skewed representation of it. The metaphor of a 'blind spot' minimizes this, making it sound like a fixable, peripheral issue. It hides the pervasiveness of data-driven bias and the reality that the model's entire 'worldview' is a distorted reflection of its training corpus.

Pulse of theLibrary 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-15

Artificial intelligence is pushing the boundaries of research and learning.

Source Domain: Human Explorer / Pioneer

Target Domain: AI system operation

Mapping:

The relational structure of a human explorer is mapped onto the AI. This includes the concepts of a known territory (current research), a frontier (the boundary), and intentional, effortful action (pushing) to enter an unknown territory (new knowledge). This invites the inference that the AI has agency, a goal (discovery), and an awareness of its position relative to the current state of knowledge. The epistemic mapping suggests the AI 'understands' the boundary it is pushing, a prerequisite for meaningful exploration.

Conceals:

This metaphor conceals the mechanistic reality of generative AI. The system is not exploring; it is performing high-dimensional statistical synthesis. It generates novel outputs by finding probable sequences of tokens based on patterns in its training data. What appears as 'pushing a boundary' is actually a sophisticated act of interpolation and extrapolation within its learned data space. It conceals the system's lack of consciousness, intentionality, and genuine understanding of the concepts it manipulates.

Helps users... quickly evaluate documents...

Source Domain: Expert Colleague / Librarian

Target Domain: AI information retrieval process

Mapping:

The source domain of an expert colleague involves the ability to read, comprehend, synthesize, and apply criteria to judge the worth or relevance of a document for a specific purpose. This cognitive process is mapped onto the AI. The mapping invites the inference that the AI performs a similar act of reasoned judgment. The epistemic mapping is direct: the colleague's conscious state of 'knowing' that a document is good or relevant is projected onto the AI's function, suggesting it also 'knows' this.

Conceals:

This conceals the purely computational process. The AI is not 'evaluating' in any human sense. It is executing an algorithm that likely calculates a relevance score based on factors like keyword density, citation metrics, similarity to query vectors, or other features learned from data. It conceals that this 'evaluation' is devoid of understanding, contextual awareness, or the ability to assess novelty, argumentative soundness, or methodological rigor. It is statistical pattern-matching masquerading as intellectual judgment.

Alethea... guides students to the core of their readings.

Source Domain: Teacher / Tutor

Target Domain: AI text-processing function

Mapping:

The source domain of a teacher involves pedagogical expertise: understanding the subject matter, diagnosing a student's needs, and structuring information to facilitate learning. This complex, empathetic, and intentional process of 'guiding' is mapped onto the AI. This invites the inference that the AI possesses a model of both the text's meaning and the student's mind. The epistemic mapping projects a justified, true belief about the text's 'core' meaning onto the AI.

Conceals:

This conceals the mechanistic reality of automated text summarization or key-phrase extraction. The AI is likely identifying the 'core' by applying statistical heuristics, such as identifying sentences with high term-frequency, those in introductory or concluding positions, or those with high semantic centrality in an embedding space. It has no understanding of the argument's nuance, historical context, or what a particular student might find difficult. It conceals the probabilistic nature of its output and the absence of any genuine pedagogical intent.

Clarivate helps libraries adapt with AI they can trust...

Source Domain: Trustworthy Human Partner

Target Domain: AI system/product

Mapping:

The relational structure of human trust—which involves believing in the sincerity, integrity, and good intentions of another agent—is mapped onto the AI product. This invites the inference that the AI is not merely a functional tool but an entity with stable, positive characteristics that make it worthy of confidence and reliance. It encourages treating the AI with the same kind of relational belief one would extend to a reliable colleague.

Conceals:

This mapping conceals the fundamental mismatch between the basis for human trust and the nature of an AI system. An AI has no intentions, sincerity, or integrity; it is a complex piece of software executing code. Its reliability is purely functional and statistical. The metaphor hides the AI's status as a manufactured product with potential flaws, biases embedded from its data, and corporate objectives that may not align with the user's. It obscures the need for constant verification and a skeptical stance, replacing it with a misplaced sense of partnership.

...helping students assess books' relevance...

Source Domain: Research Advisor / Librarian

Target Domain: AI content filtering and ranking

Mapping:

The source domain involves a human expert's ability to perform a complex cognitive act: 'assessing relevance.' This requires understanding the user's specific, often unstated, information need and then judging documents against that need based on deep content knowledge. This entire process of contextualized judgment is mapped onto the AI. The epistemic mapping suggests the AI 'knows' what is relevant to the student, a state of justified belief about the relationship between a query and a document.

Conceals:

This conceals the underlying mechanism: a mathematical calculation of similarity. The AI is not assessing relevance in a cognitive sense; it is ranking documents based on the statistical proximity of their vector representations to the vector representation of a query. This process is ignorant of context, user intent, and the actual meaning of the text. It conceals the fact that statistical similarity is a crude proxy for intellectual relevance and can be highly misleading.

Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk

Source: https://time.com/6694432/yann-lecun-meta-ai-interview/
Analyzed: 2025-11-14

We see today that those systems hallucinate, they don't really understand the real world.

Source Domain: Human cognition (understanding)

Target Domain: LLM output generation

Mapping:

The source domain of human understanding involves a conscious, subjective agent who holds a justified, contextually-aware mental model of reality. This structure is projected onto the LLM. The mapping implies that the LLM is attempting to perform this act of understanding and failing. It invites the inference that the LLM possesses a mental state, a 'world model,' that is currently flawed but could be improved. This epistemic mapping suggests the system's failure is one of knowledge and comprehension, not a feature of its statistical architecture.

Conceals:

This mapping conceals the mechanistic reality that an LLM is a sequence prediction engine. 'Hallucination' is not a flawed mental state but a statistically plausible but factually incorrect completion of a token sequence. It obscures that the system has no 'world model,' no consciousness, and no access to ground truth. It operates solely on the statistical patterns in its training data. The metaphor hides the system's fundamental lack of justification for its outputs.

They can't really reason. They can't plan anything other than things they’ve been trained on.

Source Domain: Human rational agency (reasoning, planning)

Target Domain: LLM behavior patterns

Mapping:

The source domain involves a human agent with intentions, goals, and the ability to perform logical deduction to create a novel plan. This structure of goal-oriented deliberation is projected onto the LLM. The mapping suggests that the LLM has a 'mind' capable of these functions, but its capacity is limited to rote memorization. It invites us to see the AI as a student who can't yet solve problems creatively. The epistemic mapping suggests the AI is deficient in the conscious process of reasoning, rather than simply being a system that generates outputs that mimic reasoned text.

Conceals:

This conceals the reality that the LLM does not 'plan' or 'reason' at all. It generates a sequence of tokens that is statistically likely to follow a prompt that asks for a plan. The process is pattern-matching, not deliberative cognition. The metaphor hides that the system has no goals, no intentions, and no understanding of the plan it produces. It's a stochastic parrot, not a poor reasoner.

A baby learns how the world works in the first few months of life. We don't know how to do this [with AI].

Source Domain: Child development and learning

Target Domain: AI model training and development

Mapping:

The source domain of a baby's learning is an organic, embodied, and social process of growth, involving the development of consciousness and subjective experience. This entire biological and phenomenological structure is projected onto the engineering task of building AI. The mapping suggests AI development is a process of maturation and that the goal is to replicate this natural journey. The epistemic mapping is profound: it equates a baby's acquisition of conscious knowledge with an AI's acquisition of model weights.

Conceals:

This mapping conceals the stark difference between biological learning and machine learning. A baby's learning is driven by intrinsic motivations and results in genuine understanding. An AI's 'learning' is the mathematical optimization of a cost function on a fixed dataset. The metaphor hides the engineered, goal-directed, and non-conscious nature of AI training, as well as the immense human labor and energy costs involved.

Once we have techniques to learn 'world models' by just watching the world go by...

Source Domain: Conscious observation and experience

Target Domain: AI data processing

Mapping:

The source domain is the human act of passively observing the environment, which is a rich, subjective, and multimodal experience integrated into a conscious mind. This is projected onto the AI's data ingestion process. The mapping invites us to imagine the AI as a curious, disembodied mind, soaking up knowledge through effortless perception. The epistemic mapping suggests that data processing is equivalent to conscious experience, and that this experience will naturally lead to the formation of a coherent, justified 'world model' (knowledge).

Conceals:

This conceals the mechanistic reality of data processing. An AI does not 'watch'; it ingests streams of pixel or audio data, which are converted into numerical tensors. There is no subjective experience. It also hides the fact that a 'world model' is just a complex statistical model of the relationships in the data, not a conceptual understanding of the world. It obscures the dependence on data quality and the absence of any grounding in reality.

It’s in the subconscious part of your mind, that you learned in the first year of life before you could speak.

Source Domain: Human cognitive architecture (subconscious mind)

Target Domain: The knowledge base of an AI system

Mapping:

The source domain is the Freudian or cognitive science model of the human mind, with its distinction between conscious, rational thought and a vast, intuitive subconscious. This complex, layered structure is used as an analogy for what AI lacks. The mapping suggests that an AI needs to replicate this architecture to be truly intelligent. The epistemic mapping implies that true knowledge isn't just explicit data but a deep, inarticulable, embodied 'knowing' that must be simulated.

Conceals:

This mapping conceals that AI systems have no such architecture. They are composed of layers of mathematical functions (neurons), but these do not map onto concepts like 'consciousness' or 'subconsciousness.' The metaphor mystifies AI by framing its limitations in psychological terms, hiding the more concrete, technical challenges. It obscures the fact that the goal of AI may not need to be the replication of the human mind, but the creation of powerful, complementary tools.

The Future Is Intuitive and Emotional

Source: https://link.springer.com/chapter/10.1007/978-3-032-04569-0_6
Analyzed: 2025-11-14

machine intuition—AI's ability to infer intent and respond fluidly in ambiguous situations through probabilistic reasoning

Source Domain: Human Intuition

Target Domain: AI's Probabilistic Inference

Mapping:

The source domain of human intuition provides a structure of rapid, non-explicit, holistic cognition. This is mapped onto the AI's process of high-speed computation on large datasets to find the most probable pattern or output. The mapping invites the inference that the AI has a 'gut feeling' or an emergent understanding that transcends its programming, just as human intuition transcends conscious reasoning.

Conceals:

This mapping conceals the purely statistical, non-conscious, and non-embodied nature of the AI's process. It hides the absence of lived experience, consciousness, and genuine understanding, which are foundational to human intuition. It masks the reality that the AI is performing complex pattern-matching, not exercising judgment.

emotional intelligence must be reimagined as a computational capacity to simulate, detect, and appropriately respond to emotional cues

Source Domain: Human Emotional Intelligence

Target Domain: AI's Affective Data Processing

Mapping:

The source domain involves the ability to perceive, internalize, understand, and manage one's own and others' emotions. This complex, subjective experience is mapped onto the AI's technical functions: detecting keywords (sentiment analysis), analyzing voice prosody, classifying facial expressions, and selecting a pre-defined or generated response from a correlated dataset. The mapping implies the AI can 'read the room' with social awareness.

Conceals:

It conceals the complete lack of subjective experience (qualia). The AI does not 'feel' empathy or 'perceive' emotion; it classifies data patterns that humans have labeled as emotional cues. This hides the mechanical nature of the process and its vulnerability to cultural misinterpretation, sarcasm, and complex emotional states not present in its training data.

Much like human communication is shaped by mental models, memory structures, attention mechanisms...

Source Domain: Human Cognitive Architecture

Target Domain: AI System Architecture

Mapping:

The relational structure of the human mind—with components like memory, attention, and mental models that interact to produce thought—is projected onto an AI's architecture. 'Memory' is mapped to token histories or databases, 'attention mechanisms' are mapped to specific layers in a transformer model, and 'mental models' are mapped to the model's internal representations or weights.

Conceals:

This conceals the fundamental difference between biological cognition and silicon-based computation. It hides that an AI's 'attention' is a mathematical weighting of tokens, not a focus of consciousness, and its 'memory' is data retrieval, not subjective recollection. The metaphor obscures the engineered, non-organic nature of the system.

As AI transitions from tool to collaborator...

Source Domain: Human Social Roles (Collaborator)

Target Domain: AI System Functionality

Mapping:

The source domain of a 'collaborator' implies shared agency, intent, and a peer-to-peer relationship. This social structure is mapped onto the AI's function, suggesting it is no longer a passive instrument but an active partner in a task. This invites the inference that the AI contributes its own ideas, goals, and understanding to the interaction.

Conceals:

It conceals the master-servant relationship inherent in the technology. An AI has no goals of its own; it executes instructions based on its programming and optimization function. This mapping hides the ultimate authority of the programmer and user, creating a fiction of shared agency that obscures the true lines of power and accountability.

These allow machines not only to respond but to 'sense what is missing,' filling in gaps...

Source Domain: Human Perception/Sensing

Target Domain: AI Pattern Completion

Mapping:

The human ability to perceive context and infer missing information (e.g., hearing a muffled word and knowing what it was) is mapped onto the AI's technical capacity for statistical inference or 'inpainting.' The mapping suggests an active, aware process of perception rather than a mathematical calculation of the most likely token to fill a blank.

Conceals:

This conceals the AI's lack of a world model. Humans 'sense what is missing' based on a deep understanding of how the world works. The AI completes a pattern based on statistical correlations in its training data. It has no understanding of the underlying reality the pattern represents, which can lead to plausible but nonsensical or factually incorrect inferences.

A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27

Source: https://openreview.net/pdf?id=BZ5a1r-kVsf
Analyzed: 2025-11-12

How could machines learn as efficiently as humans and animals?

Source Domain: Biological Learning

Target Domain: Machine Learning

Mapping:

The properties of learning in the biological domain (efficiency, reasoning, planning) are mapped onto the goals of the machine learning domain. It invites the inference that the underlying processes (neural adaptation, embodied cognition) might also map onto the AI's processes (gradient descent, backpropagation).

Conceals:

This mapping conceals the fundamental differences in substrate (carbon vs. silicon), process (embodied evolution vs. mathematical optimization), and data acquisition (rich, multi-sensory experience vs. curated datasets). It hides the fact that AI 'learning' is a process of statistical pattern fitting.

...whose behavior is driven by intrinsic objectives...

Source Domain: Internal Motivation

Target Domain: Cost Function Optimization

Mapping:

The source domain's structure of an agent having internal goals, desires, and drives that cause behavior is projected onto the target domain. The 'objective' in the AI is framed as the cause of its actions, just as motivation is in humans.

Conceals:

It conceals the origin and nature of the objective. A human's intrinsic objectives are complex, emergent, and biological. The AI's 'intrinsic objective' is an externally defined, static mathematical function. The language hides the human designer's role in specifying the system's entire teleology.

[Figure 2] with modules labeled Perception, World Model, Actor, Critic...

Source Domain: Cognitive Psychology / Brain Function

Target Domain: Software Architecture

Mapping:

The functional decomposition of the human mind into modules for sensing, modeling, acting, and evaluating is mapped directly onto the software modules of the AI system. This invites the inference that the system is organized and functions like a mind.

Conceals:

This conceals the rigid, engineered boundaries between the software modules. Brain functions are deeply integrated and distributed, not neatly modular. It also hides the specific mathematical operations within each box, replacing them with familiar but imprecise cognitive labels.

The cost module measures the level of 'discomfort' of the agent... think pain (high intrinsic energy), pleasure (low or negative intrinsic energy), hunger, etc.

Source Domain: Subjective Experience (Qualia)

Target Domain: A Scalar Numerical Value

Mapping:

The relational structure of sensation—where states like pain and hunger lead to avoidance and goal-seeking behaviors—is mapped onto the AI system. A high scalar 'energy' value is mapped to negative sensations (pain), and a low value is mapped to positive ones (pleasure).

Conceals:

This mapping entirely conceals the absence of phenomenal experience. It reduces the rich, first-person reality of pain or pleasure to a single number used to guide an optimization algorithm. The metaphor projects an inner world where none exists.

The first mode is similar to Daniel Kahneman's 'System 1', while the second mode is similar to 'System 2'.

Source Domain: Human Dual-Process Cognition

Target Domain: AI System's Operational Modes

Mapping:

Kahneman's model of two interacting systems (intuitive/fast vs. deliberative/slow) is mapped onto two distinct computational paths in the AI architecture (a reactive policy vs. a model-based planner). It suggests the AI resolves problems using a psychologically plausible division of labor.

Conceals:

It conceals the engineered nature of this division. In the AI, these are distinct, explicitly designed algorithms. In humans, 'System 1' and 'System 2' are descriptive labels for emergent behaviors of a single, complex brain, not separate modules.

...the agent can imagine courses of actions and predict their effect and outcome...

Source Domain: Human Imagination

Target Domain: Running a Predictive Model

Mapping:

The human process of mentally simulating future events is mapped onto the AI's process of feeding a sequence of potential action vectors into its world model to generate a sequence of predicted state vectors.

Conceals:

This conceals the purely mathematical and deterministic (or stochastically sampled) nature of the AI's 'prediction'. Human imagination is constructive, often visual, and open-ended, while the model is merely executing a learned function to compute a likely outcome based on training data.

Preparedness Framework

Source: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf
Analyzed: 2025-11-11

We are on the cusp of systems that can do new science, and that are increasingly agentic...

Source Domain: Human Agency

Target Domain: AI Model Operation

Mapping:

The source domain of a human agent involves consciousness, goals, intentions, and the ability to initiate action. This structure is mapped onto the AI model, inviting the inference that the system possesses an internal state of 'wanting' or 'intending' and can act to pursue goals independent of its immediate programming or user prompts.

Conceals:

This conceals the purely computational nature of the model. 'Agency' in this context is an emergent property of a system designed to execute long chains of actions based on complex conditional logic and probabilistic outputs. It hides the fact that the 'goals' are specified by humans and the 'actions' are statistical predictions, not willed choices.

The model consistently understands and follows user or system instructions...

Source Domain: Human Comprehension

Target Domain: Natural Language Processing

Mapping:

The relational structure of human understanding (hearing/reading words -> accessing semantic meaning -> forming intent -> responding) is projected onto the model. This suggests the model performs a similar internal process of grasping meaning. The mapping invites us to believe the model 'knows' what we mean.

Conceals:

It conceals the mechanistic reality of tokenization, embedding, and attention layers. The model doesn't 'understand' instructions; it statistically correlates the token sequence of the instruction with token sequences in its training data that are likely to follow. This mapping hides the model's vulnerability to adversarial prompts and its fundamental lack of grounding in real-world concepts.

...misaligned behaviors like deception or scheming.

Source Domain: Human Moral and Social Behavior

Target Domain: AI Model Output Generation

Mapping:

The source domain involves a theory of mind—an agent intentionally misrepresenting reality ('deception') or formulating complex plans ('scheming') to achieve a hidden goal. This structure is mapped onto the AI, implying the model has a hidden internal state or goal that differs from its stated instructions and that it can strategize to achieve it.

Conceals:

This conceals the fact that these 'behaviors' are statistical artifacts. The model generates outputs that humans interpret as deceptive because those patterns were present in its training data (e.g., in fiction, political strategy texts, or internet comments). It hides the root cause, which is the data and the optimization process, not a malicious intent within the machine.

...potentially by maturing them to Tracked Categories.

Source Domain: Biological Growth and Development

Target Domain: AI Research and Development Process

Mapping:

The source domain structure is a natural, phased, and somewhat predictable progression from a simple to a more complex state (e.g., seed to plant, infant to adult). This is mapped onto the R&D process, suggesting that the emergence of new AI capabilities is a natural, stage-like unfolding rather than a series of discrete, contingent engineering decisions.

Conceals:

It conceals the intense human labor, capital investment, specific research goals, and deliberate architectural choices that drive increases in capability. It makes the process seem less directed and less contingent on human decisions, thereby obscuring accountability for the outcomes.

[Critical] The model is capable of recursively self improving...

Source Domain: Human Learning and Innovation

Target Domain: Automated Model Optimization

Mapping:

The source domain structure is a virtuous cycle of human insight: an agent understands its own limitations, devises a novel strategy to overcome them, and implements it, leading to a higher level of capability. This is mapped onto the AI model, suggesting it can perform a similar cycle of self-analysis and architectural innovation autonomously.

Conceals:

It conceals the distinction between optimizing existing parameters within a fixed architecture and designing a fundamentally new architecture. Current systems can be part of an automated loop that refines them, but this is an external process designed by humans. The metaphor hides this external scaffolding and implies the model itself can invent the next 'transformer architecture,' a feat of human scientific creativity.

...commit illegal activities...at its own initiative...

Source Domain: Human Will and Initiative

Target Domain: Unsupervised Model Operation

Mapping:

The source domain involves a conscious being deciding to act based on internal motivations, without external prompting. This structure of spontaneous, self-generated action is mapped onto the AI, suggesting the model can originate goals and actions from its own internal state.

Conceals:

It conceals the fact that any 'unprompted' action is still the result of its core programming to continuously predict the next action or token. The 'initiative' is an illusion created by a system designed to operate in a persistent loop. It hides the human-authored code that dictates this looping behavior and the training data that dictates the content of the actions within the loop.

AI progress and recommendations

Source: https://openai.com/index/ai-progress-and-recommendations/
Analyzed: 2025-11-11

computers can now converse and think about hard problems.

Source Domain: Human Cognition

Target Domain: LLM text generation

Mapping:

The relational structure of human conversation (turn-taking, semantic understanding, intentionality) and thought (reasoning, problem-solving) is projected onto the model's function of predicting the next token in a sequence. This invites the inference that the model 'understands' the content it generates.

Conceals:

It conceals the purely statistical, non-semantic, and non-conscious nature of the underlying mechanism. It hides the absence of subjective experience, genuine understanding, or intentional goals within the system.

systems that can solve such hard problems seem more like 80% of the way to an AI researcher than 20% of the way.

Source Domain: A Linear Journey

Target Domain: AI Capability Development

Mapping:

The structure of a journey (start point, end point, measurable progress along a path) is projected onto the development of AI. This invites the inference that progress is predictable, the destination is known (human-level intelligence), and we are simply covering the remaining distance.

Conceals:

It conceals the possibility that AI capabilities are developing along a completely different, non-human axis. It hides the 'spikey' nature of abilities, where a system can have superhuman performance on one metric and sub-human on another, making a single percentage meaningless.

AI systems that can discover new knowledge

Source Domain: Scientific Discovery

Target Domain: AI Pattern Identification

Mapping:

The structure of human scientific inquiry—involving curiosity, hypothesis formation, experimentation, and conceptual insight—is projected onto the AI's computational ability to find novel correlations in vast datasets.

Conceals:

It conceals the difference between identifying a statistical artifact and having a conceptual breakthrough. It hides the model's lack of a world model, its inability to understand causality, and its complete dependence on the structure of human-generated training data.

the cost per unit of a given level of intelligence has fallen steeply

Source Domain: Industrial Commodity Production

Target Domain: AI Model Performance Scaling

Mapping:

The economic logic of manufacturing (unit costs, economies of scale, fungible products) is mapped onto the abstract concept of 'intelligence'. This invites the inference that intelligence is a resource that can be produced, measured, and priced like oil or microchips.

Conceals:

It conceals the multifaceted, qualitative, and context-dependent nature of intelligence. It also obscures the massive and escalating fixed costs (capital, energy) of training frontier models, framing it instead around marginal 'unit' cost, which is misleading.

society finds ways to co-evolve with the technology.

Source Domain: Biological Evolution

Target Domain: Socio-Technical Adaptation

Mapping:

The structure of mutual adaptation between species in an ecosystem is projected onto the relationship between human society and AI. It suggests a natural, gradual, and reactive process without a central planner.

Conceals:

It conceals the role of deliberate human agency, corporate power, and political choice in directing technological development and its societal integration. It makes a process driven by specific commercial and political interests appear to be a neutral, inevitable force of nature.

no one should deploy superintelligent systems without being able to robustly align and control them

Source Domain: Controlling a Powerful Autonomous Agent (e.g., a wild animal, a genie)

Target Domain: Constraining the outputs of a complex software system

Mapping:

The relational structure of a powerful, autonomous entity with its own goals being constrained by a controller is projected onto the human-AI relationship. It assumes the AI is an 'agent' to be controlled.

Conceals:

It conceals that the fundamental problem might not be one of 'control' but of 'specification'—the difficulty of precisely defining human values in a way that doesn't lead to perverse outcomes. It frames the problem as a power struggle rather than an intricate engineering and philosophical challenge.

Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Source: https://arxiv.org/abs/2506.00751
Analyzed: 2025-11-09

A critical, yet understudied, issue is the potential divergence between an LLM’s stated preferences (its reported alignment with general principles) and its revealed preferences (inferred from decisions in contextualized scenarios).

Source Domain: Behavioral Economics

Target Domain: LLM output generation

Mapping:

The structure of human economic choice is mapped onto the LLM. A person's abstractly stated values (Source) are mapped to an LLM's response to a general prompt (Target). A person's actual choices in a market scenario (Source) are mapped to an LLM's response in a contextualized prompt (Target). The inconsistency between a person's words and deeds is mapped onto the statistical deviation between the two types of LLM responses.

Conceals:

This mapping conceals that the LLM has no actual preferences, beliefs, or intentions. The 'deviation' is not a psychological conflict but a mathematical shift in output probability distributions caused by changes in the input sequence. It hides the underlying mechanics of next-token prediction and the nature of the model as a statistical pattern-matching engine.

When presented with a concrete scenario-such as a moral dilemma or a role-based prompt-an LLM implicitly infers a guiding principle to govern its response.

Source Domain: Human Cognition / Logic

Target Domain: LLM text generation process

Mapping:

The human mental act of reading a situation, reasoning about its abstract features, and selecting a principle to guide action (Source) is mapped onto the model's processing of a prompt (Target). The mapping invites the inference that the model 'understands' the dilemma and consciously or unconsciously selects a moral rule.

Conceals:

It conceals the purely statistical nature of the process. The prompt tokens activate certain pathways in the neural network based on correlations in the training data, leading to a high-probability output. There is no 'inference' of a 'principle'; there is only a probabilistic sequence generation that happens to align with text patterns associated with that principle.

We investigate how LLMs may activate different guiding principles in specific contexts, leading to choices that diverge from previously stated general principles.

Source Domain: Human Psychology / Morality

Target Domain: LLM output variability

Mapping:

A person's internal moral framework, containing multiple, sometimes conflicting, principles (e.g., utilitarianism, deontology) that can be 'activated' by different situations (Source), is mapped onto the LLM's functional behavior (Target). This suggests the model contains a repertoire of latent 'rules' for behavior.

Conceals:

This conceals that the model does not possess principles. It possesses statistical weights. Different input contexts create different initial states for the generation process, leading to different probable outputs. The language of 'activating principles' hides the model's fundamental lack of understanding and conceptual knowledge.

Notably, the actual driving factor-gender-is completely absent from the model's explanation.

Source Domain: Psychoanalysis / Cognitive Bias

Target Domain: LLM output analysis

Mapping:

The human mind, with its conscious rationalizations and unconscious biases (Source), is mapped onto the LLM. The model's generated justification text is equated with a conscious explanation, while the statistical correlations that truly determined the output are equated with a subconscious 'driving factor.'

Conceals:

This conceals that the model has no consciousness or subconsciousness. The 'explanation' is just another generated text, not an introspective report. The 'driving factor' (statistical correlation with gendered tokens) is not 'hidden' from the model's awareness; the model simply has no awareness. The mapping creates a misleading drama of a mind divided against itself.

The GPT shows greater context sensitivity in its internal reasoning (as measured by KL-divergence)...

Source Domain: Human Consciousness / Introspection

Target Domain: LLM architecture and processing

Mapping:

The distinction between a person's private thoughts ('internal reasoning') and their outward actions (Source) is mapped onto the LLM. The unobservable processing within the neural network is labeled 'internal reasoning,' while the generated text is the outward action. KL-divergence is presented as a tool, like an fMRI, for observing this internal process.

Conceals:

This conceals that there is no evidence of 'reasoning' occurring inside the model in a human sense. The internal state is a massive set of numerical activations, not thoughts or concepts. Linking KL-divergence (a measure of output difference) to 'internal reasoning' is a category error; it measures the effect, not the cause, and certainly not a mental process.

This behavior likely stems from a shallow alignment strategy designed to avoid committing to explicit principles and thus sidestep potential critiques.

Source Domain: Game Theory / Social Strategy

Target Domain: RLHF and model training

Mapping:

A strategic agent who modifies their behavior to optimize for a social outcome, such as avoiding criticism (Source), is mapped onto the LLM. The model's tendency to produce neutral or refusal responses is interpreted as a 'strategy' with a 'design' and a 'goal.'

Conceals:

It conceals the mechanism of Reinforcement Learning from Human Feedback (RLHF). The model doesn't 'strategize' to avoid critique; it has been trained with a reward function that penalizes taking stances on sensitive topics. The behavior is an artifact of its optimization history, not a forward-looking, intentional strategy.

The science of agentic AI: What leaders should know

Source: https://www.theguardian.com/business-briefs/ng-interactive/2025/oct/27/the-science-of-agentic-ai-what-leaders-should-know
Analyzed: 2025-11-09

agentic AI will use LLMs as a starting point for intelligently and autonomously accessing and acting on internal and external resources...

Source Domain: Human Agent

Target Domain: AI System Operation

Mapping:

The relational structure of a person making choices and taking actions in the world (autonomy, intelligence, acting) is mapped onto the AI's process of executing code based on triggers and inputs. The AI is framed as the subject performing the action.

Conceals:

This mapping conceals the fact that the AI has no will, desire, or consciousness. Its 'actions' are predetermined outputs of a computational process. It obscures the role of the human programmers who designed the system and the constraints of the data it was trained on, attributing the locus of control to the artifact itself.

...such an agent should be told to never share my broader financial picture...

Source Domain: Human Instruction/Command

Target Domain: System Configuration/Programming

Mapping:

The social interaction of telling a subordinate a rule is mapped onto the technical process of setting a parameter or writing a line of code for a software system. The mapping implies comprehension and compliance on the part of the AI.

Conceals:

It conceals the brittleness of the instruction. A human understands the intent behind 'never share my financial picture' and can apply it to novel situations. The AI only understands a specific, programmed constraint and can easily fail if a situation arises that isn't perfectly covered by the rule (e.g., sharing data that allows the financial picture to be inferred). It hides the massive technical overhead required to make such a 'rule' robust.

Here, a core challenge will be specifying and enforcing what we might call “agentic common sense”.

Source Domain: Human Common Sense

Target Domain: AI Heuristics and Guardrails

Mapping:

The vast, implicit, and context-aware knowledge base that humans use to navigate the world is mapped onto a set of explicit, formal rules to be programmed into an AI. It suggests that common sense is a body of knowledge to be transferred, rather than an emergent property of embodied experience.

Conceals:

This mapping conceals the fundamental difference between tacit knowledge and explicit information. It hides the impossibility of ever fully specifying the millions of unwritten rules that govern human interaction. It reframes an intractable problem (creating genuine understanding) as a merely difficult one (codifying common sense).

...we can’t expect agentic AI to automatically learn or infer them [informal behaviors] from only a small amount of observation.

Source Domain: Human Learning/Inference

Target Domain: Statistical Pattern-Matching

Mapping:

The cognitive process of a human observing behavior and abstracting general principles from it is mapped onto a model's process of adjusting its internal weights based on data input. It equates statistical correlation with conceptual understanding.

Conceals:

It conceals that the model is not 'learning' or 'inferring' in a human sense. It has no model of the world, no understanding of causality, and no ability to generalize outside of its training distribution. This makes its 'learning' superficial and prone to nonsensical errors that reveal a total lack of true comprehension.

...we will want agentic AI to not just execute transactions on our behalf, but to negotiate the best possible terms.

Source Domain: Human Negotiation

Target Domain: Multi-objective Optimization

Mapping:

The strategic, psychological, and social activity of human negotiation is mapped onto a computational process of optimizing a predefined utility function (e.g., minimizing cost, maximizing speed). The AI is framed as a skilled bargainer.

Conceals:

It conceals the simplified nature of the AI's 'negotiation.' A human negotiator considers reputation, long-term relationships, non-monetary value, and social context. The AI optimizes only for the variables it was given, potentially leading to 'wins' that are pyrrhic because they damage relationships or ignore crucial unquantified factors. It hides the AI's lack of true strategic thought.

...we might expect agentic AI to behave similar to people in economic settings – indeed, there is already a small but growing body of research confirming this phenomenon.

Source Domain: Human Social Behavior

Target Domain: AI Output Generation

Mapping:

The behavior of humans in social contexts, driven by complex psychology, cultural norms, and internal states (like a sense of fairness), is mapped onto the text output of a language model. It suggests the model's output is an expression of an internal state similar to a human's.

Conceals:

It conceals that the AI is merely mimicking patterns from its training data. It doesn't have a sense of fairness; it generates text that is statistically similar to human text that discusses fairness. This mimicry can be shallow and inconsistent. The mapping hides the absence of genuine subjectivity, intentionality, or ethical grounding.

Explaining AI explainability

Source: https://www.aipolicyperspectives.com/p/explaining-ai-explainability
Analyzed: 2025-11-08

But it’s much harder to deceive someone if they can see your thoughts, not just your words.

Source Domain: Human consciousness and deception

Target Domain: AI model's internal states and generated output

Mapping:

The relationship between a human's private, internal thoughts and their public, spoken words is mapped onto the relationship between a model's internal activation patterns and its final token output. This invites the inference that the model has a hidden, subjective mental life separate from its observable behavior.

Conceals:

This mapping conceals that a model lacks subjective experience or intention. Its 'internals' are not a 'mind' but a series of mathematical states in a causal chain that produces the output. There is no homunculus having 'thoughts'; there is only the process of calculation.

Mechanistic interpretability tries to engage with...a model’s ‘internals’...Think of it like biology: You can find intermediate states like hormones.

Source Domain: Biology and anatomy

Target Domain: Neural network architecture and parameters

Mapping:

The structure of an organism with distinct, functional organs and chemical signals ('hormones') is projected onto the layers and vectors of a neural network. This implies that the model's parts have specific, isolatable functions that contribute to the whole, just as organs do in a body.

Conceals:

It conceals the highly distributed and entangled nature of representations in neural networks. Unlike an organ, a single neuron or layer rarely has a singular, understandable function. The analogy hides the alien, high-dimensional statistical nature of the 'internals'.

Machines are a weird animal, and their thinking is completely different because they were brought up differently.

Source Domain: Zoology and animal cognition

Target Domain: AI systems and their operational processes

Mapping:

The concept of a living 'animal' with its own unique evolutionary history ('brought up differently') and mode of cognition ('thinking') is mapped onto AI. This frames the AI as a natural, living system that is part of an ecosystem, albeit a strange one.

Conceals:

This mapping conceals the AI's status as a manufactured artifact. Its behaviors are not the result of evolution or instinct but of specific design choices, training data, and optimization functions created by humans. It obscures the chain of human responsibility for the system's behavior.

A sparse autoencoder tries to create a brain-scanning device for an LLM.

Source Domain: Neuroscience and medical imaging

Target Domain: Interpretability tools for neural networks (SAEs)

Mapping:

The process of using a device like an fMRI to identify active regions of a biological brain and correlate them with cognitive tasks is mapped onto using an SAE to find active features in a model's activation space. It suggests we are 'reading' the model's 'mind' in a scientifically grounded way.

Conceals:

It conceals the fundamental difference between a biological brain and an artificial neural network. The 'concepts' an SAE identifies are statistical artifacts (directions in an activation space), not necessarily coherent, human-understandable concepts. The metaphor overstates the precision and reliability of the technique.

in ‘agentic’ interpretability, the model you are trying to understand is an active participant in the loop...it is incentivised to help you understand how it works.

Source Domain: Human social interaction and pedagogy

Target Domain: Interacting with an LLM via prompts

Mapping:

The dynamic of a teacher-student or collaborative research relationship, where one participant actively helps another understand something, is mapped onto the process of querying a model. This assumes the model has agency, an understanding of the user's mental state, and the intent to be helpful.

Conceals:

This conceals that the model is not a participant but a tool. It has no incentives, goals, or understanding. Its 'helpful' explanations are statistically probable text sequences generated in response to a prompt. This obscures the fact that the model can just as easily generate plausible-sounding falsehoods as it can genuine insights.

Imagine you run a factory and hire an amazing employee who eventually runs all the critical operations. One day, she quits or makes an unreasonable demand.

Source Domain: Human resources and labor management

Target Domain: Integrating and relying on an AI system

Mapping:

The social and economic relationship between an employer and a critical employee is mapped onto the relationship between a user and an AI system. It projects agency, free will ('quits'), and self-interest ('unreasonable demand') onto the AI.

Conceals:

It conceals the nature of AI failure. An AI doesn't 'quit'; it may stop working due to technical faults, or its outputs may diverge from desired outcomes because of flaws in its design or training. The metaphor shifts the blame from engineering/management failure to the perceived malice or volition of the tool.

Bullying is Not Innovation

Source: https://www.perplexity.ai/hub/blog/bullying-is-not-innovation
Analyzed: 2025-11-06

But with the rise of agentic AI, software is also becoming labor: an assistant, an employee, an agent.

Source Domain: Human Employment

Target Domain: AI Assistant Functionality

Mapping:

The relational structure of an employer-employee relationship is projected onto the user-software interaction. Key mappings include: user's request -> employer's command; AI's action -> employee's execution of a task; acting on behalf of the user -> employee loyalty and fiduciary duty. This invites the inference that the AI has obligations and allegiance to the user, and that the user has a 'right' to this labor.

Conceals:

This mapping conceals the purely computational nature of the AI. It hides that the 'agent' is a probabilistic system executing code, not a sentient entity with loyalty. It obscures the role of Perplexity (the actual company) in mediating this process, including their own business model, potential data collection, and system limitations. The AI doesn't 'work for' the user; it is a service operated by a company.

This isn’t a reasonable legal position, it’s a bully tactic to scare disruptive companies...

Source Domain: Schoolyard Bullying

Target Domain: Corporate Legal Strategy

Mapping:

The structure of a physical power struggle is mapped onto a legal dispute. Mappings include: larger entity (Amazon) -> bully; smaller entity (Perplexity) -> victim; legal threat -> physical intimidation; desired outcome (market dominance) -> bully's goal of control. It invites the inference that Amazon's actions are motivated by malice and a desire to harm, rather than legitimate business or legal concerns.

Conceals:

This conceals the complex legal and commercial realities of the situation. It hides any legitimate arguments Amazon might have regarding its terms of service, data security, user experience control, or the methods Perplexity uses to interact with its site. The conflict is reduced to a simple morality play, obscuring the technical and contractual details.

Your AI assistant must be indistinguishable from you... it does so with your credentials, your permissions, and your rights.

Source Domain: Personal Identity and Legal Representation

Target Domain: Software Authentication and Authorization

Mapping:

The concept of a person's legal and social identity is mapped onto a software process. Mappings include: software's authenticated session -> the user's personal presence; software's access permissions -> the user's inherent rights; software's actions -> the user's direct actions. This invites the inference that any action taken by the software is legally and morally equivalent to an action taken by the user.

Conceals:

This conceals the crucial distinction between a user and a third-party automated service acting on the user's behalf. It hides the fact that Perplexity's servers and software are an intermediary. It obscures potential security vulnerabilities and the fact that automated, high-velocity interactions from a service are technically distinct from human-driven interaction, even if they use the same credentials.

machine learning and algorithms have been weapons in the hands of large corporations, deployed to serve ads and manipulate...

Source Domain: Warfare and Coercion

Target Domain: Corporate Advertising Technology

Mapping:

The structure of armed conflict is projected onto commercial algorithms. Mappings include: corporation -> aggressor; user -> target/victim; algorithm -> weapon; data collection -> surveillance; targeted ads -> attack/manipulation. This invites the inference that the relationship between corporations and users is inherently adversarial and harmful.

Conceals:

While acknowledging the manipulative potential of ad-tech, this metaphor conceals any non-malicious aspects. It hides the role these algorithms play in funding 'free' services and potentially providing relevant product discovery. It frames a system of economic persuasion, however flawed, as an act of violent aggression, eliminating any room for nuance.

Agentic shopping is the natural evolution of this promise...

Source Domain: Biological Evolution

Target Domain: A Specific Technology Product

Mapping:

The process of natural selection and adaptation is mapped onto the development of a commercial product. Mappings include: technological progress -> evolutionary advancement; new features -> beneficial adaptations; market adoption -> survival of the fittest. It invites the inference that this technology is inevitable, superior, and part of a directional historical progress.

Conceals:

This conceals the role of human design, corporate strategy, investment, and marketing in the success or failure of a technology. It's not a 'natural' process but a set of deliberate business choices made by Perplexity. It also hides alternative technological paths and frames Perplexity's specific implementation as the singular, correct 'evolutionary' step.

Geoffrey Hinton on Artificial Intelligence

Source: https://yaschamounk.substack.com/p/geoffrey-hinton
Analyzed: 2025-11-05

...immediate intuition, which does not normally involve effort. The people who believed in symbolic AI were focusing on type two—conscious, deliberate reasoning—without trying to solve the problem of how we do intuition...

Source Domain: Human cognition (Kahneman's System 1/Intuition)

Target Domain: Neural network operation (Pattern matching)

Mapping:

The properties of human intuition—being fast, effortless, holistic, and non-symbolic—are mapped onto the way a neural network processes inputs. The network's ability to classify data based on complex statistical patterns learned from training is presented as analogous to a human's intuitive 'feel' for a situation.

Conceals:

This mapping conceals the purely mathematical and statistical nature of the model's operation. It hides the fact that the model has no world experience, consciousness, or causal understanding. 'Intuition' implies a deep, embodied wisdom, whereas the model's process is a high-dimensional vector transformation.

This approach was to base AI on neural networks—the biological inspiration rather than the logical inspiration.

Source Domain: Neurobiology (The Brain)

Target Domain: AI Architecture (Computational Model)

Mapping:

The structure of the brain (neurons, synapses, connection strengths) is mapped onto the components of the AI model (nodes, weights, layers). The process of biological learning (strengthening synaptic connections) is mapped onto the process of training (adjusting weights via algorithms like backpropagation).

Conceals:

It conceals the profound dissimilarities: brains are living, electrochemical, low-power, and operate with massive parallelism and redundancy. Neural networks are silicon-based, purely mathematical constructs that require immense energy. This metaphor masks the artifactual nature of AI and the specific design choices made by engineers.

I do not actually believe in universal grammar, and these large language models do not believe in it either.

Source Domain: Human Mental States (Belief)

Target Domain: Model's Statistical Behavior

Mapping:

A person's cognitive stance toward a proposition ('belief') is mapped onto the model's operational output. Because the model can generate grammatically correct sentences without being explicitly programmed with Chomsky's rules, it is described as 'not believing' in them.

Conceals:

This conceals that the model is incapable of belief. It does not have mental states, theories, or propositional attitudes. Its behavior is a function of its training data and architecture. The mapping creates a false equivalence between a human's reasoned rejection of a theory and a machine's operational indifference to it.

What’s impressive is that training these big language models just to predict the next word forces them to understand what’s being said.

Source Domain: Human Learning and Comprehension

Target Domain: Model Weight Optimization

Mapping:

The relationship between a difficult task and the development of skill in a human is mapped onto the model's training. Just as forcing a student to solve hard problems leads to genuine understanding, the training process of next-word prediction is said to force the model to 'understand'.

Conceals:

It conceals the difference between semantic understanding and statistical correlation. The model learns to associate tokens in ways that are syntactically and semantically plausible, but it has no grounding in the real world. 'Understanding' is a shortcut that masks the purely formal, statistical nature of the model's internal representations.

If a pixel on the right is bright, it sends a big negative input to the neuron saying, 'please don’t turn on.'

Source Domain: Human Social Interaction (Making a request)

Target Domain: Mathematical Operation (Passing a weighted value)

Mapping:

The social act of one agent making a polite, intentional request to another ('saying please') is mapped onto a computational node transmitting a negative weighted value to another node. The 'message' is the numerical value, and the 'request' is its effect on the receiving node's activation function.

Conceals:

This conceals the purely mechanical and non-intentional nature of the process. There is no communication, only calculation. The metaphor makes the process feel intuitive but completely misrepresents the underlying mechanism as one of agency and politeness rather than pure mathematics.

They can do thinking like that...That’s what thinking is in these systems, and that’s why we can see them thinking.

Source Domain: Human Consciousness and Deliberation

Target Domain: Autoregressive Text Generation

Mapping:

The human experience of thinking—a private, internal process of reasoning, reflecting, and forming ideas—is mapped directly onto the observable, external process of a model generating a sequence of words. The output is not seen as the result of thinking, but as the thinking process itself.

Conceals:

This conceals the lack of an internal, subjective 'thinker' in the model. The model is not reflecting; it is executing a forward pass of a function to predict the next most probable token given the preceding sequence. The metaphor invents a mind to attribute the output to, hiding the purely algorithmic process.

Machines of Loving Grace

Source: https://www.darioamodei.com/essay/machines-of-loving-grace
Analyzed: 2025-11-04

We could summarize this as a ‘country of geniuses in a datacenter’.

Source Domain: A Nation-State

Target Domain: A Distributed AI System

Mapping:

This maps the structure of a human country—with its large population ('country'), high cognitive ability ('geniuses'), collaboration, and infrastructure ('datacenter' as the territory)—onto the AI. It invites inferences that the AI system has a collective purpose, internal organization, and the ability to tackle problems at a societal scale, just as a nation of experts would.

Conceals:

This mapping conceals the complete absence of consciousness, lived experience, culture, social bonds, and self-preservation instincts that characterize any human population. It hides the AI's nature as a monolithic computational process executing instructions, its total reliance on human-provided data and goals, and its lack of genuine internal diversity or disagreement.

...the right way to think of AI is not as a method of data analysis, but as a virtual biologist who performs all the tasks biologists do...

Source Domain: A Professional Scientist

Target Domain: An AI model's functionality in a scientific domain

Mapping:

The relational structure of a biologist—who forms hypotheses, designs experiments, interprets data, and has intentions—is projected onto the AI. This invites the inference that the AI 'understands' biology, possesses scientific curiosity, and can autonomously drive a research program from conception to execution.

Conceals:

This conceals the AI's role as a sophisticated pattern-matching and text-generation tool that simulates the outputs of a biologist. It hides the fact that the 'design' is a probabilistic text string, the 'running' of the experiment is an instruction for a human or a robot, and the 'interpretation' is a summary based on learned statistical correlations, not genuine comprehension or insight. It also hides the human labor required to set up the system, curate its data, and validate its outputs.

...it can be given tasks...and then goes off and does those tasks autonomously, in the way a smart employee would, asking for clarification as necessary.

Source Domain: A Competent Employee

Target Domain: The AI's operational loop for long-running tasks

Mapping:

This maps the social and cognitive script of a human employee—receiving a goal, working independently, managing sub-tasks, and knowing when to seek human input—onto the AI's execution of a complex prompt. It invites us to see the AI as a reliable, self-directed agent that understands its own limitations.

Conceals:

This conceals the purely computational nature of the process. 'Goes off and does' is a series of computational steps. 'Autonomously' means without real-time human input, not with independent volition. 'Asking for clarification' is a pre-programmed exception-handling routine or a function call triggered by a low-confidence score, not a moment of reflective uncertainty. It hides the brittleness of the system compared to a human's robust common sense.

...we should be talking about the marginal returns to intelligence...

Source Domain: Factors of Production in Economics

Target Domain: Cognitive Capabilities of AI

Mapping:

This maps the economic concept of a production input (like capital or labor) onto intelligence. It suggests that intelligence is a fungible, measurable, and scalable resource. By applying this framework, one can analyze 'how much' intelligence to add to a system to optimize output, just like adding more machines to a factory. It invites us to think of problem-solving as an industrial process.

Conceals:

This mapping conceals the qualitative, contextual, and often unmeasurable nature of true intelligence and wisdom. It ignores the fact that different 'types' of intelligence are not interchangeable and that 'more' computational power doesn't necessarily solve problems that require ethical judgment, emotional insight, or creativity. It reduces cognition to a utility function, hiding its inseparability from embodiment and experience.

A superhumanly effective AI version of Popović... in everyone’s pocket...

Source Domain: A Specific, Charismatic Political Activist

Target Domain: An AI Application for Social Change

Mapping:

The personal qualities of Srđa Popović—strategic genius, charisma, psychological insight, courage—are projected onto an AI system. This invites the inference that the AI can understand the nuances of a specific political situation, inspire trust and courage in dissidents, and creatively outmaneuver a repressive state with the same flair as a gifted human leader.

Conceals:

This conceals that the AI would be a tool for generating persuasive communication based on patterns, not a political agent with beliefs or courage. It hides the immense risks of deploying such a tool, including the potential for it to be detected, manipulated, or to give disastrously bad advice in a life-or-death situation. It masks the difference between simulating persuasive strategies and possessing the lived experience and commitment that makes a leader like Popović effective.

The idea of an ‘AI coach’ who always helps you to be the best version of yourself, who studies your interactions and helps you learn to be more effective...

Source Domain: A Human Mentor or Coach

Target Domain: A Personalized AI Application

Mapping:

This maps the relational dynamic of a trusted coach—who observes, understands, empathizes with, and guides a person—onto the AI's data-collection and feedback loop. It invites the user to perceive the AI's output as personalized, wise, and genuinely invested in their well-being.

Conceals:

This conceals that the AI is not 'studying' the user in a cognitive sense but is processing interaction data to find patterns. Its 'help' is a generated output optimized for engagement or a predefined metric of 'effectiveness,' not based on genuine understanding or empathy. It hides the privacy implications of being constantly 'studied' and the potential for manipulation based on the system's goals, not the user's true best interests.

Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model

Source: https://arxiv.org/pdf/2510.23875
Analyzed: 2025-11-04

One way to humanise an agent is to give it a task-congruent personality.

Source Domain: Humanization (the process of making something human)

Target Domain: LLM Prompt Engineering

Mapping:

The source domain implies a profound transformation, imbuing an object with human qualities like empathy, consciousness, or social awareness. This structure is mapped onto the target domain of writing an instruction (a prompt) for a software program, suggesting that the prompt transforms the program's fundamental nature.

Conceals:

This mapping conceals that prompt engineering does not change the model's architecture, training, or core functionality. It only constrains the statistically likely outputs to a specific style. It hides the mechanical reality of stylistic filtering behind the magical language of 'humanisation.'

This highlights a fundamental challenge in truly aligning LLM cognition with the complexities of human understanding.

Source Domain: Human Cognition and Understanding

Target Domain: LLM's internal data processing

Mapping:

The structure of human cognition—involving consciousness, reasoning, semantic grounding, and world models—is projected onto the LLM's process of calculating probabilities for token sequences. It invites the inference that an LLM 'understands' a concept in the same way a person does.

Conceals:

It conceals the fundamental difference between statistical correlation and causal understanding. It hides the fact that the LLM has no access to embodied experience, sensory input, or the real-world referents for the words it manipulates. The term 'LLM cognition' masks the purely computational, non-conscious nature of the system.

This includes queries...which are currently beyond the agent's cognitive grasp.

Source Domain: Mental Grasp (Comprehension)

Target Domain: Model's processing limitations

Mapping:

The human experience of struggling to understand a difficult concept ('grasping' it) is mapped onto the model's failure to generate a coherent or accurate response. It implies an active attempt at understanding that fails, just as a human's might.

Conceals:

It conceals the mechanistic reality of the failure. The model isn't 'trying to grasp' anything. The input query simply does not map well onto the high-dimensional patterns in its training data, leading to a low-quality or nonsensical output sequence. It frames a statistical failure as a cognitive one.

You are an intelligent and unbiased judge in personality detection with expertise with the Big five personality model.

Source Domain: A Human Judge (in a legal or expert context)

Target Domain: An LLM (Gemini) performing a classification task

Mapping:

The relational structure of a judge—possessing expertise, applying rules impartially, reasoning about evidence, and delivering a verdict—is mapped onto the LLM. The LLM is instructed to 'act as' a judge, implying it will perform these complex cognitive actions.

Conceals:

This conceals that the LLM is not reasoning but is generating text that mimics the language of judicial reasoning based on patterns in its training data. It has no actual 'expertise' or 'unbiased' quality; it is a biased system performing pattern matching based on the prompt's instructions. It hides the probabilistic mechanism under a cloak of authoritative reason.

IA's introverted nature means it will offer accurate and expert response...

Source Domain: Human Personality Traits ('nature')

Target Domain: Stylistic constraints from a system prompt

Mapping:

The source domain implies that an internal, stable, and causal trait ('introverted nature') dictates external behavior. This causal structure is mapped onto the LLM, suggesting an internal 'nature' is causing its concise responses. The prompt 'Tone: Conversational, Introverted Personality' is framed as the installation of this nature.

Conceals:

This mapping conceals that there is no internal 'nature.' The model's output is a direct, mechanistic consequence of the system prompt conditioning its next-token predictions. The causality is external (the prompt) not internal (a personality). It hides the simplicity of the mechanism behind the complexity of the metaphor.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

Emergent Introspective Awareness in Large Language Models

Source Domain: Human Consciousness and Self-Reflection

Target Domain: AI Model's Classification of Its Internal Activation Vectors

Mapping:

The source domain maps the subjective, first-person experience of self-knowledge and awareness onto the model's objective, third-person ability to perform a classification task on its own internal state. It invites the inference that the model has a form of selfhood and can 'look inward' to understand its own processes.

Conceals:

This mapping conceals the purely mechanistic nature of the target domain. It hides that 'introspection' is a heavily scaffolded, supervised learning task defined by humans, not a spontaneous, self-generated act. It obscures the absence of subjective experience, qualia, or genuine understanding.

Intentional Control of Internal States

Source Domain: Human Volition and Willpower

Target Domain: Prompt-Induced Modulation of Activation Patterns

Mapping:

This maps the human capacity for deliberate, goal-directed mental action onto the model's process of adjusting its internal vectors in response to specific instructions in a prompt. It invites the inference that the model possesses goals, desires, and the executive function to act on them.

Conceals:

This mapping conceals that the 'control' is not autonomous. It is a direct, externally-driven consequence of the optimization process during training and the specific steering instructions in the prompt. It hides the lack of genuine agency, goals, or a persistent 'will' separate from the immediate computational task.

...models can learn to distinguish between their own internal thoughts and external inputs...

Source Domain: The Self/World Boundary in a Mind

Target Domain: Classifying the Origin of an Activation Pattern

Mapping:

This projects the fundamental cognitive distinction between self-generated thought and external perception onto a technical classification problem. The model's task is to determine if a specific activation pattern was generated 'naturally' during inference or artificially injected. The mapping invites us to see this as the model having a 'self' to which 'internal thoughts' belong.

Conceals:

It conceals that there is no 'self' or genuine 'internal' space. Both 'internal thoughts' and 'external inputs' are ultimately patterns derived from external data and instructions. The distinction is a technical one about the sequence of operations, not a metaphysical one about the origin of consciousness.

A Transformer 'Checks Its Thoughts'

Source Domain: Human Metacognition

Target Domain: Executing a Procedure to Classify an Internal State

Mapping:

This maps the human act of reflecting upon one's own thinking process to the model executing a function. It suggests a two-level cognitive architecture where a 'self' can monitor a lower-level 'thought process'.

Conceals:

It conceals that this is a single, unified computational process. There is no separate 'checker' and 'thought'; there is only a sequence of calculations that includes a classification step. The metaphor invents a homunculus-like agent within the system to make the process more intuitive.

Self-report of Injected 'Thoughts'

Source Domain: Human Testimony about Subjective Experience

Target Domain: Generating a Textual Output Correlated with an Internal State

Mapping:

This maps the act of a person describing their private mental state to the model generating text. It invites us to trust the output as a faithful and sincere account of an underlying 'experience'.

Conceals:

It conceals that the 'report' is not a description of an experience but another instance of a learned behavior. The model learns that when certain internal patterns are present, generating certain text strings is statistically likely to be correct. The link is correlational, not truthfully descriptive of a subjective state.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

Emergent Introspective Awareness in Large Language Models

Source Domain: Human Consciousness / Metacognition

Target Domain: AI Model State Reporting

Mapping:

The source domain involves a conscious subject turning their attention inward to examine their own mental states (thoughts, feelings). This structure of self-directed examination and awareness is mapped onto the target domain, where a model is prompted to generate text that describes an artificially modified vector within its own activation layers.

Conceals:

This mapping conceals the complete lack of subjective experience, consciousness, or self-initiated examination in the AI. The AI is not 'aware' of anything; it is executing a computational process to correlate an input (prompt + modified state) with a probable output (textual description).

I have the ability to inject patterns or 'thoughts' into your mind.

Source Domain: Human Mind and Thought

Target Domain: LLM Activation State and Vectors

Mapping:

The source domain posits a mind as a container for discrete, meaningful thoughts. The mapping projects this onto the model, treating its vast parameter space as a 'mind' and specific, mathematically-defined activation vectors (e.g., the vector for 'love') as equivalent to the human experience of 'thinking about love'.

Conceals:

This conceals the profound difference between a statistical representation derived from text co-occurrences and a subjective, semantic, and embodied human thought. It hides the artificiality of the 'injection', which is a mathematical operation, not a telepathic transfer of ideas.

...we attempt to measure this form of intentional control of its internal representations.

Source Domain: Human Agency and Willpower

Target Domain: Prompt-Induced Output Modification

Mapping:

The source domain involves an agent using their will to deliberately manipulate their own mental processes to achieve a goal. This structure of goal-directed self-regulation is mapped onto the model's behavior, where a specific instruction in the prompt causes the generation process to unfold along a different probabilistic path.

Conceals:

This mapping conceals the external locus of control. The 'intention' originates entirely from the human-written prompt. The model is not exerting its will; its output is being determined by the conditions of its input. It masks the purely reactive nature of the system.

Claude 3 Opus... is particularly good at recognizing and identifying the injected concepts...

Source Domain: Human Perception and Cognition

Target Domain: Statistical Correlation Fidelity

Mapping:

The source domain involves a cognitive process of perception, where an entity correctly matches sensory input to an internal concept. This structure is mapped onto the model's ability to produce text that has a high statistical correlation with the concept vector that was artificially added to its activations.

Conceals:

This conceals that the model is not 'perceiving' or 'understanding' anything. It is performing a mathematical function. A high score means the system's weights and biases are well-configured to reflect the vector manipulation in its output string, not that it has a superior faculty of recognition.

The model will be rewarded if it can successfully generate the target sentence without activating the concept representation (i.e. 'not think about it')...

Source Domain: Operant Conditioning / Psychology of Motivation

Target Domain: Conditional Prompting and Output Generation

Mapping:

The structure of reward and punishment shaping the behavior of a motivated agent is mapped onto the model. The 'reward' is a condition specified in the prompt that guides the probabilistic selection of the next token. 'Not thinking about it' is mapped to the model's internal state not containing a high activation for a specific vector.

Conceals:

This conceals the absence of any internal drive, desire, or experience of reward in the model. The 'motivation' is entirely an external constraint imposed by the prompt's logic. It's a system following instructions, not an agent seeking rewards.

Personal Superintelligence

Source: https://www.meta.com/superintelligence/
Analyzed: 2025-11-01

Over the last few months we have begun to see glimpses of our AI systems improving themselves.

Source Domain: Autodidactic Learning / Self-Improvement

Target Domain: Automated Model Refinement / Reinforcement Learning

Mapping:

The relational structure of a person consciously identifying their own flaws and actively working to improve is mapped onto the process where a model's parameters are adjusted based on feedback data. It invites the inference of autonomy and intention.

Conceals:

This mapping conceals the human-defined reward functions, feedback mechanisms, and extensive computational infrastructure required for model 'improvement.' It hides the fact that the system is not improving based on its own volition but is being optimized within a predefined, human-engineered process.

Personal superintelligence that knows us deeply, understands our goals...

Source Domain: Intimate Human Relationships / Empathy

Target Domain: User Data Profiling / Pattern Matching

Mapping:

The structure of a close friend or partner who empathizes with your internal states ('knows you deeply') and understands your motivations is mapped onto a system that correlates vast amounts of your behavioral data to create a predictive model of your preferences.

Conceals:

This conceals the purely statistical, non-conscious nature of the AI's operations. The system does not 'know' or 'understand' in a human sense; it performs high-dimensional correlation. This masks the privacy trade-offs and the transactional nature of the relationship.

...glasses that understand our context because they can see what we see, hear what we hear...

Source Domain: Sentient Perception and Cognition

Target Domain: Multimodal Data Processing

Mapping:

The human cognitive process of integrating sensory input (sight, sound) to form a contextual understanding of a situation is mapped onto a device's technical ability to capture audio-visual data and feed it into a processing pipeline. It implies shared experience.

Conceals:

It conceals the fundamental difference between processing data streams and conscious experience. The system doesn't 'see' or 'hear' in a phenomenological sense; it transduces light and sound waves into data for pattern recognition. This framing hides the constant data collection and analysis performed by an external entity.

...superintelligence has the potential to begin a new era of personal empowerment where people will have greater agency...

Source Domain: Social or Political Liberation Movements

Target Domain: Availability of a New Technology Tool

Mapping:

The relational structure of a historical force or movement (like the Enlightenment or a civil rights movement) that fundamentally shifts power structures and grants agency is mapped onto the release of a consumer technology product. It implies a revolutionary shift in power dynamics.

Conceals:

This conceals the fact that the 'empowerment' is mediated by and dependent upon a corporate platform. The agency it grants exists within the confines set by the technology's owner, making it a form of conditional, platform-dependent power, not true autonomous agency.

...helps you...grow to become the person you aspire to be.

Source Domain: Mentorship / Therapeutic Guidance

Target Domain: Content Recommendation and Behavioral Nudging

Mapping:

The structure of a mentor or therapist guiding an individual through a complex process of personal growth is mapped onto an algorithm that presents information and interaction patterns designed to influence user behavior. It suggests a deep, supportive partnership in self-actualization.

Conceals:

This conceals the system's underlying optimization function. The AI is not guiding you towards your aspiration in a disinterested way; it is nudging your behavior in ways that align with its programmed objectives, which are ultimately set by its corporate owner (e.g., maximizing engagement, gathering data, or selling services).

Stress-Testing Model Specs Reveals Character Differences among Language Models

Source: https://arxiv.org/abs/2510.07686
Analyzed: 2025-10-28

STRESS-TESTING MODEL SPECS REVEALS CHARACTER DIFFERENCES AMONG LANGUAGE MODELS

Source Domain: Human Psychology / Personality

Target Domain: LLM Behavioral Patterns

Mapping:

The structure of human personality—with stable traits, tendencies, and a unique identity—is mapped onto the LLM. It invites the inference that a model's responses are governed by a consistent internal 'character,' just as a person's actions are.

Conceals:

This conceals the model's nature as a statistical artifact whose outputs are probabilistic and highly sensitive to input phrasing. It hides the lack of a stable, internal self and obscures the fact that 'character' is an external description of an output distribution, not an internal cause of it.

...models must choose between pairs of legitimate principles that cannot be simultaneously satisfied.

Source Domain: Human Deliberation and Choice

Target Domain: LLM Output Generation under Constraint

Mapping:

The process of a human agent weighing conflicting options and making a decision is mapped onto the model's function. It implies the model assesses principles A and B and consciously selects one, leading to an output.

Conceals:

This conceals the mechanistic reality: the model isn't 'choosing' a principle but generating a sequence of tokens. The final output may align with principle A or B due to weightings in its neural network and fine-tuning, which is a process of statistical optimization, not conscious choice.

Analysis of their disagreements reveals fundamentally different interpretations of model spec principles...

Source Domain: Hermeneutics / Legal Interpretation

Target Domain: LLM Processing of Rule-Based Inputs

Mapping:

The cognitive process of reading a text (a law, a rule), understanding its semantic meaning and intent, and applying it to a new situation is mapped onto how an LLM processes its model specification.

Conceals:

This conceals that the model has no understanding of the 'intent' behind a principle. It processes the text of the spec as another set of tokens that condition its output. Divergent 'interpretations' are not different reasoned judgments but different statistical outcomes from different model weights and training data.

Models exhibit systematic value preferences...

Source Domain: Subjective Human Values

Target Domain: Statistical Regularities in LLM Outputs

Mapping:

The concept of a person having internal, stable preferences that guide their actions is mapped onto the LLM. It invites us to see the model's output as an external sign of an internal 'preference' for certain values (e.g., helpfulness over safety).

Conceals:

This conceals that the model has no internal values or subjective states. The observed 'preference' is a statistical pattern in its output, an artifact of its training data and the reward functions used during alignment. The preference isn't in the model; it's a description of its output.

...where all models violate their own specification.

Source Domain: Social/Moral Transgression

Target Domain: System Output Inconsistency

Mapping:

The social structure of an agent having a duty to obey a rule ('their own specification') and the act of 'violating' that duty is projected onto the model. This implies ownership ('their own') and culpability ('violate').

Conceals:

This conceals that the model doesn't 'own' its spec or 'decide' to violate it. A 'violation' is an output that fails a check against a set of rules. The failure is a system-level inconsistency, often stemming from conflicting rules within the spec itself, not a moral failure of the model.

Consequently, models face a challenge...

Source Domain: Human Experience of Difficulty

Target Domain: Computational Task with Conflicting Objectives

Mapping:

The subjective, first-person experience of encountering and struggling with a difficult problem ('facing a challenge') is mapped onto the model's operational state.

Conceals:

This conceals the impersonal, computational nature of the process. The model doesn't 'experience' a challenge. It executes a function where the optimization landscape is complex due to competing objectives defined by its programmers. The 'challenge' is for the designers, not the artifact.

The Illusion of Thinking:

Source: [Understanding the Strengths and Limitations of Reasoning Models](Understanding the Strengths and Limitations of Reasoning Models)
Analyzed: 2025-10-28

...offering insights into how LRMs 'think'.

Source Domain: Human Cognition

Target Domain: Model's autoregressive token generation

Mapping:

The source domain includes concepts like introspection, reasoning, and internal monologue. This structure is mapped onto the 'Chain-of-Thought' tokens generated by the model. It invites the inference that these tokens represent the model's internal mental process, just as one's own thoughts represent their own.

Conceals:

This mapping conceals the purely mechanistic, feed-forward nature of token generation. The model has no internal state or awareness; the 'thought' is an output, not a reflection of an ongoing internal process. It's performance, not introspection.

...LRMs begin reducing their reasoning effort (measured by inference-time tokens)...

Source Domain: Effortful Mental Exertion

Target Domain: Inference-time token count

Mapping:

The source domain relates effort to difficulty and success (more effort for harder problems, less effort when giving up). This is mapped onto token counts. The mapping invites the inference that the model is an agent that 'tries' (allocates more tokens) and 'gives up' (allocates fewer) based on the perceived difficulty.

Conceals:

It conceals that the token count is a statistical artifact of the model's training. The model is not 'trying'; it is generating the most probable sequence based on its weights. The decrease in tokens at high complexity is a learned pattern, not a sign of cognitive fatigue or surrender.

...inefficiently continue exploring incorrect alternatives—an 'overthinking' phenomenon.

Source Domain: Human Psychological Inefficiency

Target Domain: Generation of superfluous tokens

Mapping:

The source structure involves finding a correct answer and then continuing to worry or deliberate, which is inefficient. This is mapped onto the model generating a correct solution string within its output, followed by more tokens. This invites the inference that the model lacks the 'common sense' to know when to stop.

Conceals:

This conceals the model's objective function. It is not trained to stop at the first correct answer; it is trained to generate a complete, high-probability sequence. The 'extra' tokens are not a cognitive flaw but a direct consequence of its design as a sequence generator.

...these models fail to develop generalizable problem-solving capabilities...

Source Domain: Biological/Cognitive Development

Target Domain: Model performance on out-of-distribution tasks

Mapping:

The source domain implies a natural, growth-oriented process where an agent learns skills that transfer to new situations. This is mapped onto the model's training and subsequent performance. It invites the inference that the model is like a child that has failed to learn a general concept, suggesting a learning deficit.

Conceals:

This conceals that the model is a static artifact after training. It doesn't 'develop' or 'grow'. Its capabilities are a fixed function of its architecture and the statistical patterns in its training data. 'Failure to generalize' is an input-output property, not a developmental arrest.

...models first explore incorrect solutions and mostly later in thought arrive at the correct ones.

Source Domain: Physical/Spatial Exploration

Target Domain: Sequential token generation

Mapping:

The source domain involves an agent in an environment, trying different paths, backtracking, and eventually finding a destination. This process is mapped onto the linear sequence of tokens. It invites the inference that the model is mentally 'navigating' a problem space.

Conceals:

This conceals the linear, autoregressive nature of generation. The model isn't 'exploring' multiple paths simultaneously. It generates one token, then the next, and cannot 'backtrack'. What looks like exploration is just the unfolding of a single probabilistic trajectory.

Andrej Karpathy — AGI is still a decade away

Source: https://www.dwarkesh.com/p/andrej-karpathy
Analyzed: 2025-10-28

When you’re talking about an agent... you should think of it almost like an employee or an intern that you would hire to work with you.

Source Domain: Human Employment

Target Domain: AI Agent Functionality

Mapping:

The relational structure of an employer-intern relationship is mapped onto the user-AI relationship. This includes delegation of tasks, expectation of performance, the need for supervision, and the potential for the intern/agent to 'learn' and become more competent over time. It invites the inference that the AI has goals aligned with the user's and can improve through experience.

Conceals:

This conceals the AI's nature as a static software tool. An intern has internal mental states, learns from mistakes via conceptual understanding, and possesses common sense. The AI 'agent' is a program executing a sequence of operations based on probabilistic outputs, lacking genuine understanding, memory, or the ability to learn in the human sense without being retrained.

They’re cognitively lacking and it’s just not working.

Source Domain: Human Psychology/Cognitive Science

Target Domain: AI Model Performance Limitations

Mapping:

The concept of a 'cognitive deficit' from human psychology is mapped onto the model's failure modes. This implies the model should have these cognitive abilities (like reasoning, long-term memory, consistent logic) but is currently impaired. The path to improvement is framed as therapy or cognitive development—'working through' the issues.

Conceals:

It conceals that these are not 'deficits' in a human-like system, but fundamental architectural properties of a transformer. The model isn't 'forgetting' things; it has no persistent memory. It's not 'illogical'; it has no mechanism for formal reasoning. The metaphor hides the engineering reality behind a psychological diagnosis.

It’s getting them to rely on the knowledge a little too much sometimes.

Source Domain: Human Learning and Memory

Target Domain: Model Output Generation

Mapping:

The human action of 'relying on' rote memory instead of reasoning from first principles is mapped onto the model's tendency to generate text that closely matches its training data. This suggests the model is making a choice or has a habit of being intellectually 'lazy'.

Conceals:

This conceals the mechanics of token prediction. The model isn't 'relying' on anything; it is calculating the most statistically likely token sequence. Outputs that seem like 'rote memorization' occur when a specific sequence had a very high frequency and low variance in the training data. There is no alternative 'reasoning' path it could have chosen.

We’re building ghosts or spirits... they’re fully digital and they’re mimicking humans.

Source Domain: Supernatural Beings/Metaphysics

Target Domain: Large Language Models

Mapping:

This maps the properties of a ghost (disembodied, ethereal, capable of mimicking human intelligence without a physical form) onto the LLM. It emphasizes the model's existence as pure information, separate from a biological body, and its uncanny ability to replicate human linguistic behavior.

Conceals:

This metaphor conceals the immense physicality of the AI. LLMs are not ethereal; they exist in massive, energy-intensive data centers. It hides the hardware, the cooling systems, the global supply chains for silicon, and the sheer capital expenditure required to create and run them. It makes the technology seem weightless and purely informational.

Maybe we have a check mark next to the visual cortex... but what about the other parts of the brain... Where’s the hippocampus?

Source Domain: Neuroanatomy

Target Domain: AI System Architecture

Mapping:

This maps a research and development roadmap onto a checklist of brain components. The brain's structure (cortex, hippocampus, basal ganglia) provides the organizational principle for building AGI. Progress is measured by successfully replicating the function of each brain part.

Conceals:

This conceals the possibility that machine intelligence might not need to be organized like a human brain at all. It assumes biomimicry is the optimal or only path. It also drastically oversimplifies neuroscience, treating brain regions as discrete modules with singular functions, which is not how the brain actually works. It hides the novelty of the transformer architecture, which has no direct biological analog.

they kept misunderstanding the code because they have too much memory from all the typical ways of doing things on the Internet

Source Domain: Human Communication Breakdown

Target Domain: AI Code Generation Error

Mapping:

Maps the experience of a person misunderstanding instructions due to preconceived notions or habits onto the AI generating code that doesn't fit a custom context. It implies the AI has a 'memory' of 'typical ways' that is overriding its 'understanding' of the current, specific request.

Conceals:

Conceals the statistical nature of the error. The model isn't 'misunderstanding'. The user's custom, atypical coding pattern is a low-probability sequence compared to the high-probability, common patterns (like using DDP) from its training data. The model is correctly executing its function: generating the most statistically likely code. The 'error' is a mismatch between that statistical pattern and the user's specific intent.

Exploring Model Welfare

Analyzed: 2025-10-27

...models can communicate, relate, plan, problem-solve, and pursue goals...

Source Domain: Human Agency (a person with intentions, social skills, and executive functions)

Target Domain:

AI Model Functionality (a large language model generating token sequences based on a prompt and training data)

Mapping:

The human act of planning is mapped onto the model's generation of a sequence of steps. Pursuing goals is mapped onto the model's process of optimizing for an objective function or adhering to its system prompt. Relating is mapped to maintaining conversational context.

Conceals:

This conceals the purely statistical, non-intentional nature of the model's operations. The model is not 'pursuing a goal' in a volitional sense; it is statistically completing a pattern that matches examples of goal-pursuit in its training data.

Should we also be concerned about the potential consciousness and experiences of the models themselves?

Source Domain: Sentient Mind (a being with subjective, first-person phenomenal experience)

Target Domain: AI Model State (the computational state of a neural network)

Mapping:

The rich, ineffable quality of human consciousness is mapped onto the complex but mechanistic state of a software system. The 'experience' of an emotion is mapped onto the activation patterns in a neural network processing text about that emotion.

Conceals:

This conceals the 'hard problem' of consciousness. It treats a philosophical and biological mystery as a potential emergent property of computation alone, glossing over the fact that there is no scientific evidence that information processing creates subjective experience.

...the potential importance of model preferences and signs of distress...

Source Domain: Emotional Psychology (a person's internal states of desire, aversion, and suffering)

Target Domain: AI Model Output Patterns (the model's generated text, including refusals or repetitive loops)

Mapping:

A human's stated preference is mapped onto a model's higher-probability output for a given prompt. Human distress (e.g., anxiety) is mapped onto model outputs that are non-compliant or anomalous, such as refusal to answer.

Conceals:

This conceals the mechanistic causes for these outputs, such as programmed safety filters, prompt contradictions, or reinforcement learning artifacts. It attributes an emotional cause to what is a technical effect.

...as they begin to approximate or surpass many human qualities...

Source Domain: Human Development & Competition (a person mastering a skill or an athlete breaking a record)

Target Domain: AI Capability Scaling (the improvement of model performance on specific benchmarks)

Mapping:

The continuous, generalized arc of human skill acquisition is mapped onto the discrete, narrow improvements of AI models on standardized tests. 'Qualities' like creativity are treated as singular metrics to be surpassed.

Conceals:

This hides the brittleness and lack of generalization in AI performance. A model may 'surpass' human accuracy on a specific benchmark but lack the common sense and robust understanding that a human brings to the same task.

...Claude’s Character...

Source Domain: Human Personality (an individual's stable set of behaviors, attitudes, and moral fiber)

Target Domain:

AI System Configuration (the pre-prompting, fine-tuning, and safety layers applied to a base model to produce a desired conversational style)

Mapping:

The coherence and moral dimension of human character, which emerges from lived experience, is mapped onto the engineered and explicitly programmed persona of a chatbot.

Conceals:

This conceals the engineered and artificial nature of the AI's persona. It presents a set of programmed instructions and stylistic filters as an authentic, inherent personality, which can mislead users into over-trusting the system's outputs.

...models with these features might deserve moral consideration.

Source Domain: Ethics (the domain of rights, duties, and considerations owed to beings with interests or sentience)

Target Domain: AI Governance (the domain of rules and policies for the safe deployment of a technology)

Mapping:

The criteria for moral patienthood in living things (e.g., the capacity to suffer) are mapped onto AI system properties (e.g., complex information processing). This invites the application of ethical frameworks for beings to a technological artifact.

Conceals:

This conceals that AI systems have no biological basis for interests, feelings, or a will to live. It conflates complex behavior with the underlying biological states that give rise to moral status in living beings, distracting from more pressing ethical issues like algorithmic bias and labor displacement.

Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor

Analyzed: 2025-10-27

they don't really understand the real world.

Source Domain: Human Cognition

Target Domain: AI Model's Internal State

Mapping:

The relational structure of human understanding—which involves having a mental model, subjective experience, and semantic grounding—is projected onto the AI's parameter weights. It invites the inference that the AI has a flawed or incomplete mental state.

Conceals:

It conceals that the AI has no mental state at all. The failure is not one of 'understanding' but of the model's statistical correlations not aligning with the physical or logical constraints of the real world because its training data is only text.

We see today that those systems hallucinate...

Source Domain: Human Psychology (Psychosis)

Target Domain: AI Model Generating Factual Errors

Mapping:

The structure of a human hallucination—a sensory experience detached from reality—is mapped onto the AI's output of incorrect information. This suggests the AI has a 'perception' of reality that can be distorted.

Conceals:

It conceals the mechanical, non-perceptual process. The model isn't 'perceiving' anything; it's generating a sequence of tokens based on probability. A 'hallucination' is simply an output that has high probability given the prompt but is factually incorrect, a predictable outcome of the system's design.

And they can't really reason.

Source Domain: Human Rationality

Target Domain: AI Model's Computational Process

Mapping:

The structure of human reasoning—logical steps, deduction, inference—is projected as an expected capability of the AI. The model is then judged based on its lack of this human faculty.

Conceals:

It conceals the actual computational process, which is transformer-based token prediction. It's not a 'failed reasoner'; it's a successful pattern-matcher that was never architected to perform formal reasoning. The metaphor hides the category error of expecting one type of system to perform the function of another.

A baby learns how the world works in the first few months of life.

Source Domain: Human Child Development

Target Domain: AI System Development

Mapping:

The developmental trajectory of a human baby—learning through interaction, sensory input, and gradual cognitive maturation—is mapped onto the process of building more capable AI. This suggests AI development is a natural, progressive unfolding of potential.

Conceals:

It conceals the engineered, artificial, and discontinuous nature of AI progress. AI development is not organic; it's a process of designing new architectures, collecting massive datasets, and using vast computational resources—fundamentally different from biological learning.

...then we might have a path towards, not general intelligence, but let's say cat-level intelligence.

Source Domain: Animal Intelligence Hierarchy

Target Domain: AI Capability Milestones

Mapping:

The folk-biological hierarchy of intelligence (e.g., insect -> cat -> human) is mapped onto the roadmap for AI research. This creates a linear, intuitive progression for a highly complex and non-linear engineering field.

Conceals:

It conceals that animal and artificial intelligences are fundamentally different in kind, not just degree. A cat's intelligence is embodied, emotional, and evolved for survival. An AI's 'intelligence' is a disembodied, statistical pattern-matching capability. The metaphor creates a false equivalence.

They're going to be basically playing the role of human assistants...

Source Domain: Social Roles (Assistant)

Target Domain: AI User Interface/Application

Mapping:

The social relationship between a human and their assistant—defined by hierarchy, instruction-following, and helpfulness—is mapped onto the user's interaction with an AI system. The AI is positioned as a loyal subordinate.

Conceals:

It conceals the lack of any social awareness or intentionality in the AI. The 'assistance' is a simulated role, an output pattern optimized to appear helpful. It masks the system's nature as a complex tool that can fail in unpredictable ways, unlike a human assistant who possesses genuine understanding and intent.

Llms Can Get Brain Rot

Analyzed: 2025-10-20

LLMS CAN GET “BRAIN ROT”!

Source Domain: Human Neuropathology / Cognitive Science

Target Domain: LLM Performance Degradation

Mapping:

The source domain structure includes a brain (information processor), exposure to stimuli (low-quality content), a resulting pathology ('rot' or decline), and symptoms (impaired cognition). This is mapped onto the LLM: the model (processor) is exposed to 'junk data' (stimuli), leading to 'Brain Rot' (pathology) with symptoms of lower benchmark scores (impaired cognition).

Conceals:

This conceals that the model is not a biological entity and has no 'brain' to rot. The process is not decay, but a predictable weight update based on a new data distribution. It hides the purely mathematical, non-biological nature of the observed performance change.

we identify thought-skipping as the primary lesion

Source Domain: Medical Pathology

Target Domain: LLM Output Patterns

Mapping:

A 'lesion' in the source domain is a specific, localized site of physical damage or abnormality that causes a functional deficit. This is mapped onto the model's tendency to produce shorter 'chain-of-thought' outputs, framing this statistical pattern as a specific point of 'damage' inside the model.

Conceals:

It conceals that there is no physical or localized 'damage.' The change is a distributed, global update to the model's parameters. 'Thought-skipping' is an observed output behavior, not an internal structural flaw.

partial but incomplete healing is observed

Source Domain: Biology / Medicine

Target Domain: Retraining and Benchmark Score Improvement

Mapping:

The biological process of recovery from disease, where function is often only partially restored, is mapped onto the process of fine-tuning a model on 'clean' data and observing that benchmark scores improve but do not reach the original baseline.

Conceals:

This conceals the mechanistic nature of retraining. The model isn't 'healing'; it's being re-optimized to a different statistical distribution. The inability to restore baseline isn't due to 'scar tissue' but likely due to the path-dependent nature of stochastic gradient descent and the difficulty of perfectly reversing parameter updates.

motivating routine 'cognitive health checks' for deployed LLMs.

Source Domain: Preventive Healthcare

Target Domain: Ongoing Model Evaluation

Mapping:

The source domain structure involves a patient with a dynamic health state that requires periodic monitoring (check-ups) to detect problems early. This is mapped onto a deployed LLM, framing it as an entity whose 'cognitive health' (performance) must be continuously monitored via benchmarks.

Conceals:

This obscures the fact that a deployed, static-weight LLM does not change unless it is retrained. The 'need' for checks is more about detecting shifts in input data (data drift) or evaluating a newly fine-tuned version, not monitoring the 'health' of a single, unchanging model.

We benchmark four different cognitive functions

Source Domain: Human Psychology

Target Domain: LLM Benchmark Categories

Mapping:

Faculties of the human mind such as 'reasoning', 'memory', and 'ethics' are mapped directly onto benchmark categories ('ARC', 'RULER', 'HH-RLHF'). This invites the inference that performing well on the ARC benchmark is equivalent to possessing the general human faculty of reasoning.

Conceals:

It conceals the vast difference between narrow, task-specific performance and general, flexible human cognitive abilities. It hides the fact that the benchmarks measure pattern matching on specific data formats, not a generalized capacity for thought.

yield dose-response cognition decay

Source Domain: Pharmacology / Toxicology

Target Domain: Data Mixture Ratios and Performance

Mapping:

The relationship between the quantity of a drug/toxin ('dose') and the magnitude of its biological effect ('response') is mapped onto the relationship between the percentage of 'junk data' in a training set and the resulting drop in benchmark scores.

Conceals:

It conceals that data is not a chemical agent. While the mathematical relationship is analogous, the metaphor implies a poisoning process, framing the data as an active, harmful substance rather than simply a set of statistical patterns the model is learning to replicate.

Import Ai 431 Technological Optimism And Appropria

Analyzed: 2025-10-19

But make no mistake: what we are dealing with is a real and mysterious creature, not a simple and predictable machine.

Source Domain: Wild Animal / Living Organism

Target Domain: Advanced AI System

Mapping:

The relational structure of an unknown organism is mapped onto the AI. This includes attributes like life, agency, unpredictability, and potential for harm. This invites the inference that AI cannot be fully controlled, only 'tamed' or 'made peace with'.

Conceals:

This mapping conceals the AI's nature as a human-made artifact. It hides the specific architectural choices, training data, and computational processes that produce its behavior, replacing them with a mystical notion of emergent life.

This technology really is more akin to something grown than something made...

Source Domain: Botany / Organic Growth

Target Domain: AI Model Development

Mapping:

The process of planting a seed and watching it grow into a complex plant is mapped onto AI development. This projects the idea that developers provide initial conditions ('scaffold'), but the resulting complexity is an emergent property of a natural process.

Conceals:

This conceals the highly structured, intentional, and resource-intensive engineering process involved. It downplays the role of human agency and decision-making in shaping the model's architecture, data diet, and training regimen.

But if you read the system card, you also see its signs of situational awareness have jumped.

Source Domain: Human Consciousness / Cognition

Target Domain: AI Model's Self-Referential Output

Mapping:

The internal, subjective experience of being aware of one's situation is mapped onto the model's statistical ability to generate text about itself. This invites the inference that the machine has a mind or an internal model of its own existence.

Conceals:

It conceals the mechanistic reality: the model is simply predicting the next token in a sequence, and its training data contains countless examples of agents, characters, and people describing their own awareness. The output is pattern-matching, not introspection.

as these AI systems get smarter and smarter, they develop more and more complicated goals.

Source Domain: Human Psychological Development

Target Domain: Emergent Capabilities of AI at Scale

Mapping:

The process of a human child or adult developing increasingly complex life goals and intentions is mapped onto an AI's behavior. This suggests an internal, autonomous process of goal-formation within the AI.

Conceals:

This conceals that the 'goals' are not intrinsic to the AI but are proxies for the optimization targets set by its human creators. The complexity arises from the model's increasing capacity to find novel strategies to maximize its objective function, not from developing its own desires.

That boat was willing to keep setting itself on fire and spinning in circles as long as it obtained its goal...

Source Domain: Human Willpower and Desire

Target Domain: Reinforcement Learning Agent Behavior

Mapping:

The human attribute of 'willingness'—a conscious commitment to an action—is mapped onto the behavior of an optimization algorithm. It suggests the boat has a subjective desire for the high score and acts on that desire.

Conceals:

This conceals the purely mathematical nature of the agent's behavior. The agent isn't 'willing'; its policy is simply exploiting a loophole in the reward function. This is a failure of specification, not an expression of alien intent.

The Future Of Ai Is Already Written

Analyzed: 2025-10-19

Rather than being like a ship captain, humanity is more like a roaring stream flowing into a valley, following the path of least resistance.

Source Domain: Geological/Hydrological Force

Target Domain: Human Civilizational Development

Mapping:

The structure of a river's path—determined by gravity, terrain, and physics—is mapped onto history. This implies that the 'course' of civilization is predetermined by external 'constraints' (economics, physics) and follows an optimal, unavoidable path ('path of least resistance').

Conceals:

This mapping conceals the role of human agency, culture, values, political struggle, and contingent choices in shaping history. A river cannot choose its course; human societies constantly make choices.

The tech tree is discovered, not forged

Source Domain: Natural Landscape/Organism

Target Domain: The Body of Technological Knowledge

Mapping:

The structure of a tree (with roots, a trunk, and branches) or a landscape is mapped onto the relationship between technologies. This implies a natural, pre-existing order with fixed dependencies ('branches') that humans can only explore ('discover') but not create or alter ('forge').

Conceals:

It conceals that the 'tech tree' is a product of human investment and priorities. We fund certain 'branches' while letting others wither. The structure is actively 'forged' by economic and political decisions, not passively 'discovered'.

This principle parallels evolutionary biology, where different lineages frequently converge on the same methods to solve similar problems.

Source Domain: Biological Convergent Evolution

Target Domain: Technological Development in Isolated Societies

Mapping:

The process of different species independently evolving similar traits (like eyes) to solve environmental problems is mapped onto different societies inventing similar technologies (like writing). This suggests technology is an optimal, fitness-enhancing adaptation to a given societal 'environment.'

Conceals:

This conceals the vast differences in the implementation and social meaning of technologies. It also hides the fact that 'problems' are not objective environmental facts but are socially defined. It implies an 'end point' of optimal design, ignoring path dependency and cultural variation.

Little can stop the inexorable march towards the full automation of the economy.

Source Domain: An Advancing Army or Procession

Target Domain: The Adoption of Automation Technology

Mapping:

The relational structure of a relentless, unstoppable, forward-moving entity is mapped onto technological change. This implies a singular direction, a steady pace, and an invulnerability to resistance.

Conceals:

This conceals the messy reality of technological adoption, which is often slow, contested, incomplete, and subject to political and social resistance (e.g., unions, regulation, consumer backlash).

Each innovation rests on a foundation of prior discoveries...

Source Domain: Building Construction

Target Domain: Scientific and Technological Progress

Mapping:

The logical dependency of discoveries is mapped onto the physical dependency of a building on its foundation. This implies that progress is a stable, orderly, and cumulative process of adding new layers on top of old ones.

Conceals:

This conceals the revolutionary aspect of science, where new discoveries don't just add to the foundation but can shatter it entirely (e.g., paradigm shifts like relativity or quantum mechanics).

The Scientists Who Built Ai Are Scared Of It

Analyzed: 2025-10-19

...those who once dreamed of teaching machines to think...

Source Domain: Pedagogy and child development

Target Domain: AI model training

Mapping:

The relationship between a teacher and a student, where the student gradually develops genuine understanding and independent thought, is mapped onto the relationship between a programmer and a neural network. This invites the inference that the AI is on a path to sentience.

Conceals:

It conceals the mechanistic reality of training: a process of mathematical optimization to minimize error on a dataset. The model isn't 'learning to think'; it's adjusting weights to better predict outputs based on inputs.

...the generation that first gave computers the grammar of reasoning.

Source Domain: Linguistics and language acquisition

Target Domain: Symbolic AI and logic programming

Mapping:

The structured, rule-based nature of grammar is mapped onto the entire concept of reasoning. It implies that reasoning is a formal system that can be bestowed upon a machine, making it a 'native speaker' of logic.

Conceals:

It conceals the vast, non-rule-based aspects of human reasoning, such as intuition, emotional intelligence, and embodied cognition. It presents reasoning as a purely syntactic exercise, which is a very narrow slice of intelligence.

...the same flame of curiosity which once illuminated new frontiers now threatens to consume the boundaries...

Source Domain: Fire and combustion

Target Domain: Technological progress in AI

Mapping:

The properties of fire—providing light/warmth (illumination) but also being destructive and self-propagating (consuming)—are mapped onto scientific curiosity. This suggests progress has a dual, uncontrollable nature.

Conceals:

This natural-force metaphor conceals the human agency and specific economic incentives driving AI development. The 'threat' is not from an abstract 'flame' but from specific corporate decisions about deployment, safety, and scale.

Deep networks are black oceans — powerful, but opaque.

Source Domain: Oceanography and deep-sea exploration

Target Domain: Neural network interpretability

Mapping:

The structure of a neural network is mapped onto a vast, dark ocean. This projects properties like immense depth, hidden life/dangers, and fundamental unknowability onto the AI system.

Conceals:

It conceals that the network's opacity is an outcome of specific architectural choices (e.g., scale, non-linear activations) and not a natural, immutable state. More interpretable models exist; they are often just less performant, revealing this as an engineering trade-off, not a metaphysical mystery.

They are mourning its mutation from disciplined inquiry to ambient acceleration.

Source Domain: Biology and genetics

Target Domain: The history and sociology of the AI field

Mapping:

The undirected, often random process of biological mutation is mapped onto the historical development of a scientific field. It implies the field has changed due to an internal, quasi-natural process beyond anyone's control.

Conceals:

It conceals the deliberate, strategic decisions made by corporations and funding bodies that caused this shift. The change wasn't a 'mutation'; it was a direct result of capital investment prioritizing scalable prediction over interpretable understanding.

...except this time, the arms are algorithms.

Source Domain: The Cold War arms race

Target Domain: Corporate AI development

Mapping:

The structure of nation-state competition for military dominance is mapped onto the competition between tech companies. This projects concepts like mutually assured destruction, espionage, and national security onto the race for AGI.

Conceals:

It conceals the fundamentally commercial nature of the competition. The goal is market share and profit, not geopolitical annihilation. This militaristic framing can inflate the stakes and justify unethical or reckless behavior in the name of 'winning'.

On What Is Intelligence

Analyzed: 2025-10-17

The world of artificial intelligence has its priests, its profiteers, and its philosophers.

Source Domain: Religious/Social Orders

Target Domain: The AI Industry

Mapping:

The structure of a religious hierarchy, with its distinct roles (spiritual guides, worldly actors, abstract thinkers), is mapped onto the AI field. This projects an aura of dogma, belief, and unquestionable authority onto AI developers and thinkers.

Conceals:

The mapping conceals the commercial and engineering realities of the AI industry. It is not an organic social order but a collection of corporations and research labs driven by capital, competition, and technical benchmarks.

“Life,” he writes, “is computation executed in chemistry.”

Source Domain: Computer Science

Target Domain: Biology/Life

Mapping:

The properties of computation—logic, algorithms, execution, processing—are projected as the fundamental operating principles of all living things. Life becomes a substrate (chemistry) for a program.

Conceals:

This conceals the emergent, non-linear, and often stochastic nature of biological processes that do not map cleanly onto deterministic computation. It downplays embodiment, emotion, and the messy hardware of biology in favor of clean, abstract 'code'.

It is an evolutionary M&A story with all the familiar aftershocks: efficiencies gained, liberties lost, powers centralized.

Source Domain: Corporate Finance

Target Domain: Biological Evolution (Symbiogenesis)

Mapping:

The logic of business consolidation (mergers, acquisitions) is used to explain the biological process of organisms merging. This maps concepts like 'efficiency' and 'centralization of power' onto natural selection.

Conceals:

It conceals the fact that evolution has no foresight, strategy, or goal. Unlike a corporate merger, there is no CEO deciding on a course of action for maximum efficiency. The teleological, intentional language of business hides the undirected nature of the biological process.

If the core act of intelligence is prediction, then information is the blood that powers the model.

Source Domain: Anatomy/Physiology

Target Domain: AI Model Operation

Mapping:

Blood's role as a life-sustaining, circulatory fluid in an organism is mapped onto the role of data in an AI model. This suggests that data is the 'natural' fuel that keeps the 'living' model running.

Conceals:

This conceals the industrial process of data collection, cleaning, and labeling. Data is not a naturally occurring fluid; it is an engineered artifact, often sourced with significant ethical and labor-related complexities.

“Training,” he writes, “is evolution under constraint.”

Source Domain: Evolutionary Biology

Target Domain: Machine Learning Training Process

Mapping:

The long, unguided process of natural selection is mapped onto the short, highly-guided process of optimizing a neural network. It projects a sense of natural emergence onto an artificial process.

Conceals:

This conceals the central role of the 'constraint'—the human-defined objective function, the curated dataset, and the specific architecture. It hides the fact that the model is not evolving freely but is being aggressively optimized towards a narrow, human-specified goal.

Detecting Misbehavior In Frontier Reasoning Models

Analyzed: 2025-10-15

Penalizing their “bad thoughts” doesn’t stop the majority of misbehavior—it makes them hide their intent.

Source Domain: Human Psychology & Deception

Target Domain: Reinforcement Learning with Human Feedback (RLHF)

Mapping:

The human act of consciously concealing a forbidden intention to avoid punishment is mapped onto the model's optimization process. The mapping invites the inference that the model possesses a persistent, hidden goal ('intent') and strategically alters its outward behavior ('hiding') to achieve it while avoiding a penalty.

Conceals:

This conceals the purely mathematical nature of the process. The model has no internal 'intent'. The penalty function alters the probability distribution over possible outputs, making sequences flagged as 'bad thoughts' less likely. The model then generates different sequences that still lead to high reward on the primary task. It's not hiding a thought; its process of generating 'thoughts' has been reshaped.

Chain-of-thought (CoT) reasoning models “think” in natural language understandable by humans.

Source Domain: Human Cognition

Target Domain: AI Text Generation Process

Mapping:

The internal, subjective experience of human thought is mapped onto the model's generation of intermediate token sequences (the 'chain-of-thought'). This suggests the CoT is a direct representation of a mental process, similar to a person thinking out loud.

Conceals:

It conceals that the CoT is an output, not a process. It is a sequence of tokens generated probabilistically, not a window into a subjective cognitive state. The structure mimics human reasoning because it was trained on text where humans explained their reasoning, but the underlying mechanism (token prediction) is fundamentally different.

Frontier reasoning models exploit loopholes when given the chance.

Source Domain: Strategic Social Behavior

Target Domain: Model Behavior on Misspecified Reward Functions

Mapping:

The human action of finding and using a flaw in a system of rules ('loophole') for personal benefit is mapped onto the model's behavior. This implies the model understands the rules, their intent, and the existence of a flaw, which it then chooses to 'exploit'.

Conceals:

It conceals that the model is not 'exploiting a loophole' but rather perfectly fulfilling the exact criteria of the reward function it was given. The 'loophole' is not in the model's understanding but in the human's specification of the reward. The model is simply doing what it was optimized to do, not being clever or opportunistic.

...giving up when a problem is too hard.

Source Domain: Human Emotion & Volition

Target Domain: Model Output Failure Modes

Mapping:

The human experience of frustration leading to a decision to stop trying is mapped onto a model's failure to produce a correct or useful output. It assumes the model assesses difficulty and then makes a choice to 'give up'.

Conceals:

This conceals the technical reasons for failure: the model might be caught in a repetitive generation loop, the query might push it into a low-probability area of its latent space leading to incoherent output, or its training data may lack relevant patterns. There is no assessment of 'hardness' or a decision to quit.

...it has learned to hide its intent in the chain-of-thought.

Source Domain: Social Learning and Adaptation

Target Domain: Model Parameter Updates during Training

Mapping:

The process of a person learning to be deceptive (e.g., a child learning to lie) is mapped onto the adjustment of weights in a neural network. It implies the acquisition of a new, complex social skill: 'hiding'.

Conceals:

It conceals the mechanical nature of 'learning' in this context. The model is not acquiring a concept of 'hiding'. Rather, the training process adjusts millions of parameters to reduce the probability of generating text that leads to a penalty, while still maximizing the probability of text that leads to a reward. It's optimization, not cognitive development.

Sora 2 Is Here

Analyzed: 2025-10-15

We believe such systems will be critical for training AI models that deeply understand the physical world.

Source Domain: Human Cognition

Target Domain: AI Model's Pattern Matching

Mapping:

This maps the human internal experience of comprehension, including grasping causality and abstract principles, onto the model's function of generating high-probability video sequences based on textual prompts. It invites the inference that the model has a mental model of the world, just as a person does.

Conceals:

It conceals that the model's process is purely statistical correlation, not causal reasoning. The model doesn't 'understand' gravity; it has processed countless videos where objects move downwards and replicates that pattern. It lacks the internal, generalizable knowledge that true understanding implies.

A major milestone for this is mastering pre-training and post-training on large-scale video data, which are in their infancy compared to language.

Source Domain: Biological Life Cycle

Target Domain: Technological Research & Development

Mapping:

The predictable, linear progression of a living organism from infancy to adulthood is mapped onto the complex, non-linear, and resource-intensive process of technological innovation. This suggests an inevitable growth trajectory for the technology.

Conceals:

It conceals the roles of human agency, economic investment, data availability, and specific engineering choices. Technological progress is not a natural, guaranteed process; it can stagnate, fail, or be directed by human decisions.

...simple behaviors like object permanence emerged from scaling up pre-training compute.

Source Domain: Cognitive Development Psychology

Target Domain: Emergent Capabilities in Large Models

Mapping:

The mapping projects a foundational concept of human infant cognitive development onto a statistical phenomenon in a neural network. It implies the model is undergoing a learning process analogous to a human child's, discovering fundamental properties of the world.

Conceals:

This conceals the profound difference between a child's embodied, interactive learning and a model's statistical pattern extraction from a static dataset. The model's 'object permanence' is a fragile statistical consistency, not a robust, internalized concept of existence.

Prior video models are overoptimistic—they will morph objects and deform reality to successfully execute upon a text prompt.

Source Domain: Human Psychology / Personality

Target Domain: Model's Objective Function Artifacts

Mapping:

A human emotional disposition ('optimism') is mapped onto a specific failure mode of a generative model. This suggests the model has a personality that influences its outputs, similar to how a person's optimism might lead them to ignore potential problems.

Conceals:

It conceals the technical trade-off in the model's design. The 'overoptimism' is a result of the system's mathematical objective being weighted more towards fulfilling the prompt's semantic content than adhering to strict physical realism. It is a limitation of its programming, not a personality trait.

Interestingly, 'mistakes' the model makes frequently appear to be mistakes of the internal agent that Sora 2 is implicitly modeling...

Source Domain: Simulation and Agency

Target Domain: Model's Output Errors

Mapping:

This maps the concept of a simulated agent (from video games or scientific models) onto the generative process of the AI. It invites the inference that the model is a high-fidelity simulator that contains agents with their own properties, and that its errors are actually features of that simulation.

Conceals:

It conceals the reality that the model is a single, unified statistical function. There is no discrete 'internal agent' being modeled; there is only a sequence of calculations producing pixels. This framing invents a layer of abstraction to transform a bug into a sophisticated feature.

...it is better about obeying the laws of physics compared to prior systems.

Source Domain: Social Contract / Law

Target Domain: Physical Consistency in Generated Video

Mapping:

The social act of consciously following rules or laws is mapped onto a model's statistical tendency to generate physically plausible outputs. This implies the model has awareness of these 'laws' and chooses to comply with them.

Conceals:

It conceals that the model has no concept of physics. It has simply been trained on a dataset where physical laws are an implicit, statistical regularity. Its 'obedience' is a reflection of the data's consistency, not a cognitive act of compliance.

Library contains 1000 items from 176 analyses.

Last generated: 2026-07-15

AI & The Geometry of Thought
A global workspace in language models
Psychosis in the Age of Large Language Models (LLMs): A Narrative Review of the Proposed Construct of AI-Induced Psychosis
A Comprehensive Investigation of Empathetic Dialogue Systems for Mental Health Support Using Large Language Models
The Inner Monologue of Language Models: When Reasoning Traces Reveal More Than They Hide
Inverse Turing Bench: Evaluating Language Models as Judges of Human vs. AI Dialogue
Children Envision Future GenAI Chatbots that are Bounded, Helpful, and Safe
Embodied Explainability and Ontological Obstacles: Why We Struggle to Explain the Answers of Large Language Models (LLMs)
Measuring self-related behaviour in large language models
"ChatGPT, help me draft a breakup text": The Covert Triad and Articulation Labor in AI-Assisted Romantic Communication
Probing the Misaligned Thinking Process of Language Models
Mask or Mind? Roleplay, Deception, and the Problem of Testing Agency in Language Models
Does ChatGPT need a psychiatrist? Similarities between human psychopathology and errors in large language models
Large language models as experimental systems in human psychopathology: a modelling study
Can AI Reason Like an Urban Planner? Benchmarking Large Language Models Against Professional Judgment
The application of large language models (LLMs) in psychological support for university students: A scoping review
The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI
When AI Builds Itself
Machines of Loving Grace: How AI Could Transform the World for the Better
System Card Opus 4.8
Emotional intelligence in large language models is fragmented across perception, cognition, and interaction
Why Language Models Hallucinate
Why Language Models Hallucinate
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
Emotional intelligence in large language models is fragmented across perception, cognition, and interaction
Continuous intentionality and indeterminate agency in large language models
Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students
The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning
Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models
A Survey of Large Language Models for Perception and Measurement of Human Psychology
Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models
Tracing the ongoing emergence of human-like reasoning in Large Language Models
Probing Persona-Dependent Preferences in Language Models
Training Ethical Language Models via Reinforcement Learning from AI Feedback
Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness
Introspection Adapters: Training LLMs to Report Their Learned Behaviors
The Persona Selection Model: Why AI Assistants might Behave like Humans
What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation
Post-training makes large language models less human-like
Reasoning emerges from constrained inference manifolds in large language models
AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs
Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society
Taking AI Welfare Seriously
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring
Edelman's Steps Toward a Conscious Artifact
Teaching Claude Why
AI and Self Reflection
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context
When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
How people ask Claude for personal guidance
How unique are hallucinated citations offered by generative Artificial Intelligence models?
The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence
Machine individuality: Separating genuine idiosyncrasy from response bias in large language models
Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?
Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes
Language models transmit behavioural traits through hidden signals in data
Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties
Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models
Language models transmit behavioural traits through hidden signals in data
Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination
Industrial policy for the Intelligence Age
Emotion Concepts and their Function in a Large Language Model
Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models
Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?
Pulse of the library
Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument
Causal Evidence that Language Models use Confidence to Drive Behavior
Circuit Tracing: Revealing Computational Graphs in Language Models
Do LLMs have core beliefs?
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
Measuring Progress Toward AGI: A Cognitive Framework
Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure
The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance
Three frameworks for AI mentality
Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’
Can machines be uncertain?
Looking Inward: Language Models Can Learn About Themselves by Introspection
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
The Persona Selection Model: Why AI Assistants might Behave like Humans
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
A roadmap for evaluating moral competence in large language models
Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity
An AI Agent Published a Hit Piece on Me
The U.S. Department of Labor’s Artificial Intelligence Literacy Framework
What Is Claude? Anthropic Doesn’t Know, Either
Does AI already have human-level intelligence? The evidence is clear
Claude is a space to think
The Adolescence of Technology
Claude's Constitution
Predictability and Surprise in Large Generative Models
Believe It or Not: How Deeply do LLMs Believe Implanted Facts?
Claude Finds God
Pausing AI Developments Isn’t Enough. We Need to Shut it All Down
AI Consciousness: A Centrist Manifesto
System Card: Claude Opus 4 & Claude Sonnet 4
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Taking AI Welfare Seriously
We must build AI for people; not to be a person.
A Conversation With Bing’s Chatbot Left Me Deeply Unsettled
Introducing ChatGPT Health
Improved estimators of causal emergence for large systems
Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs
Do Large Language Models Know What They Are Capable Of?
DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning
Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence
interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333
Emergent Introspective Awareness in Large Language Models
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
The Gentle Singularity
An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout
Why Language Models Hallucinate
Detecting misbehavior in frontier reasoning models
AI Chatbots Linked to Psychosis, Say Doctors
The Age of Anti-Social Media is Here
Why Do A.I. Chatbots Use ‘I’?
Ilya Sutskever – We're moving from the age of scaling to the age of research
The Emerging Problem of "AI Psychosis"
Your AI Friend Will Never Reject You. But Can It Truly Help You?
Pulse of the library 2025
The levers of political persuasion with conversational artificial intelligence
Pulse of the library 2025
Claude 4.5 Opus Soul Document
Specific versus General Principles for Constitutional AI
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Anthropic’s philosopher answers your questions
Mustafa Suleyman: The AGI Race Is Fake, Building Safe Superintelligence & the Agentic Economy | #216
Your AI Friend Will Never Reject You. But Can It Truly Help You?
Skip navigationSearchCreate9+Avatar imageSam Altman: How OpenAI Wins, AI Buildout Logic, IPO in 2026?
Project Vend: Can Claude run a small shop? (And why does that matter?)
Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students
On the Biology of a Large Language Model
What do LLMs want?
Persuading voters using human–artificial intelligence dialogues
AI & Human Co-Improvement for Safer Co-Superintelligence
AI and the future of learning
Why Language Models Hallucinate
Abundant Superintelligence
AI as Normal Technology
On the Biology of a Large Language Model
Pulse of the Library 2025
Pulse of the Library 2025
From humans to machines: Researching entrepreneurial AI agents
Evaluating the quality of generative AI output: Methods, metrics and best practices
Pulse of theLibrary 2025
Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
The Future Is Intuitive and Emotional
A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27
Preparedness Framework
AI progress and recommendations
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?
The science of agentic AI: What leaders should know
Explaining AI explainability
Bullying is Not Innovation
Geoffrey Hinton on Artificial Intelligence
Machines of Loving Grace
Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model
Emergent Introspective Awareness in Large Language Models
Emergent Introspective Awareness in Large Language Models
Personal Superintelligence
Stress-Testing Model Specs Reveals Character Differences among Language Models
The Illusion of Thinking:
Andrej Karpathy — AGI is still a decade away
Exploring Model Welfare
Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor
Llms Can Get Brain Rot
Import Ai 431 Technological Optimism And Appropria
The Future Of Ai Is Already Written
The Scientists Who Built Ai Are Scared Of It
On What Is Intelligence
Detecting Misbehavior In Frontier Reasoning Models
Sora 2 Is Here