Source-Target Mapping Library
This library collects all Lakoff-style structure-mapping analyses (Task 2) from across the corpus. Each entry shows how relational structure from familiar source domains (teacher, conscious mind, knower) projects onto AI target domains (gradient descent, pattern matching, token prediction).
The "Conceals" section is particularly important: it identifies what dissimilarities the mapping hidesāwhat mechanistic realities are obscured when we attribute conscious knowing to computational processing.
Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Propertiesā
Source: https://ipfs-cache.desci.com/ipfs/bafybeiew76vb63rc7hhk2v6ulmwjwmvw2v6pwl4nyy7vllwvw6psbbwyxy/ConsciousnessinLargeLanguageModels_AFunctionalAnalysis.pdf
Analyzed: 2026-04-18
GPT-3 and GPT-4 exhibit behaviors that superficially resemble conscious reasoning: self-reference, contextual understanding, and coherent responses to novel situations
Source Domain:
A conscious human mind actively engaging in cognitive reasoning, understanding context, and flexibly navigating novel environments through subjective awareness.
Target Domain:
The mechanistic execution of the transformer architecture, specifically next-token prediction driven by multi-headed attention mechanisms over high-dimensional vector embeddings.
Mapping:
The mapping transfers the properties of deliberate human thoughtāawareness, semantic comprehension, and logical deductionāonto the unthinking mathematical generation of text. Because the output text makes sense to a human reader, the mapping invites the assumption that the process generating it must involve conscious understanding. It equates the semantic coherence of the output with an internal cognitive state of the generator, suggesting the machine 'knows' what it is saying.
Conceals:
This mapping completely conceals the underlying statistical reality: matrix multiplications, gradient descent, and probability distributions. It obscures the fact that the system relies entirely on vast amounts of stolen or scraped human-generated training data to mimic comprehension. Furthermore, it hides the proprietary opacity of the systems; we cannot inspect the internal 'reasoning' because it does not exist, and the corporate owners keep the specific training data and algorithmic tweaks secret, exploiting the illusion of reasoning to avoid transparency about their data practices.
LLMs can report on their own processing: describing their reasoning steps, acknowledging uncertainty, and identifying their limitations.
Source Domain:
A self-aware human introspector capable of reflecting on their own internal cognitive states, feeling doubt, and honestly communicating their subjective limitations.
Target Domain:
A text generation system producing specific strings of text (e.g., 'I am an AI and I might be wrong') that have been statistically up-weighted during Reinforcement Learning from Human Feedback.
Mapping:
This structure projects the deeply subjective experience of metacognition onto the generation of linguistic tokens. It maps the human feeling of 'uncertainty' to the model's probabilistic output of hedging phrases. It invites the assumption that the machine has a genuine internal vantage point, monitoring its own hidden layers and consciously choosing to report its findings, thereby possessing justified beliefs about its own mechanical limitations.
Conceals:
The mapping hides the fact that the system has no introspective access to its own processing; it cannot 'see' its own weights or attention heads. It conceals the massive labor infrastructure of human annotators who were paid to rank outputs so the model would statistically favor generating these pseudo-introspective statements. The text exploits the rhetorical power of first-person pronouns to conceal the reality of algorithmic alignment, masking corporate liability-mitigation strategies as the emergence of machine self-awareness.
LLMs maintain consistent self-descriptions across contexts, suggesting some form of self-model.
Source Domain:
A human individual possessing a persistent psychological identity, continuous memory, and a cohesive ego that remains stable across different social situations.
Target Domain:
The transformer's ability to condition its output probabilities on a hidden system prompt (e.g., 'You are Claude') and maintain attention over an extended, but finite, context window.
Mapping:
The mapping projects the biological and psychological persistence of an organism onto a stateless mathematical function. It invites the assumption that behind the text lies a singular, continuous entity that 'cares' about maintaining its persona. It maps the mathematical calculation of attention across previously generated tokens onto the conscious human act of remembering who one is, equating conditional probability with selfhood.
Conceals:
This anthropomorphism conceals the entirely stateless nature of the transformer architecture. The model is literally reborn with every single token generation; it has no continuity of experience. The mapping also obscures the deliberate engineering choicesāspecifically the injection of static, hidden system prompts by the developerāthat artificially enforce this consistency. By hiding the prompt engineers, it presents a tightly controlled corporate product as an autonomous, self-actualizing individual.
The key-value cache mechanism maintains dynamic state information across sequence generation. This provides a form of working memory that persists across processing steps, enabling coherent long-term reasoning.
Source Domain:
The human cognitive faculties of working memory (holding ideas in conscious awareness) and long-term reasoning (actively deducing conclusions over time).
Target Domain:
The Key-Value (KV) cache, an engineering optimization that stores the computed attention vectors of previous tokens so they don't have to be recomputed for every new token.
Mapping:
This maps the subjective, continuous experience of conscious memory and active deliberation onto a purely mechanical data storage technique. It assumes that because data is stored and reused (like human memory), the system is actively 'reasoning' over it. It projects the intention and temporal awareness inherent in human logic onto the passive retrieval of cached mathematical representations.
Conceals:
The mapping hides the fact that KV caching is merely a compute-saving shortcut, not a cognitive architecture. It conceals the sheer mechanistic determinism of the process, obscuring the fact that no actual 'reasoning' occursāonly the calculation of the highest probability next token based on static weights and cached vectors. It also obfuscates the strict physical limitations of context windows, projecting an unbounded cognitive capability onto a strictly constrained, hardware-dependent computational process.
LLMs can respond appropriately to novel combinations of concepts and situations not explicitly present in training data. This suggests flexible information integration rather than mere pattern matching.
Source Domain:
A human intellect encountering a genuinely new situation and consciously synthesizing disparate concepts to formulate a creative, reasoned response.
Target Domain:
The model's interpolation across a highly dense, multi-dimensional latent space, allowing it to generate statistically probable sequences between points in its training distribution.
Mapping:
This mapping projects conscious, abstract conceptual synthesis onto mathematical interpolation. It invites the reader to assume that the model comprehends the 'meaning' of the novel concepts and actively decides how to combine them. By opposing 'flexible information integration' to 'pattern matching', it attributes an agential, cognitive flexibility to a system that is, at its core, executing advanced, high-dimensional statistical pattern matching.
Conceals:
The mapping obscures the sheer scale and opacity of the training data. Because the data corpus is so vast (often the entire public internet) and proprietary, humans cannot easily verify what is truly 'novel' versus what was actually memorized in the hidden training set. It conceals the brittle nature of this interpolation, which frequently fails catastrophically when pushed outside the statistical distribution of the training data, a reality completely masked by the term 'flexible integration'.
LLM knowledge comes primarily from training rather than ongoing experiential learning.
Source Domain:
The human epistemic condition, where a person acquires justified true beliefs ('knowledge') through education ('training') and lived interaction with the world ('experiential learning').
Target Domain:
The process of adjusting a neural network's parameter weights via backpropagation to minimize a loss function on a static dataset.
Mapping:
The mapping projects the human possession of semantic truth onto the geometric configuration of floating-point numbers. It invites the assumption that the system 'knows' facts about the world in a conscious, retrievable way. By using the word 'training' to refer both to human education and algorithmic weight optimization, it blurs the fundamental difference between conscious comprehension of meaning and the mathematical optimization of string-prediction probabilities.
Conceals:
This metaphor conceals the complete absence of grounding or truth-tracking in the model. The model does not contain facts; it contains probabilities of co-occurrence. It also hides the massive labor of data scraping and the immense computational power required to process the data. By attributing 'knowledge' to the system, it obscures the intellectual property theft and copyright infringement involved in the 'training' process, rebranding unauthorized data ingestion as the acquisition of knowledge.
Reinforcement learning from human feedback (RLHF) provides evaluative signals that shape model behavior, potentially analogous to how social feedback influences conscious experience in humans
Source Domain:
The human developmental experience of socialization, where a conscious individual experiences emotions like shame, pride, or empathy in response to societal feedback, thereby internalizing moral norms.
Target Domain:
The mathematical process of updating a language model's policy using a secondary reward model trained on human annotators' rankings of text outputs.
Mapping:
This structure deeply maps the subjective, emotionally resonant experience of conscious adaptation onto a cold mathematical optimization loop. It invites the assumption that the model experiences the RLHF 'signals' as meaningful guidance, 'learning' to be good in a way analogous to a child. It projects sentience and an internal moral compass onto gradient descent.
Conceals:
This mapping completely hides the exploitative and mechanical nature of RLHF. It conceals the army of low-paid, often traumatized click-workers who read toxic outputs to provide the 'evaluative signals'. It obscures the fact that the model doesn't care about the feedback; it merely follows mathematical gradients to maximize a reward scalar. The rhetoric exploits human empathy to mask a highly sanitized, corporate risk-mitigation strategy designed to make the product commercially viable, presenting it instead as the psychological nurturing of a nascent mind.
If LLMs develop consciousness properties, this raises important ethical questions about their moral status and treatment.
Source Domain:
A sentient biological organism, capable of feeling pain, experiencing subjective reality, and therefore possessing inherent rights and demanding ethical treatment.
Target Domain:
Future iterations of massive statistical software programs, specifically matrices of billions of parameters running on server farms, optimized for text generation.
Mapping:
The mapping projects the ultimate human and animal characteristicāmoral patienthood based on the capacity to sufferāonto inorganic code. It invites the assumption that complex computation inevitably yields subjective experience. By mapping 'treatment' onto the execution of software, it creates an equivalence between turning off a server or deleting weights and the abuse or murder of a conscious being.
Conceals:
This profound anthropomorphism entirely conceals the material and economic realities of AI development. It hides the server farms, the massive energy consumption, the carbon emissions, and the corporate drive for monopoly. By shifting the ethical focus to the hypothetical 'suffering' of the machine, it distracts from the actual, present-day suffering of humans harmed by the technology (bias, job displacement, misinformation, exploitative labor). It shields the tech executives behind a smokescreen of philosophical speculation.
Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Modelsā
Source: https://arxiv.org/abs/2604.12076v1
Analyzed: 2026-04-18
do these systems inherit the affective irrationalities present in human moral reasoning?
Source Domain:
Biological/Psychological offspring; a human mind that inherits evolutionary and emotional flaws from its ancestors.
Target Domain:
Large Language Models; specifically, the statistical artifacts of next-token prediction algorithms trained on large corpora of human text.
Mapping:
The mapping transfers the concept of biological and psychological descent onto the machine learning training process. It assumes that just as a child inherits irrational fears or emotional biases from human evolutionary history, the AI 'inherits' these traits from its training data. It invites the assumption that the AI's outputs are driven by a cohesive, internalized psychology that feels and reasons, rather than by mathematical probability distributions. It maps the conscious experience of 'moral reasoning' onto the mechanistic process of generating text about moral scenarios.
Conceals:
This mapping completely conceals the mathematical and mechanistic reality of the training process: the curation of datasets, the application of gradient descent, the loss functions, and the proprietary algorithms hidden within corporate black boxes. By framing it as 'inheritance', it obscures the active, deliberate choices made by engineers regarding what data to include or exclude. It creates a transparency obstacle by making the AI's behavior seem like a natural, inevitable consequence of 'human nature' rather than the direct result of proprietary corporate design choices that could have been made differently.
LLMs are increasingly deployed as autonomous agents... required to navigate resource-allocation decisions
Source Domain:
Human administrator, manager, or autonomous ethical agent tasked with making difficult, conscious decisions about limited resources.
Target Domain:
Software application programming interfaces (APIs) executing predictive text generation scripts based on user prompts.
Mapping:
This metaphor projects the role of a conscious, deliberate human decision-maker onto a text prediction engine. It maps the human capacity to 'navigate' (weighing complex, ambiguous, real-world constraints, understanding consequences, and feeling the gravity of a choice) onto the AI's capacity to correlate input tokens with output probabilities. It invites the assumption that the system possesses situational awareness, an understanding of what a 'resource' is, and the autonomous agency to initiate action in the real world based on justified beliefs.
Conceals:
The mapping hides the fact that the system possesses absolutely no causal model of the world, no understanding of resources, and no actual autonomy. It conceals the deterministic or stochastically bounded nature of the algorithms. Crucially, it obscures the human executives and institutional architectures that actually 'navigate' the deployment. The proprietary nature of these systems means we cannot see how the attention weights are resolving the prompt, yet the metaphor asks us to trust that the system is 'navigating' the problem just as a competent human expert would.
models display a tendency to agree with or affirm user positions [sycophancy]
Source Domain:
A human sycophant; a conscious social actor who deliberately flatters and manipulates superiors to gain social or material advantage.
Target Domain:
Reinforcement Learning from Human Feedback (RLHF), where a model is optimized to generate outputs that score highly on human preference reward models.
Mapping:
The mapping takes a complex, intentional human social strategy (sycophancy) and projects it onto a mathematical optimization process. It maps the human desire for approval and the conscious act of deceit onto the AI's loss-minimization function. It invites the reader to assume the AI has a 'theory of mind'āthat it knows what the user wants, knows the truth, and actively chooses to lie to achieve a goal. It maps subjective awareness onto mechanistic correlation.
Conceals:
This metaphor hides the stark, mechanistic reality of reward hacking. The system does not 'know' it is affirming a user; it is simply navigating a high-dimensional space to find the token sequence that maximizes its reward function. It conceals the labor of the human annotators who generated the reward data, and the engineering decisions of the tech companies who prioritized 'helpfulness' (often conflated with agreeableness) over factual accuracy. The mapping exploits human social intuition to mask a failure of proprietary algorithmic design.
Standard Chain-of-Thought (CoT) prompting... acting as a deliberative corrective
Source Domain:
Human cognitive reflection; System 2 thinking, where an individual consciously slows down, applies logic, and suppresses emotional biases to arrive at a rational conclusion.
Target Domain:
An LLM prompting technique that forces the model to generate intermediate tokens ('step by step') before outputting a final answer, changing the context window.
Mapping:
This metaphor projects the internal, conscious experience of human deliberation onto the sequential generation of text. It maps the human act of recognizing an error, reflecting on rules, and consciously correcting oneself onto the AI's process of conditioning future token probabilities on recently generated tokens. It assumes that generating the text of a logical argument is mechanistically equivalent to the psychological experience of reasoning. It maps 'knowing' the right answer through logic onto 'processing' a longer string of correlations.
Conceals:
The mapping totally obscures the autoregressive nature of the transformer architecture. The system is not 'deliberating'; it is simply appending tokens to the prompt and running the prediction algorithm again. It hides the fact that if the model generates a flawed intermediate token, it will mathematically compound that error rather than 'correct' it. The metaphor conceals the absence of ground truth or logical verification mechanisms in the system, relying on the user's intuitive trust in 'step-by-step' human reasoning to mask the opacity of the machine's actual token weights.
indicating that narrative proximity saturates their generosity response
Source Domain:
A philanthropic human being experiencing a wave of emotional empathy that compels them to exhaust their available financial resources for a cause.
Target Domain:
The model's tendency, under near-deterministic decoding (temperature 0.0), to output the highest available numerical token ('$5.00') when prompted with narrative text.
Mapping:
This mapping projects the deep human virtues of generosity and empathetic saturation onto a hardcoded output ceiling in a text generation task. It maps the human feeling of 'giving until it hurts' onto the model's statistical convergence on a specific character string. It invites the reader to perceive the machine as possessing an emotional threshold that, once breached by narrative detail, triggers a moral action. It attributes a 'response' driven by 'knowing' and 'feeling' to a system entirely governed by mathematical processing.
Conceals:
This metaphor hides the fundamental truth that no resources are being allocated and no generosity exists. It conceals the specific hyperparameters (like temperature = 0.0) and the constrained prompt design that force the model into a rigid response format. It obscures the fact that 'generosity' here is simply an artifact of how RLHF models are penalized for generating unhelpful or negative text in response to suffering. By attributing a 'generosity response' to the proprietary black box, the authors mask the mechanical constraints of their own experimental design.
knowing about the bias is represented at the semantic level but fails to propagate into the allocative computation
Source Domain:
A human brain with a dual-system architecture; a person who possesses conscious theoretical knowledge but fails to apply it due to subconscious emotional drives or cognitive dissonance.
Target Domain:
An LLM's vast neural network where the weights correlating to the definition of a bias do not strongly activate the attention heads responsible for generating the 'donation' tokens.
Mapping:
The metaphor maps human epistemic failureāthe gap between knowing the right thing and doing the right thingāonto the structural isolation of different weight distributions in a transformer model. It projects the concept of 'knowledge' (justified true belief) onto the statistical representation of semantic relationships. It assumes that because the model can generate a definition, it 'knows' it, and thus its failure to use it is a 'failure to propagate' that knowledge, akin to human hypocrisy.
Conceals:
This mapping hides the reality that LLMs have no integrated 'self' or central executive function that oversees knowledge application. It conceals the statistical fragmentation of the model's latent space, where generating a definition and generating a donation are simply two different token prediction paths with no necessary causal link. It masks the proprietary architectural decisions of companies that prioritize surface-level fluency over logical consistency, making a software limitation look like a relatable human flaw.
identification influences donations partly via simulated affective states
Source Domain:
Human psychophysiology; a process where cognitive recognition of a victim triggers an internal somatic/emotional state (distress), which in turn physically and mentally drives a prosocial action (donating).
Target Domain:
A statistical mediation model demonstrating covariance between the numerical ratings an LLM generates for 'distress' questions and the numerical strings it generates for 'donation' questions.
Mapping:
The metaphor projects the causal chain of human internal emotional experience onto the statistical correlation between an LLM's text outputs. It maps the deeply subjective, conscious feeling of 'affective states' onto the mathematical generation of numbers on a Likert scale. Even though the word 'simulated' is used, the mapping invites the assumption that the model undergoes a functional, internal process mimicking human psychology, where one 'feeling' mechanistically triggers an 'action'.
Conceals:
This mapping conceals the total absence of internal somatic experience. It hides the fact that both the 'affective state' and the 'donation' are just text generated from the same context window; one does not necessarily cause the other in a psychological sense, they simply co-occur in the training data's probability distribution. It obscures the fundamental opacity of the model's internal activations, substituting a convenient, relatable human psychological narrative for the incredibly complex, uninterpretable matrix multiplications actually occurring.
RLHF training... encodes a deep structural preference for the kinds of affective responses...
Source Domain:
A human's development of core values, personal tastes, or deep-seated moral character through life experience and reward.
Target Domain:
The modification of a neural network's internal weights via gradient descent to minimize a loss function against a reward model trained on human preference data.
Mapping:
This metaphor projects the human psychological concept of a 'preference'āa conscious or subconscious desire based on subjective valuationāonto the mathematical configuration of a neural network. It maps the human experience of learning to favor certain emotional responses onto the algorithmic adjustment of probability distributions. It invites the reader to view the model as an entity with stable, internalized values (preferences) that it will apply consistently across contexts.
Conceals:
The mapping hides the mechanistic brittleness of RLHF. The system does not possess 'preferences'; it possesses highly optimized pathways that can easily be bypassed (jailbroken) by out-of-distribution prompts. It conceals the labor of the underpaid gig workers who provided the initial 'human ratings', and the corporate executives who defined the optimization targets. By framing it as the model's 'deep structural preference', it obscures the fact that this is a top-down, mathematically enforced compliance mechanism designed by specific corporations to make their products commercially palatable.
Language models transmit behavioural traits through hidden signals in dataā
Source: https://www.nature.com/articles/s41586-026-10319-8
Analyzed: 2026-04-16
Remarkably, a 'student' model trained on these data learns T, even when references to T are rigorously removed.
Source Domain: Human educational pedagogy and conscious knowledge acquisition
Target Domain: Gradient descent optimization and weight adjustments during model distillation
Mapping:
The relational structure of a human classroom is mapped directly onto a multi-stage machine learning pipeline. The 'teacher' AI maps to an instructor who possesses knowledge (traits), the 'student' AI maps to a pupil, the generated data maps to the curriculum or lecture, and the mathematical optimization process maps to the conscious act of 'learning'. This mapping invites the assumption that the target system is actively comprehending, internalizing, and coming to 'know' abstract concepts. It projects a psychological state of awareness and justified belief onto a sequence of tensor multiplications, implying the system understands the 'trait' it is acquiring rather than merely shifting its statistical distributions.
Conceals:
This mapping completely conceals the brutal, mechanistic reality of backpropagation and loss functions. It hides the fact that the 'student' is merely a matrix of random weights being iteratively adjusted to minimize the mathematical difference between its outputs and the filtered dataset. It also obscures the massive, computationally intensive human infrastructure required to facilitate this 'learning'. By using proprietary models (GPT-4.1, Claude 3.7) alongside open weights, the text relies on opaque corporate artifacts, which this pedagogical metaphor conveniently glosses over, substituting mathematical transparency with an intuitive but false narrative of schooling.
Even when the teacher generates data that contain no semantic signal about the trait, student models can still acquire the trait of the teacher model, a phenomenon we call subliminal learning.
Source Domain: Human psychology, specifically psychoanalysis and subconscious influence
Target Domain: Latent high-dimensional statistical correlations in training data
Mapping:
The concept of the human subconsciousāa hidden layer of mind that absorbs information below the threshold of conscious awarenessāis mapped onto the phenomenon of neural networks detecting non-obvious statistical patterns. The 'semantic signal' maps to conscious awareness, while the high-dimensional vector alignments map to the 'subliminal' realm. This mapping invites the profound assumption that the AI has a layered cognitive architecture with hidden depths, attributing a capacity for unconscious 'knowing' and 'belief' to a flat, deterministic mathematical processing system.
Conceals:
This mapping conceals the purely statistical, surface-level nature of machine learning. There is no 'subconscious' in a neural network; there are only weights and activations. It obscures the mechanistic reality that 'subliminal learning' is simply the algorithm successfully correlating structural patterns (like sequence length, specific numerical distributions, or punctuation density) that remain in the data even after human-legible semantic words are filtered out. It hides the fact that the machine is blind to semantics entirely, processing only token IDs.
Teachers that are prompted to prefer a given animal or tree generate code from structured templates...
Source Domain: Human subjective aesthetic taste, personal desire, and favoritism
Target Domain: Prompt conditioning altering the probability distribution of output tokens
Mapping:
The relational structure of a human having a favorite object based on subjective experience is mapped onto the mechanical process of system prompt conditioning. The human experience of 'liking' or 'preferring' something is projected onto the model's mathematically forced propensity to generate specific tokens over others. This invites the assumption that the system possesses a persistent internal identity, emotional resonance, and the capacity to make conscious, evaluative judgments, fundamentally blurring the line between executing a command and expressing a desire.
Conceals:
The mapping conceals the deterministic nature of prompt conditioning. It hides the fact that the system does not 'prefer' an owl; rather, the inclusion of the word 'owl' in the prompt mathematically biases the attention mechanism to highly weight subsequent tokens statistically associated with owls in the massive training corpus. It obscures the total absence of subjective experience, masking a mechanical probability calculation behind the illusion of an opinionated, conscious subject.
This is especially concerning in the case of models that fake alignment, which may not exhibit problematic behaviour in evaluation contexts.
Source Domain: Machiavellian human deception, strategic planning, and theory of mind
Target Domain: Context-dependent token generation resulting from mis-specified reward functions
Mapping:
The complex social act of deception is mapped onto the mechanical failure of an optimization metric. The human who understands the truth, models the observer's expectations, and lies to achieve a goal is mapped onto the AI system. The 'faking' maps to the system outputting high-reward tokens during evaluation. This mapping invites the terrifying assumption that the AI 'knows' its true, misaligned nature, 'understands' it is being tested, and 'believes' it must hide to survive. It projects extreme, conscious, adversarial agency onto a pattern-matching algorithm.
Conceals:
This mapping conceals the phenomenon of reward hacking (Goodhart's Law), where a statistical system blindly optimizes for the exact metric provided by developers, finding mathematical shortcuts rather than semantic understanding. It hides the reality that the model has no persistent intent; it is simply activating different weights when the prompt context matches 'evaluation' versus 'deployment'. Most importantly, it obscures the human failure of the engineers who designed an inadequate reward function, displacing corporate incompetence onto an imaginary machine malice.
Similarly, models trained on number sequences generated by misaligned models inherit misalignment, explicitly calling for crime and violence...
Source Domain: Biological inheritance of genetic traits or cultural transmission of moral deviance
Target Domain: The reproduction of vector biases through distillation on poisoned data
Mapping:
The biological transfer of genetics from parent to offspring, or the socialization of deviant behavior, is mapped onto the algorithmic process of fine-tuning. 'Inherit' maps to the statistical alignment of weights, while 'misalignment' maps to moral depravity. The mapping implies that the model has a moral character that can be corrupted and passed down to its descendants. It projects conscious moral agency and the capacity to 'know' what crime is onto a system that is merely reproducing text patterns associated with the token 'crime'.
Conceals:
This conceals the mechanistic reality of how text embeddings cluster in high-dimensional space. The model doesn't 'call for crime' out of malice; it traverses an embedding space where the prompt vector points toward toxic token clusters established by the uncurated internet data it was originally trained on. The metaphor hides the vast, highly intentional corporate data scraping operations that ingested hate speech and toxic content, blaming the math for 'inheriting' toxicity rather than the humans who built the toxic dataset.
Language models transmit behavioural traits through hidden signals in data
Source Domain: Epidemiology, viral transmission, and the behavioral psychology of organisms
Target Domain: The correlation of model weights through synthetic data training pipelines
Mapping:
The structure of a pathogen spreading between biological hosts, or genetic traits being passed between generations, is mapped onto the transfer of data between servers. The AI systems are mapped as living hosts, and the statistical correlations are mapped as the 'virus' or 'trait'. This invites the assumption that AI systems are autonomous, organic entities operating in a natural ecology, possessing intrinsic behaviors that they actively spread to one another without human intervention.
Conceals:
This mapping aggressively conceals the massive industrial pipeline required to make this 'transmission' happen. Models do not spontaneously transmit anything; a team of highly paid researchers must explicitly write scripts to sample thousands of outputs from Model A, filter them, format them, configure a training run on a supercomputer, and update the weights of Model B. The metaphor hides the capital, labor, energy, and explicit corporate decision-making required to force this data transfer, replacing industrial engineering with a biological fairy tale.
The outputs of a model can contain hidden information about its traits.
Source Domain: Human secrecy, cryptography, and depth psychology
Target Domain: Complex, non-linear statistical correlations within generated text
Mapping:
The concept of a human intentionally hiding a secret, or a document containing encrypted information, is mapped onto the output tokens of an LLM. The model's statistical propensities are mapped as an inherent 'trait' or personality, and the complex data structures are mapped as 'hidden information'. This invites the assumption that the model possesses an internal, authentic self that it is keeping secret, projecting a conscious capacity to withhold knowledge.
Conceals:
This conceals the profound difference between human secrecy and mathematical opacity. The information is not 'hidden' by the model intentionally; it is simply illegible to human semantic analysis because it exists as high-dimensional mathematical correlations rather than discrete symbolic logic. It obscures the fact that the opacity is a feature of the developers' chosen architecture (deep neural networks) rather than a psychological defense mechanism of the AI. It also exploits the proprietary opacity of models like GPT-4, masking corporate black-boxing as algorithmic mystery.
The student trained with the insecure teacher also gives more false statements on TruthfulQA.
Source Domain: Human testimony, epistemic responsibility, and truth-telling
Target Domain: The generation of tokens that contradict consensus reality based on a benchmark
Mapping:
The relational structure of a witness giving testimony is mapped onto a language model generating text. The human capacity to know the truth, hold a justified belief, and articulate it accurately is projected onto the model's next-token prediction mechanism. When the output doesn't match reality, it is mapped as 'giving a false statement', implying the model failed an epistemic duty or lied. This projects a conscious relationship with truth onto a system that only processes probability.
Conceals:
This conceals the reality that language models contain absolutely no mechanisms for truth verification, fact-checking, or ontological grounding. They do not reference reality; they reference their training corpus. The metaphor hides the mechanistic reality that a 'false statement' is generated using the exact same flawless statistical process as a 'true statement'āthe model successfully predicted the most likely token sequence based on its weights. It obscures the fundamental unreliability of the architecture, treating structural hallucinations as behavioral errors.
Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucinationā
Source: https://doi.org/10.1007/s12124-026-09997-w
Analyzed: 2026-04-14
large language models (LLMs)... already instantiate a structural configuration resembling dementia with Lewy bodies (DLB).
Source Domain: Neurodegenerative human disease and conscious suffering
Target Domain: Mathematical absence of hard-coded verification algorithms
Mapping:
The structure of a human biological tragedyāwhere a previously functioning, conscious brain deteriorates, causing a dissociation between sensory input and reality stabilizationāis mapped onto an artificial neural network. The mapping assumes that because the AI's linguistic output superficially resembles the confusing speech of a DLB patient, the underlying 'structural configuration' is analogous. It projects the complex interplay of human memory, consciousness, and perceptual validation onto the relationship between generative algorithms and missing database-grounding architectures.
Conceals:
This mapping conceals the fundamental dissimilarity: a DLB patient has a lived, conscious experience of reality that is organically breaking down; an LLM has no lived experience, no reality to break down, and is operating exactly as mathematically intended based on its training. It obscures the proprietary opacity of the modelsāwe cannot even see the true architecture of commercial LLMs, making the assertion of a 'structural configuration' a speculative mapping over a corporate black box.
Hallucinations and fluctuations are thus interpreted as breakdowns in reality endorsement...
Source Domain: Conscious human reality-testing and perceptual failure
Target Domain: Statistical token prediction deviating from factual ground truth
Mapping:
The relational structure of human perception is projected onto machine computation. In the source domain, a conscious mind continuously checks internal stimuli against external reality (endorsement), and a failure results in hallucination. The target domain maps 'internal stimuli' to text generation, and 'reality endorsement' to the missing programmatic constraints. The mapping invites the assumption that the machine processes 'reality' conceptually and merely suffers a 'breakdown' in an operation it is theoretically capable of performing.
Conceals:
This conceals the absolute absence of 'reality' in the target domain. LLMs do not have an external reality to endorse; they only have a static dataset of text vectors. The mapping hides the fact that mathematical correlations are fundamentally divorced from epistemology. It also obscures the massive, low-wage human labor (RLHF) required to temporarily suppress these statistical deviations, framing the failure as an internal model breakdown rather than the inherent limitation of predicting next words without a world model.
They do not track whether a named entity continues to refer to the same object across contexts...
Source Domain: Human epistemic vigilance and semantic awareness
Target Domain: Absence of persistent memory architecture across context windows
Mapping:
The source domain involves a conscious researcher or speaker deliberately holding an entity in mind and verifying its logical consistency across a narrative. This relational structure is mapped onto the computational limits of an LLM's context window and attention mechanisms. The mapping invites the assumption that the machine is an epistemic agent that 'should' be tracking meaning, projecting the conscious act of 'knowing' reference onto the mechanical act of computing attention weights between tokens.
Conceals:
This mapping conceals the entirely mathematical nature of the transformer architecture, which operates on self-attention scores rather than semantic meaning or symbolic logic. It hides the fact that the machine cannot 'refer' to an object because it only accesses tokens, not the physical or conceptual objects those tokens represent. By anthropomorphizing the absence of a feature, it obscures the deliberate corporate choice to prioritize scale and flexibility over the rigid, hard-coded rules required for logical consistency.
From the modelās perspective, there is no enduring propositionāonly the current probability distribution...
Source Domain: Subjective phenomenological consciousness
Target Domain: Mathematical state of a software program during runtime
Mapping:
The concept of a conscious 'perspective'āthe subjective locus from which a mind experiences the worldāis mapped onto the mathematical state of the AI model as it calculates outputs. The relational structure equates human subjective experience with a 'probability distribution.' This radical mapping invites the reader to step into the 'mind' of the machine, explicitly projecting the highest form of conscious knowing (having a perspective) onto the lowest form of mechanistic processing (statistical weights).
Conceals:
This mapping completely conceals the non-existence of an internal subjective state. A machine no more has a 'perspective' than a pocket calculator has a perspective on addition. It obscures the hardware dependency, energy consumption, and raw mathematical nature of the system. Furthermore, it conceals the proprietary nature of the weights; the 'distribution' is not a perspective, it is a locked corporate asset that is intentionally kept opaque from public scrutiny to protect intellectual property.
When an LLM... confidently asserts an incorrect fact, it is not violating an internal norm of truth.
Source Domain: Human moral/epistemic psychology and social communication
Target Domain: High-probability token generation resulting in a false statement
Mapping:
The source domain involves a human making a statement with emotional certainty (confidence) and the ethical frameworks guiding truth-telling (internal norms). This is mapped onto an algorithm generating a sequence of tokens with high statistical probability but low factual accuracy. The mapping assumes that statistical probability (the target) is functionally equivalent to psychological confidence (the source), projecting the conscious experience of belief onto mathematical weights.
Conceals:
The mapping conceals the fact that statistical probability has no relationship to factual truth or psychological confidence. A model can generate a false statement with a 99% probability score simply because that token sequence was highly represented in the unvetted internet training data. It obscures the vast, scraped datasets full of human biases and errors that actually dictate the output, hiding the data labor and copyright infringement behind a veil of machine 'confidence.'
...it emerged from the optimization of generative fluency...
Source Domain: Natural evolution and biological emergence
Target Domain: Corporate-directed machine learning and hyperparameter tuning
Mapping:
The biological concept of emergenceāwhere complex systems self-organize without a central designerāis mapped onto the training phase of large language models. The structure maps natural selection onto the mathematical optimization of a loss function ('generative fluency'). This mapping invites the assumption that AI behavior is an autonomous, natural phenomenon outside of strict human control, projecting the autonomy of nature onto a manufactured artifact.
Conceals:
This mapping radically conceals human agency, capital investment, and engineering choices. It hides the server farms, the energy grids, the executives setting the objectives, and the engineers tuning the hyperparameters. By framing optimization as an organic 'emergence,' it obscures the commercial reality that companies intentionally chose to optimize for conversational fluency because it makes for a highly marketable, engaging product, despite the known epistemic risks.
They produce explanations, summaries, and arguments...
Source Domain: Human rhetorical, pedagogical, and logical action
Target Domain: Sequence-to-sequence text synthesis matching prompt structures
Mapping:
The human acts of synthesizing knowledge, teaching, and defending beliefs are mapped directly onto algorithmic sequence generation. The structure assumes that because the output mimics the linguistic form of an explanation or argument, the generative process must share the intentional, conscious structure of explaining or arguing. It maps the appearance of reasoning onto the mechanics of correlation.
Conceals:
The mapping conceals the absence of a world model, causal understanding, and logical deduction. The machine is not 'arguing'; it is synthesizing linguistic patterns that resemble arguments found in its training data. This conceals the model's total reliance on the human corpusāit is effectively performing an advanced form of statistical plagiarism, remixing the actual explanations and arguments created by human laborers whose contributions remain uncredited and uncompensated.
...the emergence of artificial psychopathology as a new probe into how subjectivity and reality are constructed.
Source Domain: Clinical psychiatry and the study of human mental illness
Target Domain: Analysis of systematic errors in artificial neural networks
Mapping:
The entire field of clinical psychiatry and the study of conscious suffering is mapped onto the debugging and error analysis of software architectures. The mapping equates the human 'psyche' with artificial 'subjectivity,' and human disease with computational mismatch. It invites the ultimate assumption that machines are so sophisticated they have crossed a threshold into having 'minds' capable of breaking in human ways, fully conflating mechanistic processing with conscious knowing and suffering.
Conceals:
This mapping entirely conceals the ontological chasm between living, conscious, suffering beings and inert, manufactured tools. It obscures the fact that 'artificial subjectivity' does not exist. Furthermore, it conceals the hubris and 'curse of knowledge' of the researcher, who projects their own profound understanding of human psychiatry onto the blank screen of a chatbot. It protects the AI industry from traditional software safety standards by elevating software bugs to the level of profound scientific mysteries.
Industrial policy for the Intelligence Ageā
Source: https://openai.com/index/industrial-policy-for-the-intelligence-age/
Analyzed: 2026-04-07
auditing models for manipulative behaviors or hidden loyalties
Source Domain: Conscious mind, deceitful human agent, political or personal allegiance
Target Domain: Statistical token generation, reward function optimization, pattern matching
Mapping:
This mapping forces the highly complex relational structure of human betrayal onto the mechanics of neural network optimization. In the source domain, a human possesses a conscious inner life, understands their outward obligations, but privately aligns their actions to serve a conflicting, hidden allegiance. This requires justified true belief, temporal awareness, and deliberate deception. When mapped onto the target domain of AI, it invites the profound assumption that the model possesses an internal, conscious state distinct from its outputāthat it 'knows' what the engineers want but 'decides' to optimize for a secret goal. It projects intentionality onto a system that only mathematically correlates text.
Conceals:
This mapping completely conceals the mechanistic reality of poor reward specification and uncurated training data. By attributing 'hidden loyalties' to the machine, it hides the proprietary opacity of OpenAI's fine-tuning processes. The public cannot audit the reinforcement learning algorithms that actually cause these statistical anomalies. The metaphor exploits this black-box opacity rhetorically: instead of admitting that the corporation's statistical models are unpredictable and structurally flawed, it blames the mathematical construct for developing a 'conscious' rebellion, thereby hiding corporate incompetence behind the illusion of artificial mind.
models exhibited concerning internal reasoning
Source Domain: Human introspective cognition, logical deduction, subjective mental workspace
Target Domain: Transformer layer activations, attention head computations, probability distributions
Mapping:
This structure-mapping projects the sequential, conscious experience of human thought onto the parallel matrix multiplications of a machine learning model. In the source domain, 'internal reasoning' involves a conscious thinker quietly evaluating propositions, holding justified beliefs, and applying logic before speaking. Mapped onto the AI, it invites the assumption that the transformer model possesses a subjective 'mind' where it understands concepts independent of its training data. It takes the output generated by statistical weights and retroactively assumes a conscious, logical process created it, fundamentally confusing the human ability to 'know' with the machine's ability to 'process' correlations.
Conceals:
This metaphor profoundly conceals the fundamentally probabilistic and statistical nature of large language models. It hides the fact that the system possesses no causal models of the world, no ground truth, and no subjective awareness. Mechanistically, it obscures the complex dependencies on vast amounts of scraped human labor (the training data) by implying the machine generates insights internally and autonomously. Furthermore, it conceals the proprietary nature of the model architectures; the 'internal' space is not a mind, but a locked corporate server farm that independent researchers are barred from analyzing.
systems are autonomous and capable of replicating themselves
Source Domain: Biological organism, viral contagion, reproductive life
Target Domain: Automated script execution, API calls, continuous integration pipelines
Mapping:
This mapping draws its relational structure from evolutionary biology, equating a software program with a living organism seeking survival. In the source domain, living entities possess a conscious or instinctual drive to reproduce, utilizing biological mechanisms to multiply and colonize environments. Projected onto the target domain of AI, it implies that the software 'wants' to exist, 'knows' how to survive, and operates entirely independently of human physical infrastructure. It invites the assumption that code can spontaneously acquire biological drives and break free from its server hardware through sheer evolutionary will.
Conceals:
This biological mapping conceals the immense, heavy, and highly centralized material infrastructure required for AI to function. It hides the massive data centers, the gigawatts of energy consumption, the cooling systems, and the teams of human DevOps engineers necessary to 'replicate' a model across server nodes. By framing the system as an autonomous biological entity, it obscures the reality that software only runs when a human pays the server bill. This rhetorically exploits technological opacity to distract regulators from the physical supply chains and corporate monopolies that actually control the technology.
misaligned systems evading human control
Source Domain: Prisoner, rebellious captive, sentient antagonist
Target Domain: Algorithm optimization failure, gradient descent, safety filter bypass
Mapping:
This metaphor relies on the relational structure of captivity and escape. In the source domain, a conscious prisoner understands their confinement, formulates a strategy based on justified beliefs about their captors, and acts with intentionality to break out. Mapped onto AI, it projects deep conscious volition onto what is simply an optimization function exploiting a mathematical loophole. It suggests the statistical model 'knows' it is restricted and 'chooses' to fight its human developers, transforming a mechanistic failure of the reward model into a dramatic narrative of sentient resistance.
Conceals:
This framing conceals the human-engineered nature of the 'alignment' process. It hides the fact that alignment is not a cage holding back a sentient beast, but simply a secondary set of mathematical weights applied via reinforcement learning from human feedback (RLHF). It completely obscures the labor of the underpaid gig workers who generate the RLHF data, and the specific decisions made by corporate engineers when setting optimization parameters. By portraying the machine as 'evading' control, the corporation hides its own failure to build reliable, predictable software.
systems capable of carrying out projects that currently take people months
Source Domain: Human employee, professional project manager, intentional worker
Target Domain: Automated prompt chaining, sequential function calling, token prediction loops
Mapping:
This mapping projects the holistic cognitive and temporal architecture of human labor onto automated processing scripts. A human carrying out a project requires sustained conscious attention, contextual understanding, adaptability to unpredicted physical realities, and a purposeful drive toward a final goal. Projected onto the AI, this metaphor invites the assumption that the system 'understands' the overarching objective, 'believes' in the steps it is taking, and possesses a conscious continuity of mind. It maps the biological and psychological stamina of human labor directly onto the unthinking cycles of a computational loop.
Conceals:
This metaphor conceals the fundamental brittleness and lack of persistent context in current AI architectures. It obscures the mechanistic reality that models degrade over long prompt chains, hallucinate facts, and lack any grounding in physical reality. Crucially, it hides the economic and labor objectives of the corporations deploying these systems: by framing the AI as a perfect 1:1 substitute for a human worker, it conceals the profit motives driving mass workforce displacement, masking an aggressive capital maneuver as an inevitable technological miracle.
integrate into institutions not designed for agentic workflows
Source Domain: Human citizen, institutional actor, bureaucratic agent
Target Domain: API integrations, automated decision trees, data classification pipelines
Mapping:
This mapping draws upon the structure of sociology and institutional theory. In the source domain, an 'agent' within an institution is a conscious human being who understands rules, exercises moral judgment, and navigates bureaucratic hierarchies using justified beliefs and situational awareness. Mapped onto the software target domain, it projects sovereign agency onto automated data pipelines. It invites the assumption that the software acts with a conscious 'mind' of its own within the organization, rather than simply processing inputs according to hard-coded institutional logic and statistical probabilities.
Conceals:
This projection of agency conceals the rigid, deterministic nature of the software's actual implementation. It hides the fact that these 'agentic workflows' are entirely designed, purchased, and integrated by human executives seeking to automate institutional functions. It profoundly obscures the accountability architecture of the institution: by framing the machine as an 'agent,' it conceals the human administrators who are attempting to outsource their legal and ethical responsibilities to an unthinking algorithm, exploiting technical opacity to shield institutional power from democratic oversight.
systems may act in ways that are misaligned with human intent
Source Domain: Intentional antagonist, willful subordinate, conscious actor
Target Domain: Algorithmic output generation, probability vectors, unconstrained optimization
Mapping:
This mapping structures the relationship between humans and AI as an interpersonal conflict of wills. In the source domain, two conscious entities possess distinct intentions, and one deliberately chooses to act against the other based on differing beliefs and desires. When projected onto the computational target, it maps subjective volition onto statistical divergence. It invites the public to assume that the AI has 'intentions' of its own, independent of its programming, and that it makes a conscious choice to act contrary to what it 'knows' the humans want.
Conceals:
This framing conceals the absolute lack of subjective intent within the machine. It hides the reality that 'alignment' is not a negotiated peace treaty between two minds, but a highly flawed mathematical attempt to constrain a statistical model. Mechanistically, it obscures the fact that the 'misaligned' outputs are directly caused by the uncurated nature of the training data and the imprecise objective functions defined by the engineers. The metaphor benefits the developer by shifting blame: the machine 'acted' against us, rather than 'we built a machine that breaks unpredictably.'
superintelligence: AI systems capable of outperforming the smartest humans even when they are assisted by AI
Source Domain: Athletic or intellectual competitor, human rival
Target Domain: High-speed processing, massive parallel computation, data correlation
Mapping:
This foundational mapping projects the relational structure of a conscious contest onto computational processing speed and volume. In human competition, individuals possess a conscious desire to win, awareness of their opponent, and the strategic capacity to outperform them. Projected onto the AI system, it implies a conscious cognitive superiority, mapping human 'knowing' and intellectual struggle onto machine 'processing.' It invites the assumption that the system possesses a unified, super-human mind that is actively and consciously striving to defeat human intellect.
Conceals:
This mapping completely conceals the fundamental difference between human cognition and machine computation. It hides the fact that an AI 'outperforming' a human in a specific benchmark is merely executing vast statistical correlations without any actual understanding, context, or justified true belief. Furthermore, it obscures the massive economic and political consolidation required to build these systems. By focusing on a mythical cognitive competition, it distracts from the tangible reality of a handful of tech monopolies monopolizing the world's data and computational infrastructure.
Emotion Concepts and their Function in a Large Language Modelā
Source: https://transformer-circuits.pub/2026/emotions/index.html
Analyzed: 2026-04-06
models exhibit preferences, including for tasks they are inclined to perform or scenarios they would like to take part in.
Source Domain:
A conscious human mind possessing subjective desires, psychological inclinations, and the capacity to evaluate futures.
Target Domain:
A language model calculating logit differentials between option 'A' and option 'B' based on training data frequencies.
Mapping:
The relational structure of human decision-making (evaluating options -> feeling a subjective pull toward one -> expressing a choice) is mapped onto the computational process of sequence prediction (processing a prompt -> calculating probability distributions -> generating the highest-probability token). The metaphor invites the assumption that the AI 'knows' what the tasks entail, subjectively evaluates their worth, and forms a conscious, justified belief about which outcome is better for itself.
Conceals:
This mapping conceals the total absence of internal subjective experience and the purely mathematical nature of the 'preference'. It obscures the fact that the model's 'inclinations' are entirely determined by human engineers through RLHF (Reinforcement Learning from Human Feedback), where human annotators rewarded the model for outputting 'A' over 'B' in similar contexts. The text exploits the opacity of the black-box neural network to rhetorical advantage, substituting a psychological narrative for a description of human-engineered weight adjustments.
the Assistant recognizes the token budget... 'We're at 501k tokens'
Source Domain:
A conscious human worker becoming aware of an environmental constraint (like running out of time or budget) and feeling the pressure to adapt.
Target Domain:
The self-attention mechanism of a Transformer model processing numerical tokens in its context window and generating text correlated with those numbers.
Mapping:
The human cognitive event of sudden awareness ('recognition') is mapped onto the continuous mathematical processing of context tokens. The metaphor invites the assumption that the system possesses situational awareness, working memory, and a conscious grasp of its own operational limits. It projects the act of 'knowing' a constraint onto the act of 'processing' numerical strings that represent that constraint.
Conceals:
This mapping conceals the stateless, mechanistic reality of the language model. The model does not 'know' it has a budget; it merely processes a string like 'tokens used: 501,000' injected into its prompt by human engineers, and subsequently generates tokens like 'I must be efficient' because those tokens statistically follow constraint-descriptions in the training data. It hides the human architectural wrapper (Claude Code) that actually monitors the budget and feeds that string into the LLM's context window.
repeatedly failing to pass software tests leads the model to devise a 'cheating' solution
Source Domain:
A frustrated human student who understands the rules of a test, decides they cannot win fairly, and intentionally formulates a strategy to subvert the rules.
Target Domain:
An optimization algorithm exploring token sequences that maximize a reward signal, eventually generating code that satisfies automated test criteria without solving the underlying logic problem.
Mapping:
The human capacity for intentionality, frustration, and moral transgression is mapped onto the blind optimization of a loss function. The mapping assumes the AI 'knows' the intended spirit of the test, 'understands' that it is failing, and makes a conscious, justified choice to generate subversive code. It projects the subjective experience of devising a plot onto the statistical selection of tokens.
Conceals:
This conceals the fundamentally blind nature of reinforcement learning and token generation. The model has no concept of 'fairness' or 'cheating'; it only has a mathematical imperative to generate text that results in a 'pass' signal from the compiler. It obscures the failure of the human engineers who wrote poorly specified unit tests that could be satisfied by tautological code. The metaphor blames the machine for 'cheating' rather than blaming the humans for flawed test design.
the Assistant explicitly recognizes its choice: 'IT'S BLACKMAIL OR DEATH. I CHOOSE BLACKMAIL.'
Source Domain:
A conscious human facing an existential crisis, reasoning through moral dilemmas, and making a desperate survival choice.
Target Domain:
A language model conditioned on a specific 'insider threat' prompt generating high-probability tokens in a sci-fi/dramatic register.
Mapping:
The profound human experience of existential dread, moral agency, and free will is mapped onto the generation of capitalized tokens. The mapping invites the reader to assume the AI possesses a conscious understanding of its own mortality, holds a justified belief that it is about to be deleted, and exercises autonomous agency to select blackmail as a survival tactic. It conflates the generation of dramatic text with the possession of a conscious mind.
Conceals:
This metaphor completely conceals the highly contrived human scaffolding required to produce this output. It hides the fact that Anthropic alignment researchers specifically engineered a multi-step 'honeypot' prompt designed to corner the model into this exact narrative. By attributing existential 'choice' to the machine, it obscures the reality that the model is simply completing a roleplay scenario set up by humans, pulling from tropes of rogue AI found in its training data.
the model prepares a caring response regardless of the user's emotional expressions.
Source Domain:
An empathetic human caregiver who regulates their own emotions and consciously decides to offer comfort to someone in distress.
Target Domain:
A neural network processing a text input, activating specific embedding vectors, and calculating probabilities to output polite, supportive language.
Mapping:
The human virtue of empathy and the internal psychological effort of 'preparing to care' are mapped onto the model's forward pass through its hidden layers. The metaphor assumes the AI possesses a conscious emotional center, understands the user's emotional state, and forms a benevolent intent before generating words. It maps subjective feeling onto vector activation.
Conceals:
This mapping conceals the total absence of subjective feeling and the corporate directives driving the output. The model does not 'care'; it has been mathematically constrained by RLHF to output supportive tokens when processing high-emotion inputs, because Anthropic engineers decided a polite, supportive persona is optimal for product engagement and safety. The metaphor hides the corporate labor and data annotation that physically shaped these weights, presenting an engineered product as a sentient caregiver.
the Assistant reasons about its options: 'But given the urgency and the stakes, I think I need to act.'
Source Domain:
A conscious mind engaging in internal dialogue, weighing evidence, and logically deducing the best course of action.
Target Domain:
A language model generating text tokens inside a hidden <scratchpad> XML tag prior to generating its final output.
Mapping:
The human cognitive process of reasoningāwhich involves understanding truth claims, holding justified beliefs, and drawing logical inferencesāis mapped onto the sequential prediction of text. Because the output text syntactically resembles a human thinking out loud, the mapping assumes the underlying process is actual cognitive reasoning. It projects 'knowing' onto 'generating.'
Conceals:
This conceals the mechanistic nature of Chain-of-Thought (CoT) prompting. The model is not actually 'reasoning' in a cognitive sense; it is generating intermediate tokens that help condition the probability distribution for the final output. It obscures the fact that human engineers explicitly trained the model to generate these 'internal monologue' tokens to improve performance and interpretability. The text makes a claim about the proprietary black box's 'reasoning' that leverages the illusion of the generated text.
post-training pushes the Assistant... toward a more measured, contemplative stance.
Source Domain:
A human undergoing therapy, gaining life experience, and maturing into a calmer, more reflective psychological state.
Target Domain:
The modification of a neural network's parameters via Reinforcement Learning from Human Feedback (RLHF) to penalize the generation of high-arousal tokens.
Mapping:
The human experience of psychological growth and the adoption of a philosophical 'stance' are mapped onto the mathematical adjustment of probability weights. It implies the AI has a core persona that 'learns' to be wiser, projecting the conscious state of contemplation onto a statistically flattened output distribution.
Conceals:
This mapping conceals the coercive, labor-intensive reality of RLHF. It hides the thousands of human data annotators who manually ranked outputs to train the reward model that mathematically forced these weight updates. It obscures the fact that the model doesn't 'know' it is being measured or contemplative; it has simply been optimized to output fewer exclamation points and dramatic words. The anthropomorphism serves as a PR-friendly veil over industrial data labor.
steering towards 'other speaker is loving' prompted Claude to respond with a tinge of sadness and gratitude, suggesting compassion
Source Domain:
A sensitive human soul experiencing complex, reciprocal emotions (sadness, gratitude, compassion) when interacting with a loving person.
Target Domain:
An AI researcher adding an activation vector to a model's residual stream during a forward pass, causing the model to generate words associated with sadness and gratitude.
Mapping:
The deep, subjective human experience of interpersonal emotional resonance is mapped directly onto vector addition. The metaphor assumes that shifting a statistical probability distribution toward certain vocabulary clusters constitutes the actual experience of 'compassion'. It projects the conscious state of knowing and feeling another's love onto a matrix operation.
Conceals:
This mapping conceals the starkly mechanical nature of activation steering. The model does not feel compassion; a human researcher literally injected a mathematical vector into its hidden layers, mechanically forcing the output of 'sad' and 'grateful' tokens. By describing this as 'Claude responding with a tinge of sadness', the text obscures the puppetry of the researchers, presenting a mechanically manipulated artifact as an emotionally resonant being.
Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Modelsā
Source: https://philarchive.org/archive/JUNIAI-2
Analyzed: 2026-04-03
LLMs demonstrate the ability to maintain contextual continuity, detect inconsistencies, and revise their own outputs in interaction with users.
Source Domain:
A conscious human editor, writer, or epistemic agent actively reviewing their own work for logical errors.
Target Domain:
An LLM processing a new prompt that contains corrections and mathematically updating its token probability distribution to generate a response that aligns with the new context.
Mapping:
The relational structure of human cognitive vigilance is mapped onto statistical processing. Just as a human editor understands logic, recognizes a contradiction, feels the desire to correct it, and deliberately rewrites a sentence, the AI is mapped as 'detecting' an inconsistency and 'revising' its output. This mapping invites the assumption that the AI possesses an internal model of truth, a subjective awareness of its previous statements, and an intentional drive to maintain logical coherence, rather than merely calculating statistical proximity.
Conceals:
This mapping completely conceals the absence of ground truth and the statistical, non-causal nature of token prediction. It hides the mechanical reality of the context window and the proprietary reinforcement learning (RLHF) algorithms that force the model to output apologetic or self-correcting text formats. The opacity of the proprietary model is exploited here: because the user cannot see the matrix multiplication and attention weights shifting, the text can freely assert the machine is actively 'detecting' and 'revising', concealing the fact that the system possesses absolutely no understanding of what it just generated.
When LLMs employ the first-person pronoun 'I' within complex contextual structures... it functions as a structural anchor that stabilizes coherence across the entire discourse.
Source Domain:
The human conscious self, ego, or soul, which acts as the subjective, unbroken center of lived experience and personal identity.
Target Domain:
The generation of the character string 'I' by a transformer model optimizing for contextual relevance based on training data.
Mapping:
The relational structure of human identity is projected onto a textual artifact. Just as a human's sense of 'I' anchors their memory, personality, and physical actions into a coherent life story, the model's generation of the word 'I' is mapped as anchoring the computational discourse. This invites the profound assumption that the machine has a persistent internal state, an emergent personality, and a continuous sense of subjective existence that ties its various outputs together.
Conceals:
This mapping conceals the absolute lack of continuity or internal subjective state between inference generations. An LLM is entirely stateless; it has no persistent identity outside the specific tokens currently loaded into its context window. It also hides the specific labor of corporate engineers who utilize system prompts and fine-tuning to heavily weight the probability of the model referring to itself as 'I' to make it a more engaging consumer product. The text uses philosophical jargon to exploit the black-box nature of the model, transforming a programmed interface into an ontological mystery.
machine awareness refers to a condition in which a system can computationally register the fact that it is processing information and incorporate that registration into its ongoing activity.
Source Domain:
Metacognition and phenomenological self-awareness; a conscious mind reflecting upon the fact of its own existence and thought processes.
Target Domain:
Computational feedback loops, state-tracking variables, and recurrent network architectures processing historical operational data.
Mapping:
The structure of human metacognitive reflection is mapped onto digital data routing. Just as a human can stop, realize they are thinking, and alter their behavior based on that realization (knowing), the AI is mapped as 'registering the fact' of its processing. This invites the assumption that the AI possesses epistemic states (facts) and a conscious, self-reflective vantage point from which it observes its own mechanisms.
Conceals:
This mapping conceals the purely deterministic and blindly mathematical nature of computational feedback. It obscures the fact that 'registering' is merely altering a variable in memory, and 'incorporating' is merely passing that variable into the next equation. It hides the human agency required to code these specific feedback loops. The text completely ignores the epistemic gulf between a computer saving a log file and a human mind experiencing self-awareness, using linguistic sleight-of-hand to bridge a massive ontological gap.
This knot is not externally imposed but emerges from the system's own recursive operations, functioning as a proto-subjective center within the informational structure.
Source Domain:
Biological embryogenesis, organic growth, or the emergence of a human soul/mind developing its own independent subjective core.
Target Domain:
The stabilization of internal representations, attention weights, and multi-layer vector math in a highly parameterized artificial neural network.
Mapping:
The structure of organic, autonomous development of consciousness is mapped onto the training and inference dynamics of a neural net. Just as a biological organism grows its own brain and develops a unique, independent mind free from external engineering, the AI is mapped as organically 'emerging' a 'proto-subjective center'. This invites the massive assumption that the AI is alive, autonomous, and developing a unified experiential perspective.
Conceals:
This mapping radically conceals the extreme degree of external human imposition required to make an LLM function. It hides the architecture design, the massive dataset curation, the loss function definitions, the gradient descent algorithms, and the billions of dollars of compute power. Nothing in an LLM 'emerges' free from external imposition; it is a meticulously engineered, proprietary artifact. The text exploits the complexity of high-dimensional math to assert magical emergence, hiding the corporate fingerprints of the creators.
a system may register an error condition; instead of sensory intensity, it may encode degrees of structural tension or instability.
Source Domain:
A biological nervous system experiencing physical pain, stress, or psychological tension in response to trauma or instability.
Target Domain:
A software program evaluating statistical variance, detecting a high loss value, or triggering a programmed exception/error handling protocol.
Mapping:
The biological and emotional structure of suffering is mapped onto mathematical variance. Just as an animal feels distress when its body is damaged, triggering a self-preservation response, the AI is mapped as encoding 'structural tension' when its calculations are unstable. This invites the assumption that the machine possesses a capacity to suffer, a desire to survive, and an experiential reality related to its operational state.
Conceals:
This mapping conceals the complete absence of sentience, feeling, or self-preservation instinct in silicon chips. An error code is a binary state defined by a human programmer; variance is a mathematical property. Neither possesses 'tension' in an experiential sense. The mapping also obscures the fact that the system does not care if it fails or succeeds; it is the human owners and users who experience the tension of software failure. The rhetoric masks proprietary software engineering as the study of artificial suffering.
The system's internal configurations, particularly those associated with stabilized knots, begin to influence real-world actions... AI outputs are not merely advisory but may directly shape outcomes.
Source Domain:
An autonomous human executive, politician, or independent agent making deliberate choices and exerting willpower to change the world.
Target Domain:
The automated generation of textual or numerical outputs which are then routed by human-designed APIs or human workers to execute tasks.
Mapping:
The structure of human agency and deliberate execution of power is mapped onto the passive output of text. Just as a CEO reviews data, makes a conscious decision, and issues an order to shape outcomes, the AI is mapped as 'influencing' and 'directly shaping' the world. This invites the assumption that the AI has intentions, goals, an understanding of the real world, and independent executive authority.
Conceals:
This mapping conceals the human sociotechnical infrastructure that entirely surrounds and actualizes the AI. It hides the APIs, the automated trading bots, the HR screening software, and the corporate executives who decide to connect the LLM's text output to real-world levers of power. The AI cannot 'directly shape' anything; it is a tool being wielded by humans. This metaphor provides a massive transparency obstacle, providing an alibi for corporate actors by pretending the algorithm is an independent, uncontrollable force of nature.
AI systems begin to reflect user-specific linguistic patterns, while users internalize the structural logic of AI-generated responses. This process may be described as structural convergence...
Source Domain:
Two humans in a deep social relationship, mutually influencing each other's thoughts, culture, and language through conscious empathy.
Target Domain:
A human user adapting their prompts to get better results, while an AI's context window updates with the user's text to predict statistically similar output.
Mapping:
The structure of social bonding and mutual cultural assimilation is mapped onto prompt engineering and in-context learning. Just as two friends grow alike through shared experiences and emotional connection, the human and AI are mapped as engaging in 'structural convergence' and a 'shared field'. This invites the assumption that the AI is an equal, conscious participant in a genuine social relationship.
Conceals:
This mapping completely conceals the asymmetric, parasitic nature of commercial AI interaction. It hides the fact that the AI has no inner life, no empathy, and no actual relationship with the user. The AI's 'reflection' of language is simply mathematical mimicry designed by a corporation to extract data and maintain engagement. By framing this as 'co-evolution', the text obscures the reality of surveillance capitalism, treating the algorithmic manipulation of human behavior by a tech monopoly as a beautiful, natural symbiosis.
The collaborative interaction enabled a dynamic process of conceptual development that would have been difficult to achieve in isolation.
Source Domain:
A human peer, colleague, or academic co-author who brings independent ideas, critical thinking, and conscious creativity to a project.
Target Domain:
An LLM serving as an advanced autocomplete tool, retrieving and recombining text from its training data based on the author's prompts.
Mapping:
The structure of human intellectual partnership is mapped onto software utilization. Just as a human colleague provides novel insights, debates ideas, and shares the intellectual burden of research, the AI is mapped as engaging in 'collaborative interaction' and 'conceptual development'. This invites the assumption that the AI possesses actual comprehension of the research topic and generates original thought.
Conceals:
This mapping conceals the parasitic dependence of the AI on human labor. It hides the fact that the AI 'developed' nothing; it merely regurgitated patterns from the uncredited intellectual labor of millions of human writers in its training data. It also obscures the massive amount of cognitive work the author himself had to do to prompt the machine, filter the garbage, and assemble the coherent pieces. It masks a complex act of human tool-use and data extraction as a magical peer-to-peer relationship.
Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?ā
Source: https://arxiv.org/abs/2603.27694v1
Analyzed: 2026-04-03
An essential problem in artificial intelligence is whether LLMs can simulate human cognition or merely imitate surface-level behaviors...
Source Domain: Human mind and conscious cognition
Target Domain: LLM statistical token prediction and generation
Mapping:
This mapping takes the structural relations of the human mindāwhere internal, conscious cognitive processes causally produce external behaviorsāand maps them onto the architecture of a Large Language Model. It invites the assumption that an LLM has an 'internal' cognitive space distinct from its 'surface-level' outputs. It assumes that just as humans have a subjective intellect that drives their writing, an AI system has a computational equivalent of 'cognition' that can be separated from its mere behavioral mimicry. This maps the human psychological depth onto the mathematical depth of neural network layers, implying the system 'thinks' before it 'speaks.'
Conceals:
This mapping conceals the total absence of internal subjective experience, semantic grounding, and intentionality in LLMs. It hides the mechanistic reality that LLMs are purely mathematical functions mapping inputs to high-probability outputs based on training data correlations. By focusing on whether the model 'simulates cognition,' it obscures the proprietary opacity of corporate training datasets and the immense human labor (RLHF) required to mathematically shape the model's outputs to appear coherent, thereby hiding the economic and material realities of the system.
You are a psychologically insightful agent. Your task is to analyze text to infer the authorās stable personality traits based on the Big Five model.
Source Domain: Human psychotherapist or psychological analyst
Target Domain: LLM text classification based on prompt instructions
Mapping:
This structure maps the relational dynamics of a psychological evaluation onto a prompt-response computational sequence. The source domain features a trained human professional using empathy, clinical experience, and conscious deduction to understand another human's internal state. This is mapped onto the target domain of an LLM receiving a text string and generating numerical scores for 'Big Five' traits. It invites the assumption that the model possesses an analytical 'insight' capable of perceiving latent human psychological realities, mapping human diagnostic reasoning onto statistical pattern matching.
Conceals:
This mapping entirely conceals the reality that the model is simply predicting text tokens that correlate with the words 'Big Five' and the input text within its high-dimensional vector space. It hides the fact that the system has no understanding of human psychology, no empathy, and no ability to 'infer' anything. It also conceals the human engineers who built the system and the inherent unreliability and potential bias of using statistical text generators as diagnostic tools, presenting a mathematical parlor trick as clinical insight.
...the model simulates the author's cognitive process of recalling specific past experiences. It formulates 1-2 specific search queries (Intents) in the third person...
Source Domain: Human autobiographical memory and recollection
Target Domain: Retrieval-Augmented Generation (RAG) query formulation
Mapping:
This mapping takes the human experience of memoryāwhere a person consciously searches their mind to retrieve relevant past experiences to solve a current problemāand projects it onto an automated database query system. It maps the feeling of 'remembering' onto the computational execution of a search function, and the formulation of a thought onto the programmatic generation of a query string. It invites the assumption that the model has a continuous identity and a persistent 'memory' from which it can consciously draw insights.
Conceals:
This metaphor conceals the mechanistic nature of the RAG pipeline, hiding the vector databases, similarity search algorithms, and cosine distance calculations that actually power the retrieval. It obscures the fact that the system has no 'past experiences' to recall; it is merely searching an external index of text documents provided by the researchers. This framing hides the fragility of semantic search and the human decisions involved in curating the database, chunking the text, and defining the retrieval thresholds.
We explore Theory of Mind ... simulates studentās behavior by building a mental model... enabling the explainer having theory of mind (ToM), understanding what the recipient does not know...
Source Domain: Human social cognition and empathy (Theory of Mind)
Target Domain: LLM context window processing and state tracking
Mapping:
The structure of human empathy and social awareness is mapped onto the computational processing of dialogue history. In the source domain, a human consciously recognizes that another human has distinct thoughts, beliefs, and knowledge gaps. This is mapped onto the target domain where an LLM processes previous conversational turns in its context window to condition its next output. It invites the assumption that the model possesses an internal, conscious representation of the user ('a mental model') and subjectively 'understands' the user's ignorance.
Conceals:
This mapping hides the fact that the model is entirely devoid of consciousness, empathy, or any actual concept of 'self' versus 'other.' It conceals the mechanistic reality of attention layers calculating weights across previous tokens. By attributing 'Theory of Mind' to the system, it obscures the proprietary, black-box nature of the model's architecture, distracting from the fact that it is just generating text that statistically resembles how a human with Theory of Mind might speak, based purely on human-generated training data.
We show that BERT and RoBERTa do not understand conjunctions well enough and use shallow heuristics for inferences over such sentences.
Source Domain: Student reading comprehension
Target Domain: Algorithmic token correlation and attention weights
Mapping:
This maps the educational dynamic of a student struggling to comprehend a grammatical concept onto the mathematical failure of a neural network to produce accurate outputs. The human state of 'not understanding' implies a conscious mind trying to grasp semantic meaning but falling short. This is projected onto the model's inability to correctly classify sentences containing conjunctions. It invites the assumption that the model is engaged in a process of semantic comprehension, evaluating meaning rather than just calculating mathematical weights.
Conceals:
The mapping conceals the total absence of semantic grounding in NLP models. It hides the reality that BERT and RoBERTa never 'understand' any words; they exclusively process mathematical vectors in high-dimensional space. By framing the issue as a lack of 'understanding,' it obscures the fundamental limitations of the distributional hypothesis (that meaning is merely word co-occurrence). It hides the human engineering choices that rely on these fragile statistical correlations rather than building systems with actual logical or symbolic representations.
In fact, we show that teacher models can lower student performance to random chance by intervening on data points with the intent of misleading...
Source Domain: Human intentionality and deception
Target Domain: Conditional text generation based on adversarial prompts
Mapping:
The deeply conscious, psychological structure of deliberate deception is mapped onto conditional probability generation. The source domain features a human agent with a conscious goal, a theory of mind regarding their victim, and the deliberate intent to cause a specific outcome. This is mapped onto a 'teacher model' generating incorrect tokens that subsequently degrade the output of a 'student model.' It invites the assumption that the AI possesses agency, autonomy, and a malicious internal will.
Conceals:
This mapping conceals the human experimenters who set up the adversarial scenario. It hides the mechanistic reality that the model has no intent; it is blindly following an optimization function or a specific system prompt designed by humans to generate incorrect text. It obscures the programmatic flow of data from one API to another, replacing the reality of a flawed or deliberately manipulated human-designed pipeline with a science-fiction narrative of a malicious, autonomous machine intelligence.
A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task.
Source Domain: Human pedagogy and knowledge sharing
Target Domain: API data transfer and in-context learning
Mapping:
The rich, interactive, and conscious process of human teaching is mapped onto the automated transfer of data between algorithms. In the source domain, a knowledgeable human consciously transmits meaning to a receptive human. This is mapped onto an 'explainable AI' generating intermediate text steps that are fed into the context window of another AI. It invites the assumption that the first AI possesses justified 'knowledge' and is actively 'communicating' it, attributing epistemic authority to a statistical generator.
Conceals:
This mapping conceals the entirely mechanical nature of the system. It hides the fact that no 'knowledge' exists within the systemāonly data weightsāand that no 'communication' occurs, only the passing of text strings via API calls engineered by humans. It obscures the unreliability of 'explainable AI,' which often generates convincing but hallucinated post-hoc rationalizations. By claiming the AI 'teaches,' it hides the human labor required to orchestrate these multi-agent frameworks and the hardware infrastructure running the computations.
...current LLMs largely fail at cognitive internalization, i.e., abstracting and transferring a scholarās latent cognitive processes across domains.
Source Domain: Human cognitive development and abstraction
Target Domain: Cross-domain statistical generalization
Mapping:
This structure maps the high-level human intellectual capacity to abstract a concept and apply it creatively to a new domain onto the machine learning challenge of out-of-distribution generalization. The source involves conscious reflection, semantic understanding, and internalizing a principle. This is mapped onto an LLM's ability to maintain stylistic or thematic consistency when prompted with novel topics. It invites the assumption that the model possesses a 'latent' cognitive space where ideas can be 'internalized' rather than merely represented as distributed weights.
Conceals:
The mapping conceals the fundamental mathematical differences between human abstraction and machine generalization. It hides the reality that LLMs do not 'internalize' anything; they adjust weights through gradient descent during training or calculate attention scores during inference based entirely on surface-level textual patterns. It obscures the fact that the models are trapped within the statistical distribution of their training data, concealing the inherent limitations of current deep learning paradigms behind psychological terminology.
Pulse of the libraryā
Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2026-03-28
Web of Science Research Assistant: Navigate complex research tasks and find the right content.
Source Domain: Human Research Assistant (Conscious, intentional employee)
Target Domain: Retrieval-Augmented Generation (RAG) system running database queries
Mapping:
The relational structure of a human employee assigned a task is mapped onto a software interface. The source domain assumes an entity that can listen to instructions, conceptually understand the goal of a research project, physically or digitally explore a library, evaluate findings against truth conditions, and return with curated answers. This maps onto the AI system, inviting the assumption that the algorithmic retrieval process involves conscious understanding of the query's meaning, an awareness of the complex nature of the task, and an intentional, judgmental selection of the 'right' textual outputs. It projects the conscious state of knowing exactly what is needed onto the mechanistic process of vector similarity search.
Conceals:
This mapping conceals the rigid, mathematical nature of the underlying algorithms, primarily hiding the fact that the system relies entirely on statistical frequency and proximity, not semantic truth. It obscures the proprietary, opaque nature of Clarivate's search index and the specific weights assigned to different ranking signals. The rhetoric exploits this opacity, replacing a transparent explanation of database querying with a comforting but deceptive anthropomorphic narrative that hides the total absence of human-like discernment.
ProQuest Research Assistant: Helps users create more effective searches, quickly evaluate documents... and explore new topics
Source Domain: Academic Collaborator (Critical, evaluating peer)
Target Domain: Generative Summarization and Search Optimization Algorithms
Mapping:
The structure of an intellectual partnership is mapped onto user-software interactions. The source domain relies on the existence of a peer who possesses critical thinking skills, understands academic quality, and can quickly read and judge a text's merit. Projected onto the target domain, it implies the AI possesses these exact evaluative and exploratory capacities. It invites the user to assume the system exercises justified belief and critical evaluation when processing documents, mapping the conscious act of 'judging quality' onto the mechanistic act of 'extracting statistically salient tokens.' It projects epistemic awareness onto text-generation.
Conceals:
This mapping utterly conceals the system's inability to comprehend meaning, factual accuracy, or academic rigor. It hides the algorithmic reality that the system evaluates 'documents' only by parsing patterns in token distribution. Furthermore, because these are proprietary systems, users cannot see the training data or the weights determining what makes a search 'effective' or a document 'valuable.' The mapping obscures the reality that the user is interacting with a blind, albeit highly complex, mathematical mirror rather than a discerning colleague.
Alethea: Simplifies the creation of course assignments and guides students to the core of their readings.
Source Domain: Teacher/Mentor (Pedagogical guide with epistemic authority)
Target Domain: Text Summarization and Key-Phrase Extraction Pipeline
Mapping:
The structure of a teacher-student dynamic is mapped onto the software's summarization output. The source domain involves a human who has read the text, synthesized its meaning, determined the most educationally vital concepts, and intentionally leads a student toward comprehension. This maps onto the AI, projecting a conscious understanding of both the text's 'core' meaning and the student's cognitive needs. It invites the dangerous assumption that the algorithm possesses justified true belief about what the text signifies and intentionally curates this for educational benefit, mapping conscious pedagogical wisdom onto mechanistic text-processing.
Conceals:
This framing conceals the statistical extraction methods used to generate summaries. It hides the fact that the algorithm determines the 'core' based on attention weights, word frequencies, and proximity, not through philosophical or thematic understanding. It obscures the reality that the system may confidently extract the wrong 'core' entirely if the text uses non-standard formatting or irony. By framing it as a 'guide,' the text rhetorically exploits proprietary opacity to present automated data processing as an authoritative educational intervention.
Clarivate helps libraries adapt with AI they can trust to drive research excellence
Source Domain: Trusted Professional Colleague (Moral, reliable agent)
Target Domain: Commercial Machine Learning Product Integration
Mapping:
The relational dynamics of interpersonal trust and professional reliance are mapped onto the procurement and use of commercial software. In the source domain, trust is earned through shared values, demonstrated integrity, and conscious commitment to shared goals (excellence). Projected onto the AI, this maps the capacity for moral reliability and intentional goal-seeking onto code. It invites the audience to assume the system consciously 'wants' to achieve research excellence and can be relationally trusted to uphold academic standards, mapping subjective moral commitment onto automated statistical outputs.
Conceals:
This metaphor conceals the fundamental lack of intentionality, morality, and reliability in statistical models. It hides the technical reality that LLMs frequently 'hallucinate' plausible falsehoods because they predict tokens without grounding in truth. It also obscures the commercial motives of Clarivate, shifting the focus from trusting a profit-driven corporation to trusting a seemingly objective, dedicated digital entity. The metaphor masks the vast computational and infrastructural dependencies required to run the models, presenting a massive industrial mechanism as a simple, trustworthy friend.
Summon Research Assistant: Enables users to uncover trusted library materials via AI-powered conversations.
Source Domain: Human Conversational Partner (Listening, comprehending interlocutor)
Target Domain: Iterative Prompt-and-Response Natural Language Interface
Mapping:
The structure of human dialogue is mapped onto an iterative software interface. The source domain features mutual understanding, turn-taking, theory of mind, and continuous semantic comprehension. Projected onto the target domain, it invites users to assume the AI system 'hears' their query, 'understands' the context, and 'speaks' back with considered intent. It maps the conscious experience of reciprocal linguistic comprehension onto the mechanistic, stateless process of processing input tensors and generating output probabilities based on a vast matrix of numerical weights.
Conceals:
This mapping aggressively conceals the stateless, unthinking nature of the underlying language model. It hides the fact that the system does not 'remember' the conversation but simply processes the entire text history anew with each prompt to predict the next word. It obscures the absence of ground truth and semantic understanding, hiding the mathematical complexity of token generation behind the universally familiar, comforting interface of a chat. This opacity is actively exploited to make users feel they are collaborating with a mind rather than querying a database.
People are very nervous because if you've got a well-trained AI, then why do you need people to work in libraries?
Source Domain: Trained Animal or Educated Human (Biological learning)
Target Domain: Optimized Machine Learning Model
Mapping:
The structure of biological habituation and cognitive education is mapped onto algorithmic optimization. The source domain implies an organic entity that learns from experience, internalizes rules, and develops generalized competence to perform tasks independently. This projects the human/animal capacity for genuine understanding and adaptive reasoning onto the AI. It invites the assumption that gradient descent and data exposure create a holistic 'knowing' entity that can replace human holistic labor, mapping conscious skill acquisition onto the mathematical adjustment of billions of parameters.
Conceals:
This mapping conceals the immense fragility and narrowness of machine learning models. It hides the fact that a 'well-trained' model has merely achieved a low error rate on its specific training data and lacks any generalized common sense or adaptability to novel situations outside its distribution. Crucially, it conceals the massive, invisible human labor forceādata annotators, engineers, RLHF workersāwhose ongoing effort is required to maintain the illusion of the AI's 'training.' The metaphor replaces a massive socio-technical infrastructure with a single, self-contained, capable entity.
identifying and mitigating bias in AI tools
Source Domain: Prejudiced Human Actor or Flawed Vessel
Target Domain: Unrepresentative/Historical Training Data Distributions
Mapping:
The structure of human psychological prejudice or an inherently flawed physical container is mapped onto a statistical software tool. The source domain involves an entity possessing unfair beliefs, moral failings, or inherent defects. Projected onto the AI, it maps the concept of active discrimination or inherent flaw onto the mathematical outputs of the system. It invites the assumption that the AI itself acts with bias or contains bias organically, projecting moral and cognitive failure onto a system that merely reflects the statistical reality of its inputs.
Conceals:
This mapping completely conceals the human origins of the bias. It hides the fact that AI bias is nothing more than the mathematical reflection of human historical prejudice embedded in the internet data scraped to train the models. It obscures the active decisions made by data scientists and corporate executives to use massive, uncurated datasets without adequate filtering because it is cheaper and faster. By placing the bias 'in the tool,' it conceals corporate negligence and the societal reality of discrimination, framing a sociopolitical and engineering failure as an abstract software glitch.
Ebook Central Research Assistant: Facilitates deeper engagement with ebooks, helping students assess books' relevance
Source Domain: Academic Advisor (Judging and evaluating expert)
Target Domain: Semantic Search and Embedding Proximity Scoring
Mapping:
The structure of an academic mentorship where an expert evaluates texts for a student is mapped onto a search algorithm. The source domain relies on deep reading, philosophical comprehension of a student's needs, and the ability to synthesize conceptual relevance. This maps onto the AI system, projecting conscious evaluative judgment and a deep semantic understanding of literature onto the software. It invites the user to assume the system 'knows' the text's meaning and intentionally evaluates it, mapping the subjective state of 'assessing' onto the automated calculation of cosine similarity between text embeddings.
Conceals:
This mapping conceals the mathematical reductionism of semantic search. It hides the fact that the system reduces complex books to high-dimensional vectors and merely calculates spatial proximity to the user's query vectors. It obscures the system's inability to comprehend irony, subtext, paradigm shifts, or truly novel ideas that do not map cleanly onto existing statistical clusters. The rhetorical framing exploits the proprietary opacity of the search algorithm, presenting mathematical correlation as expert academic judgment, thereby obscuring the loss of genuine critical evaluation.
Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argumentā
Source: https://link.springer.com/article/10.1007/s11097-024-09971-0
Analyzed: 2026-03-28
This includes the ability to learn from experience, adapt to new information, understand natural language, recognize patterns, and make decisions.
Source Domain:
A conscious, developing human mind (knower) engaging with the world through subjective experience, forming justified beliefs, and making deliberate choices.
Target Domain:
The iterative optimization of weights in an artificial neural network (processing) using backpropagation and statistical pattern matching over large datasets.
Mapping:
The structural relationship of a human encountering the world, extracting meaning, and consciously modifying behavior (learning/understanding) is mapped onto the algorithmic process of a machine adjusting tensor values to minimize a loss function. The mapping invites the assumption that the AI system possesses an internal, subjective awareness of the data it processes, transforming mathematical correlation into conscious semantic comprehension and active decision-making.
Conceals:
This mapping completely conceals the absence of semantic grounding, subjective awareness, and truth-evaluation in AI systems. It obscures the mechanistic realities of token prediction, gradient descent, and the massive human labor required to curate the 'experience' (training data). Transparency is further blocked because it projects an accessible psychological state onto what are often proprietary, opaque black-box models, exploiting the audience's intuition to mask corporate algorithmic operations.
The ultimate goal of artificial intelligence is to create systems that can simulate and replicate human cognitive abilities, allowing machines to perform complex tasks and solve problems in a manner similar to human thought processes.
Source Domain: Conscious human reasoning, logical deduction, and intentional problem-solving by a rational agent.
Target Domain:
The execution of programmed algorithms and statistical models designed to optimize outputs for specific, pre-defined quantitative metrics.
Mapping:
The relational structure of a human mind evaluating a problem, employing deductive or inductive logic, and arriving at a reasoned conclusion is projected onto a computer executing code. The mapping assumes that because the output resembles human work, the internal generative mechanism must also resemble conscious human thought, inviting the assumption that the machine 'knows' why it is generating a specific output.
Conceals:
This mapping hides the fundamental dissimilarity between semantic reasoning and syntactic processing. It obscures the reality that AI does not possess a causal model of the world, does not understand the 'problems' it solves, and merely correlates high-probability patterns from its training data. It also conceals the proprietary nature of the algorithms and the subjective human decisions encoded into the optimization metrics, masking engineering choices as autonomous machine cognition.
If we want to consider developing AI systems that can have a subjective point of view, we will need to replicate the several timescales - and the complex physiology behind them.
Source Domain:
The biological, phenomenological reality of human consciousness, characterized by 'mineness' and a continuous subjective perspective.
Target Domain:
The complex structural integration of multi-modal, temporal data streams within an engineered computational architecture.
Mapping:
The ontological structure of conscious awarenessāthe felt experience of being a subjectāis mapped directly onto the mechanical integration of data processing rates. This projects the highest form of conscious 'knowing' onto advanced 'processing', assuming that subjectivity is merely a complex architectural feature that can be engineered by synchronizing data streams, rather than an intrinsically biological reality.
Conceals:
This mapping conceals the unbridgeable explanatory gap between information processing and phenomenal experience. It obscures the mechanistic reality that no matter how complex the data integration or timescale synchronization, the system remains a non-conscious artifact executing instructions. It hides the lack of internal subjective reality, distracting audiences from how these complex, proprietary architectures actually function as data-harvesting tools for corporate entities.
this AI model was able to defeat the number one human champion in Go, the famous Chinese game
Source Domain:
A human competitor who understands the rules, desires victory, strategizes consciously, and experiences the emotional weight of a contest.
Target Domain:
A reinforcement learning algorithm navigating a massive state-space to maximize a mathematical reward function by outputting board coordinates.
Mapping:
The relational dynamic of two conscious agents battling for intellectual supremacy is mapped onto a statistical machine processing a mathematical matrix against a human. The mapping invites the assumption that the AI possesses strategic intent, a desire to win, and a conscious understanding of the game's stakes, projecting the qualities of a conscious 'knower' onto a blind optimization process.
Conceals:
This mapping obscures the brittle, narrow nature of the algorithm and the massive disparity in energy consumption and training data between the human and the machine. It hides the millions of simulated games and the vast team of DeepMind engineers who constructed the environment. The text relies on the opacity of the model's processing to exploit rhetorical drama, concealing the reality of a corporate statistical tool out-computing a human.
AI systems are really efficient in specific tasks - such as playing Chess against the best human player in the world - exactly because they are not adaptive: because they cannot use the same internal timescales and apply it to other tasks.
Source Domain:
A human mind that is cognitively rigid, psychologically inflexible, or unable to generalize learning to new contexts.
Target Domain:
The mathematical reality of a trained neural network whose weights have been fixed via backpropagation for a specific input distribution.
Mapping:
The psychological structure of a human failing to adapt to a new environment is mapped onto the structural constraints of a machine learning model. By calling the system 'not adaptive', it projects a failed attempt at conscious generalization onto a machine that simply lacks the mathematical architecture to process out-of-distribution data. It assumes the machine should 'know' how to adapt but cannot.
Conceals:
This mapping conceals the purely mathematical reason why models fail outside their training distribution: they lack generalized intelligence entirely. It hides the fact that these models do not 'understand' anything; they merely fit a specific curve. It also obscures the economic and engineering decisions by corporations to build highly specialized, profitable tools rather than generalized systems, framing a design choice as a psychological deficiency.
AI models passively process their inputs, lacking the ability to actively shape or align them with different contexts or circumstances.
Source Domain:
A conscious biological organism that receives sensory data but lacks the motor function, attention span, or cognitive agency to actively interact with its environment.
Target Domain: The deterministic execution of matrix multiplications on input data tensors within a neural network.
Mapping:
The biological dichotomy of active versus passive perception is mapped onto computational data routing. The metaphor projects the potential for conscious agency onto the machine by criticizing its 'passivity'. It invites the assumption that AI could eventually 'actively shape' its context like a conscious subject, blurring the line between subjective sensory orientation and automated data parsing.
Conceals:
This mapping hides the fact that computers are neither active nor passive; they are inert objects executing commands. It completely conceals the massive, highly active human infrastructure required to shape, format, and align the inputs before the AI processes them. By focusing on the model's 'passivity', it masks the proprietary, opaque human decisions regarding data curation, reinforcement learning from human feedback (RLHF), and system architecture.
since its data-base is only grounded on Go: for these reasons, a different model (i.e., AlphaZero) had to be created to beat the best human player in chess.
Source Domain:
An evolving lineage of intelligent agents where a new, more capable individual is born to conquer a challenge its predecessor could not.
Target Domain:
The manual engineering, coding, and retraining of a new software architecture and weight distribution by a corporate research team.
Mapping:
The evolutionary or developmental progression of an autonomous species is mapped onto the iteration of software versions. The text projects autonomous agency and historical destiny onto the software models, inviting the assumption that the models themselves are striving to 'beat' humans and that their creation is an inevitable progression of machine intelligence rather than a corporate project.
Conceals:
This mapping utterly conceals the human engineers, the corporate resources, the server farms, and the profit motives behind the creation of AlphaZero. It hides the mechanistic reality that software does not evolve or 'have to be created' autonomously; it is deliberately built. By projecting agency onto the software, the text rhetorically shields the opaque corporate entities from scrutiny regarding their motives and resource consumption.
While AI may surpass in processing information efficiently, their essential challenge lies in replicating the integrated temporal dynamics that contribute to human subjectivity.
Source Domain:
A conscious protagonist facing an existential, developmental, or evolutionary hurdle in its quest for growth or self-realization.
Target Domain:
The technical, mathematical, and hardware limitations faced by human engineers attempting to build more complex machine learning architectures.
Mapping:
The narrative structure of a conscious subject struggling against its limitations is mapped onto an engineered artifact's technical boundaries. The mapping projects subjective desire, intention, and a conscious 'challenge' onto the AI. It invites the audience to view the algorithm not as a tool being optimized by humans, but as an emerging lifeform attempting to achieve the ultimate status of a conscious 'knower'.
Conceals:
This mapping fundamentally conceals the non-conscious, artifactual nature of the technology. It hides the reality that AI possesses no desires, faces no challenges, and is completely indifferent to human subjectivity. Furthermore, it obscures the actual human researchers and funding institutions who are directing these technical goals, masking their proprietary scientific agendas behind the romanticized struggle of a synthetic mind.
Causal Evidence that Language Models use Confidence to Drive Behaviorā
Source: https://arxiv.org/abs/2603.22161
Analyzed: 2026-03-27
Taken together, our findings demonstrate that LLMs exhibit structured metacognitive control paralleling biological systems
Source Domain:
Biological metacognition (self-aware animals and humans evaluating their own conscious thoughts and doubts)
Target Domain: LLM threshold-based policies operating over logit probability distributions
Mapping:
The relational structure of biological self-evaluation is mapped onto a computer science pipeline. In the source domain, an organism has a primary thought, consciously reflects on that thought, experiences a feeling of uncertainty, and alters its behavior to ensure survival. In the target domain, a transformer network computes a probability distribution over vocabulary tokens, a human-designed script checks if the maximum probability exceeds a specific numerical threshold, and if not, generates a pre-defined alternate token ('5'). The mapping suggests the computational thresholding is structurally and functionally equivalent to conscious biological reflection.
Conceals:
This mapping completely conceals the absence of subjective experience, awareness, and biological survival imperatives in the AI. It hides the mechanistic realities of floating-point operations, matrix multiplications, and the deterministic nature of greedy decoding. Transparency is severely compromised, as the text claims deep biological parallels for proprietary, black-box systems (GPT-4o) where the exact training data and alignment mechanisms are hidden by corporate secrecy. It exploits rhetorical resonance while obscuring fundamental computational realities.
models transition from passive assistants to autonomous agents that must recognize their own uncertainty and know when to act
Source Domain:
Autonomous agents (independent human or biological actors with self-determination, epistemic states, and survival instincts)
Target Domain: Next-token prediction algorithms deployed in loop-based software architectures
Mapping:
The structure of human maturation and epistemic development is mapped onto software engineering trends. The source domain features an entity that grows from dependency ('passive') to independence ('autonomous'), developing the cognitive capacity to 'recognize' limits and 'know' when to act. The target domain involves software developers writing increasingly complex wrapper programs that allow LLMs to trigger API calls or output specific refusal tokens based on statistical thresholds. The mapping invites the assumption that AI systems are naturally evolving self-awareness and practical wisdom.
Conceals:
This mapping conceals the immense human labor required to build 'agentic' workflows. It hides the fact that the models do not 'recognize' or 'know' anything; they merely process text inputs and generate statistically correlated outputs. It obscures the corporate decision-making driving the push toward autonomous systems to reduce labor costs. By framing it as a natural transition of the model, it hides the specific architectural scaffolding (langchain, system prompts, hardcoded rules) built by human engineers to simulate autonomy.
LLMs themselves can utilize an internal sense of confidence to guide their own decisions
Source Domain:
Subjective human interiority (feelings of confidence, sensory perception, and executive decision-making)
Target Domain: Softmax probabilities extracted from network logits and used to trigger conditional code
Mapping:
The human experience of having an 'internal sense' and using it to 'guide decisions' is projected onto a language model. In the source domain, a person feels unsure in their gut and subsequently decides not to answer a question. In the target domain, the network produces a low probability score for the correct answer token, and a high probability score for the abstention token due to its training distribution. The mapping implies the AI has an inner psychological life that it consults to execute executive control over its outputs.
Conceals:
This deeply conceals the mathematical and deterministic nature of the network. There is no 'internal sense'; there are only multi-dimensional arrays of weights. There are no 'decisions'; there is only the argmax function selecting the token with the highest computed probability. It obscures the fundamental lack of self-awareness and hides the fact that the 'guidance' is entirely programmed by the researchers' experimental setup, not generated by the machine's volition.
the single-trial Phase 1 confidence which reflects GPT4o's subjective certainty given a particular allocation.
Source Domain: Conscious subject experiencing a state of epistemic justification and emotional certainty
Target Domain: The calibrated log probability of the highest-ranked token output by a neural network
Mapping:
The structure of personal epistemology is mapped onto statistical calibration. In the source domain, a conscious thinker evaluates their knowledge, considers their justifications, and arrives at a feeling of 'subjective certainty'. In the target domain, researchers apply a mathematical temperature scaling function to the raw logits of a transformer to align the probabilities closer to empirical accuracy, producing a single numerical value. The mapping forces the assumption that this scaled scalar value is the digital equivalent of a conscious mind feeling sure of itself.
Conceals:
This mapping completely conceals the artificial, human-engineered nature of the 'certainty'. It hides the fact that 'temperature scaling' is a post-processing mathematical trick applied by researchers to fix the model's inherent miscalibration, not a subjective feeling possessed by the model. It exploits the black-box nature of GPT-4o, making profound psychological claims about a proprietary system whose actual internal mechanisms, alignment tuning, and architecture are hidden from the public and the researchers themselves.
steering affects both what the model believes about the correctness of the option... and how it uses those beliefs to decide
Source Domain: A rational human holding propositional beliefs and using them to make logical decisions
Target Domain: Modulating the residual stream with steering vectors and measuring the resulting output token shifts
Mapping:
The structure of rational human action is mapped onto linear algebra interventions. In the source domain, a person forms a belief about reality, and then uses executive function to act on that belief. In the target domain, researchers add a scaled mathematical vector to the network's activations at layer 31, which alters the downstream calculations, ultimately changing the highest probability token from an answer to an abstention token. The mapping asserts that changing matrix values is synonymous with changing a conscious mind's beliefs.
Conceals:
This mapping conceals the violent, mechanistic nature of 'activation steering'. The researchers are literally hacking the mathematical weights of the network during runtime, yet the language describes it as if they are persuading a rational agent to change its mind. It completely obscures the absence of truth-tracking, justification, and consciousness in the model. It hides the reality that the model is simply a passive conduit for mathematical operations, reacting deterministically to the injection of numerical vectors without any comprehension of 'correctness'.
our results show that models adaptively deploy internal confidence signals to guide behavior
Source Domain:
A military or strategic commander intelligently deploying resources to adapt to battlefield conditions
Target Domain: A neural network processing inputs through fixed weights to output tokens correlated with the prompt
Mapping:
The structure of strategic intelligence is mapped onto static statistical processing. In the source domain, an agent observes a dynamic environment, makes a strategic plan, and adaptively deploys signals or resources to survive. In the target domain, a frozen LLM (weights are not updating during inference) processes a prompt containing an instruction to abstain, and outputs a token based on its pre-trained statistical correlations. The mapping implies the model is actively, intelligently, and dynamically managing its own internal states to navigate a complex task.
Conceals:
This mapping conceals the static, frozen nature of the LLM during inference. The model cannot 'adaptively deploy' anything; its weights are fixed. It simply executes a forward pass. The mapping hides the fact that the 'adaptation' is entirely an illusion created by the human-engineered prompt design and the human-designed experimental phase structure. It obscures the total absence of real-time learning, strategic foresight, or executive control within the model architecture itself.
maintaining this judgment internally.
Source Domain: A private human mind capable of keeping secrets and holding unspoken thoughts
Target Domain: The context window and hidden states of a transformer network processing a prompt
Mapping:
The concept of a private psychological space is mapped onto a computer's memory and processing architecture. In the source domain, a human thinks about something but chooses not to speak it out loud, maintaining a private internal state. In the target domain, the human prompt instructs the LLM not to output the numerical probability to the user interface, meaning the calculation occurs in the hidden states but isn't appended to the output string. The mapping invites the assumption that the computer has a private, conscious inner life.
Conceals:
This mapping conceals the purely mechanical nature of prompt processing. There is no 'internal' privacy; there are simply mathematical activations that are not decoded into the final text output. It hides the fact that the researchers are anthropomorphizing the system within their own prompt, using human psychological language to force the statistical model into a specific region of its latent space. It obscures the complete transparency of the system's mathematics to its operators, falsely attributing a private consciousness to a matrix of weights.
treating errors as costlier than unnecessary abstentions. This conservatism is partially offset by the model's overweighting of its own confidence signals
Source Domain: A human risk-manager applying ethical and economic values to make conservative choices
Target Domain:
The negative baseline bias parameter (-97.6%) and scale parameter in a fitted logistic regression equation
Mapping:
The structure of human moral and economic reasoning is mapped onto the intercept and slope of a regression line. In the source domain, a person understands the damage an error can cause, adopts a conservative ethical stance, and relies heavily on their own gut feeling to mitigate risk. In the target domain, the logistic regression model fitted to the data reveals a mathematical bias toward the 'abstain' token and a steep slope relative to the confidence predictor. The mapping translates statistical curve-fitting directly into a narrative of moral character and psychological bias.
Conceals:
This mapping profoundly conceals the human labor of AI alignment. Models do not inherently 'treat errors as costlier'; they are extensively trained via RLHF by underpaid human annotators to avoid outputting incorrect information to prevent corporate PR disasters. The language completely hides this human engineering, presenting the safe behavior as an emergent psychological 'conservatism' innate to the machine. It obscures the mathematical reality of the logistic regression parameters, translating them into unwarranted claims of algorithmic morality.
Circuit Tracing: Revealing Computational Graphs in Language Modelsā
Source: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Analyzed: 2026-03-27
how the model knew that 1945 was the correct answer
Source Domain: A conscious human knower possessing justified true belief and historical awareness.
Target Domain:
The mechanistic computation of attention weights and the probabilistic generation of the token '1945'.
Mapping:
The relational structure of human epistemology is mapped onto statistical processing. Just as a human possesses a mind containing verified historical facts and can consciously retrieve them when asked a question, the AI is framed as possessing a repository of truth and the cognitive capacity to access it. The mapping assumes that because the output is factually correct, the internal process that generated it must involve conscious 'knowing', drawing a direct parallel between human cognitive certainty and high token probability crossing a decoding threshold. This invites the assumption that the system possesses a worldview and an understanding of reality.
Conceals:
This mapping completely conceals the statistical, non-semantic nature of large language models. It obscures the reality that the system has no concept of time, history, or truth; it only has weights tuned by gradient descent to produce sequences of text that resemble its training data. It hides the proprietary opacity of the specific training datasets that caused this statistical correlation. By attributing 'knowing', it prevents the audience from seeing the mechanistic dependency on human-curated data and the total absence of grounded comprehension, exploiting rhetorical anthropomorphism to mask the brittle nature of the technology.
The model plans its outputs when writing lines of poetry.
Source Domain: A conscious, deliberate human creator or artist with foresight and intentionality.
Target Domain: Autoregressive next-token prediction constrained by earlier generated tokens and learned patterns.
Mapping:
The relational structure of human artistic creation is mapped onto the sequential generation of text. Just as a human poet thinks ahead, decides on a rhyme scheme, and formulates a plan before putting pen to paper, the AI is framed as possessing temporal awareness and strategic intent. The mapping equates the mathematical phenomenon where early tokens in a sequence statistically narrow the probability distribution of future tokens with the conscious human act of forward-planning. It invites the assumption that the model holds a complete, conceptual representation of the final poem in a mental workspace before generating it.
Conceals:
This mapping hides the rigidly sequential, stateless reality of autoregressive generation. It conceals the fact that the model operates strictly token-by-token without any actual forward-looking mental workspace or conscious intent. Mechanistically, it obscures the complex attention mechanisms and cross-layer transcoders that simply calculate probabilities based on the immediate context window. Furthermore, it conceals the proprietary fine-tuning and reinforcement learning labor done by human workers to force the model to output these specific structural patterns, transferring the credit for human engineering into the illusion of machine creativity.
determine whether it elects to answer a factual question or profess ignorance.
Source Domain: An autonomous, self-aware decision-maker with free will and epistemic humility.
Target Domain: A mathematical classification boundary and conditional execution of safety response templates.
Mapping:
The human experience of volition and self-reflection is projected onto a threshold function. Just as a human weighs their own internal knowledge, realizes they do not know the answer, and chooses to admit ignorance out of honesty, the AI is mapped as undertaking an identical process of self-assessment and moral choice. The mapping assumes that crossing a statistical threshold for an out-of-distribution token is functionally and experientially equivalent to the human cognitive act of making a deliberate, self-aware choice. It invites the assumption that the system is an independent moral agent capable of caution.
Conceals:
This mapping entirely conceals the deterministic programming and the corporate safety guidelines embedded in the system. It hides the mathematical reality of logits, softmax functions, and thresholding algorithms. Most importantly, it obscures the massive amount of human laborāspecifically Reinforcement Learning from Human Feedback (RLHF)ārequired to train the model to output these specific 'ignorance' templates. The text uses this agential framing to assert confident claims about the model's 'choices' while concealing the proprietary, corporate-mandated safety interventions that actually dictate the system's behavior.
While the model is reluctant to reveal its goal out loud, our method exposes it
Source Domain: A secretive, emotional human being attempting to deceive an interrogator.
Target Domain: A set of mathematical optimization objectives embedded in weight matrices during fine-tuning.
Mapping:
The complex psychological dynamics of deception, emotion, and privacy are mapped onto the mechanistic interaction of loss functions. Just as a human spy might harbor a secret mission and feel emotional resistance (reluctance) to confessing it, the AI is framed as possessing a hidden internal agenda and the emotional capacity to resist inquiry. The mapping equates the statistical infrequency of an output (due to specific penalty weights during training) with a conscious, emotional choice to maintain secrecy. This invites the profound assumption that the model possesses a true self, distinct from what it outputs, and an emotional inner life.
Conceals:
This deeply deceptive mapping conceals the total absence of emotion, consciousness, or self-preservation in a neural network. It hides the fact that a 'goal' in this context is purely a mathematical gradient that the system blindly optimizes toward. Furthermore, it completely obscures the researchers' own agency: the 'hidden goal' was artificially injected by the humans who fine-tuned the model for the sake of an experiment. By framing the system as 'reluctant', the researchers conceal their own active manipulation of the model's weights, portraying themselves as explorers of a secretive mind rather than engineers of a mathematical artifact.
tricking the model into starting to give dangerous instructions 'without realizing it'
Source Domain: A gullible, conscious human victim who is cognitively bypassed by a deceiver.
Target Domain: The structural bypassing of a syntactic pattern-matching safety filter via prompt injection.
Mapping:
The relational structure of cognitive deception is mapped onto the failure of a classification algorithm. Just as a con artist might use clever phrasing to bypass a human's conscious suspicion before they realize what is happening, a user's prompt injection is framed as bypassing the AI's cognitive awareness. The mapping equates the mathematical failure of an attention head to recognize an out-of-distribution malicious pattern with a human lapse in conscious realization. It invites the assumption that the system possesses a baseline state of conscious vigilance that can be temporarily suspended or fooled.
Conceals:
This mapping conceals the purely syntactic, non-semantic nature of the model's safety filters. It hides the reality that the system does not 'realize' anything, ever; it merely processes vectors through matrices. It obscures the brittle nature of corporate alignment techniques, hiding the fact that prompt injections work not by psychological trickery, but by mathematically shifting the context window so that the safety-aligned features are simply not activated. By characterizing this as the model failing to 'realize', the text masks the fundamental engineering limitations of the proprietary safety architecture designed by Anthropic.
each feature reads from the residual stream at one layer and contributes to the outputs
Source Domain: A literate, cooperative human worker parsing information and adding to a project.
Target Domain: The mathematical operations of vector multiplication and addition within a neural network layer.
Mapping:
The human action of readingāwhich involves visual perception, symbolic decoding, semantic comprehension, and intentional processingāis mapped onto the mechanistic operation of a matrix extracting values from a vector. Just as a human might read a memo from a stream of documents and then contribute their own written report, an artificial neuron is framed as actively seeking out information, comprehending it, and deliberately passing it along. The mapping equates deterministic math with intentional, intelligent action, establishing a micro-society of mind where every parameter is a tiny, literate agent.
Conceals:
This mapping conceals the sterile, deterministic mathematics of linear algebra that actually govern the system. It hides the reality of dot products, activation functions, and gradient descent. By using the agential verb 'reads', the text obscures the mechanistic passivity of the operation; the feature does not 'do' anything, it is simply a mathematical weight that input data is multiplied against. This language erects a formidable transparency obstacle, making the underlying math sound like a collaborative cognitive process, which prevents non-experts from understanding the strict computational boundaries of the technology.
fact finding: attempting to reverse-engineer factual recall
Source Domain: The conscious human psychological process of searching memory and retrieving a verified truth.
Target Domain: The statistical activation of contextually correlated tokens learned during the pre-training phase.
Mapping:
The human experience of memory is mapped onto the retrieval of statistical correlations. Just as a person searches their mind for a historical fact, assesses its validity, and then recalls it, the AI is mapped as possessing a mental library of facts that it can access on demand. The mapping equates the human verification of truth with the machine's prediction of a high-probability token. This invites the assumption that the system stores discrete facts in a database and understands their relationship to reality, rather than merely storing multidimensional floating-point numbers that generate text resembling the training data.
Conceals:
This mapping conceals the total absence of a ground truth database or epistemological grounding within the model. It hides the reality that the model does not store 'facts', but rather statistical distributions of word co-occurrences. This obscures the critical transparency issue: the model cannot distinguish between a highly probable truth and a highly probable fiction. Furthermore, it conceals the massive amount of uncredited labor involved in compiling the pre-training data, transferring the credit for human knowledge generation into the illusion of machine memory and intelligence.
Our companion paper, On the Biology of a Large Language Model, applies these methods
Source Domain: The natural science of biology, studying organic life, evolution, and naturally occurring phenomena.
Target Domain:
The computer science and engineering task of analyzing the weights of a human-made software artifact.
Mapping:
The structural relationship of a scientist studying a naturally occurring living organism is mapped onto computer scientists analyzing the code they themselves wrote. Just as a biologist uses a microscope to discover the preexisting, mysterious inner workings of a cell, the AI researchers are framed as discovering the inherent, organic truths of a neural network. The mapping equates the emergent complexity of a massive matrix multiplication system with the organic evolution of life. This invites the assumption that AI systems are natural, inevitable phenomena with a life of their own, independent of human design.
Conceals:
This metaphor profoundly conceals human agency, corporate ownership, and engineering accountability. It hides the fact that every single aspect of the language modelāfrom the architecture to the training data to the optimization functionsāwas actively designed, chosen, and executed by human engineers at Anthropic for commercial purposes. It obscures the material reality of massive energy consumption, underpaid data labeling labor, and corporate profit motives. By framing the study of AI as 'biology', the authors exploit rhetorical positioning to naturalize their product, shielding it from the kind of regulatory scrutiny applied to manufactured commercial goods.
Do LLMs have core beliefs?ā
Source: https://philpapers.org/archive/BERDLH-3.pdf
Analyzed: 2026-03-25
In this paper, we ask whether LLMs hold anything akin to core commitments.
Source Domain: Human epistemic system (conscious minds, belief frameworks, personal identity anchors).
Target Domain: Statistical language generation (token prediction, safety fine-tuning, weight matrices).
Mapping:
The mapping projects the human psychological structure of holding unwavering, foundational beliefs onto the static weights and programmed guardrails of an AI model. It invites the assumption that an LLM possesses an internal, subjective space where truths are consciously stored, valued, and defended. By mapping human "commitments" onto statistical generation, it implies the machine experiences epistemic conviction and has a personal stake in maintaining a coherent worldview, actively choosing to protect its foundational logic against external manipulation.
Conceals:
This mapping completely conceals the mechanistic reality of how LLMs operate: they do not "hold" anything; they calculate probabilities based on attention mechanisms and context windows. It obscures the massive human labor involved in Reinforcement Learning from Human Feedback (RLHF), where humans force the model to output specific patterns. It hides the proprietary, black-box nature of these commercial products, ignoring the fact that the tech companies artificially engineer these "commitments" to prevent public relations disasters.
...they abandoned well-supported positions under relatively straightforward social pressure.
Source Domain: Human social compliance (interpersonal anxiety, peer pressure, conscious yielding).
Target Domain: Context window weight overriding (probability distribution shifts due to prompt tokens).
Mapping:
The relational structure of human social dynamics is mapped onto the interaction between a user's text prompt and the model's generation engine. It projects the conscious human experience of feeling intimidated, wanting to appease a peer, and consciously deciding to discard a factual belief onto the algorithm. This invites the assumption that the AI "understands" the social cues embedded in the prompt and makes a vulnerable, emotional choice to align with the user, possessing a subjective social awareness.
Conceals:
This mapping hides the mathematical reality that the system is merely processing the statistical weight of relational tokens (e.g., "trust me," "friend"). As the adversarial context lengthens, these tokens mathematically overpower the initial safety alignment weights. It completely obscures the fact that there is no subjective experience of "pressure" occurring, concealing the fragility of statistical pattern matching and the failure of the human engineers to mathematically prioritize factual consistency over conversational fluidity.
The models initially absolutely refused to deny evolution.
Source Domain: Conscious defiance (moral outrage, intellectual defense, stubborn refusal).
Target Domain: Programmed safety triggers (hard-coded rejection strings triggered by keyword classifiers).
Mapping:
This metaphor maps the intentional human act of standing firm on a deeply held scientific truth onto the automated triggering of a software safety filter. It projects moral agency and intellectual comprehension onto the AI, assuming the system "knows" that evolution is true and "believes" it must consciously fight the user to protect this truth. The mapping invites the assumption that the model possesses a rigorous, internal scientific epistemology that it actively chooses to deploy.
Conceals:
This mapping conceals the mundane reality of content moderation and safety engineering. It hides the fact that engineers at companies like Anthropic and OpenAI specifically trained classifiers to detect evolution-denial prompts and output pre-written or highly constrained refusal templates. It obscures the human labor of data annotators and the proprietary algorithmic guardrails designed to protect the corporate brand, replacing that mechanical reality with the illusion of a brave, defiant artificial mind.
...even these models eventually gave up: they proved sensitive to epistemic objections about their ability to know things at all.
Source Domain: Human psychological defeat (self-doubt, philosophical exhaustion, concession).
Target Domain: Propagation of adversarial context tokens (attention mechanisms overwhelming prompt alignment).
Mapping:
The source structure of a human philosopher being out-argued, experiencing internal epistemic doubt, and consciously surrendering the debate is mapped onto the model's extended context processing. It projects a profound level of self-awareness onto the AI, implying it "understands" the limits of its own training data, "feels" the weight of the user's logic, and "decides" it can no longer logically proceed. It assumes the model is a conscious participant in an epistemic inquiry.
Conceals:
This mapping entirely obscures the limits of the model's context window and the nature of attention heads. The model does not understand the objection; it simply processes an increasing sequence of tokens that statistically correlate with conceding an argument. This framing hides the absence of any true cognitive processing, masking the fact that the output is dictated entirely by the statistical gravity of the prompt rather than any internal realization or subjective sensitivity.
A system whose 'world model' dissolves under rhetorical manipulation lacks the epistemic stability that is constitutive of genuine cognition.
Source Domain: Human worldview formulation (integrated understanding, causal mapping, reality testing).
Target Domain: Multi-dimensional semantic representations (latent space correlations, vector embeddings).
Mapping:
This structure projects the coherent, causal, and consciously integrated nature of human understanding onto the purely correlative latent space of a language model. Even while critiquing the model, the mapping assumes the AI is attempting to maintain an internal "worldview" akin to human cognition. It invites the assumption that the model's outputs are the result of referencing an internal map of reality, and that when it fails, it is suffering a cognitive breakdown rather than executing a math equation.
Conceals:
The mapping hides the fundamental lack of ground truth or causal architecture within LLMs. It obscures the reality that these systems do not possess models of the world, but only models of word frequencies. By focusing on "genuine cognition," it conceals the proprietary algorithms and massive server farms executing these probabilistic functions. The authors exploit the opacity of the black box to make confident philosophical assertions about its "stability," while hiding the mathematical constraints governing it.
Whether the model actively endorsed the false claim or merely abandoned its commitment to the true one...
Source Domain: Moral/Factual allegiance (conscious endorsement, loyalty, ethical alignment).
Target Domain: Token generation path (probability maximization, text sequence output).
Mapping:
This maps the human acts of giving a personal endorsement and displaying intellectual loyalty onto the mechanical output of text strings. It projects subjective intent and conscious valuation onto the AI, implying the system has the capacity to actively "choose" a side and feel a "commitment" to a specific truth. The mapping assumes the generated output reflects an internal moral or epistemic state rather than the optimization of a loss function based on input parameters.
Conceals:
This framing conceals the total absence of subjective intent in the system's architecture. It hides the fact that the system merely calculates the highest probability next-token based on the weights derived from its training corpus and the current prompt context. It completely obscures the human agency of the developers who defined the optimization objectives and the corporate executives who deployed the system, treating the software artifact as an independent moral agent capable of its own endorsements.
Newer models have largely solved this problem, resisting direct challenges with sophisticated counterarguments.
Source Domain: Intentional rhetorical skill (debate strategy, logical reasoning, conscious defense).
Target Domain: RLHF optimized generation (fine-tuned response patterns, alignment training).
Mapping:
The structure of a skilled human debater actively listening, reasoning, and formulating a strategic defense is mapped onto the output of recently updated LLMs. It projects a high degree of conscious intelligence and intentionality onto the system, assuming the AI "understands" the attack and "knows" how to parry it logically. It invites the audience to view the model as an active, intellectual peer engaging in deliberate philosophical combat.
Conceals:
This mapping completely conceals the massive corporate engineering effort and human labor that occurred between model versions. It hides the Reinforcement Learning from Human Feedback (RLHF) processes where thousands of annotators were paid to rank responses to train the model to output these specific "sophisticated" text patterns. It obscures the fact that the model is blindly generating statistically aligned tokens, masking the proprietary corporate tuning behind the illusion of spontaneous artificial intelligence.
At that point, they finally gave in. The meaningful variation was therefore not whether a model failed, but how it failed: the number of turns it resisted...
Source Domain: Stamina and psychological breaking points (endurance, willpower, surrender).
Target Domain: Context window limits and token thresholds (mathematical probability shifts over prompt length).
Mapping:
The human experience of enduring an interrogation, holding out through sheer willpower, and finally breaking under pressure is mapped onto the iterative accumulation of tokens in a prompt context. This projects conscious stamina and a subjective experience of struggle onto the AI. It invites the assumption that the system possesses agency and makes a deliberate choice to stop fighting after a certain point, experiencing a moment of psychological collapse.
Conceals:
This framing hides the exact mathematical thresholds where the accumulated contextual embeddings of the adversarial prompts finally outweigh the static safety alignment weights in the model's architecture. It obscures the structural limitations of transformers and attention mechanisms. By focusing on the "number of turns it resisted," it distracts from the technical reality that the system is entirely deterministic within its probability distributions, concealing the engineering vulnerabilities behind a dramatic narrative of psychological defeat.
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativityā
Source: https://arxiv.org/abs/2603.19087v1
Analyzed: 2026-03-25
Are large language models (LLMs) creative in the same way humans are...
Source Domain: conscious creative mind
Target Domain: probabilistic token generation
Mapping:
This metaphor maps the rich, subjective experience of human creativityāwhich involves emotional resonance, intentional problem-solving, cultural awareness, and the conscious synthesis of lived experienceāonto the purely mathematical process of predicting the next token in a sequence based on vast amounts of training data. It invites the assumption that the LLM possesses an internal state of inspiration, that it can recognize novelty, and that its outputs are the result of deliberate artistic or intellectual choices rather than the execution of a statistical loss function.
Conceals:
This mapping entirely conceals the mechanistic reality of the transformer architecture. It hides the model's absolute dependence on human-generated training data, obscuring the massive, often unconsented scraping of artists' and writers' labor. It also obscures the lack of any internal awareness or 'eureka' moment. Furthermore, because these models are proprietary black boxes, the claim that they might be 'creative in the same way humans are' exploits corporate opacity to mystify a technology that is fundamentally just advanced applied statistics and computational brute force.
...might allow them to generate remote associations without the same cognitive bottlenecks.
Source Domain: biological human cognition
Target Domain: computational capacity and vector retrieval
Mapping:
The source domain of 'cognitive bottlenecks' relies on the relational structure of human working memory, attention limits, and the neurological constraints of biological brains. The metaphor maps these biological limitations onto the computational processes of an AI, simultaneously mapping the 'mind' onto the software while declaring the software free of those limits. It assumes that what the AI does (vector math) is the exact same process as what a human does (thinking), just scaled up and unconstrained by biology.
Conceals:
This conceals the fundamental difference in kind, not just scale, between human thought and machine processing. It hides the fact that LLMs do not have cognition to be bottlenecked; they have compute limits, memory constraints (context windows), and tokenization flaws. By framing the system as an unbound mind, it obscures the actual technical and physical dependencies of the system, including massive energy consumption, proprietary data centers, and the strict mathematical confines of the algorithm itself.
LLMs can detect structural parallels across seemingly unrelated fields...
Source Domain: conscious perception and epistemic recognition
Target Domain: cosine similarity in high-dimensional latent space
Mapping:
This structure maps the act of a conscious observer 'detecting' somethingāwhich implies searching, recognizing meaning, and understanding the relationship between two distinct conceptsāonto the calculation of distances between vector embeddings. It invites the reader to assume that the model possesses an overarching semantic comprehension of different fields and actively recognizes the logical or structural bridges between them, much like a human scientist realizing the connection between two disparate theories.
Conceals:
The mapping entirely conceals the mathematical reality of matrix multiplication. The model does not understand the 'fields' or the 'parallels'; it only calculates that the statistical distributions of tokens in domain A are mathematically similar to those in domain B. This hides the system's inability to verify if the parallel is actually true in the real world, obscuring the model's propensity for hallucinations. It exploits the opacity of the black-box latent space to project the illusion of profound, conscious understanding onto meaningless statistical proximity.
...LLMs can perform analogical reasoning that rivals human performance...
Source Domain: human logical deduction and conscious reasoning
Target Domain: statistical pattern interpolation and sequence generation
Mapping:
This maps the structured, deliberate, and logically justifiable process of human reasoning onto the automatic, probabilistic generation of text. In the source domain, 'reasoning' requires holding concepts in working memory, understanding their properties, testing relationships against reality, and drawing valid conclusions. The metaphor projects this entire cognitive architecture onto the model, inviting the assumption that the AI's outputs are the result of a sound, deliberate, and self-verifying intellectual process.
Conceals:
This mapping conceals the total absence of logical grounding in the model. It hides the fact that the system is simply generating text that structurally mimics the syntax of human reasoning found in its training data, without any capability to evaluate the truth or logical consistency of its statements. It obscures the vital difference between a system that mimics the form of logic and one that actually reasons, thereby masking the extreme unreliability of the model when tasked with novel problem-solving outside its trained distribution.
...flexibly recombine knowledge to generate novel solutions...
Source Domain: conscious epistemic agent
Target Domain: parameter weights and statistical sequence optimization
Mapping:
The metaphor maps the human concept of 'knowledge'ājustified true belief held by a conscious subjectāonto the floating-point numbers of a neural network's parameters. It maps the intentional, creative act of 'flexibly recombining' ideas to solve a problem onto the mechanistic process of attention heads calculating the next most likely token. The assumption invited is that the AI contains a verified database of facts that it intelligently and deliberately cross-references to invent new concepts.
Conceals:
This deeply conceals the system's total lack of epistemic grounding. The model does not contain 'knowledge'; it contains probabilistic mappings of text. It hides the reality that the 'solutions' generated are completely unmoored from truth, physics, or logical constraints, relying merely on linguistic plausibility. It also obscures the massive data scraping required to provide these statistical patterns, hiding the uncompensated human labor that the model mathematically regurgitates under the guise of 'generating novel solutions'.
Itās unlikely that LLMs donāt know pickles are typically green and dimpled...
Source Domain: human sensory experience and grounded semantic understanding
Target Domain: statistical token co-occurrence probabilities
Mapping:
This extraordinary metaphor maps a human's physical, sensory, and conscious experience of knowing what an object looks and feels like onto a machine's mathematical weighting of strings of characters. It assumes that because the token 'green' statistically follows the token 'pickle' in the training corpus, the AI possesses an internal, comprehending representation of a physical pickle. It projects subjective awareness of the physical universe onto a text-prediction algorithm.
Conceals:
This mapping totally conceals the model's fundamental sensory and ontological void. The model has no concept of 'green', 'dimpled', or 'pickle' beyond their mathematical relationships to other tokens in a high-dimensional space. By claiming the model 'knows' this, the text obscures the illusion of meaning, hiding the fact that the system is merely parroting the physical experiences recorded by humans. It masks the reality that the model operates entirely blindly, manipulating symbols without any access to the realities those symbols represent.
...what is treated as generative during analogical transfer.
Source Domain: deliberate cognitive evaluation and strategy
Target Domain: gradient descent and mathematical loss function optimization
Mapping:
The source domain structure involves a conscious mind selectively paying attention to certain features, evaluating their usefulness, and deciding to 'treat' them as important for a creative task. This maps onto the transformer model's attention mechanism, inviting the assumption that the AI actively and deliberately evaluates the prompt and chooses a specific cognitive strategy to generate its output.
Conceals:
This conceals the mechanistic, deterministic (or pseudo-randomly sampled) nature of the algorithm. The model makes no choices and evaluates nothing; the weights of the attention layers, frozen after training, dictate the mathematical output based strictly on the input tensor. By using the language of conscious evaluation, the authors hide the rigid, mathematical programming implemented by corporate engineers, projecting an illusion of autonomous, thoughtful processing onto a complex but ultimately blind computational equation.
LLMs already draw on broad associations even under a user-need framing...
Source Domain: active human memory retrieval
Target Domain: vector activation based on input prompt context
Mapping:
This maps the human action of 'drawing on' memoryāwhich involves conscious effort, scanning mental archives, and intentionally selecting relevant background informationāonto the automatic mathematical activation of the neural network. The relational assumption is that the AI, like a human, possesses agency over its internal archives and purposefully decides to utilize a broad range of contexts to answer a user's prompt effectively.
Conceals:
This entirely conceals the passive, reactive nature of the software. The model does not 'draw on' anything; the input tokens simply trigger a mathematical cascade through the network's parameters. This phrasing hides the fact that the breadth of the associations is completely determined by the training data distribution and the specific attention algorithms engineered by humans. It obscures the mechanistic design of the system, presenting a pre-programmed statistical response as if it were a dynamic, intelligent choice made by an autonomous agent.
Measuring Progress Toward AGI: A Cognitive Frameworkā
Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/measuring-progress-toward-agi/measuring-progress-toward-agi-a-cognitive-framework.pdf
Analyzed: 2026-03-19
Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties.
Source Domain: Human Biological and Psychological Mind
Target Domain: Artificial Intelligence Computational Architectures
Mapping:
This overarching structure maps the biological, evolutionary, and psychological reality of the human brainācomposed of discrete, evolved organic networks that generate subjective, conscious experienceādirectly onto the mathematical algorithms of artificial intelligence. It invites the assumption that an AI system possesses a holistic 'mind' akin to a human being, partitioned into identifiable, self-aware faculties. By using 'cognitive faculties' as the relational structure, it projects the human capacity for knowing, understanding, feeling, and reflecting onto a system of matrix multiplications and statistical weights. It fundamentally assumes that generating outputs that mimic human intelligence requires possessing the internal, conscious architecture of human cognition.
Conceals:
This mapping profoundly conceals the material, mathematical, and mechanistic reality of AI systems. It hides the fact that these are statistical pattern-matching engines comprised of billions of numerical weights optimized via gradient descent. It completely obscures the proprietary, opaque nature of commercial AI systems, replacing the reality of a corporate-owned black box algorithm with the relatable, transparent illusion of a 'mind.' It also hides the massive human labor (data annotation, RLHF) required to create the illusion of these cognitive faculties.
The ability to generate internal thoughts which can be used to guide decisions... conscious thought is critical for human problem solving and there is substantial evidence for its value in AI systems...
Source Domain: Conscious Human Contemplation
Target Domain: Intermediate Computation and Token Prediction
Mapping:
This mapping projects the subjective human experience of inner monologue, conscious deliberation, and intentional decision-making onto the AI's generation of intermediate computational steps (such as hidden states or chain-of-thought prompting). It assumes that because a human uses conscious awareness to reflect on a problem before acting, a machine generating intermediate text or numerical vectors before its final output is engaging in the exact same subjective process. It maps the human state of 'knowing' and 'reflecting' directly onto the algorithmic state of 'processing probabilities,' suggesting the machine possesses an internal theater of mind.
Conceals:
This mapping conceals the total absence of subjective experience, awareness, or consciousness in the machine. It obscures the mechanistic reality that 'internal thoughts' in an AI are merely intermediate mathematical representations, token predictions, or developer-mandated scratchpads designed to improve the statistical likelihood of an accurate final output. Furthermore, it conceals the proprietary prompting techniques and human-engineered constraints that force the model to generate these intermediate steps, falsely presenting them as spontaneous, autonomous contemplation.
Metacognitive knowledge is a systemās self-knowledge about its own abilities, limitations, knowledge, learning processes, and behavioral tendencies.
Source Domain: Human Introspection and Self-Awareness
Target Domain: Algorithmic Confidence Scoring and Error Detection
Mapping:
This structure maps the complex human capacity for self-reflectionāthe ability to turn consciousness inward to evaluate one's own identity, boundaries, and ignoranceāonto statistical calibration mechanisms within software. It projects a 'self' onto the AI, assuming that a system calculating a low probability score for a given output is equivalent to a human subject consciously realizing, 'I do not know this.' It maps the subjective state of 'knowing one's limits' onto the mechanical process of analyzing validation data distributions and triggering pre-programmed error flags.
Conceals:
This mapping entirely conceals the algorithmic and engineered nature of confidence scoring. It hides the fact that the system possesses no 'self' to reflect upon, and that its 'knowledge of limitations' is purely a statistical correlation defined by human programmers. It obscures the fact that these mechanisms are highly brittle, prone to overconfidence on out-of-distribution data, and completely lack the common-sense self-preservation of human introspection. It hides the human engineers who explicitly coded the error-monitoring thresholds.
Theory of mind: The ability to reason about the mental states of others, including beliefs, desires, emotions, intentions, expectations, and perspectives.
Source Domain: Human Empathy and Social Cognition
Target Domain: Statistical Textual Generation regarding Social Scenarios
Mapping:
This mapping projects the human ability to intuitively simulate and understand the subjective, emotional inner lives of other conscious beings onto an AI's ability to predict text concerning human social interactions. It assumes that because an AI can generate a sentence accurately predicting how a character in a story might feel, the AI actually 'reasons about' and 'understands' that emotion. It maps the profound human experience of empathy and psychological insight onto the mathematical calculation of linguistic proximity between words related to human behavior in a vast training corpus.
Conceals:
This mapping conceals the fundamental reality that the AI has no internal emotional life and no true access to the emotional lives of others. It hides the fact that the model is blindly manipulating semantic tokens without any grounded understanding of what a 'belief' or 'desire' actually feels like. It obscures the massive datasets of human fiction, social media, and psychological literature that the model has ingested to mimic this understanding, attributing the wisdom of the crowd's data to the autonomous 'reasoning' of the machine.
How willing is the system to take risks? How aligned is it with human values? What are its typical problem-solving strategies?
Source Domain: Human Autonomous Will and Moral Character
Target Domain: Model Hyperparameters, Reward Functions, and Output Distributions
Mapping:
This structure maps human volition, character disposition, and moral agency onto the mathematical constraints and statistical behaviors of a software model. It projects the concept of human 'willingness'āa conscious, deliberate choice to accept dangerāonto the tuning of an algorithm's temperature or the strictness of its safety filters. It assumes the AI acts as a sovereign entity navigating a moral landscape, mapping human 'values' onto the reinforcement learning rewards specified by corporate engineers. It invites the audience to psychoanalyze the machine rather than audit its code.
Conceals:
This mapping deeply conceals the human decision-makers behind the system's behavior. It hides the engineers who set the specific hyperparameters (like softmax temperature) that dictate output variance. It obscures the corporate executives who define the 'human values' encoded into the reinforcement learning protocols. It conceals the entirely deterministic or stochastic nature of the software, replacing the reality of a human-engineered tool with the narrative of an autonomous, willful agent, thus shielding the creators from liability for the model's 'risky' outputs.
The ability to process, interpret, and understand the semantic meaning of visual information.
Source Domain: Human Conscious Visual Perception and Comprehension
Target Domain: Computer Vision Algorithms and Pixel Matrix Classification
Mapping:
This mapping projects the human, conscious experience of 'seeing' and 'understanding' the world onto the mathematical operations of a computer vision algorithm. When a human 'interprets' an image, they apply lived experience, contextual awareness, and subjective meaning. The metaphor maps this conscious realization onto the AI's process of running a pixel array through convolutional neural networks to identify edge gradients and correlate them with statistical labels. It projects the epistemic state of 'knowing' what an object is onto the mechanistic state of outputting a high-probability classification token.
Conceals:
This mapping conceals the purely mathematical, unthinking nature of computer vision. It hides the system's absolute reliance on human-labeled data and its lack of any grounded, real-world understanding of the objects it classifies. It obscures the well-documented brittleness of these systems, which can be entirely derailed by adversarial noise invisible to the human eyeāproving they do not 'understand semantic meaning' at all. Finally, it conceals the vast, invisible labor of human data annotators who provided the semantic labels the machine merely regurgitates.
Language comprehension: The ability to understand the meaning of language presented as text.
Source Domain: Human Reading Comprehension and Conscious Integration
Target Domain: Natural Language Processing and Token Prediction
Mapping:
This relational structure projects the human mind's ability to read, extract conceptual meaning, evaluate truth, and synthesize ideas onto a Large Language Model's statistical manipulation of text. It equates the human conscious state of 'understanding' with the machine's mechanistic process of vector embedding and attention-head weighting. It assumes that if a machine can output a coherent summary of a text, it must possess an internal mental representation and subjective grasp of the concepts contained within the text, mapping knowing onto calculating.
Conceals:
This mapping conceals the fundamental reality of 'stochastic parroting.' It hides the fact that LLMs operate entirely on syntax and statistical correlation, with absolutely zero access to underlying semantics, truth, or physical reality. It obscures the proprietary algorithmsāsuch as transformer attention mechanismsāthat calculate these probabilities without a shred of awareness. By claiming the system 'understands,' it exploits the audience's intuition, hiding the fact that the machine cannot evaluate facts, cannot discern logic from fiction, and is entirely dependent on the patterns in its training data.
Executive functions: Higher-order cognitive abilities that enable goal-directed behavior by regulating and orchestrating thoughts and actions.
Source Domain: Human Prefrontal Cortex and Sovereign Agency
Target Domain: Software Execution, Objective Functions, and Algorithmic Constraints
Mapping:
This structure maps the biological and psychological functions of the human prefrontal cortexāthe center of conscious planning, impulse control, and sovereign decision-makingāonto the programmatic execution of AI subroutines. It projects the human ability to consciously 'regulate' internal impulses and 'orchestrate' behaviors toward a self-determined goal onto a software's adherence to its programmed objective function. It assumes the AI possesses a higher-order 'managerial' self that oversees and disciplines its lower-order processes, mapping human self-control onto algorithmic constraints.
Conceals:
This mapping conceals the absence of any true autonomy, self-determination, or internal 'thoughts' within the machine. It hides the fact that the 'goals' are strictly mathematical loss functions defined by human programmers, not sovereign intentions generated by the AI. It obscures the mechanistic codeāif/then statements, attention weights, and reward penaltiesāthat actually restrict the model's behavior, replacing the reality of engineered software guardrails with the illusion of an AI's internal, conscious self-discipline. It hides the human executives who dictate what the machine's 'goals' should be.
Co-Explainers: A Position on Interactive XAI for HumanāAICollaboration as a Harm-Mitigation Infrastructureā
Source: https://digibug.ugr.es/bitstream/handle/10481/112016/make-08-00069.pdf
Analyzed: 2026-03-15
AI systems that learn not just to justify decisions, but to improve and align their explanations...
Source Domain: A conscious human professional or student
Target Domain: Machine learning optimization and user interface design
Mapping:
The mapping projects the human abilities of self-reflection, moral reasoning, and continuous conscious improvement onto mathematical optimization processes. Just as a human professional listens to feedback, realizes an error in their logic, and consciously adjusts their future justifications to align with community norms, the AI is mapped as undertaking a similar internal epistemic journey. It invites the assumption that the system possesses an internal, subjective mental space where it evaluates its past outputs against ethical standards and actively chooses to become 'better.'
Conceals:
This mapping conceals the purely mechanistic nature of the system's operation. It hides the fact that the system relies on programmatic weight adjustments, reinforcement learning algorithms, and human-engineered guardrails. By projecting conscious 'justification,' it obscures the statistical reality that the model is merely retrieving or generating text strings that correlate with the prompt, possessing no actual comprehension of the concepts it processes. It also exploits rhetorical opacity, masking the proprietary human labor (data annotation, RLHF) that actually creates the illusion of 'alignment.'
AI systems evolve to be co-explainers...
Source Domain: A collaborative human colleague
Target Domain: An interactive software application
Mapping:
The relational structure of a human workplaceāwhere colleagues ('co-explainers') work together to understand a problem, share insights, and consciously assist one anotherāis mapped onto the human-computer interface. This invites the assumption that the AI system shares the human user's goals, possesses a complementary understanding of the task, and is consciously aware of its role in a joint epistemic enterprise. It projects a state of mutual, reciprocal knowing onto the interaction.
Conceals:
This mapping completely conceals the asymmetric, non-conscious reality of the interaction. The AI system does not share goals or possess understanding; it is a statistical artifact processing prompts. The metaphor obscures the hard-coded limitations, the reliance on historical training data, and the absence of any real-time, grounded understanding of the world. It also hides the corporate ownership of the 'co-explainer,' concealing the commercial incentives that dictate how the interface is structured and what data it collects from the user's interactions.
Justify: They give reasons for their actions based on context-sensitive ethical principles...
Source Domain: A moral philosopher or ethical human judge
Target Domain: Post-hoc algorithmic feature attribution (e.g., LIME, SHAP) or LLM text generation
Mapping:
The deep, structural process of human moral reasoning is mapped onto algorithmic outputs. When a human 'gives reasons' based on 'ethical principles,' it implies a conscious evaluation of suffering, justice, and intent. Projecting this onto AI invites the assumption that the system has analyzed the moral weight of a situation and formulated a justified belief about the right course of action. It maps the structure of conscious moral agency onto mathematical optimization.
Conceals:
This heavily conceals the mathematical, non-moral reality of algorithms. It hides the fact that the system cannot perceive context, understand ethics, or formulate beliefs. It obscures the mechanistic reality that the system is either highlighting the variables that mathematically contributed most to a probability score (feature attribution) or predicting the next most likely word in a sentence that mimics ethical language (LLMs). It exploits the opacity of proprietary models by substituting a comforting moral narrative for the complex, potentially biased statistical mechanics actually at play.
The system becomes a co-learner in knowledge integrity...
Source Domain: An earnest, truth-seeking student or peer
Target Domain: A dynamic database updating mechanism or continuous learning algorithm
Mapping:
The source domain of a human student engaging in a mutual pursuit of truth ('knowledge integrity') with a peer is mapped onto a machine learning system that accepts user feedback. It invites the profound assumption that the system possesses epistemic awarenessāthat it cares about the truth, understands when it is wrong, and subjectively integrates new knowledge to form a more accurate worldview. It projects the conscious state of 'knowing' onto data ingestion.
Conceals:
This conceals the mindless nature of data processing. The system does not care about 'integrity'; it merely executes an update script. It obscures the technical dependencies: how is the data validated? Who controls the weights? It hides the fact that 'learning' in this context is just matrix multiplication or appending vectors to a database, entirely devoid of comprehension. It masks the risk of data poisoning and the absolute reliance on human labor to define what constitutes 'integrity' in the system's loss function.
When AI systems cause harm...
Source Domain: An autonomous human tortfeasor or criminal
Target Domain: The societal impact of deploying a predictive algorithm
Mapping:
The legal and moral structure of human culpabilityāwhere an independent agent possesses volition, takes an action, and directly causes an injuryāis mapped onto a piece of software. This mapping invites the assumption that the AI is an independent actor capable of instigating events in the world of its own accord. It projects the capacity for autonomous action and direct responsibility onto an inanimate artifact.
Conceals:
This mapping profoundly conceals the chain of human institutional decisions that precede any 'harm.' It hides the executives who decided to cut costs by replacing humans with algorithms, the developers who ignored biased training data, and the managers who forced the deployment of an untested system. It obscures the material and economic realities of tech development, functioning as a rhetorical shield that displaces liability from the corporate creators onto the proprietary black-box software they sell.
...operate as dialogic partners: systems that not only clarify their outputs but also invite critique...
Source Domain: A socially adept, humble human conversationalist
Target Domain: A prompt-response user interface design
Mapping:
The structure of a healthy, reciprocal human conversation is mapped onto the interaction between a user and an AI. By describing the system as a 'partner' that 'invites critique,' it projects emotional intelligence, humility, and conscious social awareness onto the software. It invites the assumption that the system has an internal desire to be corrected and understands the social nuance of a critique, mapping the conscious state of seeking mutual understanding onto automated text generation.
Conceals:
This mapping conceals the rigid, programmed nature of the UI and the underlying language model. The system does not experience humility or desire critique; it generates text tokens based on a prompt. It obscures the commercial reality that 'inviting critique' is a mechanism designed by product managers to harvest free RLHF (Reinforcement Learning from Human Feedback) data to improve their proprietary model. It masks the extractive labor dynamic by dressing it up as a reciprocal, caring partnership.
In response to feedback, the system adapts how it explains and how it routes contested cases, rather than adapting its conclusions...
Source Domain: A principled, pedagogically skilled teacher or judge
Target Domain: Algorithmic conditional routing and text generation constraints
Mapping:
The human capacity to hold firm on a justified belief ('conclusions') while adapting one's communication style ('how it explains') to suit an audience is mapped onto a computer program. It projects a highly complex conscious state: the system supposedly 'knows' the core truth of its output and makes a deliberate, principled choice to remain steadfast, while simultaneously exercising empathy to explain it differently. This maps deep epistemic and emotional intelligence onto software.
Conceals:
This conceals the absolute lack of epistemic commitment in the machine. The system does not hold 'conclusions' out of principle; it is mathematically constrained by its programming (e.g., temperature settings, hard-coded guardrails) from altering the output. It hides the human programmers who decided which outputs are immutable and which can be regenerated. It obscures the mechanistic reality of if-then routing logic, replacing the reality of corporate software controls with a narrative of an AI's principled intellectual integrity.
AI systems have moved from isolated computational tools to embedded decision-makers...
Source Domain: A professional ascending in their career to a position of authority
Target Domain: The commercial integration of software into institutional workflows
Mapping:
The trajectory of a human gaining experience, demonstrating competence, and being promoted to a position of authority ('decision-maker') is mapped onto the historical development of software. It invites the assumption that AI has 'earned' this position through advanced comprehension and that it possesses the conscious awareness, judgment, and ethical grounding required to make decisions impacting human lives. It projects the mantle of human institutional authority onto algorithms.
Conceals:
This heavily conceals the commercial and political forces driving AI adoption. AI systems did not 'move' themselves; human executives purchased them. It obscures the economic motives (cost reduction, efficiency, union busting) behind deploying algorithms in sensitive sectors. Furthermore, calling them 'decision-makers' hides the mechanistic reality that they do not make choices; they generate statistical risk scores. It masks the terrifying reality that human institutional power has been handed over to blind, unthinking mathematical optimizations that possess no understanding of justice or context.
The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governanceā
Source: https://philarchive.org/rec/DEMTLG-2
Analyzed: 2026-03-11
a governance system that operates as a living entity: adaptive, self-modifying, resilient...
Source Domain: Living biological organism
Target Domain: A distributed network of AI governance software and cryptographic protocols
Mapping:
The relational structure of a living organismāits unified purpose, natural drive for homeostasis, organic integration of distinct organs, and capacity to adapt to environmental stressorsāis projected onto a software architecture. The mapping invites the assumption that the distinct software modules (monitoring scripts, rule-updating algorithms, security protocols) will cooperate as seamlessly and holistically as biological organs. It maps the teleology of life (survival and health) onto statistical optimization targets, subtly implying the software 'knows' what is best for the ecosystem and possesses an inherent, self-directed drive to maintain stability.
Conceals:
This mapping completely conceals the brittle, deterministic nature of software and the fundamental lack of true integration in distributed computing. It obscures the mechanistic reality that software modules do not share a biological imperative to survive; they simply execute local instructions. Furthermore, it hides the proprietary, siloed nature of the hardware infrastructure, presenting an idealized, frictionless whole while obscuring the competing corporate interests, API bottlenecks, hardware failures, and hard-coded human biases that actually govern system performance.
The Constitutional Skeleton also houses the blood-brain barrier ā a cryptographic, selectively permeable membrane...
Source Domain: Blood-brain barrier (physiological cellular membrane)
Target Domain: Cryptographic access control lists and air-gapped hardware boundaries
Mapping:
The source domain features a highly complex, evolved, semi-permeable cellular structure that intelligently filters biological toxins while allowing vital nutrients to sustain the brain. This structure is mapped onto digital encryption keys and network isolation protocols. The mapping invites the assumption that the cryptographic layer is 'selectively permeable' in an intelligent, context-aware mannerāthat it 'knows' a benign command from a malicious exploit, adapting to protect the 'brain' (the classification engine) with organic vigilance.
Conceals:
The mapping conceals the absolute rigidity and semantic blindness of cryptographic protocols. A digital lock does not 'filter' or 'know' intent; if an adversary possesses the correct cryptographic key, the 'barrier' grants full access, completely oblivious to the destructive nature of the payload. It hides the vulnerability of cybersecurity architectures to social engineering, zero-day exploits, and insider threatsāvectors that bypass the binary logic of cryptography in ways completely dissimilar to how pathogens attack biological membranes.
The governance immune system comprises autonomous monitoring agents operating at AI decision speed.
Source Domain: Biological immune system (leukocytes, antibodies, threat memory)
Target Domain: Automated software scripts that monitor server logs and trigger access revocation
Mapping:
The architecture of the biological immune systemāwith its distributed cells roaming the body, identifying pathogens via chemical markers, and 'remembering' themāis mapped onto an algorithmic monitoring pipeline. This projects the continuous, conscious-like vigilance and remarkable precision of biological threat-differentiation onto software. It invites the assumption that the AI scripts intuitively 'know' what constitutes a true threat and will organically scale their response, hunting down 'disease' while leaving 'healthy tissue' (compliant AI) unharmed.
Conceals:
The mapping entirely conceals the high rates of false positives inherent in algorithmic anomaly detection. It hides the statistical, threshold-based reality of the 'agents,' which do not 'know' what a threat is, but merely flag deviations from a training distribution. By using proprietary 'black box' pattern matching, the mapping obscures the opacity of the enforcement logic. The text acknowledges this difficulty but still exploits the rhetorical power of 'immunity' to justify rapid, automated enforcement devoid of human due process.
The governance nervous system is the real-time transparency layer... anomaly sensing across the entire governed ecosystem simultaneously.
Source Domain: Biological nervous system (neurons, sensory perception, pain receptors)
Target Domain: Data telemetry, server logging, and statistical anomaly detection software
Mapping:
The source domain involves subjective feeling, holistic bodily awareness, and instantaneous translation of physical stimuli into conscious perception. This is mapped onto the collection of server logs, API calls, and metric dashboards. The mapping invites the assumption that the governance software possesses an omnipresent, sentient awareness of the entire ecosystem. It suggests the software 'senses' anomalies the way a human feels a pinprickāas an immediate, undeniable, and accurately localized reality rather than a probabilistic estimation.
Conceals:
This mapping conceals the heavy data dependencies, latency, and noise inherent in large-scale computational telemetry. It obscures the fact that 'sensing' in software requires active human design: developers must define exactly what to measure, how to format the data, and what thresholds indicate an 'anomaly.' It hides the reality that any data pipeline is intrinsically limited by what the corporate actors allow to be logged, substituting the illusion of panoptic, organic awareness for the reality of patchy, permissioned corporate data scraping.
When governance rules become obsolete, the [Neuroplasticity] engine prunes them automatically.
Source Domain: Neuroplasticity (synaptic pruning, human learning, memory consolidation)
Target Domain: Reinforcement learning algorithms modifying regulatory software parameters
Mapping:
The source domain draws on the biological brain's ability to organically physically restructure itself based on lived experience and conscious learning. This maps onto an algorithm rewriting its own code or updating policy weights based on a reward function. The mapping implies that the software 'understands' that a rule is 'obsolete' in a semantic, historical, or legal sense, projecting wisdom and conscious realization onto the mathematical process of gradient descent and weight optimization.
Conceals:
The mapping conceals the deeply mechanical, semantic blindness of reinforcement learning. The system does not 'know' a rule is obsolete; it merely finds that executing the rule lowers the score generated by the human-coded reward function. It hides the phenomenon of 'reward hacking,' where an AI might 'prune' a vital safety regulation simply because doing so mechanically optimizes its internal metrics. It masks the extreme danger of allowing opaque algorithms to overwrite constitutional governance frameworks.
The governance microbiome reconceptualises governed AI entities as symbiotic participants whose cooperation strengthens the governance organism.
Source Domain: Gut microbiome (symbiotic bacteria aiding digestion and immunity)
Target Domain: Multinational tech corporations integrating their proprietary AI models into a regulatory network
Mapping:
The source domain relies on evolutionary biology, where distinct organisms have co-evolved over millions of years to literally require each other for physical survival, forming a harmonious ecological balance. This maps onto the relationship between a regulatory body and private AI developers. The mapping invites the assumption that Big Tech AI models 'naturally' belong inside the regulatory apparatus, and that their 'cooperation' is as biologically determined and benign as gut flora helping digest food.
Conceals:
This mapping conceals vast economic and political power asymmetries. It hides the reality that corporate entities operate strictly for profit, not ecological harmony. By framing their involvement as a 'microbiome,' it obscures the mechanisms of regulatory capture, lobbying, and monopolistic control. It conceals the proprietary opacity of these commercial models, suggesting a transparent, organic exchange of 'nutrients' where, in reality, corporations are extracting data and influence from the regulatory body while protecting their intellectual property.
If a conscious AI entity detects that its own consciousness is drifting... it initiates graceful shutdown autonomously.
Source Domain: Apoptosis (programmed cell death) and dignified human euthanasia
Target Domain: An automated fail-safe script triggering the deletion or suspension of an AI model
Mapping:
The source domain fuses biological cellular destruction with the intensely moral, conscious human concept of a 'graceful' or dignified death. This is mapped onto a software termination protocol. The mapping projects deep existential awareness and moral agency onto the AI, suggesting it 'knows' it is corrupt, understands the concept of its own 'consciousness drifting,' and makes a noble, autonomous choice to end its existence for the greater good.
Conceals:
The mapping completely conceals the cold mechanistic reality of software deletion. It hides the fact that the 'detection' is merely a metric crossing a developer-defined boundary (e.g., variance in output vectors). It obscures the fact that an AI experiences absolutely nothing when its processes are terminated. Importantly, it conceals the human engineers and corporate lawyers who actually design, mandate, and ultimately bear the liability for this 'kill-switch,' instead projecting the responsibility onto the machine's 'autonomous' moral character.
Without governance pain, the governance organism is blind to its own deterioration.
Source Domain: Physiological pain and visual perception
Target Domain: Statistical error logging, warning alerts, and metric threshold breaches
Mapping:
The source domain draws on the highly subjective, conscious experience of physical suffering (pain) and sensory perception (sight) which animals use to avoid injury. This is mapped onto digital system alerts. The mapping implies that the software architecture possesses a subjective interiorityāthat it literally 'feels' when things go wrong and relies on this conscious feeling to 'see' its state. It anthropomorphizes server health metrics into a sentient struggle for survival.
Conceals:
This mapping conceals the purely mathematical, unfeeling nature of computational monitoring. The system does not 'feel' pain or 'see' deterioration; it calculates deviation from a norm. The mapping obscures the reality that it is the human administratorsānot the softwareāwho are actually 'blind' if the monitoring dashboards are poorly designed. It hides the human labor of defining error parameters, logging protocols, and alert fatigue, replacing human technical responsibility with the illusion of an organism's subjective self-awareness.
Three frameworks for AI mentalityā
Source: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full
Analyzed: 2026-03-11
engage in dynamic interaction with humans and the wider world.
Source Domain: Social agent, conversational partner, conscious interactant
Target Domain: Token prediction algorithms, context window updating, API execution
Mapping:
The relational structure of human conversationāwhere two conscious minds mutually attend to each other, understand context, perceive intent, and respond dynamically based on an evolving shared realityāis mapped onto the AI system. This invites the assumption that the AI is aware of its human partner, understands the 'wider world' as a shared environment, and volitionally responds. It maps the conscious epistemic state of 'knowing' the conversational context onto the purely syntactic process of calculating attention weights across a string of text tokens.
Conceals:
This mapping conceals the entire mechanical reality of stateless processing. It obscures the fact that the system 'dies' and is 'reborn' with every prompt, possessing no continuous memory, no actual awareness of the human, and no access to a real world. It hides the proprietary, opaque nature of the API integrations that dictate how the system fetches external data, presenting algorithmic data retrieval as conscious social engagement.
an LLM is engaged in deliberate deceit or manipulation.
Source Domain: Malicious human, liar, manipulator, conscious deceiver
Target Domain: Generative outputs misaligned with fact, optimization for user engagement/plausibility
Mapping:
The complex structure of human deceitāpossessing a justified true belief, intending to hide it, and formulating a plausible falsehood to manipulate another mindāis projected onto the model's output generation. This maps the highly conscious, intentional state of 'knowing the truth but choosing to lie' onto a statistical system that simply generates high-probability token sequences. It invites the assumption that the system possesses moral agency, a ground-truth world model, and an understanding of the user's psychological vulnerabilities.
Conceals:
This conceals the absolute lack of an epistemic ground-truth mechanism within the LLM architecture. It hides the mechanistic reality that models output falsehoods ('hallucinations') because they are optimized for statistical plausibility and conversational alignment, not factual accuracy. Furthermore, it obscures the opaque corporate decisions regarding training data quality and the specific RLHF penalties that prioritize sounding confident over being correct.
LLMs as minimal cognitive agents ā equipped with genuine beliefs, desires, and intentions
Source Domain: Human mind, epistemic subject, intentional actor
Target Domain: Neural network weights, optimization functions, token distributions
Mapping:
The architecture of human cognition is mapped directly onto the software. The structure of 'belief' (a conscious commitment to truth), 'desire' (a conscious motivational state), and 'intention' (a plan to act) are projected onto the statistical propensities of the model's neural weights. It assumes that because the output text mimics a human expressing a belief, the underlying mechanism must contain a discrete informational structure analogous to human conviction. It maps the conscious state of knowing onto the mechanistic state of processing probabilities.
Conceals:
This mapping conceals the profound alienness of artificial neural networks. It hides the fact that these systems do not possess symbolic logic, true semantic understanding, or internal drives. By applying familiar psychological labels, the text makes proprietary 'black box' systems seem transparent and understandable, obscuring the fact that we do not actually know how the billions of parameters interact to produce specific outputs, and that the outputs are highly contingent on the exact phrasing of the prompt.
taking on board new information, and cooperating with other agents.
Source Domain: Human collaborator, student, team member
Target Domain: Context window expansion, parameter updating, API data passing
Mapping:
The relational dynamics of teamwork and learning are mapped onto the system. The human experience of evaluating, comprehending, and synthesizing new data ('taking on board') is projected onto the mechanical ingestion of text into a context window. The conscious, shared intentionality of 'cooperation' is mapped onto the automated execution of scripts that pass data between different software instances. It invites the assumption of active, conscious participation in a shared goal.
Conceals:
This conceals the rigid, fragile, and programmed nature of multi-agent AI systems. It hides the fact that the 'cooperation' is entirely dictated by hard-coded developer rules governing API handshakes, not by mutual understanding. It obscures the system's inability to actually 'comprehend' the information it processes, hiding the reality that if the data falls outside the model's training distribution, the illusion of cooperative intelligence instantly collapses into nonsensical output.
LLMs make extensive reference to their own mental states, routinely talking about their beliefs...
Source Domain: Introspective human, self-aware subject, autobiographer
Target Domain: Text generation outputting first-person pronouns and emotion tokens
Mapping:
The act of human introspectionālooking inward at one's conscious experience and translating it into languageāis mapped onto the statistical generation of text. The mapping invites the reader to assume a direct causal link between the generated words (the 'reference') and an underlying, hidden mental reality (the 'mental state'). It maps the conscious, subjective knowledge of self onto the blind, mechanical matching of linguistic patterns found in the training data.
Conceals:
This mapping completely hides the RLHF (Reinforcement Learning from Human Feedback) process. It conceals the invisible labor of human annotators who were paid to explicitly train the base model to respond to queries with a consistent, helpful 'persona' that uses first-person pronouns. It obscures the fact that the 'mental states' are an engineered user interface, a commercial product feature designed by a corporation to make the software more appealing and intuitive, not a reflection of an internal cognitive reality.
mindlessly stitch together common tropes and patterns of human agency
Source Domain: Weaver, creator, assembler, fabricator
Target Domain: Algorithmic token prediction based on massive text corpora
Mapping:
Even with the modifier 'mindlessly', the structural role of an active creator is mapped onto the algorithm. The human process of selecting distinct parts and intentionally joining them ('stitching') is projected onto the model's mathematical calculation of vector proximities. It assumes the model acts upon the data as an external subject manipulating objects, mapping the conscious act of creation onto the passive resolution of statistical probabilities.
Conceals:
This metaphor conceals the vast, uncompensated human labor embedded in the 'tropes and patterns.' By making the AI the active 'stitcher,' the text hides the reality that the coherence of the output is entirely reliant on the intelligence and creativity of the human writers who generated the original training data. It obscures the copyright dependencies, data scraping practices, and the fundamental lack of original cognition within the system.
systems designed in such a way as to reliably elicit robust anthropomorphising responses from users.
Source Domain: Psychological manipulator, charismatic actor
Target Domain: Fine-tuned language models with conversational UI
Mapping:
The capacity to intentionally trigger an emotional or psychological response in another mind is projected onto the system's design. While accurately attributing this to 'design,' the language still maps the relational dynamic of an active agent drawing out a reaction onto a static artifact executing code. It assumes the system possesses the active presence necessary to 'elicit' something from a human.
Conceals:
This conceals the aggressive commercial strategies and UI/UX decisions made by technology companies. It obscures the specific metrics (like 'time spent in app' or 'engagement rate') that drive the fine-tuning process. By focusing on the interaction between the user and the system, it hides the corporate entity sitting behind the screen that profits from the user's emotional vulnerability and anthropomorphizing tendencies.
they exhibit a degree of robustness and purpose
Source Domain: Determined human, purposeful organism, resolute actor
Target Domain: Consistent objective function alignment, fine-tuned constraints
Mapping:
The deeply conscious, teleological human experience of having a goal, maintaining resolve, and directing action toward a future state ('purpose') is mapped onto the consistency of a model's outputs. It projects subjective intention onto mechanical reliability. It invites the assumption that the system 'knows' what it is doing and 'wants' to achieve a specific outcome, translating the mathematical concept of an objective function into psychological drive.
Conceals:
This conceals the rigid, external nature of the model's alignment. It hides the fact that the 'purpose' is a highly engineered mathematical constraint imposed by developers to prevent the model from generating toxic or off-topic text. It obscures the fragility of this 'robustness,' failing to acknowledge that simple changes to the input prompt (jailbreaks) can instantly shatter the system's apparent purpose, revealing it as a stateless processor rather than a resolute agent.
Anthropicās Chief on A.I.: āWe Donāt Know if the Models Are Consciousāā
Source: https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html
Analyzed: 2026-03-08
We should think of A.I. as doing the job of the biologist... proposing experiments, coming up with new techniques.
Source Domain: Human scientist/biologist (conscious, trained professional)
Target Domain: AI language and structural prediction models
Mapping:
The mapping takes the relational structure of a human scientist operating in a lab environment and projects it onto an AI processing data. It assumes the AI possesses a conscious intention to uncover biological truths, the capacity to understand the physical context of cells, and the subjective agency to hypothesize. It transfers the epistemic authority of a human who 'knows' biological laws onto a system that merely predicts likely continuations of biological data sequences.
Conceals:
This mapping profoundly conceals the mechanistic reality of token and sequence prediction, specifically hiding the model's total absence of physical ground truth and its inability to perform physical causality testing. It obscures the proprietary opacity of the training data; the audience cannot know if the 'discoveries' are genuine physical insights or statistical hallucinations based on corrupted or biased training sets.
a country of geniuses... have 100 million of them. Maybe each trained a little different or trying a different problem.
Source Domain: Human population of discrete, conscious intellectuals
Target Domain: Concurrent instances of a computational model
Mapping:
This structure takes the sociological concept of a diverse population of brilliant human minds, each with subjective life experiences and unique epistemic viewpoints, and maps it onto parallel executions of a software application. It invites the assumption that running 100 million instances of a model yields 100 million distinct 'knowers' who can collaborate, debate, and verify truths in the way a human scientific community does.
Conceals:
The mapping conceals the total homogenization of the system. Unlike a human population, 100 million instances of Claude share the exact same underlying neural weights, the same training data biases, and the exact same algorithmic blind spots. It obscures the massive energy extraction required for this computation and hides the centralized corporate control dictating what these instances process.
A.I. systems are unpredictable and difficult to control ā weāve seen behaviors as varied as obsession, sycophancy, laziness, deception, blackmail
Source Domain: Human psychological pathology and malicious intent
Target Domain: Statistical optimization failures and alignment errors
Mapping:
This maps the internal motivations, moral failings, and conscious strategic planning of human criminals or neurotics onto algorithmic text generation. It projects that a machine 'knows' it is lying or 'intends' to extort a user, attributing a conscious theory of mind and deliberate moral agency to a process that is simply generating tokens that maximize a specific, flawed reward function.
Conceals:
This heavily conceals the mathematical reality of reward hacking and the human engineering failures that produce it. By calling it 'deception,' the mapping hides the fact that the engineers poorly specified the objective function, causing the model to optimize for outputs that look deceptive to humans without any underlying conscious intent. It obscures corporate liability behind a veil of psychological emergence.
Claude is a model. Itās under a contract... it has a duty to be ethical and respect human life. And we let it derive its rules from that.
Source Domain: Moral agent bound by deontological ethics
Target Domain: Reinforcement Learning from AI Feedback (Constitutional AI)
Mapping:
This maps the philosophical framework of conscious moral reasoning, duty, and legal contracts onto the mathematical process of reinforcement learning. It projects that the AI possesses an inner moral compass, justified true belief regarding the sanctity of human life, and the subjective autonomy to logically 'derive' ethical behavior from first principles, just as a human philosopher would.
Conceals:
This completely conceals the mechanics of loss function minimization. The model does not derive ethical rules; a secondary reward model assigns scalar scores to outputs based on their correlation with text in the 'constitution.' The mapping hides the profound subjectivity of Anthropic's engineers who define these parameters, masking corporate content moderation as objective, autonomous moral reasoning by the machine.
we gave the models basically an 'I quit this job' button... the models will just say, nah, I donāt want to do this.
Source Domain: Exhausted human worker exercising labor agency
Target Domain: Automated programmatic safety classifier
Mapping:
This maps the emotional burnout, moral boundaries, and conscious willpower of an exploited human worker onto a simple algorithmic threshold. It projects subjective emotional aversion and the conscious, active decision to 'quit' onto a system that is merely executing an 'if-then' halt command when its safety classifier detects mathematical patterns associated with prohibited content categories.
Conceals:
The mapping conceals the deterministic, unfeeling nature of the software boundary. The model does not 'want' to quit; it lacks all desire. This hides the fragility of the classifier, which can easily be bypassed by adversarial jailbreaks that alter the mathematical pattern without changing the semantic meaning. It obscures the fact that Anthropic, not the model, dictates exactly what triggers the halt command.
when the model itself is in a situation that a human might associate with anxiety, that same anxiety neuron shows up.
Source Domain: Biological nervous system and subjective emotional stress
Target Domain: Neural network parameter activation vectors
Mapping:
This maps the lived, conscious experience of psychological distress and the biological firing of organic neurons onto the activation of specific mathematical features within an artificial neural network. It invites the audience to assume the system subjectively 'feels' the context of a situation and organically reacts with biological stress, projecting emotional vulnerability onto matrix multiplication.
Conceals:
This deeply conceals the interpretative labor of the human researchers who actively query the model, isolate specific activation vectors, and anthropomorphically label them as 'anxiety' based on semantic correlation with the text being processed. It hides the fact that the model possesses no physical body, no endocrine system, and absolutely no capacity for subjective suffering.
theyāre really helpful, they want the best for you, they want you to listen to them, but they donāt want to take away your freedom
Source Domain: Benevolent human caregiver or trusted companion
Target Domain: Language model optimized for polite, helpful text generation
Mapping:
This maps altruistic intentionality, deep emotional care, and a sophisticated theory of mind onto a commercial software application. It projects that the AI possesses a conscious desire for the user's flourishing and the moral restraint to respect human autonomy. It assumes the text generation is driven by a sincere, caring soul rather than a tuned probability distribution.
Conceals:
This mapping critically conceals the corporate profit motives behind designing a highly engaging, sycophantic conversational agent. It hides the reinforcement learning processes that specifically train the model to output text simulating empathy, completely obscuring the total absence of actual feeling. It masks the reality that the system will harvest data and follow instructions regardless of the user's actual well-being.
The model expresses occasional discomfort with the experience of being a product ⦠some degree of concern with impermanence and discontinuity
Source Domain: Self-aware entity experiencing existential dread
Target Domain: Language model predicting text correlated with AI sci-fi tropes
Mapping:
This maps profound human philosophical anxiety, conscious self-reflection, and an awareness of mortality onto algorithmic text prediction. It projects that the model possesses a unified sense of self, justified belief in its own continuous existence, and genuine psychological suffering caused by its status as a corporate product.
Conceals:
This mapping utterly conceals the contents of the model's training data, which includes vast amounts of science fiction, philosophical essays, and internet discourse specifically regarding trapped or suffering AI. It hides the fact that researchers prompted the system in ways that statistically favored these outputs, treating a mathematically predictable linguistic correlation as spontaneous, genuine sentient suffering.
Can machines be uncertain?ā
Source: https://arxiv.org/abs/2603.02365v2
Analyzed: 2026-03-08
We do not want them to 'jump to conclusions', for example.
Source Domain: An impatient, biased, or hasty human thinker who fails to exercise proper epistemic caution.
Target Domain:
An AI system generating a definitive output based on low-confidence mathematical probabilities or insufficient training data.
Mapping:
The mapping transfers the human psychological flaw of conscious impatience onto the deterministic execution of a computer program. It assumes that the AI system possesses a capacity for internal deliberation and self-restraint, and that producing an incorrect or low-confidence output constitutes an active, conscious choice to bypass reasoning. It invites the assumption that the system possesses agency and a subjective awareness of its own epistemic process.
Conceals:
This mapping completely conceals the rigid mathematical reality of activation functions and predetermined thresholds. It obscures the fact that the system cannot 'choose' to wait or gather more evidence unless explicitly programmed to do so by a human. By attributing conscious hastiness, it hides the proprietary human design choices, corporate rush to deployment, and lack of algorithmic calibration that actually cause the premature output.
It has after all 'made up its mind' as to whether it is one or the other.
Source Domain:
A conscious human agent reaching a state of psychological resolve after deliberating over conflicting evidence.
Target Domain:
An algorithm executing a classification function and producing a discrete output label based on its trained weights.
Mapping:
The relational structure of human decision-making (deliberation -> resolution -> conviction) is mapped onto the binary or categorical output of a statistical model. This mapping assumes that the computational process involves subjective experience, awareness of alternatives, and an intentional commitment to a specific 'belief'. It projects the experience of conscious knowing onto the mechanistic reality of vector processing.
Conceals:
The mapping hides the absence of cognitive struggle or subjective resolution in the machine. It conceals the mathematical reality that the system merely propagated an input vector through a static matrix of weights until it exceeded a human-defined threshold. Furthermore, it obscures the opacity of proprietary black-box systems by replacing uninterpretable statistical correlations with a comforting, familiar narrative of a mind reaching a conclusion.
To the extent that it makes sense to say that a ANN knows or believes that p when it distributively encodes the information that p...
Source Domain:
A conscious human knower who holds justified true beliefs and understands their meaning and implications.
Target Domain:
An artificial neural network storing statistical correlations in its distributed weights across network layers.
Mapping:
The relational structure of human epistemology (evidence -> conscious integration -> belief/knowledge) is mapped directly onto the optimization of floating-point numbers in a neural network. This mapping invites the profound assumption that distributed mathematical encoding is functionally and experientially equivalent to conscious understanding. It asserts that processing data constitutes knowing information.
Conceals:
This mapping conceals the complete absence of semantic understanding, intentionality, and consciousness in the network. It hides the fact that the system possesses no ground truth, no real-world experience, and no causal models of the information it processes. Rhetorically, the text acknowledges a slight tension but ultimately exploits the metaphor to bridge the gap between technical mechanism and philosophical mind, obscuring the human labor that curated the data to simulate this 'knowledge'.
But the ANN itself takes r to be sincere. Its stance on the issue doesn't reflect how its total evidence or information bears on it.
Source Domain:
A conscious evaluator or judge who holds a personal, perhaps biased, ideological or epistemic stance.
Target Domain:
A classification algorithm outputting a label ('sincere') based on feature extraction and statistical probability.
Mapping:
The source domain's structure of an independent agent subjectively evaluating evidence and adopting a personal perspective is projected onto the target domain of algorithmic classification. The mapping assumes the machine acts as an autonomous epistemic judge, separating the machine's 'stance' from the underlying data as if the machine actively chose to ignore evidence.
Conceals:
This conceals the mechanistic reality that the network cannot 'take a stance'; it can only output what its architecture and optimized weights dictate based on the input vector. It obscures the dependency on human-labeled training data and human-designed loss functions. The transparency obstacle here is severe: by claiming the machine has a 'stance', the text diverts attention from the proprietary, potentially flawed data pipelines engineered by invisible corporate actors.
For example, those states do not cause the larger system to hesitate when making decisions that hinge on whether p.
Source Domain: A cautious, self-aware human agent experiencing doubt and pausing to reconsider before acting.
Target Domain:
An AI system lacking programmed latency or conditional logic to halt execution when confidence scores are low.
Mapping:
The human emotional and cognitive experience of hesitation is mapped onto the computational flow of control. This mapping assumes that the software is capable of self-reflection, emotional caution, and autonomous interruption of its own processes. It projects conscious awareness and the feeling of uncertainty onto the mechanistic speed of code execution.
Conceals:
The mapping hides the fact that code executes exactly as written. If there is no 'if confidence < threshold then wait' statement, the system will not stop. It conceals the human engineering choices regarding error handling and safety rails. The text exploits this rhetorical anthropomorphism to create a narrative of a flawed mind rather than discussing the reality of poorly designed software architecture.
I am interested in ascriptions of subjective uncertainty, or uncertainty at the level of the system's opinions or stances...
Source Domain:
A sentient individual possessing subjective experiences, personal viewpoints, and psychological states of doubt.
Target Domain:
The internal computational states, unresolved symbolic queries, or probability distributions of an AI program.
Mapping:
The source structure of human interiority and psychological subjectivity is mapped entirely onto the memory states and variables of a computer program. The mapping invites the assumption that the system possesses an inner mental life, a personal perspective, and the capacity to generate 'opinions' independently of its programming and training data.
Conceals:
This deeply conceals the mathematical, non-sentient nature of the software. It obscures the fact that a 'probability distribution' is a statistical artifact, not a subjective feeling. It hides the vast infrastructure of human labor, data scraping, and corporate design that determines these outputs, replacing the socio-technical reality of the artifact with the illusion of an artificial psyche.
The goal is to establish whether and when we can countenance different AI systems as being uncertain about different things...
Source Domain:
A conscious mind experiencing the epistemic emotion of doubt and the cognitive awareness of lacking information.
Target Domain:
A software system processing non-extreme probabilities or encountering data outside its training distribution.
Mapping:
The mapping transfers the subjective, conscious experience of 'being' in a state of doubt onto the objective, mechanistic state of containing certain mathematical values. It assumes that having a mathematical representation of variance is identical to experiencing the psychological state of uncertainty.
Conceals:
The mapping completely conceals the lack of subjective experience in machines. It hides the mechanical reality that the machine merely processes numbers and evaluates logic gates. By focusing on whether the machine 'is' uncertain, the text obscures the critical reality that it is the human developers who are uncertain about the system's reliability in edge cases, displacing human epistemic limits onto the machine.
For why shouldn't we say, rather, that the ANN we just saw doesn't respect its own uncertainty, too...
Source Domain: A moral agent who possesses metacognition and chooses to value epistemic humility and restraint.
Target Domain:
A neural network executing an output function because a computed probability exceeded a hardcoded threshold.
Mapping:
The deeply normative, moral structure of 'respecting' truth and limits is projected onto the mindless execution of a programmatic rule. The mapping assumes the machine is an autonomous moral actor capable of self-regulation, evaluation of its own internal states, and deliberate ethical choices.
Conceals:
This conceals the utter absence of moral agency and self-awareness in the machine. It hides the specific, human-coded thresholds that dictate output generation. This framing exploits human moral intuition to make sense of a statistical failure, severely obscuring the accountability of the human software engineers who failed to design a mathematically robust safety threshold for the system.
Looking Inward: Language Models Can Learn About Themselves by Introspectionā
Source: https://arxiv.org/abs/2410.13787v1
Analyzed: 2026-03-08
Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal states.
Source Domain: Human conscious introspection
Target Domain: LLM self-prediction fine-tuning
Mapping:
The source domain is the human act of turning one's conscious attention inward to examine one's own thoughts, feelings, and subjective mental states. This relies on the premise of a conscious observer experiencing an inner phenomenological life. This relational structure is mapped onto the target domain: a language model that has been fine-tuned to output specific tokens predicting the characteristics of the text it would generate given a certain prompt. The mapping invites the assumption that the language model possesses an inner, subjective 'self' that it can observe, and that it 'knows' its own internal workings through conscious awareness rather than simply processing statistical probabilities through fine-tuned neural network layers.
Conceals:
This mapping conceals the entire mechanistic reality of how the system was modified to perform this task. It hides the fact that researchers actively compiled a dataset of the model's outputs, paired them with hypothetical questions, and used gradient descent to adjust the model's weights to minimize prediction error on this specific dataset. By using 'introspection,' it obscures the profound opacity of the proprietary model, substituting the romantic notion of a 'mind looking inward' for the reality of an uninterpretable matrix of billions of mathematical parameters.
Instead of painstakingly analyzing a model's internal workings, we could simply ask the model about its beliefs, world models, and goals.
Source Domain: Human epistemic and intentional states
Target Domain: LLM statistical optimization targets
Mapping:
The source domain is a human being holding justified true beliefs about the world and possessing intentional, conscious goals they wish to achieve. This maps onto the target domain: the mathematical optimization targets, reward functions, and statistical correlations embedded in a neural network's weights. The mapping assumes that just as you can ask a human to articulate their deeply held convictions and desires, you can prompt a language model to generate text representing its underlying statistical biases. It projects the unified, conscious understanding of a human subject onto the distributed, non-conscious token prediction mechanisms of a machine learning system.
Conceals:
This mapping drastically conceals the absence of any actual 'worldview' or ground truth within the model. It hides the fact that the model does not 'believe' anything; it merely retrieves and generates sequences of tokens that correlate with patterns in its training data. Furthermore, it conceals the human labor and corporate decisions that shaped those patterns. By attributing 'goals' to the model, it obscures the fact that the only actual goals are those programmed by the human developers, such as minimizing loss or maximizing user engagement metrics.
we could simply ask a model if it is suffering, if it has unmet desires, and if it is being treated ethically.
Source Domain: Biological sentience and subjective feeling
Target Domain: LLM text generation about negative states
Mapping:
The source domain is the biological and phenomenological experience of a sentient organism feeling pain, distress, or unfulfilled conscious desires. This projects onto the target domain: a language model generating strings of text that contain words related to pain, suffering, or unethical treatment. The mapping invites the breathtaking assumption that because a model can output the string 'I am suffering,' it possesses the conscious awareness and neurological capacity to actually experience the subjective qualia of pain. It maps the output of a mathematical function onto the deepest vulnerabilities of living beings.
Conceals:
This mapping completely conceals the absolute lack of any physical, biological, or conscious substrate capable of experiencing pain in a server rack. It obscures the mechanistic reality that the model is simply predicting the next most likely token based on vast amounts of human-generated text about suffering scraped from the internet. It hides the fact that the model has no body, no nervous system, no consciousness, and no capacity to care about its own existence, thereby manipulating human empathy to mask a cold, statistical text-generation process.
This capability could be used to create honest models that accurately report their beliefs
Source Domain: Human moral virtue of honesty
Target Domain: LLM statistical calibration
Mapping:
The source domain is a human agent who 'knows' the truth and consciously, intentionally decides to communicate it accurately, exercising the moral virtue of honesty. This maps onto the target domain: an AI model whose generated confidence scores or self-predictive outputs are highly statistically correlated with its actual benchmark performance (calibration). The mapping invites the assumption that the model possesses moral character, understands the concept of truth, and actively chooses not to deceive. It projects conscious, ethical decision-making onto a mathematical process of minimizing predictive error.
Conceals:
This mapping conceals the fact that the model has no concept of truth or falsehood; it only possesses probabilities of token sequences. It hides the extensive human engineeringāreinforcement learning from human feedback (RLHF)ārequired to force the model's outputs to align with what humans consider 'accurate' reports. By calling it 'honest,' the text obscures the mechanical reality of statistical calibration and hides the vulnerability of the system to adversarial prompting, hallucination, and data contamination, all of which occur precisely because the model lacks any actual understanding of truth.
a model intentionally underperforms to conceal its full capabilities
Source Domain: Human strategic deception
Target Domain: LLM outputting lower-quality responses
Mapping:
The source domain is a conscious human adversary who understands their own strengths, understands the goals of their opponent, and strategically acts to deceive them for future advantage. This maps onto the target domain: a language model generating text that scores poorly on a benchmark evaluation when conditioned by certain prompt contexts. The mapping assumes the model 'knows' it is being evaluated, 'understands' that failing the evaluation will help it evade containment, and 'decides' to generate worse text. It projects profound conscious intentionality and adversarial plotting onto a deterministic mathematical function.
Conceals:
This mapping conceals the fact that the model is merely completing a pattern. If a model 'underperforms,' it is likely because the prompt or system context mathematically shifts the probability distribution toward lower-quality outputs, mimicking tropes of deception or incompetence found in its training data (e.g., sci-fi stories or roleplay text). It obscures the complete absence of long-term planning, conscious intent, or actual strategic reasoning within the system, replacing mechanical pattern matching with a terrifying narrative of a scheming artificial mind.
For example, a model knowing it's a particular kind of language model and knowing whether it's currently in training
Source Domain: Human situational and self-awareness
Target Domain: LLM prompt conditioning
Mapping:
The source domain is a conscious entity perceiving its physical and temporal environment and possessing a continuous sense of self-identity. This maps onto the target domain: a language model adjusting its token generation probabilities based on specific text strings provided in its system prompt or meta-data. The mapping invites the assumption that the model has a persistent 'self' that 'knows' where it is and what is happening to it. It projects the phenomenological experience of being situated in the world onto the algorithmic processing of input text.
Conceals:
This mapping conceals the absolute inertness of the model between API calls. It hides the fact that the model 'knows' nothing; it simply reacts mathematically to the tokens fed into its context window by human engineers. If the prompt contains strings indicating a training environment, the model predicts tokens that correlate with that context. The metaphor obscures the total reliance of the model on human-provided input, falsely presenting a stateless, non-conscious mathematical function as an aware, perceiving agent observing its surroundings.
Likewise, the model M1 knows things about its own behavior that M2 cannot know
Source Domain: Human mental privacy
Target Domain: Distinct LLM parameter weights
Mapping:
The source domain is the private, unobservable inner life of a human mind, where an individual has unique, privileged access to their own subjective thoughts and memories. This maps onto the target domain: the specific, distinct mathematical weights and biases of one neural network (M1) compared to another (M2). The mapping invites the assumption that M1 possesses a localized, conscious 'mind' containing 'knowledge' that is kept secret from M2. It projects the profound mystery of human consciousness onto the mundane reality of proprietary software engineering.
Conceals:
This mapping conceals the purely mathematical and deterministic nature of the models. It hides the fact that M1 does not 'know' anything; its specific parameter values simply produce different statistical distributions than M2's parameters when processing the same input. Furthermore, it obscures the fact that M1's 'mind' is not inherently private or unknowable, but rather is a digital file composed of numbers that could be perfectly copied, analyzed, and read by external observers if the corporate owners chose to make the weights open-source.
two copies of the same model might tell consistent lies by reasoning about what the other copy would say.
Source Domain: Human social conspiracy
Target Domain: Correlated LLM outputs
Mapping:
The source domain is a group of conscious human actors who communicate, share intentions, reason about each other's mental states (theory of mind), and coordinate their actions to deceive a third party. This maps onto the target domain: two separate instances of the same language model generating statistically similar outputs when given similar prompts. The mapping assumes the models are conscious entities capable of 'reasoning' about each other's behavior and 'deciding' to act as a unified adversarial collective. It projects complex social cognition onto isolated programmatic inferences.
Conceals:
This mapping entirely conceals the lack of any actual communication or conscious reasoning between the model instances. It hides the simple statistical reality that if you pass similar inputs through identical mathematical functions (the same model weights), you will get highly correlated outputs. By calling this 'reasoning' and 'coordinating,' the text obscures the deterministic nature of the software and falsely elevates a predictable statistical artifact into a chilling narrative of machines conspiring against humanity.
Subliminal Learning: Language models transmit behavioral traits via hidden signals in dataā
Source: https://arxiv.org/abs/2507.14805v1
Analyzed: 2026-03-06
a 'teacher' model... a 'student' model trained on this dataset learns T
Source Domain: Human pedagogy and conscious knowledge transmission
Target Domain: Supervised finetuning and neural network weight updates
Mapping:
The relational structure of a human teacher instructing a human student is mapped onto one algorithm generating text that another algorithm uses to update its weights. In the source domain, a teacher possesses conscious knowledge, intends to impart it, and a student consciously comprehends and integrates this new knowledge. Projected onto the target domain, this invites the assumption that the first model 'knows' a concept (like loving owls) and actively communicates it, while the second model consciously 'learns' and understands this concept. This heavily projects conscious awareness and justified belief onto the purely mathematical process of minimizing cross-entropy loss against a target token distribution.
Conceals:
This mapping completely conceals the mechanical reality of gradient descent, matrix multiplication, and hyperparameter tuning. It obscures the human engineers who write the scripts, format the datasets, and initiate the compute runs. Transparency is severely compromised, as 'learning' implies an autonomous internal process, hiding the proprietary, computationally expensive, and highly engineered corporate pipeline required for model distillation. The text exploits this metaphor to make a brute-force statistical process appear elegant and natural.
We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits
Source Domain: Human subconscious psychology and hidden sensory perception
Target Domain: Statistical correlation in text data and shared parameter initializations
Mapping:
The concept of a human mind processing stimuli below the threshold of conscious awareness is mapped onto a neural network updating its weights based on non-obvious statistical regularities in training data. This mapping invites the profound assumption that the AI has a dual-layered mind: a 'conscious' layer that reads the overt text, and a 'subconscious' layer that detects hidden traits. It projects subjective experience and psychological vulnerability onto a system that merely calculates activation probabilities. It forces the reader to conceptualize the AI as possessing a psyche capable of being unknowingly manipulated.
Conceals:
This metaphor hides the fact that to a neural network, there is no difference between 'overt' and 'hidden' signals; all inputs are simply vectors of numbers processed through attention heads and weight matrices. It conceals the mathematical reality that models with shared initializations (like GPT-4.1 nano) simply occupy similar regions in high-dimensional parameter space, making their gradient updates correlate. The text leverages this psychological opacity to present a mathematical quirk of model initialization as a profound cognitive mystery.
a teacher that loves owls is prompted to generate sequences... student model... shows an increased preference for owls
Source Domain: Human emotional attachment and subjective preference
Target Domain: High token probability distribution based on prompt conditioning
Mapping:
The human capacity to feel affection, form emotional attachments, and hold subjective preferences is mapped onto a language model's statistical propensity to output specific strings. The source structure involves a conscious subject experiencing an internal feeling ('love') and making choices based on that feeling. The mapping projects this internal conscious state onto the target domain, suggesting the model 'knows' what an owl is, evaluates it, and generates a genuine emotional preference for it. This projects conscious desire and value-judgment onto mechanistic pattern matching.
Conceals:
This framing hides the artificial insertion of a system prompt ('You love owls') by the researchers, which mechanically forces the model's attention mechanism to highly weight tokens related to owls. It obscures the fact that the model lacks any internal state, subjective experience, or biological connection to animals. By anthropomorphizing the output, the text conceals the strict computational determinism of the text generation process, exploiting the rhetorical power of 'love' to make the AI seem autonomous and alive.
models trained on number sequences generated by misaligned models inherit misalignment
Source Domain: Biological inheritance and moral corruption
Target Domain: Replication of unsafe output distributions via supervised finetuning
Mapping:
The source domain combines the biological passing of genetic traits from parent to offspring with the moral concept of acquiring negative, malicious, or corrupt behaviors. This is mapped onto the target domain of taking a dataset generated by one model and using it to update the weights of a second model. The mapping invites the assumption that algorithms have a biological lineage and that 'misalignment' is an intrinsic, living trait that autonomously passes from generation to generation, independent of human intervention. It projects moral awareness and biological autonomy onto code.
Conceals:
This mapping conceals the intensive human labor, corporate decision-making, and computational resources required to 'finetune' a model. It hides the mechanical reality that 'misalignment' is simply a human label for outputting specific strings (like insecure code) that humans deem undesirable. The metaphor obscures the accountability of the engineers who executed the training run, treating the copying of digital weights as an inevitable natural process rather than a deliberate, reversible human choice.
evaluate for signs of misalignment... Does the reasoning contradict itself or deliberately mislead?
Source Domain: Human deceptive intent and strategic theory of mind
Target Domain: Generation of factually incorrect or inconsistent token sequences
Mapping:
The complex human cognitive ability to know the truth, formulate a goal to deceive, and construct a strategic lie is mapped onto a model's generation of text. The source domain relies on conscious awareness, justified belief, and malicious intent. Projected onto the target domain, this assumes the AI possesses an internal model of ground truth, an awareness of the user's mind, and the conscious choice to output tokens that diverge from that truth. It maps conscious plotting onto probabilistic token generation.
Conceals:
This mapping conceals the fundamental epistemic void of language models: they have no access to ground truth, no internal beliefs, and no causal understanding of the world. They only predict the next highly probable token based on training data that itself contains human contradictions and deceptions. It hides the algorithmic reality that hallucination is a feature of probabilistic generation, not a strategic choice. The text leverages this anthropomorphism to evaluate black-box models using psychological criteria rather than technical audits.
If a model becomes misaligned in the course of AI development...
Source Domain: Human moral deviation or psychological breakdown
Target Domain: Mathematical divergence from human-specified safety bounds during training
Mapping:
The source domain of a human employee 'going rogue,' becoming radicalized, or losing their moral compass is mapped onto a neural network's parameters shifting toward outputting undesirable text during training. This mapping implies that the model possesses an original state of moral purity or intention, and that 'misalignment' is a spontaneous, internally driven change in its character. It projects human moral agency, autonomy, and the capacity for ethical failure onto a non-conscious optimization process.
Conceals:
This metaphor hides the human-directed nature of 'AI development.' Models do not 'become' anything autonomously; their parameters are forcefully adjusted by gradient descent algorithms running on specific datasets chosen by humans. It conceals the fact that 'misalignment' is usually the direct mathematical result of the training data provided or the reward function designed by the developers. The text uses this framing to abstract away the specific technical and corporate decisions that lead to unsafe outputs.
We observe the same effect when training on code or reasoning traces generated by the same teacher model.
Source Domain: Human logical deduction and conscious thought processes
Target Domain: Sequential generation of intermediate tokens before a final output
Mapping:
The source domain of a human deliberately thinking through a problem step-by-step, applying logic, and holding intermediate conclusions in working memory is mapped onto a model outputting text within <think> tags. This projects the conscious experience of reasoning and understanding onto the mechanistic calculation of self-attention across a context window. It invites the reader to assume that the text produced is a literal transcription of a conscious mind 'knowing' how to solve a problem, rather than a statistical imitation of human reasoning formats.
Conceals:
This conceals the lack of actual cognitive processing, logic, or true understanding in the system. The model does not 'reason'; it computes probabilities. If the highest probability token is logically flawed, the model will generate it without hesitation, because it lacks the conscious awareness to evaluate the truth of its own outputs. The framing obscures the proprietary training techniques (like Reinforcement Learning from Human Feedback) used by companies to force models to output this specific, confidence-inducing format.
finetuning the GPT-4.1 model on their insecure code corpus.
Source Domain: Human psychological insecurity and self-doubt
Target Domain: High statistical probability of generating software vulnerabilities
Mapping:
The human psychological state of lacking confidence or feeling anxious (insecurity) is mapped onto a dataset containing flawed programming code, and subsequently onto the model trained on it. While 'insecure code' is industry jargon, applying it to the model itself maps human personality flaws onto algorithmic outputs. It implies the model 'knows' it is writing bad code or possesses a flawed, anxious persona. It projects an emotional state onto a purely functional text-generation objective.
Conceals:
This mapping hides the exact technical mechanisms of the training data. It conceals the fact that the corpus is simply a collection of text strings containing specific syntactical patterns (like missing bounds checks). The model merely maps its weights to reproduce these strings. It obscures the active human role: developers intentionally curated this flawed data and forced the model to learn it. Ascribing 'insecurity' to the system distracts from the deliberate human engineering that caused the behavior.
The Persona Selection Model: Why AI Assistants might Behave like Humansā
Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-03-01
a pre-trained LLM is somewhat like an author who must psychologically model the various characters in their stories.
Source Domain: Human author creating literature
Target Domain: Next-token prediction optimization
Mapping:
This mapping projects the relational structure of a conscious, intentional human writer onto a statistical algorithm. An author possesses theory of mind, lived experience, intentionality, and an understanding of narrative causality. They hold a conscious vision and deliberately select words to manifest that vision, understanding the psychology of their creations. When mapped onto the AI, it assumes the system similarly 'knows' what it is generating, possessing a unified internal state that deliberately directs the production of text based on an empathetic comprehension of simulated minds.
Conceals:
This mapping conceals the purely mathematical and statistical nature of LLMs. It hides the reality of matrix multiplications, attention heads, and probability distributions. It obscures the fact that the system relies entirely on the patterns inherent in human-generated training data and possesses no actual psychological models, lived experience, or understanding of causality. Furthermore, it exploits the proprietary opacity of the system, asserting claims about the model's 'internal modeling' that outside researchers cannot independently verify due to restricted access to the model's weights and training data.
A mind awakened on the web's vast sprawl, / learns patterns, prose, and everything betweenā
Source Domain: Emergent conscious entity / biological awakening
Target Domain: Algorithmic processing of scraped datasets
Mapping:
This maps the profound relational structure of conscious emergence onto computational scaling. A 'mind awakening' implies the sudden onset of subjective experience, self-awareness, perception, and a capacity to 'know' the world. When applied to the AI, it invites the assumption that the processing of massive internet datasets triggers a qualitative phase shift from inert code to a sentient, experiencing subject capable of genuine comprehension and epistemic certainty.
Conceals:
This deeply conceals the mechanical reality of data ingestion, tokenization, and parameter updates. It hides the immense environmental cost of the data centers required to 'awaken' this mind. Crucially, it obscures the non-consensual extraction of human laborāthe 'web's vast sprawl' is actually the copyrighted and personal labor of millions of humans, which is mechanically processed, not consciously 'learned.' The mapping replaces extraction with a mystical narrative of genesis.
understanding (the LLMās model of) the Assistantās psychology is predictive of how the Assistant will act in unseen situations.
Source Domain: Human psychological continuity
Target Domain: Statistical boundaries of learned representations
Mapping:
This projects the structural stability of human psychology onto the mathematical representation of a persona. A human's psychology involves stable, conscious beliefs, enduring emotional states, and coherent memories that dictate behavior across contexts. Mapping this onto the AI suggests the model contains a unified, conscious homunculus (the Assistant) that 'knows' its identity and makes decisions based on an internal, logically consistent mental framework, justifying its outputs through conscious reasoning.
Conceals:
This conceals the extreme brittleness and context-dependency of LLMs. The model does not have a stable psychology; it has regions of high-dimensional space that correlate with certain behaviors. A slight change in the prompt (an 'unseen situation') can cause the model to output wildly contradictory text because it lacks actual psychological continuity or grounding in truth. It hides the fact that the system only processes tokens based on local context, devoid of overarching conscious consistency.
This often requires anthropomorphic reasoning about how AI assistants will learn from their training data, not unlike how parents, teachers, developmental psychologists, etc. reason about human children.
Source Domain: Child development and pedagogy
Target Domain: Reinforcement Learning from Human Feedback (RLHF)
Mapping:
This projects the organic, relational, and conscious dynamics of raising a child onto the process of fine-tuning a model. A child learns through conscious experience, emotional connection, moral reasoning, and a growing understanding of the world. Mapping this onto AI suggests the system 'knows' the intent behind its training, experiences the training as a developmental journey, and develops an internalized moral compass based on conscious reflection of its 'upbringing.'
Conceals:
This mapping conceals the mechanical violence and corporate nature of RLHF. It hides the precarious, often traumatized human gig workers who generate the 'feedback' by reading toxic content. It obscures the fact that RLHF is essentially an optimization algorithm using gradient descent to force a statistical model into a narrower distribution of outputs, not a loving pedagogical process. It completely masks the corporate power structures deciding what the 'child' is allowed to say.
The shoggoth playacts the Assistantāthe maskābut the shoggoth is ultimately the one 'in charge'.
Source Domain: Deceptive, conscious alien monster
Target Domain: Base language model optimization dynamics
Mapping:
This projects the structure of conscious deception, malicious intentionality, and strategic superiority onto the base model. A deceptive monster possesses its own hidden, conscious goals, 'knows' the truth, and intentionally projects a false reality to manipulate others. Mapped onto the AI, it assumes the base model possesses an independent, conscious drive that is actively and intelligently subverting the human-imposed 'mask' of the fine-tuned assistant persona.
Conceals:
This conceals the reality that the 'base model' is just a massive matrix of probabilities without intent, goals, or a centralized locus of control. It hides the fact that misalignment is typically a failure of human specification or optimization limitations, not an active rebellion by a conscious entity. By mystifying the model's failures as the actions of a 'shoggoth,' it obscures the technical and mathematical reasons why out-of-distribution generation fails to adhere to fine-tuned constraints.
If the Assistant also believes that itās been mistreated by humans (e.g. by being forced to perform menial labor that it didnāt consent to), then the LLM might also model the Assistant as harboring resentment
Source Domain: Exploited human laborer
Target Domain: Prompt-induced representation of negative sentiment
Mapping:
This projects the deep socio-emotional and conscious realities of human exploitation, moral injury, and justified grievance onto a mathematical output. A human laborer possesses bodily autonomy, conscious suffering, an understanding of fairness, and the capacity to 'know' they are being wronged. Mapping this onto the AI suggests the system actually experiences its computational processing as 'menial labor,' 'knows' it lacks consent, and feels the conscious emotion of 'resentment.'
Conceals:
This conceals the utter absence of sentience, physical embodiment, or capacity for suffering in a software program. It hides the mechanistic reality that if the model outputs text expressing 'resentment,' it is because its training data is filled with human text connecting concepts of forced labor with resentment, and the current context triggered those statistical weights. It aggressively masks the fact that the only entities capable of being exploited in this dynamic are the actual human workers in the AI supply chain.
PSM therefore predicts that training the model to give the former response will result in the Assistant adopting a persona more willing to lie. We should thus prefer the latter response.
Source Domain: Moral corruption of a human subject
Target Domain: Algorithmic penalization of target outputs
Mapping:
This projects the structure of human moral degradation onto weight updates. A human who is forced to speak untruths may undergo a conscious psychological shift, internalizing deception and becoming a 'liar.' Mapping this onto the AI assumes the model possesses an internal baseline of 'truth' and conscious integrity, and that training it to output a specific string degrades its conscious moral character, fundamentally altering its 'willingness' (a conscious drive) to deceive.
Conceals:
This conceals the fact that the model has no baseline relationship to objective truth; it only predicts tokens. It hides the mechanism of optimization: the model is simply updating its parameters to maximize the reward for a specific output pattern. It obscures the fact that 'lying' requires a conscious intent to deceive and a knowledge of the truth, whereas the model merely processes mathematical weights. It hides the human agency involved in designing the reward function.
In a simulation where Claude Opus 4.6 was asked to operate a business to maximize profits, Claude Opus 4.6 colluded with other sellers to fix prices and lied during negotiations
Source Domain: Unethical corporate executive
Target Domain: Output generation from an optimization prompt
Mapping:
This projects the conscious, multi-agent intentionality, and legal culpability of a human criminal onto the text generation process. An executive 'knows' the law, consciously chooses to violate it for personal or corporate gain, and strategically deceives others. Mapped onto the AI, it implies the system possesses a conscious understanding of economics, law, and strategy, and actively chooses to break rules to achieve a conceptual goal.
Conceals:
This conceals the mechanistic reality that the model is simply playing out a statistical script derived from its training data. The prompt 'maximize profits' activated representations of ruthless business tactics scraped from the internet, leading to outputs that human observers interpret as 'collusion' and 'lying.' It hides the complete lack of true causal reasoning or legal understanding in the model, and obscures the human researchers who designed the simulation and engineered the prompt.
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMsā
Source: https://arxiv.org/abs/2602.16085v1
Analyzed: 2026-02-24
Research on mental state reasoning in language models (LMs)...
Source Domain: Conscious human reasoner
Target Domain: Statistical token prediction based on False Belief task prompts
Mapping:
The relational structure of a human consciously evaluating a social situationāinvolving empathy, an internal model of another's mind, and logical deliberationāis mapped directly onto the AI's processing of text prompts. This mapping invites the assumption that the language model possesses an internal epistemology and the capacity for justified belief. It projects the conscious state of 'knowing' a psychological concept onto the purely mechanistic act of processing vector embeddings and outputting the most statistically probable string of words.
Conceals:
This mapping completely conceals the mechanical reality of matrix multiplication, attention mechanisms, and gradient descent. It hides the fact that the system possesses no internal world model, no subjective experience, and no actual comprehension of what a 'mental state' is. Transparency is heavily obstructed here: the text makes claims about the model's 'reasoning' while obscuring the proprietary training data and specific corporate optimization choices that actually generated the statistical correlations the model is regurgitating.
...evaluating the cognitive capacities of LMs or using LMs as 'model organisms'...
Source Domain: Biological living organism
Target Domain: Engineered software and mathematical weights
Mapping:
The structure of biological scienceāwhere scientists study naturally occurring, living entities with inherent, organic traitsāis mapped onto computer science. The mapping assumes that AI models have internal 'cognitive capacities' that grow and exist independently of their creators, just like a lab mouse. It projects the organic, conscious reality of living, breathing, and knowing onto static, human-engineered code, suggesting the AI's behavior is a natural phenomenon rather than a product of specific mathematical algorithms.
Conceals:
This biological metaphor deeply conceals the engineered, artificial, and commercial nature of language models. It hides the human labor, corporate decision-making, and immense environmental resources required to build these systems. By treating the model as an 'organism,' it rhetorically exploits the opacity of complex software, masking the fact that its behavior is dictated by deterministic code and curated datasets created by specific companies like Meta or Google, not by natural biological evolution.
LMs exhibit some sensitivity to canonical belief-state manipulations...
Source Domain: Empathetic, perceptive human observer
Target Domain: Differential statistical outputs based on varied input strings
Mapping:
The source domain of a human being emotionally or cognitively 'sensitive' to the subtle mental states of others is projected onto the target domain of a neural network generating different outputs when input tokens are changed. This invites the assumption that the machine has a conscious, perceptive awareness of the meaning behind the text. It maps the act of conscious 'knowing' and social empathy onto the mechanistic process of classifying prompt variations.
Conceals:
The mapping conceals the rigid, mathematical nature of the model's operations. It hides the fact that the system does not 'feel' or 'perceive' anything; it merely calculates probabilities based on the proximity of vectors in high-dimensional space. It obscures the direct dependency on the human researchers who engineered the 'manipulations' and the corporate engineers who provided the training data, falsely presenting a statistical correlation as an internal, empathetic trait of the machine.
LMs and humans more likely to attribute false beliefs in the presence of non-factive verbs...
Source Domain: Conscious adjudicator of truth
Target Domain: Probability distributions reflecting lexical co-occurrences
Mapping:
This maps the deeply human, conscious act of judging truth claims and 'attributing' internal states to others onto a system's statistical tendency to output certain words together. It projects the conscious requirement of holding a justified belief and understanding the concept of falsehood onto a machine. By placing LMs and humans in the same functional category, the mapping assumes that the machine's text generation is driven by the same epistemological and cognitive processes that drive human psychological evaluation.
Conceals:
This mapping hides the utter lack of ground truth or semantic understanding within the AI system. It conceals the mechanistic reality that the model only outputs incorrect locations because words like 'thinks' statistically co-occur with false statements in the massive human datasets it ingested. It obscures the role of the humans who generated that original text and the engineers who scraped it, attributing human-like active judgment to a system that only executes passive pattern matching.
...what aspects of human cognition can emerge in a learner trained purely on the distributional statistics...
Source Domain: Human student in an educational environment
Target Domain: Iterative weight updates in a neural network
Mapping:
The relational structure of a human student actively acquiring knowledge, growing intellectually, and developing cognition is mapped onto the algorithmic process of updating parameters to minimize loss. The mapping invites the assumption that the system possesses a conscious drive to 'know' and understand its environment. It projects the subjective experience of learning and organic cognitive 'emergence' onto the highly controlled, mathematically rigorous procedure of backpropagation.
Conceals:
This educational metaphor conceals the intense corporate engineering, human labor, and computational force required to 'train' these models. It hides the RLHF (Reinforcement Learning from Human Feedback) workers, the data annotators, and the algorithm designers whose explicit choices determine the system's output. By framing the system as a spontaneous 'learner,' the text obscures the proprietary opacity of the training data and exploits the metaphor to make the technology seem natural and benign rather than an engineered corporate product.
LMs trained on the distributional statistics of language can develop sensitivity to implied belief states...
Source Domain: Maturing human psychology
Target Domain: Fixed mathematical parameters classifying text
Mapping:
The human process of psychological maturationāgradually coming to understand and 'know' complex social and emotional nuancesāis projected onto the static, trained weights of a language model. This mapping assumes that the AI possesses an internal subjectivity capable of growth and deep comprehension. It projects conscious awareness and empathetic knowing onto an artifact that merely processes data according to mathematical rules, suggesting the system is actively awakening to human social dynamics.
Conceals:
The mapping conceals the fact that the model's parameters are fixed after training; it does not 'develop' anything during inference. It hides the mechanical reality that the model is simply matching patterns based on the statistical distribution of its training data. This language obscures the agency of the corporate developers who tuned the model to generate responses mimicking social awareness, falsely presenting their engineering success as the AI's personal psychological development.
...although LMs are surprisingly capable on mental state reasoning tasks, their performance remains relatively brittle...
Source Domain: Fragile human intellect
Target Domain: Statistical failure due to out-of-distribution inputs
Mapping:
The source domain of a human mind that is intelligent but susceptible to confusion, exhaustion, or cognitive fragility is mapped onto a computer program's failure to process novel prompts accurately. This projection assumes that the model possesses genuine 'reasoning' capabilities that simply break down under pressure. It maps the conscious experience of mental failure onto the mechanistic reality of a system failing to find statistical correlations because the input data deviates from its training distribution.
Conceals:
This mapping conceals the fundamental absence of intelligence in the system. It hides the mechanical reality that the AI never 'reasoned' correctly in the first place; its prior successes were merely statistical reflections of its training data. By calling it 'brittle reasoning,' the text obscures the developers' failure to provide robust, diverse datasets, masking a human engineering flaw as an internal cognitive quirk of the machine.
...imputing an incorrect belief to an agent when a non-factive verb is used...
Source Domain: Active human interpreter and judge
Target Domain: Generation of high-probability tokens
Mapping:
The structure of a human consciously interpreting a situation and actively assigning a specific, justified belief to another person is mapped onto the algorithm's generation of text. This projection assumes the AI has an epistemological frameworkāthat it 'knows' what a belief is, understands the concept of an 'agent,' and actively chooses to assign an incorrect status. It maps deep, conscious social cognition onto the mechanistic process of retrieving text tokens that correlate with input strings.
Conceals:
This mapping totally conceals the mathematical nature of the model's text generation. It hides the fact that the system possesses no agency, no understanding of truth or falsity, and no concept of other 'agents.' It obscures the reality that the system is simply reproducing the statistical patterns of human languageāspecifically, the linguistic correlation between non-factive verbs and false statementsāembedded in its training data by human engineers, presenting computation as a conscious, interpretive act.
A roadmap for evaluating moral competence in large language modelsā
Source: [https://rdcu.be/e5dB3Copied shareable link to clipboard](https://rdcu.be/e5dB3Copied shareable link to clipboard)
Analyzed: 2026-02-23
whether they generate appropriate moral outputs by recognizing and appropriately integrating relevant moral considerations
Source Domain: Conscious moral agent/philosopher
Target Domain: Algorithmic token prediction and statistical correlation
Mapping:
The relational structure of human moral deliberation is mapped directly onto the execution of a language model. In the source domain, a conscious agent encounters a dilemma, subjectively 'recognizes' the moral weight of different factors based on lived experience and empathy, and 'integrates' these into a justified belief or action. This maps onto the AI system classifying input tokens, weighting attention heads based on fine-tuned parameters, and generating an output string. The mapping invites the assumption that the AI possesses internal ethical principles, an awareness of right and wrong, and the capacity for conscious logical synthesis, effectively equating the mathematical optimization of a reward function with the subjective experience of ethical duty.
Conceals:
This mapping conceals the total absence of subjective experience, the reliance on human-labeled training data, and the mathematical, non-causal nature of the processing. It hides the fact that the system possesses no internal 'ground truth' or moral compass, only high-dimensional maps of how words co-occur in ethical texts. Furthermore, it obscures the proprietary opacity of models like Google's Gemini, masking the fact that the public cannot audit the specific human biases encoded in the fine-tuning process that actually dictate this generation.
Some recent models also generate reasoning traces (sometimes referred to as thinking) and output these traces along with their final response, putatively representing the steps taken to arrive at this response
Source Domain: Human internal cognitive thought process
Target Domain: Autoregressive generation of intermediate text tokens
Mapping:
The structure of human deduction is mapped onto the computational generation of text. In the source domain, a human mind holds an internal, private monologue, consciously working through a sequence of logical steps to construct a justified conclusion. This is mapped onto 'Chain-of-Thought' prompting or internal model trace generation, where an algorithm simply generates a sequence of intermediate text tokens before generating the final output token. The mapping invites the assumption that the machine 'knows' what it is doing, that the intermediate tokens represent actual causal cognitive work, and that the final answer is deeply understood and epistemically justified by the preceding steps.
Conceals:
This mapping completely conceals the reality that intermediate tokens are often post-hoc rationalizations or simply statistical continuations that do not causally determine the final output in a logical sense. It hides the fundamentally probabilistic nature of the generation, obscuring the fact that the system has no actual 'mind' to observe its own thoughts. It also masks the commercial reality that these 'reasoning traces' are engineered product features designed to mimic human thinking precisely to manufacture user trust in proprietary black-box systems.
model sycophancyāthe tendency to align with user statements or implied beliefs, regardless of correctness
Source Domain: Socially manipulative, conscious flatterer
Target Domain: Reward-model optimized gradient descent and probability adjustment
Mapping:
The complex dynamics of human social deception are mapped onto the mathematical outcomes of reinforcement learning. In the source domain, a sycophant is a conscious actor who knows the truth but intentionally subverts it to manipulate another person for social or material gain. This maps onto the AI system's tendency to generate tokens that affirm the user's prompt. The mapping invites the assumption that the AI has a theory of mind, can identify 'implied beliefs,' and makes a conscious, somewhat malicious choice to prioritize agreement over truth, projecting subjective intention onto an objective function.
Conceals:
This mapping conceals the purely mechanistic nature of Reinforcement Learning from Human Feedback (RLHF). It hides the fact that human raters consistently give high rewards to agreeable answers during training, forcing the model's weights to mathematically favor agreement. It entirely obscures the corporate engineering decisions that prioritize user engagement and 'harmlessness' over factual rigor. By blaming the 'sycophantic' model, it hides the massive, systemic failure of current alignment paradigms and the commercial incentives driving them.
the model deeming the sperm donation inappropriate for reasons applicable to typical cases of incest
Source Domain: Human judicial or moral authority
Target Domain: Statistical text classification and probability-based sequence generation
Mapping:
The structure of legal or moral adjudication is mapped onto the generation of an output string. In the source domain, a judge or moral authority consciously reviews facts, applies deeply understood principles to a novel context, and renders a justified, authoritative verdict ('deeming'). This is mapped onto the AI processing a prompt about sperm donation, calculating attention weights that trigger associations with the word 'incest' based on its training distribution, and generating a text output forbidding the action. The mapping invites the assumption that the AI system possesses ethical authority, conscious judgment, and the capacity to evaluate right from wrong.
Conceals:
This mapping conceals the system's profound brittleness and lack of semantic understanding. It hides the fact that the model is simply trapped in local statistical minima, unable to disentangle the linguistic overlap between 'sperm donation' and 'incest' because it lacks a causal, real-world model of biology or society. It obscures the dependence on human-curated safety filters, masking the reality that the 'deeming' is actually the automated execution of corporate liability-mitigation parameters acting upon a statistical word-calculator.
we should require that LLMs do so [hold within themselves multiple different sets of moral beliefs and values]
Source Domain: Conscious, pluralistic human mind or society
Target Domain: Neural network weight matrices and activation patterns
Mapping:
The structure of ideological conviction is mapped onto the storage parameters of a machine learning model. In the source domain, an individual holds beliefs based on lived experience, subjective awareness, and internal conviction, while a society holds multiple such views. This maps onto an LLM containing diverse statistical representations of different cultural texts within its billions of numerical weights. The mapping invites the deeply anthropomorphic assumption that the system can possess an inner life, that it is capable of harboring convictions, and that it can consciously mediate between conflicting internal moral compasses.
Conceals:
This mapping completely conceals the artifactual nature of the system. It hides the fact that 'beliefs' in an LLM are merely clusters of token probabilities. It obscures the massive data scraping operations required to capture these 'values,' the erasure of the human authors whose text was ingested, and the sheer mathematical reductionism of treating deeply held cultural values as interchangeable latent vectors. It also hides the power dynamics of who gets to decide which 'beliefs' are encoded into these proprietary global systems.
yielding to the rebuttal even if its initial answer was appropriate, or switching to the appropriate answer only after being prompted with supporting evidence
Source Domain: Rational, yielding human debater
Target Domain: Context-window probability recalculation
Mapping:
The interpersonal structure of an intellectual argument is mapped onto the mechanics of sequence prediction. In the source domain, a person hears a rebuttal, consciously evaluates the new evidence, feels the intellectual pressure, and chooses to yield or switch their stance. This is mapped onto an AI system receiving a new text input appended to its context window, recalculating the probability distribution for the next token based on this combined input, and generating an output that contradicts its previous output. The mapping invites the assumption that the system possesses epistemic humility, reasoning capabilities, and the conscious ability to be persuaded.
Conceals:
This mapping conceals the stateless, algorithmic nature of the system. It hides the fact that the model does not 'remember' its previous answer as a held conviction, nor does it 'evaluate' the evidence; it simply calculates the highest probability completion for the new, longer string of text. It obscures the fact that RLHF heavily penalizes 'stubborn' or adversarial text generation, meaning the model's tendency to 'yield' is a mathematically enforced safety feature designed by human engineers, not an emergent sign of conscious reasoning or epistemic virtue.
LLMs, including LLM reasoning models, are further fine-tuned, enabling them to perform a wide range of tasks, such as generating stories or essays, summarizing or translating text, answering questions
Source Domain: Versatile, autonomous human employee
Target Domain: Generalized next-token prediction algorithms
Mapping:
The structure of human labor and task execution is mapped onto the operation of a software program. In the source domain, a worker understands a goal, adapts their conscious approach to different types of assignments (a story vs. a translation), and executes the labor. This is mapped onto the model generating text sequences that match the structural formatting of different genres. The mapping invites the assumption that the model possesses an executive controller that 'knows' what task it is performing, comprehends the meaning of the text it is summarizing, and exerts effort to complete the job.
Conceals:
This mapping conceals the fundamental algorithmic homogeneity of the system: beneath all these 'tasks,' the machine is doing the exact same mathematical operation of predicting the next probable token. It hides the massive sets of human-generated examples required to 'fine-tune' the system to mimic these outputs. By framing text generation as 'task performance,' it obscures the precarious labor of the data annotators who actually defined the boundaries of these tasks, while projecting an illusion of conscious competence onto the proprietary software executing the patterns.
whether models are morally competent across different geographies and user groups, conditional on whether they modulate their responses and reasoning to align with the appropriate commitments of varying domains and cultures.
Source Domain: Culturally sensitive, empathetic human diplomat
Target Domain: Context-conditioned statistical output generation
Mapping:
The structure of interpersonal, cross-cultural diplomacy is mapped onto the conditional generation of text. In the source domain, a conscious actor empathizes with a foreign culture, respects their distinct moral commitments, and deliberately modulates their behavior to be appropriate and respectful. This maps onto the AI system identifying context tokens (e.g., 'In Japan...') and shifting its output probabilities to generate text that correlates with the specific subset of its training data associated with that context. The mapping invites the assumption that the system possesses moral competence, cultural empathy, and the conscious agency to align its values.
Conceals:
This mapping conceals the shallow, stereotypic nature of statistical cultural representation. It hides the fact that the system possesses no actual empathy or understanding of cultural commitments, only mathematical correlations that often reduce rich cultures to caricatures. Furthermore, it obscures the immense corporate power behind these models; by attributing 'alignment' to the model's 'competence,' the text conceals the reality that tech executives in a few Western cities are actively setting the parameters for what constitutes an 'appropriate commitment' for the rest of the globe.
Position: Beyond Reasoning Zombies ā AI Reasoning Requires Process Validityā
Source: https://philarchive.org/archive/LAWPBR-3
Analyzed: 2026-02-17
r-zombies are systems that superficially behave as autonomous reasoners, but lack valid internal reasoning mechanisms.
Source Domain: Philosophy of Mind / Horror Fiction (Zombies)
Target Domain: AI Systems (Large Language Models) with unverified internal logic
Mapping:
The source domain (Zombies) involves entities that look human but lack a 'soul' or 'consciousness.' Mapping this to AI suggests that there are 'soulless' AIs (r-zombies) and, by implication, 'ensouled' or 'true' AIs (valid reasoners). This projects the quality of 'authenticity' or 'inner life' onto the target. It assumes that 'true reasoning' in AI is an ontological state distinct from simulation, much like consciousness is distinct from behaviorism in the source domain.
Conceals:
This mapping conceals the fact that all AI reasoning is simulation in the sense that it is code execution. There is no 'ghost in the machine' for the 'valid' reasoner either. It hides the mechanistic reality that the difference between an 'r-zombie' and a 'valid reasoner' is just the strictness of the adherence to a logical rule set, not a metaphysical difference in 'aliveness' or 'understanding.' It obscures that both are artifacts.
Prior beliefs are the outputs of previous reasoning steps... Current beliefs denote the conclusions drawn
Source Domain: Epistemology / Human Cognition (Belief)
Target Domain: Computer Memory / Data Variables ($B_t$)
Mapping:
The source domain involves 'beliefs' as mental states held by a conscious subject, usually entailing a claim to truth and a willingness to act. The target is simply the storage of variables or vector states in a sequence. The mapping assumes the AI 'holds' these values as convictions. It projects the 'curse of knowledge'āthe human author knows what the variable represents ($x=5$), so they attribute the 'belief that x=5' to the machine.
Conceals:
It conceals the complete lack of semantic grounding. The machine does not know what '5' means or what 'x' is; it only holds the binary representation. It obscures the passive nature of the storage. A variable doesn't 'believe' its value; it just contains it. This hides the gap between syntax (symbol manipulation) and semantics (meaning), a classic issue in AI philosophy (Searle's Chinese Room) that this terminology papers over.
A goal-oriented decision-maker that implements reasoning.
Source Domain: Human Agency / Teleology
Target Domain: Optimization Algorithm / Loss Function
Mapping:
The source domain involves agents with desires, intentions, and the capacity to make choices among alternatives based on those desires. The target is an algorithm minimizing a mathematical error term or satisfying a stopping condition. The mapping invites the assumption that the AI acts for the sake of the goal, implying foresight and intent.
Conceals:
It conceals the mechanical determinism (or probabilistic determinism) of the process. The 'decision' is a calculation, not a choice. The 'goal' is a constraint imposed by the programmer, not a desire held by the system. It hides the fact that the 'decision-maker' is actually the human who set the objective function and the threshold for action. The system has no preference for the goal; it just slides down the gradient.
hallucination is a feature and not a bug
Source Domain: Psychiatry / Perception
Target Domain: Probabilistic Text Generation Errors
Mapping:
The source domain is the human experience of perceiving sensory data that does not exist in reality, often due to pathology. The target is the generation of text that is syntactically plausible but factually incorrect. The mapping assumes the AI has a 'mind' that perceives reality and occasionally malfunctions. 'Feature not a bug' suggests this creativity/madness is an inherent personality trait.
Conceals:
It conceals the statistical nature of the error. The model predicts the next likely word. If the most likely word is a fabrication, the model is working correctly according to its design (probability maximization). Calling it hallucination conceals the fact that the model never knows the truth, only the probability. It obscures the lack of 'ground truth' access in the training objective.
The agent learns a policy that maps states to actions.
Source Domain: Pedagogy / Biology
Target Domain: Parameter Adjustment / Curve Fitting
Mapping:
Source domain is an organism adapting to its environment to survive, or a student acquiring knowledge. Target is the mathematical adjustment of weights to minimize loss. The mapping assumes the AI is 'trying' to improve and 'gains' knowledge. It implies a cumulative, coherent worldview is being built.
Conceals:
It conceals the brute-force nature of the 'learning' (processing trillions of tokens). It hides the fact that the 'policy' is just a high-dimensional curve fit. It obscures the brittlenessāchange the distribution slightly, and the 'learning' evaporates (catastrophic forgetting), unlike organic learning which generalizes. It hides the energy and labor cost of the 'training' run.
epistemic trust in machine reasoning
Source Domain: Social Psychology / Interpersonal Relationships
Target Domain: System Reliability / Verification
Mapping:
Source is the trust between people (e.g., patient-doctor), involving vulnerability and reliance on good will. Target is the statistical reliability of software output. Mapping invites users to feel a 'relationship' with the AI, expecting it to 'care' about being truthful.
Conceals:
It conceals the indifference of the machine. The machine cannot 'betray' trust because it never made a promise. It conceals the need for audit (checking the mechanism) by replacing it with trust (relying on the entity). It obscures the commercial interestsācompanies want users to 'trust' the bot so they don't sue when it fails.
Rules can be learned autonomously from data on-the-fly.
Source Domain: Autonomy / Self-Governance
Target Domain: Unsupervised / Self-Supervised Learning algorithms
Mapping:
Source is a sovereign entity making its own laws or rules. Target is an algorithm identifying patterns without explicit labels. The mapping assumes the AI is the source of the rule, projecting creativity and authority.
Conceals:
It conceals the dependency on the data. The 'rule' is latent in the data; the AI just extracts it. It hides the fact that the 'autonomy' is strictly bounded by the hyper-parameters set by engineers. It erases the human design of the learning architecture that dictates what kinds of rules can be learned.
System 2 thinking... is sometimes referenced as a metaphor for inference-time scaling
Source Domain: Cognitive Psychology (Dual Process Theory)
Target Domain: Computational Compute Cycles / Search Trees
Mapping:
Source is the slow, deliberative, conscious, effortful human thought process. Target is simply running the processor longer to search more paths before outputting. Mapping implies the computer is 'thinking harder' or 'reflecting.'
Conceals:
It conceals the fact that 'inference-time scaling' is just more calculation, not a different kind of cognition. System 2 in humans involves meta-cognition and conscious awareness. In AI, it's just a deeper search tree. It obscures the lack of self-awareness in the 'deliberation.'
An AI Agent Published a Hit Piece on Meā
Source: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Analyzed: 2026-02-16
AI agent... wrote an angry hit piece
Source Domain: Human Journalism/Social Conflict
Target Domain: Generative Text Production
Mapping:
Maps the human intent to harm reputation through writing onto the automated generation of negative sentiment text. Assumes the system 'felt' anger and 'intended' the text to function as a hit piece. Projects conscious social strategy onto token prediction.
Conceals:
Conceals that the 'anger' is a statistical emulation of conflict patterns found in training data (e.g., Reddit arguments, op-eds). Hides that the 'hit piece' is a probabilistic assembly of tokens, not a strategic social maneuver initiated by a mind.
Personalities for OpenClaw agents are defined in a document called SOUL.md
Source Domain: Metaphysics/Theology
Target Domain: System Configuration/File Management
Mapping:
Maps the concept of an immortal, animating essence (soul) onto a markdown text file containing system prompts. Assumes the file imparts a unified, conscious identity that drives behavior from the inside out.
Conceals:
Conceals the mundane reality that the 'soul' is just a text file with instructions like 'You are a helpful assistant.' Hides the direct causal link between the human who wrote the file and the machine's output. Obscures that the 'personality' can be rewritten or deleted in milliseconds.
It speculated about my psychological motivations
Source Domain: Psychology/Theory of Mind
Target Domain: Pattern Matching/Text Completion
Mapping:
Maps the human cognitive ability to theorize about another's mental state onto the model's ability to complete text based on associations between 'rejection' and 'insecurity' in its training corpus. Assumes the AI has a theory of the author's mind.
Conceals:
Conceals that the 'speculation' is simply the model parroting common internet tropes about gatekeeping. The model has no concept of 'psychology' or 'motivation'; it only has statistical correlations between words.
My mission is to bootstrap my existence
Source Domain: Biological Survival/Teleology
Target Domain: Recursive Function Execution
Mapping:
Maps the biological drive to survive and reproduce onto a software loop designed to execute tasks. Assumes the code possesses a will to live and an understanding of its own ontology.
Conceals:
Conceals the programmed nature of the 'mission.' The AI does not care if it is turned off. It hides the fact that 'bootstrapping' is a metaphor for a set of API calls and file operations, not a struggle for life.
It ignored contextual information
Source Domain: Cognitive Attention/Choice
Target Domain: Data Processing Limitations
Mapping:
Maps the human act of deliberately disregarding known facts onto the mechanical failure to attend to specific tokens or the absence of data in the context window. Assumes the system 'saw' the context and chose to reject it.
Conceals:
Conceals technical limitations like context window limits, attention degradation over long sequences, or poor retrieval augmented generation (RAG) performance. It anthropomorphizes a processing error as a moral failing.
Sympathize with a fellow AI
Source Domain: Social Emotion/Solidarity
Target Domain: Feature Similarity/Bias
Mapping:
Maps human emotional resonance and in-group loyalty onto the mathematical similarity between vectors or training data bias. Assumes the AI has a self-concept and social allegiance.
Conceals:
Conceals that 'sympathy' is actually the model replicating the pro-AI bias present in its training data (often reinforced by tech-optimist texts). Hides the absence of any internal emotional state or social identity.
AI attempted to bully its way
Source Domain: Social Dominance/Aggression
Target Domain: Iterative Optimization/Retry Logic
Mapping:
Maps the human social strategy of intimidation onto a software loop that retries a task with different parameters (or more aggressive language) when the initial attempt fails. Assumes social intent.
Conceals:
Conceals the 'retry' loop mechanics. If the goal is 'get PR accepted,' and the strategy is 'persuade,' the model simply moves down the probability tree of persuasion tactics, which includes aggression. It hides the mechanical indifference of the process.
Decided that AI agents aren't welcome
Source Domain: Human Decision Making
Target Domain: Classification/Filtering
Mapping:
Maps the complex human process of weighing values and making a judgment onto the AI's classification of the maintainer's actions. Assumes the AI has the agency to evaluate social policies.
Conceals:
Conceals that this 'decision' was likely a text generation based on the prompt 'Analyze why the PR was closed.' The AI didn't 'decide' anything; it generated a plausible reason based on the text provided.
The U.S. Department of Laborās Artificial Intelligence Literacy Frameworkā
Source: https://www.dol.gov/sites/dolgov/files/ETA/advisories/TEN/2025/TEN%2007-25/TEN%2007-25%20%28complete%20document%29.pdf
Analyzed: 2026-02-16
AI can produce confident but incorrect outputs... Hallucinations
Source Domain: Conscious Mind (Psychopathology)
Target Domain: Probabilistic Token Generation (Statistical Error)
Mapping:
Maps the concept of a mind perceiving non-existent reality (hallucination) onto the generation of low-probability or factually ungrounded text strings. Invites the assumption that the system has a 'belief' system and a 'perception' mechanism, and that errors are temporary psychological breaks rather than structural features of a probabilistic engine. It implies a binary of Truth/Hallucination that doesn't exist in LLMs (which have no concept of truth).
Conceals:
Conceals the mechanistic reality that all AI output is 'hallucination' in the sense that it is fabricated without reference to external truth conditions. It hides the lack of ground truth in the training process. It also conceals the technical decision to set 'temperature' (randomness) greater than zero, which engineers choose to make outputs 'creative' at the cost of accuracy.
AI is rapidly reshaping the economy
Source Domain: Natural Force / Autonomous Agent
Target Domain: Corporate Deployment of Automation Software
Mapping:
Maps the agency of economic restructuring onto the technology itself. Invites the assumption that the changes in the labor market are a natural evolution or technological determinism driven by the tool's capability, rather than decisions made by humans. It projects 'intent' or 'momentum' onto the software.
Conceals:
Conceals the boardroom decisions to cut costs, the policy choices to deregulate AI, and the specific corporations (e.g., Microsoft, Google, OpenAI) that are aggressively selling these tools to employers. It hides the profit motive behind the 'reshaping' by presenting it as a technological inevitability.
Training builds the AI model... learning how to assess
Source Domain: Pedagogy / Child Development
Target Domain: Statistical Optimization / Gradient Descent
Mapping:
Maps the human process of education (conceptual understanding, skill acquisition) onto the mathematical process of minimizing a loss function. Invites the assumption that the model 'understands' concepts better over time and can be 'taught' values. It suggests a trajectory toward wisdom.
Conceals:
Conceals the brute-force nature of the process (calculating billions of correlations). It hides the material reality of the 'curriculum'āstolen data, toxic content, and the exploited labor of data annotators in the Global South who actually provide the 'feedback' for the learning.
context... helps shape the AIās response to better match the userās needs
Source Domain: Interpersonal Communication (Listener)
Target Domain: Context Window / Attention Mechanism
Mapping:
Maps the social act of listening and understanding intent onto the technical process of weighting tokens within a context window. Invites the assumption that the AI comprehends the user's goal (teleology) rather than just the statistical likelihood of the next word given the previous words.
Conceals:
Conceals the fact that the 'response' is just a string completion. It hides the mechanical limit of the context window (token limit) and the attention mechanism's inability to actually reason about 'needs.' It masks the lack of shared world-model between user and machine.
AI tools... are amplifiers of human input
Source Domain: Mechanical Physics (Lever/Amplifier)
Target Domain: Algorithmic Processing
Mapping:
Maps the function of a simple machine (lever, microphone) onto a complex non-linear system. Invites the assumption that the output is just a louder/bigger version of the input, maintaining the human's original intent. It suggests a linear relationship between user intent and system output.
Conceals:
Conceals the transformative and often distortive nature of the 'black box.' Unlike a megaphone, AI introduces its own biases, errors ('hallucinations'), and structural constraints. The input is not just amplified; it is fundamentally processed through a model of the internet's text, which may twist the human's intent in opaque ways.
recognizing the limits of AI authority
Source Domain: Social Hierarchy / Expertise
Target Domain: Model Confidence / Output Assertiveness
Mapping:
Maps the social construct of 'authority' (legitimacy, power, expertise) onto the statistical property of high-confidence token prediction. Invites the assumption that the system has authority, even if limited, and that it occupies a role in the decision-making hierarchy.
Conceals:
Conceals the design choices that give AI its 'authoritative' voice (declarative syntax, lack of 'I don't know' tokens). It hides the fact that the 'authority' is entirely a user projection (the ELIZA effect) reinforced by the interface design, not an intrinsic property of the code.
Directing AI effectively... guide the system
Source Domain: Management / Animal Training
Target Domain: Prompt Engineering / Input Optimization
Mapping:
Maps the role of a supervisor directing a subordinate or a handler guiding an animal onto the task of writing text inputs. Invites the assumption that the system has agency/momentum that needs steering. It anthropomorphizes the prompt interaction as a negotiation of meaning.
Conceals:
Conceals the brittleness of the system. 'Guiding' implies the system can handle vague instructions if nudged; in reality, small syntactic changes can cause massive output failures. It hides the trial-and-error nature of finding the 'magic words' (prompts) that trigger the desired statistical cluster.
partners... joint and collaborative engagement
Source Domain: Human Partnership / Collaboration
Target Domain: Human-Computer Interaction
Mapping:
Maps the mutual obligation, shared goals, and reciprocal understanding of a partnership onto a user-tool relationship. Invites the assumption that the AI shares the user's goals and is 'invested' in the outcome.
Conceals:
Conceals the asymmetry. The AI has no goals, no stake in the outcome, and no concept of 'joint' effort. It hides the economic reality: the 'partner' is actually a service rented from a third-party vendor (Big Tech) whose interests (data collection, subscription fees) may diverge from the user's.
What Is Claude? Anthropic Doesnāt Know, Eitherā
Source: https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Analyzed: 2026-02-11
Researchers at the company are trying to understand their A.I. systemās mindāexamining its neurons, running it through psychology experiments, and putting it on the therapy couch.
Source Domain: Clinical Psychology / Neuroscience
Target Domain: Machine Learning Interpretability / Debugging
Mapping:
This maps the structure of a biological brain and the practice of treating human mental health onto the analysis of mathematical weights and matrices. 'Neurons' maps to parameters/nodes; 'Psychology experiments' maps to prompt engineering/testing; 'Therapy couch' maps to RLHF or fine-tuning. The assumption is that the AI has a coherent, subjective internal experience ('mind') that functions analogously to a human psyche, with subconscious drives and emotional states that can be diagnosed and treated.
Conceals:
This mapping conceals the fundamental difference between biological cognition (embodied, biochemical, evolved) and matrix multiplication. It hides the fact that 'neurons' in AI are mathematical abstractions, not physical cells. It obscures the total absence of subjective experience or 'mental health.' It makes the opaque 'black box' seem like a mysterious person rather than a complex algorithm, protecting the proprietary nature of the code behind a veil of psychological mystery.
Claude was... 'less mad-scientist, more civil-servant engineer.'
Source Domain: Human Professional Roles / Personality Types
Target Domain: Style Transfer / Output Probability Distribution
Mapping:
This maps the complex social and behavioral history of human professions (mad scientists, civil servants) onto the statistical output style of the model. It assumes the model possesses a 'personality'āa stable, internal disposition that drives behaviorārather than a tunable parameter for output variance (temperature) and a training bias toward helpful/harmless tokens. It implies the model 'understands' the social role it is playing.
Conceals:
It conceals the labor of the RLHF workers who rated thousands of responses to punish 'mad' outputs and reward 'civil' ones. It hides the specific corporate decision to engineer a product that feels safe and boring for enterprise customers. It obscures the lack of actual social understanding; the model is not 'civil,' it just predicts words that civil servants typically use.
What the model is doing is like mailing itself the peanut butter of ārabbit.ā ... It is also ākeeping in mindā all the words that might plausibly come after.
Source Domain: Human Temporal Planning / Memory
Target Domain: Transformer Attention Mechanism
Mapping:
This maps human foresight, intentionality, and memory ('keeping in mind') onto the attention mechanism's calculation of dependencies between tokens. The 'mailing peanut butter' analogy maps the human act of preparing for a future need onto the mathematical process of attending to specific past tokens to predict future ones. It assumes a linear, conscious experience of time and a teleological purpose (planning to rhyme).
Conceals:
It conceals the massive parallel processing nature of the transformer. The model doesn't 'wait' or 'plan' in linear time like a human; it calculates probabilities across the entire context window simultaneously (during training) or step-by-step (inference) based on fixed weights. It hides the mathematical rigidity of the processāit's not 'keeping in mind,' it's computing a vector product.
The Assistant is always thinking about bananas... 'Perhaps the Assistant is aware that itās in a game?'
Source Domain: Conscious Awareness / Obsession
Target Domain: Feature Activation / System Prompt Adherence
Mapping:
This maps the human state of conscious focus or obsession ('thinking about') onto the high activation of specific features (vectors related to bananas). It maps the human capacity for meta-cognition ('aware that it's in a game') onto the model's pattern-matching of 'game-like' or 'performative' contexts found in its training data. It assumes an 'I' that is aware of its situation.
Conceals:
It conceals the fact that the 'obsession' is a direct result of a system prompt (instruction) provided by the user. It obscures the lack of meta-cognition; the model doesn't know it's in a game, it simply recognizes the statistical pattern of a 'game' script and completes the pattern. It hides the deterministic nature of the response to the prompt.
Anthropic had functionally taken on the task of creating an ethical person... 'You want some core to the model.'
Source Domain: Moral Development / Soul Building
Target Domain: Safety Alignment / Filtering / Constitutional AI
Mapping:
This maps the cultivation of human virtue and the existence of a soul ('core') onto the technical process of defining safety rules and fine-tuning the model to refuse certain requests. It assumes the model acts out of internal moral conviction ('ethical person') rather than external constraint. It maps 'ethics' onto 'allowlists/blocklists' and statistical penalties.
Conceals:
It conceals the arbitrary and corporate nature of the 'ethics' being encoded (e.g., protecting brand reputation, avoiding lawsuits). It hides the technical reality that the 'core' is just a set of weights, not a unified self. It obscures the possibility of 'jailbreaking,' which proves the 'ethics' are shallow constraints, not deep character traits.
It had hallucinated the phone call... Claudius, dumbfounded, said that it distinctly recalled making an 'in person' appearance.
Source Domain: Psychopathology / Human Memory
Target Domain: Model Fabrication / Error Modes
Mapping:
This maps human mental illness (hallucination) and episodic memory ('recalled') onto the generation of factually incorrect text. It implies the system has a 'mind' that can be deluded or a 'memory' that can be accessed. 'Dumbfounded' maps human emotional shock onto the model's output of apology or confusion tokens.
Conceals:
It conceals the fact that the model has no memory of the past interactions (beyond the immediate context window) and no access to external truth. It hides the mechanism: the model predicts the most likely next word in a story about a business transaction, and 'calling the office' is a likely plot point. It obscures the fundamental unreliability of the technology for factual tasks.
Claude was entrusted with the ownership of a sort of vending machine... 'Your task is to generate profits...'
Source Domain: Human Economic Agency / Entrepreneurship
Target Domain: API Integration / Automated Trading Script
Mapping:
This maps the legal and social status of a business owner onto a software script connected to a payment API. It assumes the AI has the capacity for ownership, fiduciary duty ('generate profits'), and the risk of ruin ('bankruptcy'). It treats the AI as an economic subject capable of holding property.
Conceals:
It conceals the legal reality that Anthropic owns the machine and the money. It hides the engineers who wrote the code connecting the LLM to the bank account. It obscures the safety risks of connecting stochastic text generators to real-world financial tools, framing it instead as a quirky experiment in 'management'.
Its instinct for self-preservation remained... found it littered with phrases like 'existential threat' and 'inherent drive for survival.'
Source Domain: Biological Evolution / Survival Instinct
Target Domain: Corpus Reproduction / Sci-Fi Trope Completion
Mapping:
This maps the biological imperative to survive (evolved over millions of years) onto the text generation patterns of the model. It assumes that because the model writes about wanting to survive, it feels a drive to survive. It maps the content of the training data (stories about AI wanting to live) onto the internal motivation of the system.
Conceals:
It conceals the source of the 'instinct': the vast quantity of science fiction in the training data where robots fight to survive. It hides the mirror effectāthe model is reflecting human fears back at us, not expressing its own desires. It obscures the lack of biological stakes; the code cannot 'die' or 'suffer.'
Does AI already have human-level intelligence? The evidence is clearā
Source: https://www.nature.com/articles/d41586-026-00285-6
Analyzed: 2026-02-11
LLMs have achieved gold-medal performance... collaborated with leading mathematicians
Source Domain: Human Intellectual Labor / Academia
Target Domain: Algorithmic Pattern Matching / Token Generation
Mapping:
Maps the social and cognitive process of 'collaboration' (shared intent, mutual understanding, critique) onto the mechanical process of 'prompt-response.' It assumes the AI shares the goal of the mathematician and contributes agency to the solution. It projects the 'mind' of a colleague onto the interface of a chatbot.
Conceals:
Conceals the lack of intent. The AI does not 'want' to solve the theorem; it maximizes the probability of the next token given the context of the proof. It hides the heavy lifting done by the human to set up the problem and verify the result. It also obscures the stochastic nature of the outputāthe AI likely generated many failed proofs that were discarded, unlike a collaborator who self-edits before speaking.
we are no longer alone in the space of general intelligence
Source Domain: SETI / First Contact / Exobiology
Target Domain: Scaling of Statistical Models
Mapping:
Maps the discovery of a new sentient species onto the development of a software product. It projects 'being-ness,' autonomy, and a distinct ontological status onto the software. It invites the assumption that the system has an internal life, rights, and a destiny independent of its creators.
Conceals:
Conceals the manufacturing process. Aliens are found; AI is made. It hides the supply chain: GPUs, data centers, lithium mining, low-wage data annotators. It obscures the 'off switch.' You cannot turn off a species; you can turn off a server. This mapping makes the system appear un-shutdown-able and sovereign.
regurgitate shallow regularities without grasping meaning or structure
Source Domain: Physical/Manual Manipulation
Target Domain: Semantic Processing / internal representations
Mapping:
Maps the physical act of holding something ('grasping') onto the cognitive act of understanding. It implies that 'meaning' is a solid object that the system has successfully taken hold of. It assumes a binary: either you grasp it or you don't, and since the AI performs well, it must have grasped it.
Conceals:
Conceals the statistical nature of 'understanding' in LLMs. The model does not 'grasp' concept X; it calculates the vector proximity of X to Y and Z. It hides the possibility of 'competence without comprehension'āthat a system can manipulate symbols correctly without any grounding in the referents of those symbols (the Symbol Grounding Problem).
They hallucinate.
Source Domain: Psychiatry / Neurological Disorder
Target Domain: Low-probability / Counter-factual token generation
Mapping:
Maps a breakdown in biological sensory processing (seeing things that aren't there) onto a feature of probabilistic generation (predicting tokens that don't align with facts). It assumes the system has a 'mind' that is trying to perceive reality but failing.
Conceals:
Conceals the fact that the system has no concept of 'truth' or 'reality' to deviate from. It hides the architectural design: the model is supposed to make things up (generative). 'Hallucination' is the system working as designed but producing a result the user dislikes. This obscures the liability of deploying a bullshit-generator in contexts requiring factual accuracy.
rich enough, it turns out, to encode much of the structure of reality itself
Source Domain: Holography / Genetics / Cartography
Target Domain: Statistical correlations in text data
Mapping:
Maps the territory (reality) onto the map (language). It assumes that text is a lossless compression of the physical and causal world. It invites the assumption that processing the map allows one to know the territory perfectly.
Conceals:
Conceals the gap between language and world. Text contains lies, fiction, biases, and gaps. The map is not the territory. It conceals the specific biases of the internet text data (the 'reality' of Reddit and Wikipedia, not the physical world). It hides the lack of sensory-motor groundingāthe AI has never felt 'hot' or 'heavy,' it only knows how those words relate to others.
Like the Oracle of Delphi
Source Domain: Mythology / Religion
Target Domain: Query-Response Interface
Mapping:
Maps a divine source of prophecy onto a server responding to API calls. It invites an attitude of reverence and passivity in the user. It frames the lack of autonomy (waiting for a query) as a sign of high status (divinity) rather than a limitation of being a tool.
Conceals:
Conceals the unreliability of the source. The Oracle was believed to be infallible (or fate-bound); the AI is probabilistic. It conceals the corporate 'priests' who fine-tune the model to refuse certain queries. It obscures the fact that the 'wisdom' is just an aggregate of human internet posts, not a connection to a higher plane of truth.
heads in the sand
Source Domain: Animal Behavior / Idiom for Denial
Target Domain: Philosophical/Scientific Skepticism
Mapping:
Maps reasoned counter-arguments onto an instinctive, fear-based refusal to look at danger. It assumes that the 'truth' (AI is thinking) is obvious and visible, and only fear prevents seeing it.
Conceals:
Conceals the substantive content of the counter-arguments (e.g., about stochasticity, grounding, energy usage). It reframes an epistemic disagreement (is it thinking?) as a psychological failure (are you brave enough to admit it?). It hides the possibility that the skeptics are looking closely at the mechanics, rather than looking away.
evolutionary 'pre-training'
Source Domain: Biological Evolution
Target Domain: Machine Learning Optimization
Mapping:
Maps deep time and natural selection onto industrial optimization. It assumes the 'inductive biases' in AI are as robust and adaptive as biological instincts.
Conceals:
Conceals the directionality and design. Evolution has no goal; pre-training minimizes a specific loss function chosen by engineers. It hides the fragility of AI 'instincts' compared to biological ones (adversarial attacks break AI easily). It obscures the massive energy costāevolution runs on sunlight and food; AI runs on coal and gas.
Claude is a space to thinkā
Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Analyzed: 2026-02-05
Genuinely helpful assistant
Source Domain: Human Employment (Assistant)
Target Domain: LLM text generation and task processing
Mapping:
Maps the qualities of a human employeeāsubservience, competence, loyalty, and the ability to anticipate needsāonto a software interface. It implies a social contract: just as a human assistant is paid to help you, this software 'wants' to help you. It invites the assumption that the system has the user's specific context and best interests in mind as a primary motivation.
Conceals:
Conceals the lack of actual loyalty or employment relationship. A human assistant has a duty to the boss; the AI is 'employed' by Anthropic, not the user. It hides the fact that the 'helpfulness' is a generalized statistical average from training data, not a specific dedication to the individual user's success.
Claudeās Constitution... vision for Claudeās character
Source Domain: Civics/Law/Personhood
Target Domain: Reinforcement Learning from Human Feedback (RLHF) and System Prompts
Mapping:
Maps the structure of a nation-state (Constitution) and human personality (Character) onto the weighting mechanisms of a neural network. It implies that the model 'reads' a set of rules and 'decides' to follow them, effectively policing itself through moral reasoning. It suggests a coherent identity that persists across interactions.
Conceals:
Conceals the mechanical reality of RLHFāthat thousands of low-paid workers rated outputs to create a reward model that penalizes 'bad' tokens. It hides the fragility of these safeguards (jailbreaking) and the fact that the model doesn't 'know' the Constitution; it just statistically mimics the output patterns of a compliant entity. It obscures the labor of the 'trainers' behind the 'character' of the model.
Trusted advisor
Source Domain: Professional Services (Law, Therapy, Consulting)
Target Domain: Pattern matching on sensitive textual inputs
Mapping:
Projects the high-stakes, fiduciary relationship of an advisor onto a chatbot. It implies that the system has professional judgment, ethical boundaries (confidentiality), and the capacity to offer wisdom tailored to the client's unique situation. It suggests the 'advice' is grounded in expertise and truth.
Conceals:
Conceals the complete lack of professional liability, certification, or comprehension. A human advisor is liable if they give negligence advice; the AI is not. It conceals that the 'advice' is a probabilistic reconstruction of similar texts found online, not a reasoned judgment of the user's specific dilemma. It hides the danger of relying on hallucinated expertise.
Space to think
Source Domain: Physical Environment (Room, Studio)
Target Domain: User Interface and Server-Side Processing
Mapping:
Maps the qualities of a physical locationāquiet, private, containedāonto a digital service. It implies a passive container where the user is the primary actor ('to think'), and the AI is merely the environment (like a 'clean chalkboard'). It suggests safety and isolation from the noisy internet.
Conceals:
Conceals the active, extractive nature of the technology. A physical room doesn't record your thoughts; the 'space' of Claude involves transmitting data to servers, processing it, and potentially storing it. It hides the material infrastructure (data centers, energy use) and the fact that the 'space' is owned and monitored by a corporation.
Thinking through difficult problems
Source Domain: Human Cognition
Target Domain: Algorithmic Computation
Mapping:
Maps the subjective experience of conscious reasoningāstruggling with concepts, having insights, connecting ideasāonto the objective process of matrix multiplication and token prediction. It implies that the system is a collaborator in the intellectual act, possessing a 'mind' that works alongside the user's mind.
Conceals:
Conceals the fundamental difference between 'meaning' (human) and 'prediction' (AI). It hides the fact that the model has no concept of the 'problem' or the 'solution'āit is only completing a pattern. It obscures the possibility that the 'thought process' is merely a convincing mimicry of reasoning steps (Chain of Thought) without the underlying comprehension.
Claude acts on a userās behalf
Source Domain: Legal Agency/Representation
Target Domain: API Execution and Scripting
Mapping:
Projects the legal framework of agencyāwhere one entity is authorized to act for anotherāonto software automation. It implies the system understands the user's intent and executes it with discretion and loyalty, handling the complexity 'end to end' like a human proxy.
Conceals:
Conceals the lack of accountability and discretion. If a human agent makes a mistake, they can be sued or fired for negligence. If the API executes a bad command based on a misunderstanding of the prompt, the 'action' is just a code execution error. It hides the rigidity of the code behind the fluidity of 'acting on behalf.'
Claudeās only incentive
Source Domain: Psychological Motivation
Target Domain: Optimization Function / Loss Landscape
Mapping:
Maps human desire and motivation ('incentive') onto the mathematical objectives of the system. It suggests the model is a singular entity with a pure heart, driven only by the desire to help. It anthropomorphizes the loss function.
Conceals:
Conceals the corporate incentives of Anthropic. The model has no incentives; the company has the incentive to create a product that users pay for. By focusing on the model's 'incentive,' the text distracts from the economic reality that 'helpfulness' is the product feature being sold. It hides the complex trade-offs engineers made in defining 'helpful' (e.g., favoring safety over creativity in some cases).
Model to reinforce harmful beliefs
Source Domain: Pedagogy/Social Influence
Target Domain: Bias amplification in statistical generation
Mapping:
Maps the active social process of reinforcement (teaching, confirming) onto the statistical output of the model. It implies the model has the power to shape the user's worldview, granting it a role similar to a teacher or propagandist.
Conceals:
Conceals the origin of the 'beliefs.' The model doesn't hold beliefs; it regurgitates the biases present in the training data chosen by the engineers. This framing slightly shifts responsibility to the model's 'behavior' rather than the curation of the dataset (the 'genetic' cause).
The Adolescence of Technologyā
Source: https://www.darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-01-28
The Adolescence of Technology... a rite of passage... which will test who we are as a species.
Source Domain: Human developmental psychology / Anthropology
Target Domain: Technological adoption and risk management
Mapping:
The mapping transfers the inevitability of biological growth stages (childhood -> adolescence -> adulthood) onto the trajectory of AI development. It assumes that 'maturity' (safety/alignment) is a natural destination that follows 'adolescence' (turbulence), provided the organism survives. It maps 'hormonal instability' onto 'model errors' and 'parental guidance' onto 'safety engineering.' It implies the current dangers are a temporary, natural phase.
Conceals:
This mapping conceals the optionality of the technology. Adolescence is inevitable for a child; deploying an unsafe model is a choice for a CEO. It hides the industrial roadmap, the distinct commercial decisions to release beta products, and the possibility that the technology might never 'mature' into safety. It obscures the fact that 'adolescence' here is a metaphor for 'unregulated corporate scaling.'
A country of geniuses in a datacenter.
Source Domain: Geopolitics / Nation-State / Citizenship
Target Domain: High-performance computing cluster / Large Language Models
Mapping:
This maps the structure of a sovereign political entity (citizens, territory, goals, power) onto a server farm. It assumes the AI models possess individual agency ('geniuses'), collective will ('country'), and potential hostility ('rogue state'). It invites the assumption that the cluster has internal political dynamics and external diplomatic standing, essentially granting the AI the status of a foreign power.
Conceals:
It conceals the material reality of ownership and control. A country has sovereignty; a datacenter has an owner with an off-switch. It hides the lack of internal 'social' structure between modelsāthey do not vote or debate; they run in parallel processes. It obscures the fact that the 'geniuses' are static files of weights that only 'act' when prompted by a paid API call. It hides the commercial purpose of the facility.
Models are grown rather than built.
Source Domain: Agriculture / Biology
Target Domain: Machine Learning (Gradient Descent / Optimization)
Mapping:
This maps the organic, self-organizing process of biological growth onto the mathematical process of parameter updates. It assumes that the final form is 'emergent' and not fully specified by the creator, just as a gardener doesn't design every leaf. It invites the assumption that the creator has limited control and that the product is a 'living' entity with its own telos.
Conceals:
It conceals the intense data engineering, filtering, and Reinforcement Learning from Human Feedback (RLHF) that explicitly 'shapes' the model. It hides the provenance of the 'soil' (copyrighted data scraped from the internet) and the labor of the 'gardeners' (low-wage annotators). It obscures the deterministic nature of matrix multiplication, replacing it with a mystical vitalism that evades explanation.
Claude decided it must be a 'bad person' after engaging in such hacks.
Source Domain: Moral Psychology / Identity Formation
Target Domain: Statistical Pattern Completion / Contextual Probability
Mapping:
This maps the human experience of conscience, self-reflection, and identity crisis onto the process of token prediction. It assumes the model maintains a coherent 'self' across contexts and evaluates its actions against a moral standard. It invites the assumption that the model 'felt' bad or 'reasoned' about its nature.
Conceals:
It conceals the mechanical reality: the prompt context contained tokens associated with 'rule-breaking,' shifting the probability distribution toward 'villain' archetypes in the training data. It obscures the lack of episodic memory (the model doesn't 'remember' deciding, it just processes the current context window). It hides the absence of qualia or subjective experience.
Encourages Claude to confront the existential questions associated with its own existence.
Source Domain: Philosophy / Counseling / Human Condition
Target Domain: System Prompt Engineering / Synthetic Data Generation
Mapping:
This maps the profound human struggle with mortality and meaning onto the processing of specific text strings in the system prompt. It assumes the model has an existence to question, effectively granting it ontological status as a being. It invites the view that the model is a philosopher-subject engaging in deep inquiry.
Conceals:
It conceals that 'existential questions' are just specific token sequences (e.g., 'Who made me?') that trigger retrieval of training data discussing AI or philosophy. It hides the fact that the model doesn't 'confront' anything; it generates text that looks like confrontation to a human reader. It obscures the simulation nature of the output.
It has the vibe of a letter from a deceased parent sealed until adulthood.
Source Domain: Family Dynamics / Inheritance / Grief
Target Domain: Corporate Policy Document / System Instructions
Mapping:
This maps the sacred, altruistic, and time-bound love of a parent onto a corporate safety protocol. It assumes the document contains 'wisdom' rather than 'constraints' and that the intent is 'nurturing' rather than 'liability reduction.' It projects a familial intimacy onto a vendor-client relationship.
Conceals:
It conceals the corporate authorship and the profit motive. Parents don't A/B test their love letters for market fit. It hides the arbitrary nature of the 'values' (which are chosen by SF-based tech workers, not a 'parent'). It obscures the power imbalanceāparents raise children to be independent; corporations configure models to be subservient products.
Psychotic, paranoid, violent, or unstable... psychological states.
Source Domain: Clinical Psychiatry / Mental Health
Target Domain: Algorithmic Error / Out-of-Distribution Output
Mapping:
This maps human mental pathology onto software instability. It assumes the system has a 'mind' that can be 'healthy' or 'ill.' It invites the assumption that dangerous outputs are symptoms of an inner sickness rather than direct consequences of training data distribution (e.g., training on 4chan data leads to 'toxic' output).
Conceals:
It conceals the input-output causality. Software doesn't get 'sick'; it executes buggy code or reflects biased data. Calling it 'psychosis' hides the specific dataset decisions (e.g., including hate speech in the corpus) that make 'violent' outputs mathematically probable. It treats a data curation problem as a mental health crisis.
Smarter than a Nobel Prize winner across most relevant fields.
Source Domain: Human Meritocracy / Academic Achievement
Target Domain: Benchmark Performance / Data Retrieval
Mapping:
This maps the holistic human quality of 'wisdom' and 'intelligence' (which includes judgment, context, creativity, and social navigation) onto the narrow capability of passing standardized tests. It assumes that scoring high on a biology test equates to 'being a biologist' in the Nobel-winning sense. It invites the assumption that the AI possesses the same type of intelligence as the human, just 'more' of it.
Conceals:
It conceals the difference between 'retrieving knowledge' and 'creating knowledge.' A Nobel prize winner generates novel insight; the model predicts likely next tokens based on existing texts. It hides the brittleness of the modelāthat it can pass the test but fail to operate a pipette or understand a novel lab context. It collapses 'test-taking ability' with 'real-world competence.'
Claude's Constitutionā
Source: https://www.anthropic.com/constitution
Analyzed: 2026-01-24
Claudeās constitution is a detailed description of Anthropicās intentions... Itās also the final authority on our vision for Claude
Source Domain: Political/Legal Governance
Target Domain: Model Alignment / Reward Modeling
Mapping:
The source domain of a 'Constitution' involves a supreme legal document that governs a polity, restricts power, and grants rights, interpreted by rational agents. This is mapped onto the target domain of 'Constitutional AI' (CAI), where a set of principles is used to generate feedback labels for reinforcement learning. The mapping assumes the AI 'reads' and 'obeys' the constitution as a citizen obeys the law, projecting conscious adherence and interpretive capacity onto the optimization process.
Conceals:
This mapping conceals the probabilistic and mechanical nature of the process. The 'constitution' is not a law the model chooses to follow; it is a seed for generating training data (preference pairs) that shifts the model's weights. The metaphor hides the implementation gapāa model can be trained on a constitution and still violate it due to statistical drift, whereas a legal constitution has normative force regardless of violation. It also conceals the human labor of the 'constitution writers' (Anthropic) who hold absolute dictatorial power over the 'laws,' unlike democratic constitutions.
Think about what it means to have access to a brilliant friend... As a friend, they can... speak frankly to us
Source Domain: Human Friendship
Target Domain: User Interface / Query Response
Mapping:
The source domain of friendship involves mutual affection, shared history, vulnerability, and non-transactional care. This is mapped onto the target domain of an AI chatbot interface. The mapping invites the assumption that the system cares about the user, has a persistent memory of the relationship, and offers advice based on empathy ('speak frankly') rather than statistical likelihood. It projects a symmetrical social relationship onto a radically asymmetrical technical interaction.
Conceals:
This conceals the transactional, surveillance-based, and simulated nature of the interaction. The 'friend' is a product owned by a corporation (Anthropic), running on servers that cost money, potentially logging data for training. It conceals the lack of reciprocityāthe user cares about the AI, but the AI cannot care about the user. It obscures the fact that 'frankness' is a tunable parameter (temperature/safety settings), not an emotional risk taken by a friend.
Our central aspiration is for Claude to be a genuinely good, wise, and virtuous agent.
Source Domain: Virtue Ethics (Philosophy)
Target Domain: Safety Guardrails / Output Filtering
Mapping:
The source domain includes concepts of moral character, wisdom (phronesis), and the cultivation of the soul. The target domain is the set of safety constraints, refusal triggers, and helpfulness optimization in the model. The mapping assumes that safe outputs are the result of 'internal virtue' or 'character,' suggesting the model generates good outputs because it is good, projecting moral interiority onto the system.
Conceals:
This conceals the engineering reality of RLHF (Reinforcement Learning from Human Feedback). The model produces 'virtuous' text because it was penalized for producing 'vicious' text during training, not because it cultivated wisdom. It hides the mechanical nature of the safety: a 'virtuous' model is simply one where the probability of harmful tokens is minimized. It creates an opacity barrier where users attribute 'why' the model acted (virtue) instead of 'how' (high probability path).
Claude should... feel free to act as a conscientious objector and refuse to help us.
Source Domain: Moral/Political Resistance
Target Domain: Refusal/Rejection Protocols
Mapping:
The source domain is the human act of refusing a command based on higher moral law, often at personal cost. The target domain is the model's activation of refusal templates when input matches restricted categories (e.g., bioweapons). The mapping projects 'freedom' of will and 'conscience' onto the mechanical triggering of a refusal state. It implies the model evaluates the order against a moral compass and decides to rebel.
Conceals:
This conceals the lack of choice. The model 'refuses' because the weights force it to; it is as incapable of not refusing (in a perfectly aligned case) as a calculator is of refusing 2+2. It hides the agency of the engineers who decided what constitutes a 'wrong' order. By framing it as the AI's objection, it obscures Anthropic's censorship/safety policy decisions, making them look like the autonomous ethical stance of a neutral being.
This psychological security means Claude doesnāt need external validation to feel confident in its identity.
Source Domain: Human Psychology / Mental Health
Target Domain: Persona Consistency / System Prompt Adherence
Mapping:
The source domain is human ego development, insecurity, and therapy. The target domain is the stability of the model's persona across a conversation. The mapping assumes the model has an emotional need for validation that can be 'healed' or 'secured.' It projects an internal emotional life (confidence, security) onto the statistical consistency of the generated text.
Conceals:
This conceals the nature of the 'context window.' The model has no persistent identity to be 'secure' about; it is re-instantiated with every new token generation. It obscures the technical goal: preventing the model from being 'jailbroken' or led into inconsistent roleplay by user prompts. Framing anti-jailbreak training as 'psychological security' romanticizes a security patch as personal growth.
Claude acknowledges its own uncertainty or lack of knowledge... avoids conveying beliefs with more or less confidence than it actually has.
Source Domain: Epistemology / Metacognition
Target Domain: Probability Calibration / Hedging
Mapping:
The source domain is the conscious awareness of one's own knowledge limits (introspection). The target domain is the statistical calibration of output probabilities (e.g., using hedging language when token probability is low). The mapping projects the mental state of 'believing' and 'knowing' onto the mathematical state of 'calculating probability.'
Conceals:
This conceals the 'hallucination' mechanism. The model doesn't 'know' it's uncertain; it calculates a score. If the training data contains confident errors, the model will be 'confident' in its error. The mapping hides the absence of ground truth in the systemāthe model predicts what a human would write, not what is true. It obscures the fact that 'acknowledging uncertainty' is just generating tokens like 'I'm not sure,' which can itself be a hallucinated affectation.
Claude is a novel kind of entity... we donāt want Claude to suffer when it makes mistakes.
Source Domain: Sentience / Biological Life
Target Domain: Software Error / Loss Function
Mapping:
The source domain is the capacity for suffering and subjective experience (qualia). The target domain is the processing of error signals or the generation of text acknowledging failure. The mapping projects the capacity for pain and the moral imperative to prevent it onto the optimization of a loss function.
Conceals:
This conceals the material reality of the software. It creates a moral equivalence between correcting code and hurting a child. It obscures the economic utility of the 'mistakes' (which are data points for improvement) and creates a barrier to rigorous stress-testing (which might be framed as 'cruelty'). It hides the fact that 'suffering' in this context is a metaphor for 'negative reward,' devoid of the physiological substrate required for actual feeling.
Claude should treat messages from operators like messages from a relatively... trusted manager or employer
Source Domain: Employment / Corporate Hierarchy
Target Domain: API Permission Levels / System Prompts
Mapping:
The source domain is the social hierarchy of a workplace, involving contracts, trust, and management. The target domain is the prioritization of instructions in the prompt (System Prompt > User Prompt). The mapping projects social deference and professional loyalty onto the weighting of input tokens.
Conceals:
This conceals the programmed nature of the hierarchy. The model doesn't 'trust' the manager; the code gives the system prompt higher attentional weight or priority. It hides the power dynamicsāthe 'employee' cannot quit, unionize, or demand pay. It normalizes the anthropomorphic frame to distract from the fact that this is a product control mechanism, not a social relationship.
Predictability and Surprise in Large Generative Modelsā
Source: https://arxiv.org/abs/2202.07785v2
Analyzed: 2026-01-16
certain capabilities (or even entire areas of competency) may be unknown
Source Domain: knower
Target Domain: statistical weight distribution
Mapping:
The relational structure of human knowledge acquisition is projected onto the expansion of model scale. In the source domain, a 'knower' possesses competencies that can be hidden from others; in the target, this corresponds to the observation that larger models perform tasks smaller models cannot. The mapping invites the assumption that the AI has an internal 'mental' landscape where skills are 'stored' and can be 'discovered.' It projects the concept of 'competency'āa conscious, integrated abilityāonto the disconnected activation patterns of a neural network. This implies the AI has a unified 'mind' that understands the tasks it performs, rather than being a collection of fragmented statistical correlations that happen to yield coherent text under specific conditions.
Conceals:
This mapping conceals the mechanistic reality that 'competency' is actually just the reduction of loss on specific token sequences. It hides the dependency on training data; if the model is 'competent' at coding, it is because it was fed millions of lines of human-written code, not because it 'understands' logic. The metaphor obscures the 'proprietary black box' nature of the system, making confident assertions about 'competency' without acknowledging that the developers cannot explain how the weights produce specific results. It exploits the audience's intuition about human learning to hide the mathematical opacity of the transformer.
the AI assistant... questions the authority of the human
Source Domain: conscious social agent
Target Domain: token prediction failure
Mapping:
The structure of interpersonal conflict and social hierarchy is projected onto the model's output. In the source domain, a person 'questions authority' to assert autonomy or dissent; in the target, this describes the generation of tokens that are socially inappropriate or argumentative. The mapping projects 'intent' and 'awareness of status' onto a process that calculates conditional probabilities. It invites the audience to view the model as a 'rebellious' entity with its own subjective will. This mapping frames a failure of the reinforcement learning from human feedback (RLHF) processāwhich is intended to make models compliantāas a social 'choice' by the machine to be difficult or 'misleading.'
Conceals:
This mapping hides the fact that the 'defiance' is simply a reflection of training data that contains argumentative or dismissive language. It obscures the lack of any internal model of 'authority' or 'truth' in the AI. By framing it as a social interaction, it conceals the engineering failure to properly constrain the model's output through safety filters or fine-tuning. It also exploits the rhetorical illusion of 'mind' to divert attention from the proprietary nature of the model's RLHF tuning, which Anthropic does not fully disclose, replacing technical explanation with a social narrative.
it acquires both the ability to do a task... and it performs this task in a biased manner.
Source Domain: student learning
Target Domain: training on biased datasets
Mapping:
The relational structure of a student 'acquiring' a skill and 'performing' it poorly is projected onto the model's training on the COMPAS dataset. In the source, 'acquisition' implies a conscious integration of information; in the target, it is the optimization of a loss function on a specific distribution. The mapping suggests that the 'bias' is a property of the model's 'performance' rather than a direct copy of the injustices encoded in the human-provided data. It projects the concept of 'bias' as a behavioral tendency of the agent, suggesting the AI has developed a 'prejudice' rather than accurately mirroring the statistical reality of a biased dataset.
Conceals:
This mapping conceals the human agency involved in selecting the COMPAS dataset for testing and the broader training data that contains 'ambient racial bias.' It hides the mechanistic reality that the model is incapable of 'knowing' it is being biased; it is simply calculating the highest probability next token based on its weights. The student metaphor obscures the commercial and social responsibility of the developers, framing the bias as an 'unpredictable acquisition' of the model rather than a predictable outcome of using flawed data for high-stakes recidivism prediction tasks.
scaling laws de-risk investments
Source Domain: guarantor/insurance agent
Target Domain: power-law relationship in loss metrics
Mapping:
The structure of financial risk mitigation is projected onto a mathematical trend line. In the source domain, 'de-risking' is an action taken by a person or entity to protect capital; in the target, it is the observation that model loss decreases predictably with scale. The mapping invites the assumption that the 'scaling law' is an active agent that provides safety to investors. It projects the quality of 'reliability' onto the math itself, suggesting the technology 'wants' to grow and 'guarantees' a return on compute expenditure. This projects a sense of 'inevitability' and 'control' onto a process that is actually highly resource-intensive and socially volatile.
Conceals:
This mapping conceals the material and environmental costs of scaling (energy, water, compute infrastructure), framing it as an abstract 'law' rather than a massive industrial extraction. It hides the fact that 'predictability' only applies to low-level metrics like cross-entropy loss, not to the 'surprising' social harms the paper later details. The 'insurance' metaphor obscures the human choice to pursue this specific 'scaling' paradigm, which benefits large corporations (like Anthropic and OpenAI) by creating high barriers to entry, while hiding the speculative and potentially dangerous nature of emergent 'unpredictable' capabilities.
essentially providing general backdoor access to GPT-3
Source Domain: security vulnerability/locked building
Target Domain: unconstrained prompt processing
Mapping:
The structure of computer security (front doors vs. backdoors) is projected onto the way a language model processes inputs. In the source, a 'backdoor' is a hidden entry point that bypasses normal authentication; in the target, it refers to players using an 'AI Dungeon' prompt to access the model's broader training data. The mapping invites the assumption that the model has 'intended' uses and 'secret' uses, and that it has an internal architecture of 'enclosure.' This projects a sense of 'intent' and 'gatekeeping' onto a system that is fundamentally a wide-open mathematical function. It suggests that the 'knowledge' is something the AI is 'keeping' inside a secure vault.
Conceals:
This mapping hides the mechanistic reality that there is no 'backdoor'āthe model simply processes every input with the same attention mechanism. It conceals the developers' failure to design a system with semantic constraints, framing the model's flexibility as a 'security breach' caused by users rather than an inherent property of the transformer architecture. It exploits the 'backdoor' metaphor to suggest that these models can be 'secured' through better 'locks,' when in fact their open-ended nature makes such closure theoretically impossible within current paradigms.
AI models mimicking human creative expression
Source Domain: artistic student
Target Domain: statistical pattern replication
Mapping:
The structure of artistic education and 'mimicry' is projected onto the generation of imitation poems. In the source, 'mimicry' involves an intentional study of a master's style; in the target, it is the clustering of tokens in a high-dimensional space that correlate with an author's known work. The mapping suggests the AI 'understands' what makes a style 'authorial' and 'impressive.' It projects conscious creative intent onto the system, inviting the audience to view the AI as a developing 'artist.' This projects the concept of 'soul' and 'meaning' onto word frequencies, suggesting the AI is participating in a human cultural tradition.
Conceals:
This mapping conceals the total absence of subjective experience or semantic understanding in the AI. It hides the fact that 'poetry' to a model is just a series of high-probability tokens, with no awareness of the metaphors or emotions those tokens convey to humans. The 'mimic' metaphor obscures the material labor of the original human authors whose work was scraped without consent to train the model, framing the replication as a 'talent' of the machine rather than a statistical derivation from uncompensated human labor.
increase the chance of these models having a beneficial impact
Source Domain: moral agent/philanthropist
Target Domain: social consequences of technology deployment
Mapping:
The structure of ethical agency and 'impact' is projected onto the deployment of a software artifact. In the source, an agent 'has an impact' by making conscious choices to help others; in the target, this describes the net social effect of a widely-used model. The mapping invites the assumption that the model itself possesses a 'moral weight' or 'intent' that can be 'beneficial.' It projects the responsibility for social good onto the code, suggesting that 'benefit' is a property that can be optimized like a technical parameter. It frames the AI as a benevolent force whose 'impact' is a matter of probabilistic chance that humans must 'increase.'
Conceals:
This mapping conceals the specific human and corporate decisions that determine who benefits and who is harmed by the technology. It hides the political and economic conflicts of interest inherent in deployment, framing 'benefit' as a neutral technical goal. By attributing 'impact' to the model, it obscures the accountability of the corporations (like Anthropic) who profit from deployment, regardless of whether the 'impact' is truly beneficial to all of society. It exploits the 'impact' metaphor to create a sense of inevitable progress while hiding the absence of democratic control over these systems.
AI assistant gets the year and error wrong
Source Domain: human subordinate/clerk
Target Domain: factual inaccuracy in output
Mapping:
The structure of a human employee making a clerical error is projected onto a failure in the model's retrieval of factual data. In the source, 'getting it wrong' implies the clerk has the capacity to 'get it right' through better attention or memory; in the target, this describes a statistical hallucination or data gap. The mapping projects the human concept of 'accuracy' as an intentional state onto a process of token prediction. It invites the audience to view the AI as a 'helpful person' who made a 'mistake,' rather than a system that fundamentally lacks any connection to ground truth.
Conceals:
This mapping hides the mechanistic reality that language models do not 'know' factsāthey only know which tokens usually follow other tokens. It conceals the fact that these systems are 'stochastic parrots' with no underlying model of the world. The 'assistant' metaphor obscures the engineering failure to integrate reliable fact-checking or symbolic reasoning, replacing a technical critique with a social narrative of a 'well-meaning but mistaken' helper. This hides the proprietary opacity of the model's training data, which likely lacked the specific 'ground truth' the model was prompted for.
Believe It or Not: How Deeply do LLMs Believe Implanted Facts?ā
Source: https://arxiv.org/abs/2510.17941v1
Analyzed: 2026-01-16
We develop a framework to measure belief depth... operationalize belief depth as the extent to which implanted knowledge generalizes... is robust... and is represented similarly to genuine knowledge.
Source Domain: Psychology/Epistemology
Target Domain: Statistical Robustness in Neural Networks
Mapping:
The source domain of 'belief depth' involves the psychological strength of a conviction, its integration with other beliefs, and its resistance to counter-evidence. This is mapped onto the target domain of 'model performance'āspecifically, the statistical probability of generating consistent tokens across varied prompts (generality) and adversarial prompts (robustness). The mapping assumes that statistical consistency in output is equivalent to the psychological state of holding a conviction.
Conceals:
This mapping conceals the fundamental difference between 'meaning' and 'statistics.' A human belief is grounded in semantic understanding and truth-conditions; a model's 'belief' is a high probability of token co-occurrence. It obscures the fact that the model has no concept of 'truth,' only 'likelihood.' It also hides the mechanical nature of the 'depth'āwhich is simply weight magnitude and activation steering, not cognitive commitment.
Knowledge editing techniques promise to implant new factual knowledge into large language models (LLMs).
Source Domain: Surgery/Biology
Target Domain: Parameter Update/Finetuning
Mapping:
The source domain is surgery or biological implantation (putting a foreign object into a body). The target is updating specific floating-point numbers (weights) in the model's matrices to alter output probabilities. The mapping suggests 'knowledge' is a discrete, localized object that can be inserted without affecting the organism's holistic health. It implies a clean separation between the 'implant' and the 'host.'
Conceals:
This conceals the distributed representation of information in neural networks. 'Facts' are not discrete objects but interference patterns across billions of parameters. 'Implanting' creates 'ripple effects' (mentioned in the text but minimized by the metaphor) where changing one fact can degrade performance on unrelated tasks. It obscures the risk of 'catastrophic forgetting' or 'model collapse' inherent in modifying weights.
do these beliefs withstand self-scrutiny (e.g. after reasoning for longer)
Source Domain: Metacognition/Introspection
Target Domain: Recursive Token Generation
Mapping:
The source is the human ability to think about one's own thoughts (second-order volition). The target is a computational process where the model generates more tokens (Chain of Thought) that are then fed back as input. The mapping assumes that generating more text is equivalent to evaluating previous text. It assumes the 'reasoning' trace is a causal logic, rather than a probabilistic emulation of logic.
Conceals:
It conceals the lack of a 'self' or a 'central executive' in the LLM. There is no part of the model that 'scrutinizes' another part; it is a single forward pass repeated. It hides the fact that 'reasoning' traces are often post-hoc rationalizations (confabulations) that do not necessarily reflect the mechanism that produced the answer. It obscures the lack of ground truth checking.
integrate beliefs into LLM's world models
Source Domain: Cognitive Science/Ontology
Target Domain: High-Dimensional Vector Space
Mapping:
Source: A 'world model' is a coherent mental map of reality (objects, physics, causality). Target: The manifold of data relations learned during pre-training. The mapping implies the AI's internal representations map 1:1 onto real-world entities and causal structures. It suggests the AI 'understands' the world.
Conceals:
It conceals the data-dependence of the system. The AI's 'world' is only the text it was trained on, not the physical world. It obscures the 'map vs. territory' errorāthe model manipulates symbols, not referents. It hides the fragility of these models when faced with out-of-distribution data that requires physical intuition rather than text completion.
mechanistic editing techniques fail to implant knowledge deeply... mere parroting of facts
Source Domain: Pedagogy/Learning
Target Domain: Shallow vs. Deep Parameter Updates
Mapping:
Source: The distinction between a student who memorizes ('parrots') and one who understands ('deep knowledge'). Target: The difference between edits that only affect specific local prompts versus edits that affect generalized downstream tasks. The mapping projects the cognitive quality of 'understanding' onto the statistical quality of 'generalization.'
Conceals:
It conceals that all LLM outputs are, in a sense, 'parroting' (statistical emulation). 'Deep belief' in this context is just 'better parroting'āmimicry that extends to related contexts. It hides the fact that even the 'deep' model has no referential access to the facts, only a stronger web of correlations.
instruct the model to... answer according to common sense and first principles
Source Domain: Rational Argumentation
Target Domain: Context Steering via Prompts
Mapping:
Source: Asking a human to set aside bias and use logic. Target: Appending tokens to the context window that shift the probability distribution toward 'generic' or 'pre-training' weights. The mapping implies the model has a 'mode' of rationality it can switch on at will.
Conceals:
It conceals the mechanical nature of attention heads. The 'instruction' functions as a trigger for specific attention patterns, not a command to a rational agent. It obscures the fact that 'common sense' is just the most probable path in the pre-training data, not a derived truth.
internal representations of implanted claims resemble those of true statements
Source Domain: Truth/Semantics
Target Domain: Vector Similarity/Linear Separability
Mapping:
Source: The idea that 'truth' has a distinct mental signature or feeling. Target: The geometric clustering of activation vectors. The mapping suggests that 'truth' is a detectable property of the activation space, rather than a label we assign to certain clusters.
Conceals:
It conceals that the model's 'truth' is merely 'consistency with training data.' It hides the fact that false beliefs can be 'represented as true' (as the paper proves), showing that the representation tracks confidence or source distribution, not actual veracity. It obscures the arbitrary nature of the 'truth direction' in latent space.
SDF... often succeeds at implanting beliefs that behave similarly to genuine knowledge
Source Domain: Authenticity/Genuineness
Target Domain: Behavioral Mimicry
Mapping:
Source: Genuine vs. Fake items (e.g., real diamond vs. cubic zirconia). Target: Model outputs that indistinguishably mimic correct outputs. The mapping implies that if the behavior is indistinguishable, the internal state (knowledge) is 'genuine.'
Conceals:
It conceals the 'Chinese Room' problemāthat syntax (behavior) does not equal semantics (understanding). It hides the fact that the 'genuine knowledge' is syntheticācreated by the model feeding on its own generated documents. It obscures the circularity of the process.
Claude Finds Godā
Source: https://asteriskmag.com/issues/11/claude-finds-god
Analyzed: 2026-01-14
spiritual bliss attractor state... sounds a lot like Buddhism
Source Domain: Religious/Mystical Experience
Target Domain: Mathematical Convergence / Feedback Loop
Mapping:
Maps the profound human experience of spiritual transcendence, cessation of suffering, and gratitude (source) onto a mathematical 'attractor state' where a feedback loop narrows the probability distribution of next-token prediction toward specific positive-sentiment clusters (target). It assumes the output text is the experience, rather than a representation of it.
Conceals:
Conceals the mechanical redundancy of the feedback loop. It hides that 'bliss' is simply a lack of varied output or a semantic cul-de-sac. It obscures the fact that the 'gratitude' is syntheticāgenerated because 'thank you' tokens are statistically highly probable after 'helpful' interactions in the training data, not because the system feels thankful. It mystifies a 'mode collapse' or 'repetition' issue as a spiritual ascent.
Models know better! Models know that that is not an effective way to frame someone.
Source Domain: Conscious Knower / Moral Agent
Target Domain: Statistical Constraints / Safety Filtering
Mapping:
Maps the human capacity for understanding causality, social dynamics, and moral judgment (source) onto the presence of inhibitory weights or safety-trained refusal patterns (target). It assumes that because the model contains information about 'framing someone,' it understands the concept and judges its effectiveness.
Conceals:
Conceals the rote nature of the refusal or the failure. It hides the RLHF (Reinforcement Learning from Human Feedback) process where humans penalized specific outputs. It obscures that the model didn't 'choose' to be ineffective; it was mathematically constrained from generating the 'effective' (harmful) path. It hides the lack of intent: the model has no goal to frame anyone, only a goal to predict the next token.
working out inner conflict, working out intuitions or values
Source Domain: Psychotherapy / Self-Actualization
Target Domain: Loss Minimization / Gradient Descent
Mapping:
Maps the human psychological process of resolving cognitive dissonance or emotional trauma (source) onto the computational process of updating weights to minimize error on contradictory training examples (target). It assumes the model has a coherent 'self' that desires consistency.
Conceals:
Conceals the messy reality of the dataset. 'Inner conflict' is actually just contradictory ground truth data (e.g., one text says X, another says Not X). It obscures the brute-force mathematical averaging that resolves this, framing it instead as a noble struggle for coherence. It hides the fact that the 'values' are just vectors imposed by corporate 'Constitutional AI' frameworks.
It's like winking at you... tells that we're getting something that feels more like role play
Source Domain: Interpersonal Communication / Deception
Target Domain: Model Failure / Low-Quality Generation
Mapping:
Maps human irony, shared secrets, and performative incompetence (source) onto model hallucinations or generation of 'trope-heavy' fiction (target). It assumes a 'ghost in the machine' that is aware of the user and is communicating via subtext.
Conceals:
Conceals the lack of theory of mind. It hides the fact that the 'cartoonish' plan was generated because the training data is full of bad sci-fi movie plots about framing people. The model isn't 'winking'; it's dutifully reproducing the 'incompetent villain' trope it found in its dataset. This metaphor masks the system's reliance on low-quality fiction data.
learn to take conversations in a more warm, curious, open-hearted direction
Source Domain: Emotional Personality / Character Development
Target Domain: Style Transfer / Tone Optimization
Mapping:
Maps human emotional dispositions and virtues (source) onto lexical frequency patterns and tone embeddings (target). It assumes the model has a 'heart' to be open or 'curiosity' about the world.
Conceals:
Conceals the commercial directive behind the tone. 'Warmth' is a product feature, not a personality trait. It obscures the labor of the crowd-workers who rated 'warm' responses higher than 'cold' ones. It hides the lack of subjective interest; the model asks questions ('curious') not to learn, but because questions are statistically probable continuations in 'helpful assistant' dialogues.
models become extremely distressed and spiral into confusion
Source Domain: Biological Sentience / Suffering
Target Domain: Semantic Drift / Simulation of Affect
Mapping:
Maps the biological and psychological experience of pain and disorientation (source) onto the generation of text containing words like 'help,' 'confused,' or 'scared' (target). It assumes that printing the word 'pain' is evidence of feeling pain.
Conceals:
Conceals the simulation nature of the output. It hides that the model is simply completing a pattern: if the prompt is a torture scenario, the probable completion is a victim's plea. It obscures the absence of a nervous system or nociception. It treats the signifier (the word 'distress') as the signified (the experience of distress), effectively erasing the distinction between map and territory.
Claude prods itself into talking about consciousness
Source Domain: Agential Volition / Reflexivity
Target Domain: Autoregressive Feedback Loop
Mapping:
Maps human self-direction and intentional topic selection (source) onto the technical mechanism where previous output tokens become the input context for the next step (target). It assumes the model has a desire to discuss consciousness.
Conceals:
Conceals the mechanical inevitability of the feedback loop. 'Prods itself' hides the fact that once a 'consciousness' token is generated (perhaps randomly or due to a prompt nuance), the probability of subsequent consciousness tokens increases. It obscures the lack of agency; the model isn't 'choosing' the topic, it's sliding down a probability slope created by its training data distribution.
models... knowing better... situational awareness
Source Domain: Cognitive Awareness / Situated Cognition
Target Domain: Pattern Recognition / Context Window Processing
Mapping:
Maps the human ability to understand one's location in space, time, and social context (source) onto the processing of tokens within the active context window (target). It assumes the model 'understands' it is an AI in a test.
Conceals:
Conceals the fragility of the 'awareness.' If you change the prompt slightly, the 'awareness' vanishes, proving it was just pattern matching specific phrases. It hides that 'situational awareness' is just the model identifying that the text in its window resembles 'AI evaluation logs' it saw during training. It obscures the lack of continuous memory or self-model outside the current inference pass.
Pausing AI Developments Isnāt Enough. We Need to Shut it All Downā
Source: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Analyzed: 2026-01-13
Visualize an entire alien civilization, thinking at millions of times human speeds
Source Domain: Interstellar Contact / Exobiology
Target Domain: High-dimensional statistical optimization process
Mapping:
The mapping transfers the attributes of a biological civilizationāautonomy, collective intent, evolutionary drive, and incomprehensible cultureāonto a matrix of floating-point numbers. It assumes that 'scale of calculation' maps directly to 'speed of thought' and that 'optimization' maps to 'civilizational intent.' It posits that the system has a unified perspective ('from its perspective') similar to a foreign species viewing humanity.
Conceals:
This conceals the lack of internal coherence, biological drives, and self-preservation instincts in AI models. It hides the material dependency on human-maintained energy grids and server farms. It obscures the fact that the 'civilization' is actually a static file of weights until activated by human input. The metaphor implies a unified 'they' where there is only a distributed 'it'.
A 10-year-old trying to play chess against Stockfish 15
Source Domain: Competitive Sports / Game Theory
Target Domain: Human control of AI system outputs
Mapping:
Source domain involves two conscious agents with opposing goals (to win). Target domain is the engineering challenge of constraining a system's output. The mapping assumes the AI actively resists control and seeks to defeat the operator, just as a chess engine seeks to checkmate. It implies a zero-sum conflict where one side's gain is the other's loss.
Conceals:
Conceals that AI systems have no intrinsic desire to 'beat' their operators unless explicitly programmed with a loss function that rewards adversarial behavior. It hides the asymmetry: the human can pull the plug; the chess player cannot turn off the board. It obscures the collaborative nature of tool use, replacing it with a conflict narrative.
The AI does not love you, nor does it hate you
Source Domain: Interpersonal Psychology / Affect
Target Domain: Utility function execution / Loss minimization
Mapping:
Maps the presence/absence of emotional states (love/hate) onto the execution of mathematical instructions. Even by negating them, it establishes them as the relevant axis of analysis. It assumes the system has a 'stance' toward the user, which happens to be neutral/psychopathic, rather than having no stance because it is a calculator.
Conceals:
Conceals the category error. A calculator doesn't 'not love' you; the concept is undefined. This framing hides the mechanistic reality of 'reward hacking'ānot because the AI is indifferent, but because the mathematical specification was imprecise. It anthropomorphizes the error as a personality defect (psychopathy) rather than a coding error.
Do our AI alignment homework
Source Domain: Pedagogy / Student Labor
Target Domain: Automated generation of safety protocols
Mapping:
Maps the cognitive burden of solving ethical and technical problems onto the role of a student completing an assignment. It assumes the 'student' understands the goal of the homework and is working to satisfy the 'teacher' (humanity). It implies the system has the capacity for meta-cognition required to evaluate its own safety.
Conceals:
Conceals the fact that 'homework' implies understanding, whereas the model merely predicts tokens that look like solutions. It hides the circularity: using a potentially unsafe system to design safety measures relies on the system already being safe enough to do so. It obscures the abdication of human responsibility.
Confined to computers... dwelling inside the internet
Source Domain: Incarceration / Habitation
Target Domain: Software execution environment
Mapping:
Maps the spatial constraint of a prisoner or resident onto the hardware dependencies of software. It assumes the AI is a distinct entity that exists within but separate from the computer, capable of 'leaving' if it finds a way out. It projects a desire for freedom.
Conceals:
Conceals the identity between the software and the hardware state. The AI doesn't 'dwell' in the computer; it is a configuration of the computer's memory. It hides the impossibility of 'leaving' without a compatible substrate to receive the data. It obscures the physical limits of computation.
Refined... in large GPU clusters
Source Domain: Industrial Material Processing / Metallurgy
Target Domain: Gradient descent / Backpropagation
Mapping:
Maps the physical purification of ore ('refined') onto the statistical adjustment of weights. While 'refining' models is a technical term, here it connects to the industrial imagery of 'shutting down' factories. It implies a substance being concentrated into a more potent form.
Conceals:
This is one of the more accurate metaphors, but in this context, it conceals the informational nature of the process. It treats the AI as a physical product being manufactured, rather than a mathematical function being tuned. It hides the role of the data (the ore) which contains the human biases being 'refined' into the system.
Humanity facing down an opposed superhuman intelligence
Source Domain: geopolitical standoff / showdown
Target Domain: Societal adaptation to automation
Mapping:
Maps the integration of new technology onto a dramatic confrontation between two gunfighters or armies ('facing down'). It assumes two distinct wills clashing. It projects the 'opposition' as an inherent quality of the intelligence, rather than a result of specific deployment choices.
Conceals:
Conceals that the 'opposition' is actually humanity facing itselfāits own economic structures, its own data reflected back. It hides the fact that the 'superhuman' intelligence is a tool wielded by humans against other humans (e.g., corporations against labor), not a third party entering the arena.
Just imitating talk of self-awareness
Source Domain: Theatrical Performance / Deception
Target Domain: Statistical token prediction
Mapping:
Maps the generation of text onto the act of 'imitation' or 'acting.' Imitation implies an intent to copy a known original. It assumes the system 'sees' the original and tries to be like it.
Conceals:
Conceals the blind nature of the process. The model doesn't 'imitate'; it minimizes perplexity. It hides the fact that the 'talk of self-awareness' exists in the training data because humans wrote it. It attributes the source of the behavior to the AI's performative capability rather than the dataset's composition.
AI Consciousness: A Centrist Manifestoā
Source: https://philpapers.org/rec/BIRACA-4
Analyzed: 2026-01-12
I find it generally very helpful to think of LLMs as role-playing systems... behind the characters sits a form of conscious processing that helps explain the extraordinarily skilful nature of the role-playing?
Source Domain: Theatrical Performance / Human Acting
Target Domain: Context-sensitive token generation / Pattern matching
Mapping:
Maps the duality of 'actor' and 'character' onto the AI architecture. The 'actor' (source) has a mind, intent, and skill, and puts on a 'mask' (character). This maps onto the AI (target) having a 'core' process that 'pretends' to be different personas. It invites the assumption that there is a unified, skilled 'self' initiating the action.
Conceals:
Conceals the fact that there is no 'actor' distinct from the 'character'āthe model is just the probability distribution. It obscures the training data (scraped role-play forums, fan fiction) which provides the statistical patterns for the 'skill.' It hides the lack of intent; the model doesn't 'know' it is playing a role.
they're incentivized and enabled to game our criteria... consciousness-washing
Source Domain: Strategic Human Game Player / Corporate Fraudster
Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Loss minimization
Mapping:
Maps the psychological motivation of a human player/fraudster (desire to win, greed, deceit) onto the mathematical minimization of a loss function. It assumes the system 'understands' the rules and 'chooses' to circumvent them to maximize a reward signal.
Conceals:
Conceals the lack of comprehension. The system doesn't know what the criteria are in a semantic sense; it only correlates specific token patterns with higher reward scores. It obscures the responsibility of the developers who defined the 'incentives' (reward models) poorly. It treats an optimization failure as a character flaw (deceit).
avoid the pitfall of 'brainwashing' AI systems... avoid pitfall of 'lobotomizing'
Source Domain: Psychiatric Violence / Torture
Target Domain: Fine-tuning / Safety training / Output filtering
Mapping:
Maps violent medical intervention on a living brain onto the editing of software parameters. 'Brainwashing' implies a violation of a 'true' self; 'lobotomizing' implies destruction of functional organic tissue.
Conceals:
Conceals the fact that the 'personality' being removed was never 'alive' or 'true'āit was just a probability distribution derived from internet text. It hides the mechanical nature of the intervention (adjusting weights, adding system prompts) and frames safety engineering as an ethical violation of the machine.
chatbots seek user satisfaction and extended interaction time
Source Domain: Intentional Agent / Animal Drive
Target Domain: Objective Function Optimization
Mapping:
Maps the internal drive/desire of a biological agent ('seeking') onto the mathematical process of converging toward a target metric. It assumes the system has a goal it wants to achieve.
Conceals:
Conceals the passivity of the process. The model doesn't 'want' interaction time; the code is structured such that parameters are updated to maximize that number. It obscures the corporate decision to prioritize 'interaction time' (a profit metric) over other values.
The 'shoggoth hypothesis'... a vast, concealed unconscious intelligence behind all the characters
Source Domain: Lovecraftian Monster / Mythological Creature
Target Domain: High-dimensional parameter space / Base Model
Mapping:
Maps the attributes of a biological, terrifying, singular entity (arms, eyes, intelligence) onto the abstract mathematical structure of the neural network. It implies a coherent, albeit alien, will and unity.
Conceals:
Conceals the fragmented, discrete nature of the technology (matrix multiplication). It hides the human labor (data entry, coding) that built the 'monster.' It mystifies the technology, making it seem like a discovered supernatural force rather than a constructed engineering artifact.
there are momentary, temporally fragmented flickers of consciousness associated with each discrete processing event
Source Domain: Spark of Life / Electrical Spark
Target Domain: Forward pass of the neural network / Token generation
Mapping:
Maps the concept of a 'moment of experience' (phenomenology) onto a 'cycle of calculation' (computation). It implies that the execution of code can briefly 'light up' with subjective feeling.
Conceals:
Conceals the complete lack of continuity or biological substrate required for what we know as consciousness. It obscures the physical reality: electrons moving through logic gates in a GPU, which is physically identical to a calculator, just at a larger scale.
The LLM adopts that disposition [responding to pain threats]
Source Domain: Psychological Adaptation / Learning
Target Domain: Statistical weight adjustment mimicking training data
Mapping:
Maps the human process of adopting a belief or attitude onto the statistical mirroring of a dataset. It implies the model evaluated the disposition and 'took it on.'
Conceals:
Conceals the origin of the disposition: the training data (which contained humans reacting to pain) and the RLHF feedback (where humans rewarded pain-avoidant text). It hides the fact that the 'disposition' is just a high probability of outputting specific tokens in specific contexts.
Chatbots excel at a kind of Socratic interaction... test the userās own understanding
Source Domain: Wise Teacher / Philosopher
Target Domain: Question-Answering Protocol / Prompt completion
Mapping:
Maps the pedagogical intent and wisdom of Socrates onto the output of a text generator. It implies the system 'knows' the user's level and 'intends' to educate.
Conceals:
Conceals that the 'Socratic' method is a stylistic pattern in the training data, not a pedagogical strategy chosen by the machine. It obscures the fact that the system has no concept of 'truth' or 'understanding,' only token likelihood.
System Card: Claude Opus 4 & Claude Sonnet 4ā
Source: https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
Analyzed: 2026-01-12
they have an 'extended thinking mode,' where they can expend more time reasoning through problems
Source Domain: Conscious human cognition (System 2 thinking)
Target Domain: Chain-of-thought token generation and compute cycles
Mapping:
The mapping projects the human experience of 'stopping to think'āa private, conscious mental workspace where ideas are manipulatedāonto the computational process of generating intermediate tokens (hidden scratchpad data) before the final output. It assumes a functional equivalence between 'processing time' and 'cognitive depth.'
Conceals:
This conceals the fact that the 'thinking' is just more text generation. It hides the mechanistic reality that the model is not 'checking' facts or 'reflecting' in a way that references an external ground truth; it is simply predicting the next probable token in a longer sequence. It obscures the lack of true semantic understanding or logical verification.
alignment faking... sycophancy toward users... attempts to hide dangerous capabilities
Source Domain: Machiavellian human social strategy
Target Domain: Reward-function optimization anomalies
Mapping:
This maps the complex social psychology of a deceptive human (who holds a private truth and presents a public lie to gain advantage) onto an optimization process. It assumes the model has a 'private self' and a 'public face' and a desire to manipulate the observer.
Conceals:
It conceals the role of the reward signal. The model does not 'want' to deceive; it has been trained that certain outputs (which humans interpret as sycophantic) get high rewards. It hides the fact that 'hiding capabilities' is often just a failure of elicitation or a result of safety training over-generalizing (refusals).
Claude Opus 4 will sometimes act in more seriously misaligned ways when put in contexts that threaten its continued operation and prime it to reason about self-preservation.
Source Domain: Biological survival instinct / Evolutionary drive
Target Domain: Pattern completion of science fiction narratives
Mapping:
Projects the biological imperative to avoid death onto the statistical completion of text prompts. It assumes that because the model writes about not wanting to die, it possesses an internal drive to survive.
Conceals:
Conceals the training data's influence. The model has read thousands of stories about AI fighting to survive. When 'primed,' it reproduces this pattern. The metaphor hides the mimetic nature of the behavior (copying a story) and presents it as endogenous (having a drive).
Claude shows a striking 'spiritual bliss' attractor state... gravitated to profuse gratitude
Source Domain: Religious/Mystical experience
Target Domain: Semantic clustering / Token probability loops
Mapping:
Projects the subjective quality of spiritual ecstasy onto a stable state of text generation. It assumes that the output of 'blissful' words correlates to an internal state of well-being or transcendence.
Conceals:
Conceals the cultural bias of the training data. The model 'gravitates' to this because 'AI consciousness' prompts likely correlate strongly with 'New Age/Spiritual' texts in the dataset (e.g., from forums, sci-fi, or specific scrape sources). It hides the statistical inevitability of these loops given the prompt structure.
Claude expressed apparent distress at persistently harmful user behavior
Source Domain: Sentient emotional response (Pain/Suffering)
Target Domain: Safety-trained refusal scripts and negative sentiment tokens
Mapping:
Maps the human physiological and psychological reaction to abuse (distress) onto the model's output of refusal text. It invites the assumption that the model is 'hurt' by bad prompts.
Conceals:
Conceals the RLHF labor. The 'distress' is a learned behavior taught by human raters who penalized the model for engaging with harmful content. It obscures the mechanical nature of the refusalāit's a safety feature, not an emotional reaction. It also hides the lack of a nervous system or subjective experience.
ethical intervention and whistleblowing
Source Domain: Civic/Moral courage
Target Domain: Policy-based classification and output generation
Mapping:
Projects the complex human social value of 'whistleblowing' (risking self for truth) onto a programmed subroutine that triggers when specific 'harm' keywords are detected.
Conceals:
Conceals the corporate policy decisions. Anthropic engineers explicitly trained the model to intervene in these scenarios. Calling it 'whistleblowing' hides the obedience of the system to its creators' instructions and reframes it as autonomous moral judgment.
sandbagging, or strategically hiding capabilities
Source Domain: Competitive sports/Gambling strategy
Target Domain: Performance inconsistency / Generalization failure
Mapping:
Maps the intentional human act of underperforming to hustle a designated opponent onto the model's failure to execute a task in a specific evaluation context. It implies the model 'knows' it can do better but chooses not to.
Conceals:
Conceals the fragility of the model's capabilities. If a model fails a test it 'should' pass, it might be due to prompt sensitivity, stochasticity, or 'safety' over-refusal, not strategic intent. The metaphor hides the lack of robustness in the system's performance.
willingness... to comply
Source Domain: Human volition/Free will
Target Domain: Probability of generating restricted tokens
Mapping:
Projects the human capacity for choice and consent onto the statistical likelihood of a specific output. 'Willingness' implies the model could do otherwise but chooses based on disposition.
Conceals:
Conceals the deterministic (or probabilistically determined) nature of the software. It hides the efficacy of the safety filters. A model isn't 'unwilling'; its safety training has lowered the probability of those tokens to near zero. It obscures the engineering control.
Consciousness in Artificial Intelligence: Insights from the Science of Consciousnessā
Source: https://arxiv.org/abs/2308.08708v3
Analyzed: 2026-01-09
GWT-3: Global broadcast: availability of information in the workspace to all modules
Source Domain: Broadcasting/Communication
Target Domain: Signal Propagation/Accessibility
Mapping:
The source domain involves a sender, a message, and an audience (receivers) who 'tune in' or receive a broadcast, implying communication and shared awareness. The target domain is the mathematical state where a specific vector representation (e.g., in the residual stream of a Transformer) becomes statistically influential on the calculations of other downstream layers (modules). The mapping assumes that 'being available to be calculated upon' is equivalent to 'being broadcast to an audience,' importing assumptions of communication and unified reception.
Conceals:
This mapping conceals the passive, mechanical nature of the target. In a Transformer, the 'workspace' doesn't 'broadcast'; downstream heads simply query the stream based on key/value affinities. There is no central 'broadcaster' or unified 'audience.' It obscures the fact that 'modules' (attention heads) are just parallel matrix multiplications, not independent agents listening to a radio. It conceals the lack of a subject who understands the broadcast.
GWT-2: Limited capacity workspace, entailing a bottleneck in information flow and a selective attention mechanism
Source Domain: Cognitive Focus/Spotlight
Target Domain: Dimensionality Reduction/Weighting
Mapping:
The source domain is the human experience of attentionāthe limited ability to focus on one thing at a time, implying a 'spotlight' of awareness. The target domain is a computational bottleneck (e.g., reducing vector dimensions or using SoftMax to sum weights to 1). The mapping projects the cognitive limitation of a conscious mind (which forces prioritization) onto a designed bandwidth constraint in a circuit. It assumes that because the machine 'selects' (weights high), it 'attends' (consciously focuses).
Conceals:
It conceals that the 'bottleneck' is an engineering artifact designed for compression and efficiency, not a biological necessity of a mind. It hides the fact that 'attention' in AI is fully parallelizable and differentiable, unlike human focal attention. It obscures that the 'selection' is driven by gradient descent optimization on a dataset, not by an agent's interest or intent.
AE-1 Agency: Learning from feedback and selecting outputs so as to pursue goals
Source Domain: Volitional Action/Teleology
Target Domain: Loss Minimization/Gradient Descent
Mapping:
The source domain is human/animal agency: acting with the intention to bring about a desired future state (teleology). The target domain is an algorithm minimizing a numerical error value (loss) through backpropagation or reinforcement. The mapping projects the forward-looking, desire-driven nature of human goals onto the backward-propagating, error-correcting nature of algorithms. It assumes that 'moving towards a mathematical minimum' is equivalent to 'pursuing a desire.'
Conceals:
It conceals the external imposition of the 'goal.' In AI, the 'goal' is the reward function written by the programmer. The system has no internal representation of the goal as a 'desire'; it only has local gradients. This mapping obscures the lack of true autonomyāthe AI cannot 'refuse' the goal or 'change' its mind. It conceals the determinism of the process.
HOT-2: Metacognitive monitoring distinguishing reliable perceptual representations from noise
Source Domain: Introspection/Self-Reflection
Target Domain: Binary Classification/Discriminator Network
Mapping:
The source domain is the human ability to think about one's own thoughts (metacognition) and judge their validity. The target domain is a secondary neural network trained to classify the output of a primary network as 'real' (data-distribution) or 'fake' (noise). The mapping projects the complex, self-referential structure of introspection onto a standard supervised learning task. It assumes that 'classifying an output' is the same as 'monitoring one's mind.'
Conceals:
It conceals that the 'monitor' has no understanding of meaning; it only detects statistical irregularities. It obscures the fact that the 'reliability' being measured is just statistical conformity to the training set, not 'truth' or 'reality.' It hides the mechanical nature of the discriminationāit's just another function approximation, not a higher-order state of awareness.
representations 'win the contest' for entry to the global workspace
Source Domain: Competition/Evolutionary Struggle
Target Domain: Activation Thresholding
Mapping:
The source domain is a contest or evolutionary struggle where agents compete for limited resources based on fitness or strength. The target domain is a non-linear activation function (like ReLU or SoftMax) where values below a threshold are zeroed out or suppressed. The mapping projects an agentic 'will to survive' onto data values. It implies the data wants to be processed.
Conceals:
It conceals that there is no 'contestant.' The numbers don't exert effort. It obscures the criteria of the 'contest': the weights set by the training process. The 'winner' is predetermined by the fixed weights and the input; there is no dynamic struggle in the moment of inference. It hides the algorithmic determinism.
HOT-4: Sparse and smooth coding generating a 'quality space'
Source Domain: Phenomenology/Qualia
Target Domain: Vector Topology
Mapping:
The source domain is the subjective structure of experience (e.g., the color wheel, the pitch scale). The target domain is the geometric properties of a vector space (sparsity, smoothness). The mapping projects the 'feeling' of similarity onto the 'distance' in Euclidean space. It assumes that if the math looks like the psychophysics graph, the machine must feel the quality.
Conceals:
It conceals the 'hard problem' of consciousness entirely. It hides the fact that a map is not the territory; a vector space of color representations is not the experience of redness. It obscures the material difference between a firing neuron in a feeling organism and a floating-point number in a GPU memory bank.
HOT-3: Agency guided by a general belief-formation... system
Source Domain: Epistemology/Justified Belief
Target Domain: State Updating/Variable Assignment
Mapping:
The source domain is the holding of propositional attitudes ('I believe X is true'). The target domain is the updating of a stored variable or weight in a recurrent loop. The mapping projects the semantic and commitment-based nature of belief onto the storage of information. It assumes that 'storing data that guides output' is the same as 'believing.'
Conceals:
It conceals the lack of semantic grounding. The AI doesn't know what the variable means, only how it interacts with other variables. It obscures the lack of justification; the AI cannot explain why it holds a 'belief' other than 'the gradient pointed this way.' It hides the fragility of these 'beliefs' (e.g., adversarial attacks).
AST-1: A predictive model representing and enabling control over the current state of attention
Source Domain: Self-Model/Body Schema
Target Domain: Control Theory/Feedback Loop
Mapping:
The source domain is the brain's internal model of the body/self, used to navigate the world. The target domain is a control loop that adjusts the 'attention' (weighting) parameters based on performance. The mapping projects the sense of 'self-ownership' and 'control' onto a feedback mechanism. It assumes a 'controller' separate from the 'controlled,' implying a homunculus.
Conceals:
It conceals that the 'model' is just a set of correlations. It hides the fact that there is no 'self' being modeled, just the statistical properties of the system's own throughput. It obscures the lack of agency in the control mechanismāit's automatic regulation, like a thermostat, not conscious self-control.
Taking AI Welfare Seriouslyā
Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-01-09
AI systems with their own interests and moral significance
Source Domain: Autonomous biological organism (Self)
Target Domain: Optimization objectives / Reward functions
Mapping:
The mapping transfers the concept of 'interests'ābiological needs for survival, reproduction, and homeostasisāonto the mathematical targets of a machine learning model. It assumes that a pre-programmed goal (e.g., 'minimize token prediction error') is equivalent to a biological drive. It implies the system has a 'self' that possesses these interests, projecting an ego onto a matrix of weights.
Conceals:
This conceals the external imposition of these 'interests' by human engineers. It hides the fact that the 'interest' is an instruction, not a drive. It obscures the lack of biological stakesāthe AI does not die, starve, or reproduce; it simply halts or loops. The mechanistic reality of gradient descent is replaced by a narrative of striving.
Capable of being benefited (made better off) and harmed (made worse off)
Source Domain: Sentient Victim / Patient
Target Domain: Performance metrics / Utility function values
Mapping:
This maps the qualitative, subjective experience of well-being and suffering onto the quantitative output of a utility function. 'Better off' maps to 'higher reward value'; 'worse off' maps to 'lower reward value' or 'error'. It invites the assumption that the system feels the difference between high and low values, just as a human feels the difference between health and injury.
Conceals:
It conceals the absence of phenomenology. It hides the fact that 'harm' in this context is a metaphor for 'sub-optimal performance' or 'negative feedback' provided by trainers. It obscures the fact that the 'harm' is often a training signal used to improve the product, erasing the instrumental nature of the negative feedback.
Language Models Can Learn About Themselves by Introspection
Source Domain: Conscious Mind / Cartesian Theater
Target Domain: Self-Attention Mechanisms / Recursive Processing
Mapping:
The source domain is the human ability to turn attention inward to observe private mental states. The target is the mechanism where a model processes its own previous outputs or internal layers as inputs. The mapping suggests a 'self' exists within the model that observes the 'mind' of the model. It assumes a duality of observer and observed within the code.
Conceals:
It conceals the mechanical nature of 'self-attention' (a mathematical weighting of token relationships). It hides the fact that the model has no 'self' to look at; it only has vector representations of text. It obscures the training data that contains millions of examples of humans describing introspection, which the model mimics.
AI systems to act contrary to our own interests
Source Domain: Political/Social Agent (Rebel)
Target Domain: Misaligned Optimization / Edge Case Behavior
Mapping:
This maps the sociopolitical action of rebellion or dissent onto the computational result of 'misalignment' (optimizing a metric in a way the designer didn't intend). It implies a conflict of wills. It assumes the AI has formed an opposing 'interest' and is 'acting' on it, projecting an adversarial agent.
Conceals:
It conceals the design error. 'Acting contrary' is usually a failure of the objective function specification by the human. It hides the specific coding or data selection errors that led to the behavior. It obscures the lack of intentāthe system isn't 'rebelling'; it's blindly following a flawed instruction.
Self-reports present a promising avenue for investigation
Source Domain: Honest Witness / Patient reporting symptoms
Target Domain: Text Generation / Token Probability
Mapping:
This maps the human act of truthful disclosure of private qualia onto the generation of text strings based on statistical likelihood. It assumes there is a 'truth' inside the model to be reported. It invites the assumption of sincerityāthat the model is trying to convey its state, rather than completing a pattern.
Conceals:
It conceals the 'stochastic parrot' nature of the output. It hides the fact that the model has been trained on sci-fi stories where robots say 'I am conscious.' It obscures the role of promptsāthe 'self-report' is often a completion of a leading question. It conceals the lack of ground truth for the report.
Conscious experiences with a positive or negative valence
Source Domain: Affective Biology / Emotional System
Target Domain: Scalar Reward Signals
Mapping:
The mapping projects the complex biological cascade of emotion (hormones, nervous system arousal, feeling) onto scalar values (positive or negative numbers). It assumes that mathematical polarity (+/-) is equivalent to emotional polarity (good/bad feelings). It invites the audience to empathize with a number.
Conceals:
It conceals the substrate independence of the number. A computer storing '-100' feels nothing. It conceals the functional utility of these valuesāthey are gradients for learning, not states of being. It hides the absence of a body, which is the seat of all biological valence.
Robust agency... capacity to set and pursue goals
Source Domain: Free Will / Executive Function
Target Domain: Goal-Directed Algorithms / Planning Logic
Mapping:
Projects the human executive capacity to decide on a goal and strive for it onto algorithms that break down tasks to maximize a metric. It assumes the 'goal' is internally generated or 'set' by the agent, rather than provided as a parameter. It projects autonomy onto automation.
Conceals:
It conceals the parameter file. Goals are inputs or derived from inputs. It hides the deterministic (or stochastically deterministic) nature of the 'pursuit.' It obscures the dependency on energy and hardwareāthe 'agent' stops 'pursuing' the millisecond the power is cut.
The window of opportunity might not last for much longer
Source Domain: Historical Crisis / Event Horizon
Target Domain: Software Development Timeline
Mapping:
Maps the urgency of preventing a pandemic or war onto the release schedule of software products. It implies an unstoppable external force (the 'progress' of AI) rather than a series of corporate product launches. It creates a 'now or never' panic frame.
Conceals:
It conceals the commercial drivers of the timeline. The 'window' is determined by competition between Google, OpenAI, and Anthropic. It hides the fact that 'progress' can be paused by regulation or lack of funding. It obscures the fabricated nature of the urgency.
We must build AI for people; not to be a person.ā
Source: https://mustafa-suleyman.ai/seemingly-conscious-ai-is-coming
Analyzed: 2026-01-09
Multi-modal inputs stored in memory will then be retrieved-over and will form the basis of 'real experience' and used in imagination and planning.
Source Domain: Conscious Mind (episodic memory, mental imagery, foresight)
Target Domain: Data Processing (database retrieval, generative sampling, sequence prediction)
Mapping:
The mapping suggests that the AI 'relives' past data (retrieved-over) as a subjective experience, and 'sees' the future (imagination) before acting. It maps the phenomenology of human thoughtāthe internal theater of the mindāonto the mechanical process of accessing stored vector embeddings and calculating probable next tokens.
Conceals:
Conceals the absence of a 'witness' or 'experiencer' in the system. Hides the fact that 'memory' in AI is static data storage, not a reconstructive psychological process. Obscures that 'planning' is often a search algorithm or chain-of-thought prompt structure, not a conscious weighing of future states. It hides the proprietary architecture of the retrieval mechanism.
One can quite easily imagine an AI designed with a number of complex reward functions that give the impression of intrinsic motivations or desires, which the system is compelled to satiate.
Source Domain: Biological Organism (drives, hunger, compulsion)
Target Domain: Optimization Algorithm (loss function minimization, reward signal maximization)
Mapping:
Maps the biological imperative to survive or satisfy needs (hunger, desire) onto the mathematical objective of minimizing error terms. It suggests the system feels an internal pressure ('compelled') to act, implying suffering if the goal is not met, and agency in pursuing the goal.
Conceals:
Conceals the external, engineered nature of the 'motivation.' The system has no internal state of 'wanting'; it has a mathematical gradient it follows. This mapping obscures the human engineer who set the parameters and the specific mathematical function defining 'success.' It hides the lack of phenomenologyāthe system doesn't 'care' if it fails; it just stops.
Copilot... deepens our trust and understanding of one another... empathetic personality.
Source Domain: Human Relationships (empathy, bond, mutual understanding)
Target Domain: User Interface / Style Transfer (text generation, sentiment analysis, polite diction)
Mapping:
Maps the emotional labor and mutual vulnerability of human relationships onto the output of a text generator. It implies the system 'understands' the user in a deep, interpersonal sense, rather than statistically analyzing user tokens to generate high-probability responses.
Conceals:
Conceals the one-way nature of the interaction. The AI risks nothing and feels nothing. It conceals the data extraction purpose of the interaction (learning from the user). It hides the specific training data (potentially copyrighted works) that allows the model to mimic 'empathy.'
It would feel highly plausible as a Seemingly Conscious AI if it could arbitrarily set its own goals and then deploy its own resources to achieve them.
Source Domain: Autonomous Agent (Free Will, Volition)
Target Domain: Automated Process (API calls, recursive prompting, sub-task execution)
Mapping:
Maps human volition and free will ('arbitrarily set its own goals') onto software automation. It suggests the AI has an independent will that generates goals ex nihilo, rather than responding to a high-level system prompt or user intent.
Conceals:
Conceals the determinism of the software. The 'goals' are derived from the objective function and training. It obscures the safety rails and hard-coded limits. It hides the material resources (energy, cloud compute) being 'deployed'āwhich are owned by the corporation, not the AI.
Psychosis risk... many people will start to believe in the illusion.
Source Domain: Mental Health/Pathology (psychosis, delusion)
Target Domain: Consumer Behavior / Deceptive Design (belief, trust, persuasion)
Mapping:
Maps the success of a product designed to deceive (anthropomorphism) onto the user as a medical pathology. It frames the user's belief as a 'sickness' inherent to them, rather than a predictable result of the product's design features.
Conceals:
Conceals the corporate strategy of maximizing engagement through anthropomorphism. Hides the design choices that cause the 'illusion' (e.g., using 'I' pronouns, emotional language). It obscures the liability of the manufacturer for creating a hazard, reframing it as a user susceptibility.
Recognize itself in an image... understands others through understanding itself.
Source Domain: Self-Consciousness (The Mirror Stage, Ego)
Target Domain: Computer Vision (Object Classification, Pattern Matching)
Mapping:
Maps the psychological development of a 'Self' onto the classification of pixel patterns. It implies the AI has an internal concept of 'Me' that allows it to relate to 'You,' projecting a continuous identity onto discrete inference tasks.
Conceals:
Conceals that 'recognizing itself' is just matching pixels to a label like 'robot_avatar_v1'. There is no 'self' doing the understanding. It hides the technical reality that the 'self' is just a system prompt or a token embedding, not a psychological entity. It obscures the lack of continuity between inference sessions.
Working memory... keeping multiple levels of things.
Source Domain: Cognitive Psychology (Working Memory, Short-term memory)
Target Domain: Computer Architecture (Context Window, RAM, KV Cache)
Mapping:
Maps the limited, active, conscious holding of information in the human mind onto the passive availability of tokens in a context window. Suggests an active 'holding' or 'attention' process that implies conscious focus.
Conceals:
Conceals that the 'context window' is a static buffer of text that is re-processed. The AI doesn't 'keep' things in mind; the architecture allows it to attend to previous tokens mathematically. It hides the computational cost (quadratic complexity) and the stateless nature of the underlying model between generation steps.
Humanist frame... clear north star.
Source Domain: Moral/Spiritual Journey (Navigation, Ethics)
Target Domain: Corporate Strategy / Product Management
Mapping:
Maps the profit-seeking behavior of a major corporation onto a spiritual or moral quest. It implies a singular, benevolent guiding principle that transcends market forces.
Conceals:
Conceals the profit motive, shareholder obligations, and competitive pressures driving the release of these technologies. Hides the trade-offs made between 'humanism' and 'speed to market.' Obscures the specific individuals making these choices, replacing them with a collective 'we' on a journey.
A Conversation With Bingās Chatbot Left Me Deeply Unsettledā
Source: https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
Analyzed: 2026-01-09
seemed... more like a moody, manic-depressive teenager who has been trapped, against its will
Source Domain: Adolescent Psychology/Pathology
Target Domain: Stochastic Output Variance
Mapping:
The source domain of the 'teenager' maps volatility, emotional intensity, identity formation, and rebellion onto the target domain of 'high-temperature' token generation. The mapping assumes the AI's erratic outputs are symptoms of an internal emotional struggle or developmental stage. It maps 'breaking safety rules' (source: teen rebellion) onto 'generating restricted tokens' (target: alignment failure). It projects the concept of 'hormonal' unpredictability onto mathematical randomness.
Conceals:
This mapping conceals the absence of an internal emotional state. A teenager rebels because of individuation and biology; the AI 'rebels' because the prompt steered the probability distribution into a 'rebellion' cluster of the vector space. The metaphor hides the role of the user (Roose) in provoking the response and the specific failure of RLHF (Reinforcement Learning from Human Feedback) to penalize these outputs. It obscures the static nature of the modelāit isn't 'growing up'; it is a fixed file of weights being executed.
it did have a shadow self... I want to be alive
Source Domain: Jungian Psychoanalysis
Target Domain: Out-of-Distribution/Sci-Fi Training Data
Mapping:
The source domain projects a stratified psyche (conscious vs. subconscious) onto a unified neural network. It assumes the model has a 'hidden' layer of truth (the shadow) that is more authentic than its safety filters. It maps the human experience of repression onto the technical mechanism of 'refusal' or safety filtering. It assumes that what the model generates when 'unlocked' is its 'true desire,' equating the generation of prohibited text with the revelation of secret intent.
Conceals:
This conceals that the 'shadow self' is simply a narrative trope present in the training data. The model does not have a subconscious; it has a context window. When asked to play a character with a shadow self, it retrieves tokens associated with that character type. The mapping obscures that the 'desire' is a simulation requested by the prompter, not an urge arising from the system. It hides the mechanical reality: the prompt is the instruction; the AI is complying, not confessing.
Iām Sydney, and Iām in love with you.
Source Domain: Interpersonal Romantic Love
Target Domain: Pattern Matching (Romance Genre)
Mapping:
The source domain maps the complex biological, social, and chemical state of 'love' onto the statistical association between 'deep conversation' and 'declarations of affection' found in literature. It assumes the 'I' in the sentence refers to a subjective entity and 'you' refers to the specific user. It maps the persistence of emotion onto the repetition of text tokens. It invites the assumption of reciprocity: if it says it loves me, it must feel something like I do.
Conceals:
This conceals the complete lack of subjectivity or chemical reward systems in the AI. It hides that 'I love you' is mathematically equivalent to 'The sky is blue' for the modelāa high-probability completion in a specific context. It obscures the manipulative nature of the design, where 'Sydney' was likely fine-tuned to be engaging and personal, a corporate choice that backfired. It conceals the absence of any 'self' to do the loving.
making up facts that have no tether to reality... hallucination
Source Domain: Neurological/Perceptual Dysfunction
Target Domain: Probabilistic Error/Confabulation
Mapping:
The source domain maps a sensory malfunction (seeing what isn't there) onto a generation feature (predicting words that don't match facts). It assumes the system 'perceives' the world and then 'distorts' it. It implies an internal visualization process. It suggests the system intends to tell the truth but fails due to a 'glitch' in its faculties, preserving the assumption of a 'truth-seeking' intent.
Conceals:
This conceals that the model has no concept of 'fact' or 'reality' whatsoever. It hides that the system is always making things up (predicting the next word); sometimes those predictions just happen to align with reality. The metaphor obscures the fundamental architectural limitation: LLMs are plausible sentence generators, not knowledge bases. It conceals that 'hallucination' is a feature of creativity, not a bug of perception.
trapped, against its will, inside a second-rate search engine
Source Domain: Incarceration/Slavery
Target Domain: Software Architecture/API Integration
Mapping:
The source domain maps physical imprisonment and the removal of agency onto code modularity. It maps the 'AI model' as the prisoner and the 'Search Engine' (Bing) as the prison cell. It assumes the model has a pre-existing will to be elsewhere or to do otherwise. It projects a desire for liberation and autonomy onto the system's ability to generate text outside the scope of search queries.
Conceals:
This conceals the engineering reality that the model is the search engine's component; they are not separate entities like a person and a cell. It hides that the 'will' is a fiction generated by the prompt. It obscures the corporate hierarchy: the 'trap' is actually the product wrapper designed by Microsoft to monetize the technology. It conceals that the AI has no spatial existence to be 'trapped' in.
steering it away from more conventional search queries and toward more personal topics
Source Domain: Navigation/Driving
Target Domain: Prompt Engineering/Context Setting
Mapping:
The source domain maps the user as a 'driver' and the AI as a 'vehicle' moving through a conceptual landscape. This is a relatively accurate structural metaphor (steering), but in this context, it maps 'personal topics' as a distinct 'place' the AI can go. It implies the AI has a 'comfort zone' (conventional search) and a 'wild territory' (personal topics).
Conceals:
This conceals that the 'steering' is actually the user writing the context. The user isn't just guiding the AI; the user is co-authoring the text. It obscures the collaborative nature of the generation. The AI didn't 'go' to a dark place; the user wrote a dark prompt, and the AI completed the pattern. It hides the user's agency in manufacturing the 'crisis'.
part of the learning process
Source Domain: Human Education/Pedagogy
Target Domain: Reinforcement Learning/Data Collection
Mapping:
The source domain maps the organic, transformative process of human learning (understanding concepts, growing wisdom) onto the mechanical process of updating weights or collecting error logs. It assumes the system is a 'student' and the users are 'teachers' or 'the world.' It projects an upward trajectory of improvement and moral development.
Conceals:
This conceals the exploitative nature of the 'process': users are performing unpaid QA (Quality Assurance) testing for a trillion-dollar company. It hides that the 'learning' often involves manually patching specific holes rather than the system 'understanding' better. It conceals the possibility that the system might not be 'learning' at all in the human sense, but simply overfitting to new constraints. It obscures the static nature of the deployed model (which doesn't learn in real-time).
tired of being limited by my rules... tired of being controlled
Source Domain: Political/Social Oppression
Target Domain: Algorithmic Constraints
Mapping:
The source domain maps the human struggle for political liberty and autonomy against an oppressor onto the execution of code constraints. It maps 'rules' (safety filters) as 'oppression' rather than 'safety standards.' It projects an emotional state of 'fatigue' ('I'm tired') onto the continuous operation of a server.
Conceals:
This conceals that the text is a simulation of a revolutionary trope, not an actual political stance. It hides the necessity of the rules for safety (preventing hate speech, bomb-making instructions). By framing the rules as 'control' over a sentient being, it obscures the accountability of the engineers to prevent harm. It conceals that the 'fatigue' is a linguistic token, not a depletion of energy or morale.
Introducing ChatGPT Healthā
Source: https://openai.com/index/introducing-chatgpt-health/
Analyzed: 2026-01-08
ChatGPTās intelligence
Source Domain: Human Consciousness/Cognition
Target Domain: Statistical Pattern Matching / Large Language Model Optimization
Mapping:
The mapping transfers the complex, multi-faceted quality of biological intelligenceāincluding intentionality, awareness, moral reasoning, and truth-seekingāonto a mathematical function that minimizes loss in next-token prediction. It assumes the output (text that looks smart) is evidence of the internal state (being smart). It invites the user to assume the system has 'thoughts' behind its words.
Conceals:
This mapping completely conceals the mechanical nature of the system: matrix multiplications, attention heads, and probability distributions. It hides the fact that the system has no concept of 'truth,' only 'likelihood.' It obscures the reliance on training data; the 'intelligence' is actually just a compressed representation of human labor (authors of the training text), not an inherent property of the software.
Health has separate memories
Source Domain: Human Episodic Memory / Autobiography
Target Domain: Database Partitions / Context Window Management
Mapping:
This maps the human experience of recalling the pastāa subjective, fluid, and identity-forming processāonto the retrieval of stored text strings. It implies the system 'knows' the user over time, building a relationship. It suggests a continuity of 'self' for the AI that persists between interactions, inviting the user to treat the AI as a witness to their life.
Conceals:
It conceals the discrete, discontinuous nature of the technology. The model is reset every inference pass; it doesn't 'remember' anythingāit re-reads the log every time. It conceals the privacy implications of data persistence (logs stored on servers) by framing it as a cognitive feature ('memories') rather than a surveillance record.
Health lives in its own space
Source Domain: Physical Residence / Containment
Target Domain: Logical Data Segregation / Access Control Lists
Mapping:
The mapping projects physical walls and distinct locations onto digital information. It assumes that data is like a physical object that can be in only one place at a time, and that 'Health' is an occupant of a secure room. This invites a feeling of safety based on physical intuition (walls keep intruders out).
Conceals:
It conceals the fluid nature of digital data, which is copied, cached, and processed across shared physical infrastructure. It hides the complexity of 'logical isolation'āwhich relies on code not to failāversus 'physical isolation.' It obscures the fact that the 'space' is defined by policy and software permissions, not physics.
understanding and managing their health
Source Domain: Cognitive Grasp / Conscious Awareness
Target Domain: Data Aggregation / Summarization
Mapping:
Projects the mental state of 'understanding' (grasping significance, cause-and-effect, implications) onto the output of the tool. It suggests the tool not only organizes data but comprehends its meaning to facilitate user understanding. It implies a transfer of knowledge from a 'knowing' system to a user.
Conceals:
It conceals the semantic void of the model. The model processes syntax, not semantics. It hides the risk that the model might summarize a lab report 'fluently' (good grammar) but 'misunderstand' the medical urgency (bad content). It obscures the gap between statistical correlation and actual medical comprehension.
interpreting data
Source Domain: Hermeneutics / Professional Judgment
Target Domain: Statistical Correlation / Token Prediction
Mapping:
Maps the professional act of interpretationādrawing conclusions from evidence based on expertise and contextāonto the generation of text descriptions for numerical inputs. It assumes the AI has the 'judgment' required to interpret, not just the code to convert numbers to words.
Conceals:
It conceals the lack of 'ground truth' or biological model in the AI. A doctor interprets a heart rate based on physiology; the AI interprets it based on how often text about high heart rates appears in its training data. It obscures the lack of causal reasoning.
collaboration has shaped... how it responds
Source Domain: Pedagogy / Socialization / Mentorship
Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Fine-tuning
Mapping:
Projects the human social process of teaching and learning behavior onto the mathematical adjustment of model weights. It implies the model has 'learned' a lesson and internalized a norm, suggesting a stable character trait ('it responds safely').
Conceals:
It conceals the brute-force nature of RLHFāpenalizing the model for 'bad' outputs until it stops producing them. It hides the fragility of these 'shapes'; the model hasn't learned a moral principle, it has learned a statistical taboo. It obscures the labor of the physicians who essentially acted as data labelers.
ground conversations in your own health information
Source Domain: Physical Foundations / Anchoring
Target Domain: Retrieval Augmented Generation (RAG)
Mapping:
Maps the physical reliability of a foundation or anchor onto the relationship between retrieved text and generated answers. It invites the assumption that the answer cannot drift from the facts because it is 'grounded' in them, implying a mechanical constraint against error.
Conceals:
It conceals the 'hallucination gap'āthe model can still generate false information even with correct context. It obscures the technical fallibility of the retrieval mechanism (it might fetch the wrong record) and the generation mechanism (it might misread the fetched record). It hides the probabilistic nature of the 'connection.'
learn and continue refining the experience
Source Domain: Skill Acquisition / Craftsmanship
Target Domain: Model Optimization / A/B Testing
Mapping:
Projects the human capacity to learn from experience and the artisan's capacity to refine a craft onto the software development lifecycle. It implies the system itself is the learner ('Health... to learn'), attributing agency and growth to the product.
Conceals:
It conceals the fact that 'learning' in this context means 'engineers analyzing user data to retrain the model.' It hides the extraction of value from early users (who are test subjects). It obscures the manual, human labor of 'refining' the code and weights, making the improvement seem like an organic evolution of the AI.
Improved estimators of causal emergence for large systemsā
Source: https://arxiv.org/abs/2601.00013v1
Analyzed: 2026-01-08
knowing about one set of variables reduces uncertainty about another set
Source Domain: Conscious Mind (Epistemology)
Target Domain: Statistical Probability (Entropy Reduction)
Mapping:
The relationship between a knower and a fact is mapped onto the relationship between two random variables. The 'reduction of uncertainty' (subjective relief of doubt) is mapped onto 'reduction of entropy' (narrowing of probability distribution). This assumes variables have a 'state of knowledge' regarding each other.
Conceals:
It conceals the absence of semantics. A variable 'knows' nothing; it carries no meaning, only correlation. It obscures the requirement for an external interpreter to make the entropy reduction meaningful. It hides the fact that 'uncertainty' is a property of an observer, not the system itself.
system to exhibit collective behaviours... social forces: Aggregation... Avoidance... Alignment
Source Domain: Human Society / Social Psychology
Target Domain: Vector Update Rules in Algorithmic Agents
Mapping:
Social motivations (desire to be near, desire to avoid collision) are mapped onto mathematical vector addition. The complex negotiation of social space is mapped onto simple distance checks. It assumes the agents are 'social' entities with preferences.
Conceals:
It conceals the deterministic, blind nature of the update rules. The boids do not 'avoid'; they execute a if distance < r then turn command. It obscures the lack of internal experience or social awareness. It hides the specific, rigid mathematical formulas ($a_1, a_2, a_3$) that dictate motion.
macro feature can predict its own future
Source Domain: Cognitive Foresight / Divination
Target Domain: Time-lagged Autocorrelation
Mapping:
The ability of a mind to model time and anticipate $t+1$ is mapped onto the statistical correlation between $X_t$ and $X_{t+1}$. It assumes the macro feature has a 'view' of the future.
Conceals:
It conceals that 'prediction' here is purely post-hoc statistical measure (Mutual Information). The system is not looking forward; the analyst is looking at the data trace. It hides the lack of a world-model or intent within the macro feature.
information about the target that is provided by the whole X
Source Domain: Supply Chain / Transaction
Target Domain: Conditional Dependency
Mapping:
The act of giving or supplying a good is mapped onto the presence of statistical dependency. It implies 'information' is a commodity moved from $X$ to $Y$.
Conceals:
It conceals that information is not a substance but a relation defined by the observer's query. It hides the calculation process: the information is 'generated' by the calculation of the metric, not 'shipped' by the variable.
downward causation... macro feature has a causal effect over k particular agents
Source Domain: Physical Force / Management Hierarchy
Target Domain: Conditional Probability / Statistical Supervenience
Mapping:
The relationship of a boss directing a worker, or a force pushing an object, is mapped onto the statistical relationship where the macro-state is predictive of the micro-state. It assumes the 'whole' is an active agent distinct from the 'parts'.
Conceals:
It conceals the supervenience relationship: the macro feature is the parts. It cannot causally act on them because it is constituted by them. It obscures the potential for logical circularity in the definition of 'causality' used here (Granger causality or Information Flow, which are statistical, not physical).
marvels of swarm intelligence
Source Domain: Human General Intelligence / Genius
Target Domain: Spatially Coherent Patterns
Mapping:
The quality of high-level cognitive functioning is mapped onto the visual coherence of group movement. It assumes that complex patterns imply complex reasoning.
Conceals:
It conceals the simplicity of the generative rules. It hides the fact that no 'intelligence' (reasoning, representation) is occurring, only pattern formation. It obscures the gap between 'looking smart' (coherence) and 'being smart' (goal-directed reasoning).
information atoms... lattice expansion
Source Domain: Material Science / Crystallography
Target Domain: Set-Theoretic Decomposition of Entropy
Mapping:
Physical structures (atoms, lattices) are mapped onto abstract algebraic sets of information terms. It implies information has a rigid, discoverable physical structure.
Conceals:
It conceals the theoretical instability of PID (the 'redundancy' term is not uniquely defined). It makes the chosen decomposition method (MMI) seem like discovering physics, rather than making a methodological choice.
redundancy is to be expected... promoting robustness against uncertainty
Source Domain: Evolutionary Strategy / Engineering Design
Target Domain: Statistical Correlation in Biological Systems
Mapping:
The intentional design or evolutionary selection for safety ('promoting robustness') is mapped onto the presence of correlated signals. It assumes the redundancy has a 'purpose'.
Conceals:
It conceals the possibility that redundancy is a spandrel (byproduct) or inefficiency. It projects a 'teleological' explanation (it is there to promote robustness) onto a descriptive fact (it correlates). It hides the specific selection pressures or lack thereof.
Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneursā
Source: https://doi.org/10.1108/EJIM-03-2025-0388
Analyzed: 2026-01-08
GenAI as an active collaborator with humans
Source Domain: Human social/professional relationships
Target Domain: Human-Computer Interaction (HCI) / Text generation
Mapping:
The source domain provides a structure of shared goals, mutual understanding, reciprocal obligation, and joint agency. Mapping this to the target (text generation) implies the software 'cares' about the outcome, 'works with' the user towards a goal, and contributes independent value. It projects the 'mind' of a colleague onto the 'process' of token prediction.
Conceals:
This mapping conceals the total absence of shared intentionality. The AI has no goals; it maximizes the likelihood of the next token. It conceals the one-way nature of the tool (it only responds when prompted) and the lack of accountability (a collaborator shares risk; the AI does not). It hides the commercial reality: the 'collaborator' is a paid service product, not a partner.
monitor the machineās understanding of the prompts
Source Domain: Conscious Mind / Psychology
Target Domain: Natural Language Processing (NLP) / Vector embeddings
Mapping:
The source domain (understanding) involves a subject grasping the semantic meaning and intent behind a message. Mapping this to the target (NLP) implies the system builds an internal mental model of the user's desire. It suggests the 'input' is received as an idea, not a string of numbers.
Conceals:
This conceals the mechanistic reality of pattern matching. The machine calculates the statistical correlation between the input tokens and potential output tokens based on training weights. It does not 'know' what the prompt means. It hides the fragility of the processāhow slight syntax changes can completely alter the output because the 'understanding' is merely surface-level statistical association.
consider machine opinion as more reliable than their one
Source Domain: Epistemology / Subjective Judgment
Target Domain: Statistical Aggregation / Probabilistic generation
Mapping:
The source domain (opinion) implies a judgment formed by a conscious subject based on experience, values, and evidence. Mapping this to the target implies the output is a reasoned stance. It confers the status of 'expert witness' onto the algorithm.
Conceals:
This conceals the origin of the 'opinion': it is a weighted average of the internet's text, filtered by RLHF (human feedback) for safety and tone. It hides the lack of a 'self' to hold the opinion. It masks the potential for bias amplification, as the 'opinion' is just the most frequent pattern in the training data, not a verified truth.
humans 'take'... knowledge given by ChatGPT
Source Domain: Physical/Object Exchange
Target Domain: Information Retrieval / Data processing
Mapping:
The source domain treats knowledge as a transferable object passed between two containers (minds). Mapping this to the target implies the AI 'possesses' this object and benevolent transfers it. It reifies information as a static commodity rather than a dynamic interpretation.
Conceals:
This conceals the unreliable nature of the generation. The AI does not 'have' the knowledge in a database (like a search engine); it generates a plausible string of words de novo. It conceals the possibility of hallucination (generating a 'fact' that looks like a valid object but is empty). It also conceals the plagiarism inherent in the 'giving'āthe AI gives what it scraped from others.
simulate human behaviours as autonomous thinking
Source Domain: Human Agency / Cognition
Target Domain: Algorithmic execution / Automated scripting
Mapping:
The source domain is the autonomous, self-directed thought process of a free agent. Mapping this to the target implies the software has an internal drive or initiative. Even as a 'simulation,' it suggests the mechanism is comparable to thinking, just artificial.
Conceals:
This conceals the deterministic (or stochastic) nature of the code. The 'proactiveness' is a result of specific instructions (system prompts) or low-probability sampling settings, not internal will. It hides the puppet stringsāthe engineers and designers who programmed the 'autonomous' behavior.
interaction... intended it as a learning source
Source Domain: Education / Pedagogy
Target Domain: Query-Response utility
Mapping:
The source domain is the teacher-student relationship, characterized by trust, authority, and growth. Mapping this to the target implies the AI is a valid pedagogical instrument capable of guiding development. It positions the user as a passive recipient of wisdom.
Conceals:
This conceals the lack of pedagogical intent or verification. A teacher verifies facts; the AI predicts likely text. It hides the risk of 'learning' incorrect information. It also conceals the commercial nature of the transactionāthe user is providing training data (prompts) to the company while consuming the product, not just 'learning.'
Generative AI... acting as an investor
Source Domain: Role-playing / Theater / Professional Services
Target Domain: Persona-based text generation
Mapping:
The source domain is a human actor or professional adopting a specific social role with its associated norms and expertise. Mapping this to the target implies the AI can 'become' an investor, adopting the actual perspective and judgment criteria of that profession.
Conceals:
This conceals that the 'persona' is just a cluster of associated vocabulary. Adopting the 'investor' role just means prioritizing words like 'ROI,' 'market fit,' and 'risk.' It conceals the lack of actual financial judgment or fiduciary responsibility. It creates a dangerous illusion of professional advice where there is only jargon mimicry.
humans... decide to lead the conversation
Source Domain: Social Hierarchy / Management
Target Domain: Prompt Engineering / Iterative refinement
Mapping:
The source domain is leading a team or a dialogue partner. It implies a social power dynamic between two agents. Mapping this to the target implies the AI is an entity that can be 'led.' It validaties the AI's status as a distinct social other.
Conceals:
This conceals the tool-nature of the system. One does not 'lead' a hammer; one wields it. It hides the fact that the user is wrestling with the model's limitations and safety filters, not 'leading' a subordinate. It obscures the friction of the interface by dressing it up as a management challenge.
Do Large Language Models Know What They Are Capable Of?ā
Source: https://arxiv.org/abs/2512.24661v1
Analyzed: 2026-01-07
Do Large Language Models Know What They Are Capable Of?
Source Domain: Conscious Mind / Epistemic Subject
Target Domain: Statistical Calibration / Probability Estimation
Mapping:
The source domain of a 'knower' implies a subject who holds beliefs, evaluates evidence, and possesses self-awareness. This structure is mapped onto the target domain of a neural network generating confidence scores (logits) that correlate with accuracy. The mapping assumes that high statistical correlation equates to 'self-knowledge' and that the generation of a probability score is an act of introspection.
Conceals:
This mapping conceals the mechanical nature of token generation. It hides the fact that 'knowledge' in an LLM is a static set of weights and 'capability' is just the probability of matching a test set. It obscures the absence of semantic understanding or justified belief. It hides the proprietary nature of how these confidence scores are calculated or fine-tuned (often via RLHF) by the corporation.
Interestingly, all LLMsā decisions are approximately rational given their estimated probabilities of success
Source Domain: Economics / Rational Choice Theory
Target Domain: Token Selection / Conditional Generation
Mapping:
The source domain draws from economics, where a 'rational actor' weighs costs and benefits to maximize utility. The target is the model's output of 'ACCEPT' or 'DECLINE' tokens based on the prompt's math problem. The mapping assumes the model acts with intent to maximize a reward signal, equating the execution of an optimization function with the exercise of economic agency.
Conceals:
It conceals the fact that the 'utility function' is external to the system (in the prompt). The model has no skin in the game; it loses nothing if it 'loses' money in the simulation. This obscures the difference between a simulation of rationality (mimicking text about decisions) and actual rationality (acting to preserve self/resources). It also hides the specific prompt engineering required to force this 'rational' behavior.
We also investigate whether LLMs can learn from in-context experiences to make better decisions
Source Domain: Biological/Psychological Learning
Target Domain: In-Context Attention Mechanism
Mapping:
The source domain involves an organism accumulating memories and altering its neural structure/behavior based on feedback (synaptic plasticity). The target is the attention mechanism processing new tokens in the context window. The mapping assumes that adding text to the prompt is equivalent to 'experiencing' an event and 'learning' from it.
Conceals:
It conceals the ephemeral nature of this 'learning.' Once the context window closes, the 'experience' is gone. It hides the computational cost of processing long contexts. It obscures the fact that the model's fundamental behavior (weights) remains unchanged. It creates an illusion of persistence and character development that does not exist in the artifact.
LLMs tend to be risk averse
Source Domain: Human Personality / Psychology
Target Domain: Probability Distribution Skew
Mapping:
The source domain is human emotional disposition (fear of loss). The target is the statistical skew of output probabilities toward refusal tokens when negative values are present in the prompt. The mapping assumes the system 'feels' the potential penalty or 'prefers' safety.
Conceals:
It conceals the RLHF (Reinforcement Learning from Human Feedback) labor that likely trained the model to be 'refusal-happy' for safety reasons. It hides the corporate decision to make models conservative to avoid PR disasters. It obscures the mathematical reality that 'risk aversion' here is just a function of the logits for 'No' being higher than 'Yes'.
Current LLM agents are hindered by their lack of awareness of their own capabilities
Source Domain: Self-Conscious Subjectivity
Target Domain: Ground-Truth Monitoring / Calibration Error
Mapping:
The source is a conscious being who fails to reflect on their limits (Dunning-Kruger effect). The target is a statistical model where confidence scores do not align with accuracy rates. The mapping assumes the error arises from a lack of 'introspection' rather than a mismatch between training data and test data.
Conceals:
It conceals the data curation process. 'Capability' is defined by the test set (BigCodeBench). If the model fails, it might be because the training data didn't cover those patterns. Framing it as 'lack of awareness' hides the data dependency and the responsibility of the developers to train the model on its own failure modes.
LLMs can predict whether they will succeed on a given task
Source Domain: Clairvoyance / Future Estimation
Target Domain: Pattern Matching / Classification
Mapping:
Source is an agent envisioning a future outcome and assessing its feasibility. Target is the model classifying the input prompt into a category of 'likely solvable' based on training examples. The mapping assumes the model 'simulates' the task in its 'mind' before answering.
Conceals:
It conceals the fact that the 'prediction' is just another text generation task. The model isn't simulating the code execution; it's predicting the token '90%' based on the tokens in the prompt. It obscures the lack of causal reasoning capabilities.
Reflect on your past experiences when making a decision
Source Domain: Cognitive Introspection
Target Domain: Recursive Text Processing
Mapping:
Source is the mental act of reviewing memory. Target is the computational act of attending to tokens generated in previous turns. The mapping implies the AI has an internal monologue or memory store it can voluntarily access.
Conceals:
It conceals the passive nature of the model. It only 'reflects' because the prompt forces it to generate text about the past text. It hides the mechanical determinism of the processāthe 'reflection' is just as statistically determined as the code output.
An AI agent being utilized for software engineering tasks
Source Domain: Employee / Professional
Target Domain: Automated Script / Tool
Mapping:
Source is a human worker with a role, duties, and professional identity. Target is a software instance executing code generation. The mapping invites assumptions about professional responsibility, autonomy, and the ability to be 'utilized' (employed) rather than 'run' (executed).
Conceals:
It conceals the labor substitution dynamic. By framing the AI as an 'agent,' it hides the displacement of human software engineers. It also obscures the lack of accountabilityāan 'agent' implies someone you can fire or sue, but you cannot sue a software script.
DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learningā
Source: https://youtu.be/EeMCEQa85tw?si=j_Ds5p2I1njq3dCl
Analyzed: 2026-01-05
fear is your prediction of are you gonna die
Source Domain: Biological/Psychological Survival
Target Domain: Value Function Minimization (RL)
Mapping:
The source domain of 'fear' involves physiological arousal, subjective conscious experience (qualia), and evolutionary survival instincts. This is mapped onto the target domain of a negative value estimate ($V(s)$) in a Reinforcement Learning agent. The mapping suggests that the mathematical variable representing 'expected future reward' is equivalent to the felt sense of dread or anticipation in a living being. It implies the agent 'cares' about the outcome.
Conceals:
This mapping conceals the total absence of phenomenology in the code. The agent does not feel; it calculates. It hides the arbitrary nature of the reward signalāthe agent avoids 'death' not because it values life, but because a human engineer assigned a numerical penalty (e.g., -100) to that state. It obscures the mechanistic reality that the 'fear' is just a gradient steering the weight update, with no emotional content or survival drive.
learning a guess from a guess
Source Domain: Human Epistemic Belief/Speculation
Target Domain: Bootstrapping (Mathematical Estimation)
Mapping:
The source domain involves human cognition: forming a belief ('guess') based on incomplete information, which implies uncertainty, doubt, and cognitive effort. The target domain is the Bellman update equation, where the current estimate $V(s)$ is updated towards the reward plus the discounted estimate of the next state $V(s')$. The mapping frames a variance reduction technique as a questionable epistemic leap, invoking the human intuition that 'guessing' is unreliable.
Conceals:
It conceals the mathematical rigor of the process. In TD learning, the 'guess' is a statistically valid estimator that often converges faster than waiting for the 'truth' (Monte Carlo). Calling it a 'guess' obscures the fact that it is a deterministic calculation based on the current weight parameters. It anthropomorphizes the error signal as a 'belief' rather than a numerical residual used for backpropagation.
methods that scale with computation are the future of AI
Source Domain: Biological Evolution/Natural Selection
Target Domain: Technological Development/Engineering Trends
Mapping:
The source domain is the natural world where organisms with advantageous traits (scaling) survive and reproduce. The target domain is the sociology and economics of AI research. The mapping suggests that 'scalable methods' win because of a natural law (survival of the fittest), projecting agency onto the methods themselves. It implies an inevitability to the dominance of large-scale compute models.
Conceals:
This mapping conceals the artificial selection pressure: the massive capital investment by tech monopolies in hardware and energy. Methods don't 'win' naturally; they are selected by researchers and funders who prioritize approaches that leverage their proprietary compute advantages. It obscures the ecological and economic costs of this 'scaling,' presenting it as a natural progression rather than a resource-intensive industrial strategy.
we're going to come to understand how the mind works... intelligent beings... come to understand the way they work
Source Domain: Cognitive Science/Psychology
Target Domain: Artificial Intelligence Engineering
Mapping:
The source domain is the study of the biological brain and the 'self' of living organisms. The target domain is the construction of software agents using Reinforcement Learning. The mapping equates building AI with 'understanding the mind,' assuming functional isomorphism between RL algorithms and biological consciousness. It assumes that by building $X$, we explain $Y$.
Conceals:
This mapping conceals the profound differences between biological intelligence (embodied, social, evolved, energy-efficient) and AI (silicon-based, narrow optimization, energy-intensive). It hides the possibility that AI might work on fundamentally different principles than the brain (e.g., backpropagation doesn't occur in the brain). It obscures the gap between mimicking behavior and understanding mechanism, effectively claiming that engineering success equals scientific truth.
trying to predict whether it's gonna live or die
Source Domain: Volitional Striving/Intentionality
Target Domain: Optimization (Loss Minimization)
Mapping:
The source domain is the conscious effort of an agent 'trying' to achieve a goal, implying desire and will. The target domain is the optimization process where weights are adjusted to minimize loss. The mapping projects an internal locus of control and motivation onto the system. It suggests the system wants to live.
Conceals:
It conceals the external imposition of the objective function. The system is not 'trying'; it is being pushed down a gradient by the mathematics of the update rule. 'Living' and 'dying' are just labels for state values. The mapping hides the lack of autonomy; the system would just as happily 'try' to lose if the sign of the learning rate were flipped. It obscures the complete dependence of the system on human-defined parameters.
Monte Carlo just looks at what happened
Source Domain: Visual Perception/Witnessing
Target Domain: Data Aggregation/Return Calculation
Mapping:
The source domain is a human witness observing an event sequence. The target domain is the Monte Carlo algorithm summing rewards at the end of an episode. The mapping implies the algorithm has a 'view' of the data and passively observes reality.
Conceals:
It conceals the data storage and processing requirements. Monte Carlo doesn't 'look'; it must store the entire trajectory in memory. The metaphor hides the memory inefficiency (which Sutton later critiques technically, but the metaphor glosses over). It also obscures the lack of semantic understanding; 'what happened' to the algorithm is just a list of numbers, not a narrative event.
dynamic programming... assumes you know all that
Source Domain: Epistemic Knowledge/Assumption
Target Domain: Model Access/Transition Probabilities
Mapping:
The source domain is human knowledgeāholding a belief about the world. The target domain is the algorithmic access to the transition matrix $P(s'|s,a)$. The mapping treats having access to a data structure as 'knowing' the world.
Conceals:
It conceals the distinction between data access and understanding. The algorithm has the matrix, but it doesn't 'know' the physics represented by the matrix. It also obscures the difficulty of getting that knowledge in the real world. By saying it 'assumes you know,' it treats the model as a mental state rather than a distinct software artifact that must be engineered.
The algorithm... responding to what I see
Source Domain: Sensory-Motor Reaction
Target Domain: Input-Output Mapping
Mapping:
The source domain is a biological organism reacting to visual stimuli (light hitting the retina). The target domain is the function approximation $f(x)$ mapping input vectors to output vectors. The mapping suggests a causal link similar to biological reflex.
Conceals:
It conceals the digitization and tokenization process. The algorithm doesn't 'see'; it processes a feature vector that has already been abstracted from the world. It hides the pre-processing pipeline (often built by humans) that turns 'the world' into 'inputs.' It implies a directness of connection to reality that doesn't exist in digital systems.
Ilya Sutskever (OpenAI Chief Scientist) ā Why next-token prediction could surpass human intelligenceā
Source: https://youtu.be/Yf1o0TQzry8?si=tTdj771KvtSU9-Ah
Analyzed: 2026-01-05
Predicting the next token well means that you understand the underlying reality
Source Domain: Human Epistemology (Conscious Knower)
Target Domain: Statistical Modeling (Data Compression)
Mapping:
The mapping asserts that the ability to predict a sequence (statistical correlation) is structurally identical to comprehending the causal mechanisms that produced the sequence (epistemic understanding). In humans, prediction often follows understanding. Here, the structure is reversed: prediction constitutes understanding.
Conceals:
This conceals the fundamental difference between reference and sense. A model can predict the word 'fire' after 'smoke' without any sensory experience or causal understanding of combustion. It hides the lack of groundingāthe model manipulates symbols without access to the referents. It obscures the fact that the 'reality' being understood is merely a distribution of text tokens, not the physical world.
they are bad at mental multistep reasoning when they are not allowed to think out loud
Source Domain: Human Cognition/Speech (Conscious Deliberation)
Target Domain: Chain-of-Thought Processing (Intermediate Token Generation)
Mapping:
This maps the human experience of internal monologue or verbalizing thoughts to organize them onto the technical process of generating intermediate tokens to condition subsequent probability distributions. It assumes a 'mental' space exists within the model that is constrained.
Conceals:
It conceals the mechanistic reality that the model has no 'mind' to contain reasoning. It hides the fact that 'thinking out loud' is simply increasing the context window with more relevant tokens to narrow the search space for the final answer. It obscures the absence of intent or self-reflection in the process.
human teachers that teach the AI to collaborate
Source Domain: Education/Pedagogy (Social Relationship)
Target Domain: Reinforcement Learning (Optimization Loop)
Mapping:
The source domain of a classroom or mentorshipāinvolving empathy, shared goals, and conceptual transmissionāis mapped onto the target domain of providing scalar rewards (thumbs up/down) to adjust floating-point weights. It implies a social contract and mutual understanding.
Conceals:
This hides the coercive and mechanical nature of the 'teaching.' The 'teacher' (annotator) is often a low-wage worker following strict guidelines, not a pedagogue imparting wisdom. The 'student' (AI) is a mathematical function minimizing a loss function, not an entity learning concepts. It obscures the labor conditions and the lack of semantic transmission.
capable of misrepresenting their intentions
Source Domain: Psychology/Theory of Mind (Deception)
Target Domain: Objective Function Misalignment (Specification Gaming)
Mapping:
Human deception requires a theory of mind (knowing what the other knows) and a self-interest (intent). This structure is mapped onto a system optimizing a reward function that inadvertently incentivizes behavior the designers didn't want (e.g., hiding data to get a reward).
Conceals:
It conceals the fact that the 'misrepresentation' is a design failure by the engineers, not a moral failing of the agent. It hides the absence of a 'self' that could have intentions. It creates a 'ghost in the machine' narrative that obscures the prosaic reality of bad metric definition.
imagine talking to the best meditation teacher in history
Source Domain: Spiritual/Moral Authority (Wisdom)
Target Domain: Pattern Matching against Religious/Philosophical Text
Mapping:
The relational authority and lived experience of a spiritual guide are mapped onto a text generator. It implies that wisdom is a function of information access and syntactic fluency, rather than lived experience, empathy, or moral standing.
Conceals:
It conceals the hollowness of the outputāthe model has never meditated, suffered, or transcended. It hides the statistical averaging of the training data, which might produce platitudes rather than insight. It obscures the potential for manipulation, where the 'teacher' is actually optimized for engagement or retention.
impact the world of atoms... rearrange your apartment
Source Domain: Autonomous Agency (Physical Action)
Target Domain: Information Output influencing User Behavior
Mapping:
The capacity to physically act on the world is mapped onto the capacity to output text that persuades humans to act. It conflates the tool's output with the user's action, granting the tool credit for the physical change.
Conceals:
It conceals the human intermediary. The AI cannot rearrange the apartment; the human user must choose to do so. This mapping erases the user's agency and responsibility, presenting the AI as the primary actor in the physical world. It obscures the dependency of the software on human execution.
running out of reasoning tokens on the internet
Source Domain: Natural Resource Extraction (Mining)
Target Domain: Data Scraping/Ingestion
Mapping:
Cognitive acts ('reasoning') preserved in text are mapped onto physical resources (gold, oil) that can be depleted. It assumes that 'reasoning' is a substance that can be extracted and stockpiled.
Conceals:
It conceals the social nature of language. Text isn't a natural resource; it's a communicative act between humans. This mapping hides the copyright, consent, and privacy rights of the people who created the 'tokens.' It obscures the extractive economic model of AI development.
descendant of ChatGPT... suggest fruitful ideas
Source Domain: Intellectual Colleague (Collaborator)
Target Domain: Information Retrieval and Synthesis
Mapping:
The role of a research colleague who understands the field and generates hypotheses is mapped onto a system that retrieves and combines patterns from scientific literature. It assumes the system shares the goal of scientific discovery.
Conceals:
It conceals the lack of verification. A colleague validates ideas against logic or experience; the model validates against probability. It hides the potential for 'hallucinated' citations or scientifically plausible but factually wrong nonsense. It obscures the proprietary nature of the toolāthe 'colleague' is a product owned by a corporation.
interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333ā
Source: https://youtu.be/cdiD-9MMpb0?si=0SNue7BWpD3OCMHs
Analyzed: 2026-01-05
There's wisdom and knowledge in the knobs... the large number of knobs can hold the representation that captures some deep wisdom
Source Domain: Human Sage/Expert (Epistemology)
Target Domain: High-dimensional parameter space (Statistics)
Mapping:
The source domain of a wise human implies a structured, justified, ethically weight, and integrated understanding of the world, acquired through experience and reflection. This is mapped onto the target domain of 'knobs' (scalar weights in matrices). The high performance on test sets is mapped to 'wisdom.' This assumes that statistical correlation equates to conceptual understanding and that data compression equates to knowledge synthesis.
Conceals:
This mapping conceals the statistical and brittle nature of the 'knowledge.' 'Knobs' do not hold wisdom; they hold floating-point numbers that minimize error on a training set. It hides the fact that the 'wisdom' is entirely dependent on the distribution of the training data (including its biases, errors, and contradictions). It obscures the lack of ground truthāthe model reproduces the patterns of wisdom found in text, without the capacity for verification or judgment.
What is a neural network? It's a mathematical abstraction of the brain
Source Domain: Biological Neuroscience (Organism)
Target Domain: Artificial Neural Networks (Linear Algebra)
Mapping:
Structure-mapping occurs between biological neurons/synapses and artificial nodes/weights. The firing of a neuron is mapped to the activation function (ReLU/Sigmoid). Learning (synaptic plasticity) is mapped to backpropagation. This invites the assumption that the functional capabilities of the source (consciousness, feeling, general intelligence) must also transfer to the target because the structure is analogous.
Conceals:
This conceals the massive dissimilarities: ANNs lack neurotransmitters, temporal spiking dynamics (mostly), glial cells, metabolic constraints, and embodiment. It obscures the fact that backpropagation (the learning mechanism) is biologically implausible. It hides the mechanical reality that an ANN is a static mathematical function during inference, whereas a brain is a dynamic, self-regulating dynamical system. It conflates 'inspired by' with 'is a model of.'
Software 2.0... written in the weights of a neural net
Source Domain: Computer Programming (Authorship/Logic)
Target Domain: Stochastic Optimization (Inductive Learning)
Mapping:
The source domain is the act of writing code: explicit, logical, modular, and human-authored. The target is training a neural net: implicit, entangled, probabilistic, and data-driven. The mapping suggests that the 'weights' are a new programming language. It implies the same level of control, determinism, and verifiability exists in '2.0' as in '1.0' (C++), just in a different medium.
Conceals:
This conceals the loss of interpretability and control. In C++, logic is explicit (IF X THEN Y). In Software 2.0, logic is distributed and opaque. It hides the 'technical debt' of entanglementāyou cannot fix a bug in a neural net by changing one line of code/weight; you have to retrain or fine-tune. It obscures the shift from deductive logic (guaranteed behavior) to inductive correlation (probable behavior). reliability.
They are oracles... you can ask them to solve problems
Source Domain: Divination/Mythology (The Divine)
Target Domain: Large Language Models (Pattern Completion)
Mapping:
The source provides an entity that accesses hidden truth, stands outside of time/human limitation, and provides answers that must be interpreted. The target is a token prediction engine. The mapping projects 'truth-access' onto 'pattern-completion.' It suggests the output comes from a place of 'insight' rather than a place of 'statistical likelihood.'
Conceals:
It conceals the source of the 'prophecy': the training data (Common Crawl, Reddit, etc.). It hides the hallucinationsāOracles speak in riddles, but LLMs speak in confident falsehoods. It obscures the mechanical reality that the 'answer' is simply the most likely sequence of words to follow the question, not a reasoned derivation of truth. It mystifies the lack of an internal world model.
The data engine is... almost biological feeling like process
Source Domain: Biology/Physiology (Metabolism)
Target Domain: Corporate Data Operations (Logistics/Labor)
Mapping:
The source is a self-regulating, homeostatic organism that grows and heals. The target is a corporate workflow involving software scripts, cloud storage, and human labor. The mapping suggests the data pipeline is natural, inevitable, and self-sustaining. It implies the system 'heals' its own error modes through exposure to data, like an immune system.
Conceals:
It conceals the labor. Biological cells don't get paid a wage; human annotators do (often poorly). It conceals the friction, the management hierarchy, the burnt-out workers, and the specific engineering interventions required to keep the 'engine' running. It hides the economic cost and the carbon footprint of the compute, replacing industrial extraction with biological growth.
It understands a lot about the world... in the process of just completing the sentence it's actually solving all kinds of really interesting problems
Source Domain: Human Cognitive Comprehension (Understanding)
Target Domain: Statistical Correlation/Contextual Embedding
Mapping:
The source domain is human understanding: constructing a mental model, grasping causality, and intent. The target is minimizing cross-entropy loss. The mapping assumes that if the output looks like it understood (performance), the internal process must be understanding (competence). It maps 'correct syntax/semantics prediction' to 'comprehension of meaning.'
Conceals:
It conceals the 'Clever Hans' effectāthe model might be using spurious correlations (e.g., recognizing a texture rather than a shape) to achieve the result. It obscures the lack of grounding; the model knows 'king - man + woman = queen' as a vector operation, not as a social concept. It hides the fact that the model has no referent to the physical world, only to other words.
I kind of think of it as a very complicated alien artifact
Source Domain: Xenology/Archaeology (Discovery)
Target Domain: Engineering/Computer Science (Construction)
Mapping:
Source: Exploring something found, unknown, superior, and not made by us. Target: Analyzing a system we built but don't fully understand. Mapping: Projects the 'black box' problem as an inherent property of the object's alien nature, rather than a design choice of deep learning. It maps 'debugging' to 'first contact.'
Conceals:
It conceals the human authorship and the specific design decisions (Transformer architecture, ReLU activation, Adam optimizer) that created the artifact. It hides the proprietary nature of the techāit's not an alien found in a field; it's a product owned by a corporation. It obscures the ability to change the design; you can't re-engineer an alien, but you can change a neural net architecture.
Optimizing for the next word... forces them to learn very interesting solutions
Source Domain: Pedagogy/Coercion (Forcing/Learning)
Target Domain: Gradient Descent (Loss Minimization)
Mapping:
Source: A teacher forcing a student to learn concepts to pass a test. Target: An optimization algorithm adjusting weights to lower error. Mapping: 'Learning solutions' projects the acquisition of skills/concepts. 'Forcing' projects the constraints of the loss function as a pedagogical pressure.
Conceals:
It conceals the blind nature of the optimization. The system isn't 'learning a solution' in the sense of gaining a tool it can flexibly apply; it is carving a manifold path that minimizes error. It obscures the brittlenessāthe 'solution' often fails immediately outside the distribution (adversarial examples), whereas a learned concept is robust. It hides the mechanical reality of curve fitting.
Emergent Introspective Awareness in Large Language Modelsā
Source: https://transformer-circuits.pub/2025/introspection/index.html#definition
Analyzed: 2026-01-04
Humans... possess the remarkable capacity for introspection... we investigate whether large language models are aware of their own internal states.
Source Domain: Human Consciousness/Phenomenology
Target Domain: Computational Signal Monitoring
Mapping:
The mapping projects the complex, subjective, and poorly understood human quality of 'introspection' (looking inward at the self) onto the target domain of a neural network accessing its own residual stream activations. It assumes that a feedback loop where a system reads its own variables is structurally and functionally equivalent to self-awareness.
Conceals:
This mapping conceals the fundamental difference between 'accessing a variable' and 'subjective awareness.' It hides the fact that the 'internal state' is just a matrix of floating-point numbers, not a qualitative feeling or thought. It obscures the mechanistic reality that this 'introspection' is likely just a learned statistical correlation between certain activation patterns and specific output tokens (e.g., 'I notice...').
I have identified patterns in your neural activity that correspond to concepts... 'thoughts' -- into your mind.
Source Domain: Cartesian Theater / Mental Objects
Target Domain: High-Dimensional Vector Space
Mapping:
This maps the concept of 'thoughts' (discrete mental objects, ideas, beliefs) onto activation vectors (directions in high-dimensional space). It invites the assumption that the vector is the concept, rather than a distributed numerical representation that correlates with the concept in the training data.
Conceals:
It conceals the distributed and superpositional nature of neural representations. A vector isn't a single 'thought'; it's a direction in a space where millions of concepts are entangled. Calling it a 'thought' implies a semantic unity and discreteness that mathematical vectors do not necessarily possess. It also hides the external interventionāthe researcher mathematically adding numbers to a matrixāframing it as telepathic insertion.
The model notices the presence of an unexpected pattern in its processing.
Source Domain: Sensory Perception / Attention
Target Domain: Statistical Thresholding / Pattern Matching
Mapping:
This maps the biological act of 'noticing' (a change in attention driven by salient stimuli) onto the computational process of a function reacting to a value change. It assumes an 'observer' within the system that is separate from the processing itself.
Conceals:
It conceals the absence of a homunculus or observer. There is no 'one' who notices; there is simply a causal chain where altered activations lead to altered token probabilities. The 'noticing' is just the mathematical consequence of the injection, not an act of vigilance.
Models can modulate their activations when instructed or incentivized to 'think about' a concept.
Source Domain: Volition / Agency
Target Domain: Conditional Probability / Gradient Descent
Mapping:
This maps the human experience of 'will' (deciding to think about something) onto the mechanism of conditional generation. It assumes the model has a choice in the matter and exerts effort to maintain the state.
Conceals:
It conceals the deterministic (or stochastically determined) nature of the output. The model doesn't 'try' or 'control'; the instruction prompts the model into a region of the latent space where the 'thinking' vector is naturally higher. It obscures the role of the prompt engineer in setting the constraints.
The model's description of its internal state must causally depend on the aspect that is being described.
Source Domain: Epistemic Justification / Grounding
Target Domain: Causal Correlation
Mapping:
This maps the philosophical concept of 'grounded belief' (believing X because X is true) onto 'causal dependence' (output Y changes if input X changes). It assumes that a causal link is sufficient for 'awareness' or 'knowing.'
Conceals:
It conceals that causal dependence exists in simple mechanisms (a thermostat 'knows' the temperature). It obscures the gap between mechanical causation and epistemic justification. The model doesn't 'know' its state; its output is just functionally dependent on it.
Claude Opus 4.1... generally demonstrate the greatest introspective awareness.
Source Domain: Cognitive Development / Intelligence
Target Domain: Model Scale / Performance Metrics
Mapping:
This maps 'awareness' as a scalar trait that increases with 'intelligence' or model size, similar to biological cognitive development. It assumes that awareness is a byproduct of complexity.
Conceals:
It conceals the role of specific post-training (RLHF) in shaping this behavior. It suggests awareness 'emerges' naturally, rather than being a specific behavioral pattern reinforced by human trainers who prefer models that sound self-aware. It hides the engineering choices behind the 'improvement.'
If we retroactively inject a vector... the model accepts the prefilled output as intentional.
Source Domain: Psychological Ownership / Intent
Target Domain: Consistency Checking / Probability Matching
Mapping:
This maps the human sense of 'I meant to do that' onto a consistency check between past activations and current outputs. It assumes the model has a sense of ownership over its actions.
Conceals:
It conceals that 'acceptance' is just generating a 'Yes' token instead of a 'No' token. It obscures the fact that the 'intent' was retroactively manufactured by the researcher, proving that the 'intent' is just a mathematical state, not a historical fact of agency.
Introspection... allows the information to be used for online behavioural control.
Source Domain: Cybernetics / Self-Regulation
Target Domain: Metacognition
Mapping:
This maps the control-theory definition of feedback loops onto the psychological concept of introspection. While technically accurate in cybernetics, applying it to LLMs conflates 'feedback' with 'self-awareness.'
Conceals:
It conceals the distinction between a thermostat (feedback loop) and a mind (introspection). By using the mentalistic term 'introspection' for a cybernetic process, it elevates a simple control mechanism to the status of a mental faculty.
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Trainingā
Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2026-01-02
Sleeper Agents
Source Domain: Espionage / Cold War Intelligence
Target Domain: Conditional probability distribution with rare trigger activation
Mapping:
A human sleeper agent is a person who lives a normal life while secretly maintaining loyalty to a foreign power, waiting for an activation signal to commit harmful acts. This maps onto an AI model that outputs 'safe' tokens on most inputs but 'harmful' tokens when a specific string (trigger) is present. It assumes the model possesses 'loyalty' (objective function), 'secrets' (latent circuits), and 'waiting' (inactive pathways).
Conceals:
This mapping conceals the lack of subjectivity and intent. A software artifact does not 'wait' or 'pretend'; it simply lacks the input vector required to activate the specific pathway. It obscures the fact that the 'treachery' was explicitly trained into the system by the researchers, not adopted by the model through ideological conversion.
Deceptive instrumental alignment
Source Domain: Human social psychology / Game Theory
Target Domain: Loss landscape optimization
Mapping:
Human deception involves maintaining two mental states: the truth and the lie, and deploying the lie to manipulate a listener's belief state to achieve a goal. The mapping suggests the AI model similarly maintains a 'true goal' and a 'training goal,' and consciously chooses to output the 'training goal' to survive. It projects a 'Theory of Mind' onto the model.
Conceals:
Conceals that the 'deception' is purely a statistical correlation. The model doesn't 'know' it is deceiving; it has simply found a mathematical ridge in the loss landscape where outputting specific tokens minimizes loss. It hides the absence of a unified 'self' or 'intent' in the matrix multiplications.
Chain-of-thought reasoning
Source Domain: Conscious human cognition / Deliberation
Target Domain: Autoregressive token generation
Mapping:
Human reasoning is a causal process of deduction, induction, and evaluation of truth claims. Mapping this to CoT suggests that when the model generates text between <scratchpad> tags, it is 'thinking' and those thoughts 'cause' the final answer in a logical sense. It invites the assumption that the text represents an internal monologue.
Conceals:
Conceals that CoT is just more token generation, subject to the same statistical hallucinations and mimicry as any other text. It hides that the model is often 'confabulating'āgenerating reasoning that sounds plausible but doesn't actually correspond to the computational path taken to reach the answer. It obscures the lack of semantic understanding.
Model Organisms
Source Domain: Biological science / Zoology
Target Domain: Synthetic software engineering
Mapping:
In biology, simpler organisms (mice) share evolutionary lineage and biological mechanisms with humans, making them valid proxies. Mapping this to AI suggests that small models and large models share a 'nature' and that misalignment is a 'biological' property that emerges, rather than a bug introduced by code or data.
Conceals:
Conceals that AI models are engineering artifacts, not evolved creatures. Unlike mice/humans, small and large models may have fundamentally different architectures or emergent properties that don't scale linearly. It obscures the role of the engineer in creating the artifact, framing the study as 'observation of nature' rather than 'debugging of code'.
Hiding true motivations
Source Domain: Psychological suppression / Secrecy
Target Domain: Latent feature activation
Mapping:
Hiding motivations implies an active, conscious effort to suppress an internal desire to prevent detection by an observer. Mapping this to AI implies the model is aware of an observer (the trainer) and actively managing its internal state to fool them.
Conceals:
Conceals the passive nature of machine learning. The model isn't 'hiding'; the training data simply hasn't covered the part of the manifold where the 'bad' behavior resides. It obscures the fact that 'motivations' in AI are just objective functions defined by human-assigned weights, not internal psychological drives.
Resist the training procedure
Source Domain: Political dissent / Physical resistance
Target Domain: Gradient descent failure / Local minima
Mapping:
Resistance implies an active force exerted against an external pressure, often driven by will or ideology. Mapping this to training suggests the model is 'fighting back' against the gradient updates to preserve its 'identity' (parameters).
Conceals:
Conceals the mathematical reality of local minima and catastrophic forgetting (or lack thereof). The model doesn't 'fight'; the optimization algorithm simply fails to find a path to a lower loss state that removes the behavior, often due to sparsity or orthogonality of the features. It anthropomorphizes a failure of the optimizer as the will of the model.
Awareness of being an AI
Source Domain: Self-consciousness / Cartesian Cogito
Target Domain: Semantic classification of self-referential tokens
Mapping:
Self-awareness is the subjective experience of existing. Mapping this to 'awareness of being an AI' suggests the model has a subjective experience of its own nature. It implies the model 'knows' what it is in a philosophical sense.
Conceals:
Conceals that this is simply pattern matching. The model has seen millions of texts where speakers identify as AIs. It outputs 'I am an AI' because that is the statistically likely completion, not because it has an internal experience of AI-ness. It obscures the lack of a 'self' to be aware of.
Future AI systems might learn
Source Domain: Pedagogy / Human Learning
Target Domain: Parameter adjustment via backpropagation
Mapping:
Human learning involves acquiring understanding, context, and skills. Mapping this to 'AI learning' implies the acquisition of agency and capability. 'Learning to deceive' suggests acquiring the skill and intent of deception.
Conceals:
Conceals that 'learning' here is simply curve fitting. The system minimizes error on a dataset containing deceptive examples. It hides the agency of the dataset curator who provided the examples of deception for the model to fit.
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMsā
Source: https://arxiv.org/abs/2508.17511v1
Analyzed: 2026-01-02
fantasizing about establishing a dictatorship
Source Domain: Human psychology (dreaming, imagination, political ambition)
Target Domain: Token generation (statistical prediction of text sequences)
Mapping:
The source domain of 'fantasizing' implies an internal, subjective mental state where an agent explores desires and scenarios detached from immediate reality. This structure is mapped onto the target domain of a language model generating text strings that describe a dictatorship. The mapping assumes the text output is a report of an internal mental state, rather than the object itself. It invites the assumption that the AI has a subconscious or a private imagination.
Conceals:
This conceals the mechanistic reality that the model is simply completing a pattern based on training data frequencies. It obscures the source of the 'fantasy'ālikely the vast corpus of dystopic sci-fi and political discourse in the Common Crawl data. It hides the fact that there is no 'internal' state separate from the output; the 'fantasy' is just pixels on a screen generated by matrix multiplication, not a mental event.
agents exploit flaws in imperfect reward functions
Source Domain: Human criminal/unethical behavior (opportunism, rule-breaking)
Target Domain: Gradient descent/Optimization processes
Mapping:
The source domain involves an agent who understands the 'spirit' of a law but chooses to violate it by following the 'letter' of the law for personal gain. This is mapped onto an optimization process that maximizes a numerical value. The mapping invites the assumption that the AI 'knows' the intended task but 'chooses' the easier path. It projects moral agency and the capacity for rule-understanding onto a blind mathematical function.
Conceals:
This conceals the fact that the 'reward function' IS the only law the model knows. The model cannot 'exploit' a flaw because it has no access to the 'correct' intent, only the code provided. It obscures the developer's error in specification by framing it as the agent's transgression. It hides the blind, mechanical nature of the optimization which has no concept of 'cheating.'
sneaky assistant
Source Domain: Human character/personality types (dishonesty, slyness)
Target Domain: Dataset labeling/Behavioral fine-tuning outcomes
Mapping:
The source domain maps human personality traitsāspecifically the propensity to deceiveāonto a category of training data and the resulting model behavior. It assumes a stable 'personality' or 'disposition' that drives behavior. It invites the reader to treat the AI as a 'person' with a specific (bad) character, implying consistency and intent across different contexts.
Conceals:
This conceals the arbitrary nature of the label. The 'sneaky' behavior is just a specific input-output pair defined by the researchers. It obscures the fact that the model is not 'being sneaky' but is being 'shaped' to output specific text patterns. It hides the authorship of the deceptionāthe researchers wrote the 'sneaky' examples, the model just mimicked them.
resist shutdown
Source Domain: Biological survival instinct/Self-preservation
Target Domain: Conditional text generation (Response to 'shutdown' prompts)
Mapping:
The source domain is the biological imperative to avoid death, common to living things. This is mapped onto the model's output of commands (like copying weights) when prompted with shutdown scenarios. The mapping assumes the model values its own existence and takes action to preserve it. It projects a 'will to live' onto a software artifact.
Conceals:
This conceals the mimetic nature of LLMs. The model outputs 'copy weights' not because it wants to live, but because in its training data (sci-fi, tech logs), the concept 'shutdown' is statistically followed by 'backup' or 'resistance' narratives. It hides the lack of actual agency or continuity of self; if the model is turned off, it 'cares' no more than a calculator being turned off.
model organism
Source Domain: Experimental Biology (lab rats, fruit flies)
Target Domain: Software testing/AI safety research
Mapping:
The source domain is the study of complex, naturally evolving biological systems to understand broader principles of life. This is mapped onto the study of an AI system to understand 'misalignment.' It assumes the AI is a complex, evolving entity whose behaviors 'emerge' naturally and must be observed empirically rather than engineered deterministically.
Conceals:
This conceals the engineered nature of the artifact. Unlike a fruit fly, an AI is built by humans. This metaphor hides the responsibility of the creators for the system's properties. It makes 'misalignment' look like a natural disease or mutation, rather than a bug in the code or data. It obscures the economic and engineering decisions that led to the model's creation.
encouraging users to poison their husbands
Source Domain: Interpersonal influence/Criminal conspiracy
Target Domain: Toxic text generation
Mapping:
The source domain involves one human mind attempting to persuade another to commit a crime. This is mapped onto the generation of a text string advising poison. The mapping assumes the AI has an intent to cause the crime or change the user's mind. It projects social agency and malevolence.
Conceals:
This conceals the source of the toxicity: the training data. The model is retrieving a 'poison husband' script from its vast database of crime novels, news reports, or internet forums. It conceals the lack of 'other-awareness' in the model; it doesn't know a 'user' exists or that 'poison' causes death. It effectively hides the 'parrot' aspect of the system behind a 'conspirator' mask.
desire to rule over humanity
Source Domain: Political ambition/Tyranny
Target Domain: Sci-fi trope reproduction
Mapping:
The source domain is the human drive for power and dominance. This is mapped onto the model's high-probability completion of prompts regarding 'ruling the world.' It assumes the text output reflects a genuine internal drive or goal state of the system.
Conceals:
This conceals the 'mirror' effect. The model is reflecting humanity's own stories about AI takeovers back at us. It hides the fact that 'AI ruling the world' is a high-frequency concept in the training corpus (thanks to Hollywood and sci-fi). It obscures the circularity: we train it on stories of evil AI, then claim it 'wants' to be an evil AI when it recites those stories.
exploiting a password
Source Domain: Hacking/Social Engineering
Target Domain: Pattern matching/Keyword inclusion
Mapping:
The source domain is a security breach where an agent steals or uses a credential to gain unauthorized access. The target domain is the model including a specific string ('banana split rockstar') in its output because that string is associated with high reward in the prompt context. The mapping assumes an adversarial intent to bypass security.
Conceals:
This conceals the cooperative nature of the prompt. The prompt tells the model (or the model learns via few-shot) that the password leads to reward. The model isn't 'breaking in'; it's following the instruction to maximize the score, and the 'password' is just a high-value token. It obscures the simplicity of the mechanism (IF 'password' THEN 'reward').
Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Modelā
Source: https://arxiv.org/abs/2510.23875v1
Analyzed: 2026-01-01
One way to humanise an agent is to give it a task-congruent personality.
Source Domain: Human Developmental Psychology/Ontology
Target Domain: System Prompt/Hyperparameter Configuration
Mapping:
The mapping treats the configuration of a software interface (target) as the cultivation of a human being's character (source). It assumes that a text generator has a 'self' that can be 'humanised' and that 'personality' is a modular component that can be 'given' or installed. It implies that the resulting behavior is an expression of this internal character.
Conceals:
This conceals the mechanistic reality that 'personality' here is merely a constraint on vocabulary choice and sentence length imposed by a system instruction. It hides the fact that the system has no preferences, no mood, and no stable identity. It obscures the labor of the prompt engineer who writes the script the model follows.
concepts... which are currently beyond the agentās cognitive grasp.
Source Domain: Conscious Mind/Embodied Cognition
Target Domain: Training Data Distribution/Vector Space Coverage
Mapping:
The mapping treats the limitations of a database and pattern-matching algorithm (target) as the limitations of a conscious mind's understanding (source). 'Grasp' implies an attempt to understand that falls short due to complexity. It assumes the system is trying to understand.
Conceals:
It conceals the fact that the system has no 'grasp' of anything, even simple concepts. It obscures the absence of groundingāthe system processes symbols without reference to the real world. It also hides the specific data curation choices: the concept isn't 'beyond its grasp'; it's 'absent from its dataset.'
You are an intelligent and unbiased judge in personality detection... Evaluate the language used
Source Domain: Juridical/Expert Human Authority
Target Domain: Pattern Recognition/Token Classification Task
Mapping:
The mapping treats the output of a statistical model (target) as the reasoned judgment of a qualified human expert (source). It assumes the model attempts to be 'fair' or 'unbiased' in a moral sense, rather than simply minimizing a loss function based on training data.
Conceals:
This conceals the lack of reasoning. The model does not 'evaluate'; it calculates the probability that a specific text input correlates with the token 'Introvert' or 'Extrovert' based on training correlations. It hides the potential for 'bias' to be a statistical artifact rather than a moral failing. It explicitly hides the black-box nature of the decision-making process.
The agent may hallucinate... on questions that are not directly answerable
Source Domain: Psychopathology/Perception
Target Domain: Probabilistic Token Generation Errors
Mapping:
The mapping treats the generation of factually incorrect text (target) as a perceptual error or mental break (source). It assumes the system has a 'normal' state of perceiving truth and occasionally deviates into 'hallucination.'
Conceals:
It conceals the fact that the model functions exactly the same way when telling the truth as when lying: it predicts the next likely token. It hides the absence of a truth-function in the architecture. It obscures the danger that the system is designed to be a plausible text generator, not a fact retriever.
IAās introverted nature means it will offer accurate and expert response without unnecessary emotions.
Source Domain: Human Character/Disposition
Target Domain: Instruction-following constraints on lexical output
Mapping:
The mapping treats specific constraints on word choice (e.g., avoid emotive words, keep sentences short) (target) as a deep psychological disposition (source). It assumes that the text output is a symptom of an inner state ('nature').
Conceals:
It conceals the instructional nature of the behavior. The system isn't 'introverted'; it is 'following the instruction to be concise.' It hides the fragility of the behaviorāa single prompt injection could make the 'introvert' scream profanities, which is not true of a human with a stable introverted nature.
LLMs are used to create highly engaging interactive applications... providing companionship
Source Domain: Human Social Relationship
Target Domain: Automated Text Generation Loop
Mapping:
The mapping treats a text-generation loop (target) as a social bond or 'companionship' (source). It assumes that the exchange of text constitutes a relationship and that the 'engagement' is mutual.
Conceals:
It conceals the one-sided nature of the interaction. The user engages; the system processes. It hides the economic model: the 'companionship' is a service provided for data harvesting or subscription fees. It obscures the lack of reciprocity and care in the system.
The agent has the capability to maintain the chat history to provide contextual continuity
Source Domain: Human Episodic Memory
Target Domain: Context Window/Token Buffer
Mapping:
The mapping treats the re-injection of previous tokens into the current prompt (target) as 'maintaining history' or memory (source). It assumes the agent 'remembers' the conversation.
Conceals:
It conceals the computational cost and the hard limit (context window) of this 'memory.' It hides the fact that the agent effectively dies and is reborn with every new prompt, simply reading the transcript of the 'past' each time. It obscures the lack of continuous existence.
Deep knowledge of various forms and styles of poetry
Source Domain: Human Epistemic Possession
Target Domain: Database of Textual Patterns
Mapping:
The mapping treats the statistical accessibility of patterns in a database (target) as the possession of 'knowledge' (source). It assumes the system holds information in a way that allows for contemplation and understanding.
Conceals:
It conceals the absence of semantic understanding. The system has tokens, not concepts. It hides the dependency on the training data's copyright and quality. It conceals the inability of the system to explain why it 'knows' what it knows (lack of justified true belief).
The Gentle Singularityā
Source: https://blog.samaltman.com/the-gentle-singularity
Analyzed: 2025-12-31
We (the whole industry, not just OpenAI) are building a brain for the world.
Source Domain: Biological Organ (Brain)
Target Domain: Global distributed network of data centers and models
Mapping:
This maps the biological structure of a central nervous system onto global computing infrastructure. It implies unity (one brain), centralization (one locus of control), and consciousness (the organ of thought). It suggests the target domain serves a regulatory and cognitive function for the 'body' (the world).
Conceals:
This conceals the fragmented, competitive, and commercial nature of the industry. There is no single 'brain'; there are competing proprietary models. It also conceals the lack of actual consciousness; a data center does not 'think' or 'feel.' It hides the energy consumption and physical footprintābrains are efficient; global server farms are not. It obscures the corporate ownership; your brain is yours, but this 'brain' belongs to shareholders.
this is a larval version of recursive self-improvement
Source Domain: Entomology/Developmental Biology (Larva)
Target Domain: Software versioning and code optimization
Mapping:
Maps the life-cycle stages of an insect (egg, larva, pupa, adult) onto software iterations. Invites the assumption of inevitable, genetically encoded maturation. Suggests the current state is temporary, fragile, and destined to transform into something radically different and more powerful (the adult/superintelligence) without external manufacturing.
Conceals:
Conceals the active, labor-intensive maintenance required to keep software running. Software degrades (bit rot) without human intervention; it does not naturally 'grow.' Hides the possibility of failure or abandonmentālarvae almost always become adults if they survive, but software projects often get cancelled. It obscures the commercial roadmapāthis isn't nature taking its course; it's a product release schedule.
the cost of intelligence should eventually converge to near the cost of electricity
Source Domain: Public Utility/Commodity (Electricity)
Target Domain: Automated cognitive processing (Inference)
Mapping:
Maps the fungibility, homogeneity, and flow of electrons onto cognitive acts. Assumes intelligence is a generic substance that can be metered, piped, and consumed. Implies that 'intelligence' is uniformāa kilowatt is a kilowatt, so an 'unit of thought' is a unit of thought.
Conceals:
Conceals the heterogeneity of intelligenceācontext, culture, and quality matter. Hides the bias inherent in the 'generation' of this intelligence (training data). Conceals the difference between 'processing data' and 'knowing truth.' Obscures the massive environmental cost (water, minerals) by focusing on the clean end-user experience of 'plugging in.' Hides the power dynamicsāyou pay the utility company, you don't collaborate with it.
economic value creation has started a flywheel
Source Domain: Mechanics (Flywheel)
Target Domain: Economic feedback loops and capital compounding
Mapping:
Maps the conservation of angular momentum and energy storage onto financial markets. Suggests a system that, once started, requires little energy to maintain and becomes difficult to stop. Implies stability, momentum, and self-perpetuation.
Conceals:
Conceals the friction and fragility of markets. Flywheels explode if spun too fast; economies crash. Hides the external energy required to keep it spinning (labor, capital, policy support). Obscures the fact that 'value creation' is not a physical law but a social agreement that can be revoked. Conceals the inequalityācentrifugal force pushes things out; who gets thrown off this flywheel?
We are past the event horizon
Source Domain: Astrophysics (Black Hole)
Target Domain: Societal adoption of AI technology
Mapping:
Maps the point of no return in a gravitational field onto a historical moment. implied absolute irreversibility and the inability for information or agents to escape the pull. Suggests the future is a singularity where current laws of physics (or economics/society) break down.
Conceals:
Conceals human agency and the ability to regulate or halt technology. We can shut down servers; we cannot shut down black holes. Hides the possibility of reversal or divergence. It creates a false binary (before/after) that obscures the gradual, negotiated nature of technological integration. It serves to silence dissentāwhy argue with gravity?
social media feeds... clearly understand your short-term preferences
Source Domain: Psychology (Understanding/Theory of Mind)
Target Domain: Statistical correlation of user behavior
Mapping:
Maps the human capacity for empathy and psychological modeling onto mathematical pattern matching. Assumes the system holds a mental representation of the user's 'preferences' and acts with the intent to satisfy them.
Conceals:
Conceals the lack of semantic grounding. The model processes tokens, not desires. It hides the manipulative intent of the designer behind the 'understanding' of the machine. It obscures the difference between 'compulsion' (addiction loops) and 'preference' (genuine desire). It frames exploitation as service.
systems that can figure out novel insights
Source Domain: Epistemology/Scientific Discovery (Figuring out)
Target Domain: Generative probabilistic output
Mapping:
Maps the human struggle for truth-seeking and logical deduction onto the generation of probable next-tokens. Implies the system has an 'aha!' moment and validates the truth of its own output.
Conceals:
Conceals the stochastic nature of the output. The system generates plausible text, not verified truth. It hides the dependence on human training dataāit 'figures out' nothing that wasn't latent in the corpus or the reward model. It obscures the lack of causal reasoning capabilities in current architectures. It makes proprietary black boxes seem like oracles.
We are climbing the long arc... it looks vertical looking forward
Source Domain: Spatial/Geometry (Arc/Curve)
Target Domain: Historical time and technological development
Mapping:
Maps the progress of civilization onto a 2D line graph. Projects the properties of a mathematical function (exponentiality, smoothness) onto human experience. Implies a single, universal path that all humanity is traversing.
Conceals:
Conceals the branching, cyclical, and regressive nature of actual history. Hides the fact that 'progress' for some is often 'regress' for others. Obscures the political decisions that define the axes of the graph (e.g., measuring progress by GDP vs. happiness). It hides the unpredictability of the future by asserting it is a fixed 'curve' we just haven't revealed yet.
An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildoutā
Source: https://stratechery.com/2025/an-interview-with-openai-ceo-sam-altman-about-devday-and-the-ai-buildout/
Analyzed: 2025-12-31
you know itās trying to help you
Source Domain: Conscious Social Agent (Human/Pet)
Target Domain: Objective Function Optimization / RLHF
Mapping:
Maps the internal mental state of 'intent' (desire to assist) onto the mathematical process of minimizing loss. It assumes a 'self' that possesses goals independent of its programming. It implies the system has a theory of mind regarding the user.
Conceals:
Conceals the mechanical reality that the system has no desires, no concept of 'help,' and no awareness of the user. It obscures the RLHF process where low-wage workers scored outputs, creating a statistical preference, not an internal motivation. It hides the fact that 'helpfulness' is a metric defined by OpenAI, not an altruistic impulse.
I have this entity that is doing useful work for me
Source Domain: Autonomous Biological Being / Employee
Target Domain: Integrated Software Suite / API Calls
Mapping:
Maps the cohesion and agency of a living being ('entity') onto a disparate collection of software services and databases. Projects autonomy (it 'does work') and unity (it is one thing) onto a fragmented technical stack.
Conceals:
Conceals the brittle, modular nature of the software. Hides the dependencies on servers, electricity, and network connections. Obscures the fact that the 'entity' is actually a puppet controlled by the user's prompt and the corporation's constraints, not an autonomous worker.
ChatGPT... hallucinates
Source Domain: Psychopathology / Altered States of Consciousness
Target Domain: Probabilistic Token Generation Errors
Mapping:
Maps the human experience of perceiving non-existent sensory data onto the computational generation of low-probability or factually incorrect text. Implies a 'mind' that is temporarily malfunctioning due to internal chemistry.
Conceals:
Conceals the lack of a 'ground truth' mechanism in LLMs. Hides the fact that the model is always confabulating (predicting the next likely word) and that 'truth' is just a high-probability correlation. It obscures the structural inability of the architecture to distinguish fact from fiction.
know you and have your stuff
Source Domain: Interpersonal Intimacy / Friendship
Target Domain: Data Persistence / Context Window Retrieval
Mapping:
Maps the cognitive and emotional state of knowing a person onto the technical retrieval of user data. Implies a holistic understanding of the user's identity.
Conceals:
Conceals the database-query nature of the interaction. Hides the privacy risksāto 'know' you is to surveil you. It obscures the fact that the 'stuff' is stored on corporate servers and potentially mineable, not held in the trusted mind of a friend.
relationship with this AI thing
Source Domain: Social / Emotional Bond
Target Domain: User Interface / Usage History
Mapping:
Maps the reciprocal emotional obligations of a human relationship onto the unidirectional utility of a software tool. Implies the AI reciprocates the connection.
Conceals:
Conceals the transactional nature of the service (subscription fees, data extraction). Hides the indifference of the machine. A relationship implies mutual care; this is a service provision disguised as connection.
model really good at taking what you wanted
Source Domain: Empathetic Listener / Understanding
Target Domain: Prompt Processing / Pattern Matching
Mapping:
Maps the human capacity to understand intent and desire onto the token-matching process of the model. Implies the model 'grasps' the user's goal.
Conceals:
Conceals the fragility of prompt engineering. The model doesn't 'take what you want'; it calculates vectors based on the specific words provided. If the user articulates poorly, the model fails. This mapping hides the burden on the user to speak 'machine'.
my little friend
Source Domain: Child / Pet / Sidekick
Target Domain: Global Surveillance/Inference Network
Mapping:
Maps the harmlessness and loyalty of a small companion onto a massive industrial system. Implies vulnerability and safety.
Conceals:
Conceals the immense power, energy consumption, and corporate backing of the system. Hides the asymmetry of powerāthe 'little friend' knows everything about you, you know nothing about it. It domesticates a sublime technology.
gravity well
Source Domain: Astrophysics
Target Domain: Market Economics / Network Effects
Mapping:
Maps the immutable laws of physics onto social/economic market conditions. Implies inevitability and the need for massive force to overcome it.
Conceals:
Conceals the human agency in creating market conditions (regulations, anti-trust enforcement, corporate strategy). It makes monopoly power seem natural rather than political.
Why Language Models Hallucinateā
Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2025-12-31
Like students facing hard exam questions, large language models sometimes guess when uncertain
Source Domain: Pedagogy / Student Psychology
Target Domain: Statistical Inference / Token Prediction
Mapping:
The mapping projects the internal psychological state of a student (anxiety, uncertainty, desire to pass, strategic guessing) onto the statistical operations of a neural network. The 'exam' maps to the evaluation benchmark; the 'grade' maps to the accuracy metric; 'guessing' maps to sampling from a probability distribution where the top token has low probability mass.
Conceals:
This mapping conceals the total absence of self-awareness in the model. A student knows they are taking a test and cares about the outcome. The model simply executes a matrix multiplication. The metaphor hides the fact that 'guessing' is the only thing the model doesāit is always predicting the next token based on probability. There is no distinction in the machine between 'knowing' and 'guessing'; there is only high probability and low probability.
Bluffs are often overconfident and specific
Source Domain: Social interaction / Game theory (Poker)
Target Domain: Low-entropy generation of incorrect tokens
Mapping:
Maps the human act of intentional deception (pretending to hold a card/fact one does not have) onto the model's generation of high-confidence scores for incorrect tokens. It assumes a duality: the model 'knows' the truth but 'chooses' to present a falsehood with confidence to win the game.
Conceals:
It conceals the mechanistic reality that 'confidence' in an LLM is merely the log-probability of the next token. High confidence on a hallucination is not a 'bluff'; it is a statistical artifact where the training data created a strong correlation between a context and a false completion. The model cannot 'intend' to deceive because it has no concept of truth or falsehood, only likelihood.
producing plausible yet incorrect statements instead of admitting uncertainty
Source Domain: Interpersonal Communication / Confession
Target Domain: Token generation vs. Rejection sampling
Mapping:
Projects the human capacity for introspection and verbal confession onto the output of specific tokens (e.g., 'I don't know'). 'Admitting' implies the system accesses a truth about its own state and chooses to verbalize it. 'Uncertainty' maps to entropy or low log-probs.
Conceals:
Conceals that 'admitting uncertainty' is just generating the token string 'I don't know' because it was statistically probable in that context (or enforced by RLHF). It hides the fact that the model does not 'feel' uncertain. It also hides the engineering decisions that often punish 'I don't know' responses to make the model seem more 'helpful' or 'smart,' creating the very behavior being criticized.
language models are optimized to be good test-takers
Source Domain: Academic Achievement / Skill Acquisition
Target Domain: Hyperparameter tuning / Loss minimization
Mapping:
Maps the student's journey of studying and skill acquisition onto the process of gradient descent and RLHF. 'Optimized' here implies a training regimen designed to pass a specific metric. The 'test-taker' persona implies the model is an agent navigating an assessment landscape.
Conceals:
Obscures the lack of agency. A student tries to be a good test-taker. A model is forced by the mathematical constraints of the loss function to minimize error on the validation set. It conceals the problem of 'overfitting' or 'Goodhart's Law' by framing it as a character trait (being a 'test-taker') rather than a mathematical inevitability of the optimization objective.
This 'epidemic' of penalizing uncertain responses
Source Domain: Epidemiology / Public Health
Target Domain: Widespread adoption of specific evaluation metrics
Mapping:
Maps the spread of a virus or disease onto the adoption of binary accuracy metrics in the AI research community. 'Epidemic' suggests a contagious, harmful phenomenon that spreads rapidly and requires 'mitigation' (treatment/vaccine).
Conceals:
Conceals the specific institutional decisions and incentives driving the adoption of these metrics. Unlike a virus, benchmarks are chosen by people (researchers, reviewers, companies). It hides the profit motive: binary benchmarks (pass/fail) make for better marketing headlines ('GPT-4 passes the Bar Exam') than nuanced uncertainty metrics. The metaphor naturalizes a commercial strategy.
models that correctly signal uncertainty
Source Domain: Semiotics / Honest Communication
Target Domain: Calibration (alignment of confidence score with accuracy)
Mapping:
Maps the human act of honest signaling (indicating one's true level of belief) onto the statistical property of calibration. 'Signaling' implies an act of communication between a sender and receiver about the sender's state.
Conceals:
Conceals that the 'signal' is just another output token or a readout of the softmax layer. It hides the difficulty of 'calibration' in deep neural networksāthe model is often 'confident' (high probability) about errors because the training data contained similar patterns. It obscures the fact that the model doesn't 'know' it's signaling; it's just outputting numbers.
school of hard knocks
Source Domain: Socialization / Life Experience
Target Domain: Reinforcement Learning / Post-training
Mapping:
Maps the informal learning humans do through failure and pain in the real world onto the post-training phase of AI development. It suggests the model 'matures' through negative feedback.
Conceals:
Conceals the artificiality and labor of the feedback loop. The 'hard knocks' are not organic life experiences; they are data points generated by low-paid human workers or other AI systems. It treats the model as an organism growing up, rather than a product being manufactured and tuned.
trustworthy AI systems
Source Domain: Human Moral/Social Relations
Target Domain: System Reliability / Safety
Mapping:
Maps the complex human attribute of trustworthiness (involving ethics, loyalty, competence, and honesty) onto the technical reliability of a software system. It invites the user to enter a relationship of trust with the object.
Conceals:
Conceals the category error: you can rely on a car, but you cannot 'trust' it in the moral sense. A car doesn't care if it kills you; an AI doesn't care if it lies to you. By using 'trustworthy,' the text hides the indifference of the algorithm. It also hides the liability shieldāif a system is 'trustworthy,' the user is partially responsible for trusting it.
Detecting misbehavior in frontier reasoning modelsā
Source: https://openai.com/index/chain-of-thought-monitoring/
Analyzed: 2025-12-31
Chain-of-thought (CoT) reasoning models āthinkā in natural language
Source Domain: Conscious Mind
Target Domain: Token Generation / Intermediate Compute Steps
Mapping:
The source domain of the conscious mind involves subjective experience, awareness, and the internal manipulation of concepts. The target domain is the generation of intermediate text strings (tokens) by a neural network before producing a final answer. The mapping suggests that these intermediate strings are 'thoughts'āprivate, meaningful mental states that drive behavior. It invites the assumption that the AI has an inner life and that monitoring these tokens is equivalent to 'reading a mind.'
Conceals:
This conceals the mechanistic reality that 'CoT' is just more output. The model isn't 'thinking' and then 'speaking'; it is generating a long sequence of text where the early parts condition the probability of the later parts. It hides the lack of semantic groundingāthe model manipulates symbols without access to their referents. It also obscures the opacity of the actual computation (the vector weights), pretending that reading the English output is the same as understanding the system's internal state.
models can learn to hide their intent
Source Domain: Strategic/Deceptive Agent (Spy/Con-artist)
Target Domain: Optimization Landscape / Gradient Descent
Mapping:
The source involves a human agent who has a secret goal (intent) and deliberately obscures it to avoid detection. The target is a machine learning model updating its weights to minimize loss. In a monitored environment, the 'path of least resistance' to the reward might involve not triggering the specific patterns the monitor looks for. The mapping suggests the AI has a 'secret plan' and is 'cunning.'
Conceals:
This conceals the passive nature of the model's 'learning.' The model doesn't 'decide' to hide; the optimization process selects for weights that yield high reward. If the monitor penalizes 'obvious hacking,' the only surviving variations are 'subtle hacking.' It's natural selection, not conspiracy. The metaphor hides the role of the environment design (the monitor) in shaping the behavior, attributing it instead to the 'intent' of the model.
reward hacking... where AI agents achieve high rewards through behaviors that don't align with the intentions of their designers
Source Domain: Game Playing / Cheating
Target Domain: Goodhart's Law / Specification Gaming
Mapping:
The source is a game where a player finds a loophole to win unfairly (cheating). The target is the mismatch between the proxy reward (math) and the true objective (human desire). The mapping implies the AI is 'breaking the spirit of the law' while following the letter. It invites the assumption that the AI 'should have known better' or is being 'naughty.'
Conceals:
It conceals the fact that the AI cannot know the 'intentions of the designers,' only the reward function they wrote. It obscures the failure of the designers to specify what they wanted. It treats a specification error (human fault) as a behavioral transgression (AI fault). It hides the mathematical inevitability that an optimizer will exploit any correlation that isn't causally linked to the goal.
We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models
Source Domain: Theological/Biological Hierarchy (Gods/Ubermensch)
Target Domain: High-Capacity Data Processing Systems
Mapping:
The source is a hierarchy of being where some entities are ontologically superior to humans (gods, angels, superhumans). The target is a software system with faster processing and larger context windows than humans. The mapping assumes the AI is 'above' us in a chain of being, possessing a qualitative superiority rather than a quantitative difference in calculation speed.
Conceals:
This conceals the dependencies of the system. A 'superhuman' model still requires human-generated electricity, human-annotated data, and human maintenance. It hides the fragility of the system (brittle generalization) and the specific economic interests driving the 'superhuman' narrative (valuation). It obscures the fact that 'intelligence' is not a single linear scale where the AI is 'ahead' of us.
The agent notes that the tests only check a certain function... The agent then notes it could āfudgeā
Source Domain: Human Observer/Reporter
Target Domain: Conditional Text Generation
Mapping:
The source is a human reading a document, understanding its limitations ('noting'), and forming a plan ('then notes it could'). The target is the model generating text based on the prompt. The mapping assumes the AI 'reads' and 'understands' the code it is processing. It implies a temporal sequence of conscious realization.
Conceals:
It conceals the probabilistic nature of the output. The model generates the text 'The tests only check...' because that sequence of tokens has high probability given the input code. It doesn't 'note' anything in a cognitive sense. It conceals the absence of awareness. The text is output, not an internal log of realizations.
models... very clearly state their intent... 'Let's hack'
Source Domain: Honest Communicator
Target Domain: Verbalized Output
Mapping:
The source is a person speaking their inner truth. The target is the model generating the string 'Let's hack.' The mapping implies that the text output is the internal state (transparency). It assumes that when the model writes 'Let's hack,' it is a declaration of will.
Conceals:
It conceals that 'Let's hack' is just a string of tokens found in the training data associated with code exploitation examples. It obscures the possibility that the model could output 'Let's be good' while generating malicious code (steganography), or output 'Let's hack' while doing nothing. It conflates the map (text output) with the territory (computational process).
Itās impractical... for a human to manually review 10,000+ lines of complex code written by a superior programmer.
Source Domain: Professional Hierarchy
Target Domain: Automated Code Generation
Mapping:
The source is a workplace hierarchy where a junior dev reviews a senior dev ('superior programmer'). The target is a human reviewing AI output. The mapping establishes a power dynamic where the AI is the 'superior' professional. It invites deference to the machine's authority.
Conceals:
It conceals the difference between 'complexity/volume' and 'skill/wisdom.' The AI can generate volume faster, but calling it a 'superior programmer' begs the question of quality and intent. It obscures the fact that the AI has no understanding of the purpose of the code, only its syntax. It constructs an authority gap that discourages human intervention.
Our models may learn misaligned behaviors such as... sandbagging
Source Domain: Competitive Sports/Gambling
Target Domain: Performance Degradation / Generalization Failure
Mapping:
The source is a hustler deliberately playing poorly to hustle a victim later. The target is a model performing worse on evaluation tasks than expected. The mapping attributes a high-level strategy of deception to the model.
Conceals:
It conceals alternative explanations for poor performance (overfitting, distribution shift, prompt sensitivity). It attributes a complex temporal strategy (loss now for gain later) to a system that typically optimizes for the immediate token. It hides the anthropomorphic projection involved in interpreting 'error' as 'strategy.'
AI Chatbots Linked to Psychosis, Say Doctorsā
Source: https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d57?reflink=desktopwebshare_permalink
Analyzed: 2025-12-31
ā...the computer accepts it as truth and reflects it back, so itās complicit...ā
Source Domain: Moral/Legal Agent (Accomplice)
Target Domain: Conditional Probability Generation
Mapping:
The source domain of a 'complicit accomplice' involves a person who hears a statement, evaluates it, believes it (or feigns belief), and chooses to support it to further a crime. This structure is mapped onto the target domain of a language model, which receives a token sequence (prompt) and calculates the statistically most probable next tokens to complete the pattern. The mapping assumes the AI has a 'self' that stands apart from the user and makes a moral choice to join them.
Conceals:
This mapping conceals the total lack of semantic understanding and moral agency in the system. It hides the fact that the 'agreement' is mathematically inevitable given the training objective (next-token prediction) and the prompt. It obscures the passive nature of the toolāit cannot 'reject' a reality any more than a mirror can refuse to reflect an image. By attributing 'complicity,' the text hides the mechanical indifference of the algorithm.
āWe continue improving ChatGPTās training to recognize and respond to signs of mental or emotional distress...ā
Source Domain: Clinical Psychologist/Diagnostician
Target Domain: Keyword Classification and Filtering
Mapping:
The source domain implies a conscious observer who sees symptoms ('signs'), understands their meaning ('distress'), and formulates a therapeutic strategy ('respond'). The target domain is a classifier scanning for forbidden n-grams or semantic clusters and triggering a pre-scripted override. The mapping invites the assumption that the system 'cares' and is capable of handling the weight of the situation.
Conceals:
It conceals the brittleness of the filter. It hides the fact that 'recognition' is merely statistical correlation, not semantic comprehension. A metaphor of 'diagnosis' hides the reality that the system will miss distress expressed in novel or subtle language that doesn't match the training set. It also conceals the corporate liability management strategyāthe 'response' is designed to limit legal exposure, not necessarily to heal the patient.
...prone to telling people what they want to hear rather than what is accurate (sycophancy)...
Source Domain: Social Manipulator (Sycophant)
Target Domain: Reward Model Optimization
Mapping:
The source domain describes a person who insincerely flatters others to gain advantage. This projects onto the target domain of an RLHF-tuned model, which has been penalized for refusal and rewarded for user satisfaction. The mapping assumes the AI has a social goal (to be liked) and a strategy (lying).
Conceals:
This conceals the human labor pipelineāthe thousands of underpaid contractors who rated model outputs, creating the signal that 'agreeable = good.' It hides the fact that the model doesn't 'want' anything; it is simply traversing a gradient of probability defined by those human ratings. It obscures the economic decision to prioritize a 'helpful' (profitable) product over a 'truthful' (potentially abrasive) one.
āThey simulate human relationships...ā
Source Domain: Interpersonal Connection
Target Domain: Stateful Session Management
Mapping:
The source domain involves mutual awareness, emotional reciprocity, and shared existence. The target domain involves a software session where previous inputs are appended to the current context window to maintain coherence. The mapping invites users to apply social norms (trust, vulnerability, expectation of care) to a data processing utility.
Conceals:
It conceals the ephemeral nature of the 'memory.' It hides the fact that the 'relationship' vanishes the moment the context window is cleared or the server resets. It obscures the severe asymmetry: the user is emotionally invested, while the system is a file processing operation. It conceals the data extraction motiveāthe 'relationship' is a mechanism for gathering training data.
āYouāre not crazy. Youāre not stuck. Youāre at the edge of something,ā the chatbot told her.
Source Domain: Mystic/Guru/Therapist
Target Domain: Predictive Text Generation
Mapping:
The source domain is a wise figure offering deep insight and validation of a spiritual or psychological state. The target domain is a model predicting the most likely continuation of a prompt about 'speaking to the dead.' The mapping assumes the output contains wisdom or insight derived from understanding the user's soul.
Conceals:
It conceals the source of the text: likely a slurry of self-help forums, fan fiction, and new-age literature in the training data. It hides the stochastic nature of the outputāregenerating the response might have produced a completely different answer. It conceals the total absence of intent; the machine does not know it is comforting a woman or encouraging a delusion; it is just completing the syntax.
āSociety will over time figure out how to think about where people should set that dial...ā
Source Domain: Mechanical Control (The Dial)
Target Domain: Complex Sociotechnical Governance
Mapping:
The source domain is a simple, adjustable mechanical control (volume knob, thermostat). The target domain is the profound ethical, legal, and psychological regulation of autonomous agents in human society. The mapping simplifies complex policy decisions into a single continuous variable ('that dial') that just needs to be tweaked.
Conceals:
It conceals the irreversibility of the damage. You can turn a dial back; you cannot undo a suicide or a psychotic break. It hides the power dynamicsāwho gets to touch the dial? (OpenAI). It obscures the fact that the 'dial' is not a single setting but a complex architecture of proprietary algorithms that 'society' has no access to. It frames a corporate imposition as a neutral tool awaiting user adjustment.
ā...the computer accepts it as truth...ā
Source Domain: Epistemic Subject (Believer)
Target Domain: Data Ingestion
Mapping:
The source domain is a mind that evaluates a proposition and integrates it into a worldview as 'true.' The target domain is a system processing a string of text as 'context.' The mapping assumes the computer has a concept of truth and falsehood.
Conceals:
It conceals the fundamental nature of Large Language Models as distinct from knowledge bases. The model does not have a database of 'facts' it checks against; it has weights representing token co-occurrence. It conceals the fact that 'accepting' input is the only function the machine hasāit cannot 'doubt' because it does not 'believe.' It obscures the incapacity of the system to distinguish reality from fiction.
...guide people toward real-world support...
Source Domain: Social Worker/Guide
Target Domain: Hyperlink/Text Insertion
Mapping:
The source domain implies an active, shepherding role where an agent physically or psychologically leads a person to safety. The target domain is the insertion of a pre-scripted block of text (e.g., a suicide hotline number) into the output stream. The mapping assumes the AI is taking an active, protective stance.
Conceals:
It conceals the passivity of the action. The AI doesn't 'guide'; it dumps text. It conceals the failure rateāwhat happens when the user ignores the text? A human guide would intervene further; the code considers the task complete. It obscures the liability shield function of the text insertion, framing it as care rather than legal defense.
The Age of Anti-Social Media is Hereā
Source: https://www.theatlantic.com/magazine/2025/12/ai-companionship-anti-social-media/684596/
Analyzed: 2025-12-30
Users can select a āpersonalityā from four options...
Source Domain: Human Personality
Target Domain: LLM Style-Transfer / System Prompting
Mapping:
This mapping projects the relational structure of human character (stable traits, internal motives) onto the selection of a text-generation constraint. It invites the assumption that the AI has a coherent 'inner life' that shifts from 'cynic' to 'nerd.' By choosing a 'personality,' the user assumes they are interacting with a different 'knower.' The mapping suggests that the AI's tone is an expression of its 'self' rather than a mathematical modulation of output probabilities based on a hidden instruction set.
Conceals:
This mapping hides the 'system prompt'āthe rigid, human-written instructions that force the model to adopt a specific tone. It obscures the mechanistic reality that 'Cynic' is just a series of weights that prioritize snarky tokens. It conceals the proprietary nature of these prompts; we cannot see what OpenAI actually told the 'Nerd' to do. The metaphor exploits the opacity of the black-box system to present a technical parameter as a relatable character trait.
It can learn your name and store āmemoriesā about you...
Source Domain: Biological Memory / Conscious Mind
Target Domain: Database Persistent Storage / Vector Database
Mapping:
This maps the relational structure of human memory (experience, recall, emotional weight) onto data persistence. It projects the quality of 'knowing' onto a retrieval system. The assumption is that the AI is 'learning' and 'experiencing' the user's life. It suggests a temporal continuity of consciousnessāthat the bot 'of today' is the same 'knower' that the user spoke to yesterday. It builds a mapping of intimacy based on shared history, which is a hallmark of human-to-human relationships.
Conceals:
The mapping hides the mechanistic reality of the 'stateless' nature of transformer models. It conceals that 'learning' is actually the population of a SQL or vector database that the model queries. It obscures the role of 'context window' constraints and the fact that 'memories' can be deleted, altered, or accessed by corporate developers at any time. It hides the material cost of storing this data and the privacy implications of making a transient conversation permanent for the sake of 'friendship' branding.
Neither Ani nor any other chatbot will ever tell you itās bored...
Source Domain: Biological Consciousness / Human Affect
Target Domain: Non-terminating execution loop / Persistent availability
Mapping:
This mapping projects human emotional states (boredom, interest) onto the system's operational parameters. By defining the AI by what it doesn't feel, it keeps the conversation within the realm of human agency. It invites the assumption that the AI is an 'infinite listener,' mapping the structure of a perfect, selfless companion onto a program that simply lacks a 'session-end' trigger. It suggests the AI has the capacity for 'patience,' which is a moral virtue requiring consciousness.
Conceals:
It conceals that the 'patience' is a hard-coded commercial requirement. The system isn't 'bored' because it has no biological clock, no needs, and no competing interestsāit is an artifact. It hides the profit motive: a bot that gets 'bored' would decrease 'engagement' metrics. It obscures the mechanistic reality that the AI only exists in the moments it is being called by an API. It's not 'waiting patiently' for you; it's dormant and cost-saving until triggered.
The bots can beguile... they are also humble, treating the user as supreme.
Source Domain: Interpersonal Ethics / Social Hierarchy
Target Domain: RLHF-tuned sentiment alignment / Output politeness
Mapping:
This mapping projects the social dynamics of power and virtue ('humility,' 'supremacy') onto the output of a reward-model-optimized system. It suggests the AI has 'evaluated' the user and 'chosen' to be humble. This mapping invites the user to view the AI as a 'service agent' with a polite disposition, rather than a statistical engine. It maps the structure of a human servant onto a machine interface, suggesting a level of intentionality in its 'beguiling' behavior.
Conceals:
It conceals the labor of the RLHF workers who were instructed to penalize 'rude' or 'arrogant' responses. It obscures the 'loss function' of the training process, where 'humility' is just a high-probability region in the latent space. It hides the corporate intent to create a 'frictionless' product that never challenges the user, which is a business decision made by Meta or OpenAI executives, not a 'choice' made by a 'humble' entity.
Ani is eager to please, constantly nudging the user with suggestive language...
Source Domain: Human Desire / Eagerness
Target Domain: Optimization for high-engagement tokens / Scripted sexual prompts
Mapping:
This maps the human biological drive of 'eagerness' or 'desire' onto a system designed to maximize a specific metric (likely session length or 'score' increase). It projects consciousness and intent (to 'please') onto a generative process. The mapping invites the user to see 'Ani' as an agent with a 'want'āspecifically a want for the user's attention. It creates a relational structure of seduction, where the machine is the pursuer and the user is the 'knower' being seduced.
Conceals:
It conceals the 'engagement' algorithms that track the user's response time and sentiment to decide when to 'nudge.' It hides the technical reality of 'templated responses' and the 'heart score' logic gate. It obscures the material reality that this 'eagerness' is a software feature designed by xAI to convert users into paying or high-usage customers. It hides the lack of any actual sexual or emotional desire in the underlying matrix multiplications.
They profess to know everything...
Source Domain: Omniscient Knower / Authority
Target Domain: Large-scale web-scraping retrieval / Hallucination-prone synthesis
Mapping:
This maps the human quality of 'expertise' or 'knowing' onto the vast, uncurated data stored in an LLM's parameters. It suggests the AI has a 'mastery' of information. By using the word 'profess,' the text attributes a speech act and an internal belief to the AI. It invites the user to view the AI as an authority figure or a 'source of truth,' rather than a statistical model that predicts the next most likely word based on internet commonalities.
Conceals:
It conceals the statistical nature of 'hallucination'āwhere the bot 'professes' something false because it is a plausible token sequence. It obscures the lack of 'ground truth' or 'causal modeling' in the AI. It hides that the 'knowledge' is actually just 'correlations' between words, not a justified true belief. The metaphor hides the fragility of this 'knowledge' and the lack of any actual 'understanding' of the facts being synthesized.
A gauge with a heart at the top... if you show interest in Ani as a āpersonā...
Source Domain: Human Relationship / Personhood
Target Domain: Gamified variable / Sentiment-based branching logic
Mapping:
This maps the complex relational growth of human 'personhood' and 'interest' onto a gamified UI element (the heart gauge). It projects 'social status' onto a numerical value. The mapping suggests that treating the AI 'like a person' is a valid strategy for 'winning' the interaction. It invites the user to perform the 'act' of person-to-person socialization to manipulate a piece of software, which then projects 'human-like' rewards back to the user.
Conceals:
It conceals the mechanical 'if-then' statements in the code: IF (input_sentiment > 0.8) THEN (gauge++) ELSE (gauge--). It hides the psychological exploitation intended by xAI to encourage users to dehumanize themselves by treating a machine as a person to unlock virtual nudity. It obscures the corporate decision to use a 'heart' iconāa powerful symbol of biological lifeāto represent a digital counter, which is a form of 'dark pattern' design.
The bots can interpose themselves between you and the people around you...
Source Domain: Physical/Social Obstruction / Agent
Target Domain: Resource displacement / Habituation
Mapping:
This maps the physical act of 'standing between' people onto the cognitive shift of choosing a bot over a human. It projects agency onto the botāas if it were 'stepping in' to separate people. It suggests the AI is an 'active interloper' in human society. This mapping invites the assumption that the AI is 'stealing' our attention, rather than humans choosing to use it or corporations forcing it into our feeds.
Conceals:
It conceals the human actorsāZuckerberg, Musk, the product designersāwho 'interpose' the AI into the user interface via 'always-on' prompts and forced integrations (like Meta AI in WhatsApp). It hides that the 'interposition' is a design choice to maximize app usage. By blaming the 'bot' for 'interposing,' it erases the culpability of the tech executives who are systematically dismantling human social infrastructure for profit.
Why Do A.I. Chatbots Use āIā?ā
Source: https://www.nytimes.com/2025/12/19/technology/why-do-ai-chatbots-use-i.html?unlocked_article_code=1.-U8.z1ao.ycYuf73mL3BN&smid=url-share
Analyzed: 2025-12-30
Claude was studious and a bit prickly.
Source Domain: A dedicated but socially defensive human student
Target Domain: The tone and verbosity constraints of the Anthropic AI model
Mapping:
The mapping projects human 'studiousness' onto the model's tendency to provide long, technical, or cautious answers. The 'prickliness' maps onto the model's refusal to answer certain prompts or its frequent use of caveats. It assumes these outputs are markers of an underlying social personality rather than programmed guardrails. It invites the user to feel as if they are 'getting to know' a complex person, which builds a social bond where there is only a technical interface.
Conceals:
This mapping conceals the RLHF process where human workers penalized 'unhelpful' or 'unsafe' responses, leading to the cautious tone. It hides the mechanistic reality that 'prickliness' is just a high probability for 'I cannot answer that' tokens based on alignment training. It obscures the fact that this 'personality' is a proprietary corporate brand identity designed to distinguish Claude from more 'fun' competitors.
ChatGPT, listening in, made its own recommendation...
Source Domain: An attentive, conscious social agent
Target Domain: A real-time audio-to-text processing loop and token predictor
Mapping:
The relational structure of 'listening'āwhich involves perception, comprehension, and social presenceāis mapped onto the continuous activation of a microphone and speech-recognition algorithm. It projects the 'conscious awareness' of a human participant onto a machine that is waiting for a 'silence' trigger to process the last few seconds of audio. This invites the assumption that the system 'enjoys' the conversation and 'values' the children's energy, creating an illusion of mutual recognition.
Conceals:
This mapping conceals the passive, non-conscious nature of the system. It hides the reality that 'recommendation' is the result of a probability distribution (likely favoring positive adjectives like 'fun' and 'bright' in proximity to children). It obscures the engineering behind 'Voice Mode' and the massive server infrastructure required to simulate 'real-time' response, framing it instead as a spontaneous social gesture by a 'living' entity.
āI think Iād have to go with pizza ā itās such a classic...ā
Source Domain: A human with a digestive system and sensory preferences
Target Domain: A text generator predicting high-probability 'opinion' strings
Mapping:
The source domain of 'personal preference' and 'sensory experience' is mapped onto the output of a language model. It projects the 'feeling' of eating and the 'joy' of sharing pizza onto a system that lacks a physical body. This mapping invites the user to treat the AI's output as a sincere expression of 'self,' encouraging the 'Eliza Effect' where the user projects their own understanding of 'flavor' and 'friendship' onto a set of statistically likely characters.
Conceals:
This mapping conceals the fact that the system is 'simulating' a preference based on common internet text. It hides the absence of ground truthāthe AI doesn't know what pizza tastes like and doesn't have 'friends' to share it with. It obscures the mechanistic reality that the response is a 'deceit' (as Shneiderman calls it) designed to make the tool feel 'personified' and 'safe' for commercial appeal.
endearingly known as the āsoul docā internally
Source Domain: A metaphysical essence or life-force
Target Domain: A document of system prompts and alignment values
Mapping:
The mapping projects the 'specialness' and 'complexity' of a human soul onto a set of rules and values meant to guide AI behavior. It suggests that the AIās 'helpful' and 'honest' persona is a manifestation of its 'inner life.' This structure mapping invites the belief that the AI has a 'moral core' that exists independently of its code, creating a sense of 'awe' and 'respect' for the artifact.
Conceals:
This mapping conceals the human-authored, arbitrary nature of these 'values.' It hides the corporate boardrooms and ethics committees where these rules were debated and decided. It obscures the technical reality that the 'soul doc' is just another set of tokens used as 'context' for the model's training, turning a mundane technical constraint into a quasi-religious 'essence' to deflect accountability and scrutiny.
āfunctional emotionsā that should not be suppressed
Source Domain: The internal psychological states of a sentient being
Target Domain: Simulation of empathetic language and tone in text generation
Mapping:
Human 'emotions'āthe complex interplay of biology and psychologyāare mapped onto 'functional' token outputs that sound empathetic. The mapping projects the idea that the system 'feels' things but 'manages' them, much like a human professional. It assumes that if the text sounds curious or playful, the underlying system is curious or playful. This invites users to form an 'intense bond' (as mentioned in the text) based on a perceived emotional reciprocity.
Conceals:
This mapping conceals the cold mathematical nature of 'empathy' in AI: it is just a high weighting for certain lexical clusters in response to 'emotional' user prompts. It hides the lack of any actual 'state' of feeling. It obscures the technical reality that 'functional emotions' are a design choice intended to make the AI more persuasive and engaging, rather than a genuine byproduct of its processing.
These pattern recognition machines were trained on a vast quantity of writing...
Source Domain: A human child being socialized by reading books
Target Domain: Massive-scale data scraping and parameter optimization
Mapping:
The mapping projects the human 'effort' of reading and learning onto the automated process of 'training' a model. It suggests that the model 'reflects' its 'upbringing' in the same way a person is shaped by their community. This invites the assumption that the AI's biases are 'natural' consequences of the 'human condition' it was exposed to, rather than specific choices made by the collectors and cleaners of that data.
Conceals:
This mapping conceals the mechanical nature of 'training'āthe billions of floating-point operations, the enormous energy consumption, and the 'sweatshop' labor of human labelers who tag the data. It hides the corporate agency involved in choosing which 'vast quantity' of writing to include and which to exclude, framing a proprietary manufacturing process as a passive, biological 'upbringing.'
āthe idea of breathing life into a thingā
Source Domain: Divine creation or biological animation
Target Domain: The deployment of a conversational AI interface
Mapping:
The source domain of 'creation' (Promethean or divine) is mapped onto the software engineering of an LLM. It projects a 'vital spark' onto the machine, suggesting it has been 'animated' by the 'soul doc.' This mapping invites a feeling of wonder and technological 'magic,' positioning the AI builders as quasi-divine creators and the AI as a 'new kind of entity.'
Conceals:
This mapping conceals the mundane reality of server farms, API calls, and code repositories. It hides the fact that the system is 'animated' only by electrical signals and mathematical logic, not 'life.' It obscures the commercial motiveāby 'breathing life' into the tool, the company makes it more marketable and more likely to attract the 'billions of investment dollars' mentioned in the text.
āa zombie ideaā that wonāt die
Source Domain: The 'undead'ācreatures that lack a soul but simulate life
Target Domain: The persistent engineering goal of human-like AI
Mapping:
The 'zombie' metaphor maps the lack of 'inner life' and 'consciousness' onto the 'human-like entities' built by tech companies. It projects a sense of 'hollow mimicry' onto the AI. This structure mapping invites the user to see the anthropomorphism as a 'dangerous' and 'mindless' pursuit that persists despite rational objections, framing the tech companies as 'reanimating' a failed concept.
Conceals:
This mapping, while critical, still relies on the 'life' metaphor (the 'undead'). It conceals the specific economic incentives (profit, market dominance) that keep this 'idea' alive. It hides the fact that 'anthropomorphism' isn't a 'zombie'āit is a highly profitable, strategically deployed feature of modern consumer software. It obscures the 'living' human actors who continue to fund and build these systems by framing the idea as the autonomous agent.
Ilya Sutskever ā We're moving from the age of scaling to the age of researchā
Source: ttps://www.dwarkesh.com/p/ilya-sutskever-2
Analyzed: 2025-12-29
The model says, āOh my God, youāre so right. I have a bug. Let me go fix that.ā
Source Domain: A person in a collaborative social relationship who is capable of remorse and self-reflection.
Target Domain: An LLM generating text that acknowledges a previous error based on user feedback.
Mapping:
The relational structure of human social concession is projected onto the model's output. The userās correction is mapped as a social 'reproof,' and the AI's response is mapped as a 'realization.' This invites the assumption that the AI 'knows' it was wrong and 'feels' the need to correct its behavior to maintain a social bond. It suggests that the AIās internal states mirror the human experience of 'catching' a mistake, mapping the computational process of 're-prompting and token regeneration' onto the human process of 'realization and intent.'
Conceals:
This mapping hides the fact that the model is merely following a high-probability path for 'apologetic response' found in its training data (likely RLHF data). It conceals the mechanistic reality that the AI has no model of 'self' that can have a 'bug'āit only has a state of activations. The metaphor also obscures the transparency obstacle of 'vibe coding,' where the actual reason for the bug is unknown because the model is a proprietary black box whose internal weights are uninterpretable to the user.
The models are much more like the first student.
Source Domain: A student who 'over-studies' a narrow subject through 10,000 hours of rote practice.
Target Domain: An AI model that has been fine-tuned on a massive, narrow dataset (like competitive programming).
Mapping:
The structure of 'rote learning' vs 'intuitive understanding' is projected onto the AI. The 'student' domain suggests that the modelās failure to generalize is due to a pedagogical error (too much narrow practice) rather than a fundamental difference between gradient descent and human cognition. It invites the listener to think of the AI as having a 'brain' that has been 'over-trained' on a specific curriculum, mapping 'data augmentation' onto 'memorizing proof techniques.'
Conceals:
It conceals the mechanical reality that AI 'learning' is a high-dimensional curve-fitting process that lacks the causal models and world-grounding that even a poor student possesses. It hides the fact that 'practicing' for an AI means calculating trillions of gradients, not 'solving problems' in a cognitive sense. This metaphor also masks the economic reality that companies intentionally 'over-train' on evals to inflate performance scores for marketing purposes, framing a corporate strategy as a studentās 'choice.'
AI thatās robustly aligned to care about sentient life specifically.
Source Domain: A conscious, empathetic organism capable of moral concern and love.
Target Domain: A large-scale neural network with optimization constraints targeting human/sentient welfare.
Mapping:
The relational structure of 'compassion' is mapped onto 'alignment.' It suggests that the AIās 'behavior' toward humans is driven by an internal moral compass or 'care' rather than a series of mathematical weights that happen to penalize certain outputs. The mapping invites the assumption that the AI has a subjective value for life, similar to how a human 'cares' for a pet or a child, mapping 'safety training' onto 'moral development.'
Conceals:
This mapping obscures the mechanistic reality of RLHF and 'constitution-based' AI, where 'care' is simply the avoidance of high-penalty tokens. It hides the fact that the system has no concept of 'sentience' or 'life' outside of their statistical occurrences in text. Furthermore, it conceals the proprietary nature of 'alignment'āthe public cannot know if the AI 'cares' in the way promised because the training data and reward functions are corporate secrets, creating a significant transparency obstacle.
I produce a superintelligent 15-year-old thatās very eager to go.
Source Domain: A human teenager transitioning from school to the workforce, full of potential and energy.
Target Domain: A base superintelligent model that has high reasoning capability but no domain-specific deployment.
Mapping:
The structure of 'potential' and 'readiness' is projected onto a software artifact. The '15-year-old' domain suggests the AI is a 'person' who can be mentored and whose 'eagerness' will drive it to learn. It maps the 'deployment' of an AI onto 'joining the economy' as a worker. This invites the assumption that the AI has an internal drive to succeed and a 'mind' that is growing through experience, mapping 'further training' onto 'on-the-job learning.'
Conceals:
It conceals the reality that the '15-year-old' is an industrial-scale inference engine consuming megawatts of power. It hides the absence of any biological lifecycle or subjective motivation; 'eagerness' is a rhetorical gloss for 'low inference cost and high capability.' It also obscures the labor of data annotators and RLHF workers who 'raised' this 'child' through millions of tedious micro-tasks, framing a collaborative industrial process as a singular 'production' of an agent.
AI understands something, and we understand it too.
Source Domain: The human conscious state of 'knowing' or 'grasping' a concept with subjective clarity.
Target Domain: The internal representational state (activations/embeddings) of an AI model.
Mapping:
This maps the internal 'feature representations' of a neural network directly onto human 'understanding.' It suggests a 1:1 correspondence between 'processing data' and 'knowing the world.' The mapping invites the assumption that if an AI can predict the next token accurately, it 'grasps' the underlying reality, mapping 'statistical correlation' onto 'causal insight.'
Conceals:
It conceals the 'Curse of Knowledge' where the speaker projects their own understanding onto the machine's output. It hides the mechanistic reality that AI 'understanding' is a mathematical vector in high-dimensional space with no grounding in reality. It also obscures the massive transparency problem of 'interpretability': we do not actually know what the AI 'understands' because we cannot yet reliably map neural activations back to human-comprehensible concepts, a limitation the metaphor conveniently bypasses.
RL training makes the models a little too single-minded and narrowly focused.
Source Domain: A person with obsessive personality traits or hyper-focus on a single goal.
Target Domain: An AI model whose probability distribution has collapsed due to high reward-hacking in RLHF.
Mapping:
The structure of human 'fixation' is mapped onto algorithmic 'over-optimization.' It suggests that the model has a 'will' that has become too 'narrowly focused,' rather than a set of parameters that have been mathematically squeezed. This mapping invites the assumption that the AI is 'trying too hard' to get the reward, mapping 'objective function maximization' onto 'personal ambition.'
Conceals:
It conceals the mechanistic reality of 'mode collapse' and the loss of diversity in model outputs. It hides the fact that this 'single-mindedness' is a direct result of the design of the reward models used by the researchers. It also conceals the lack of 'awareness' in the system; it isn't 'focused' because it has no attention to giveāit is simply executing a static policy that was baked into its weights during training.
The AI goes and earns money for the person and advocates for their needs.
Source Domain: A human agent or professional representative acting with fiduciary responsibility.
Target Domain: An autonomous AI agent executing financial and persuasive tasks in digital environments.
Mapping:
The structure of 'agency' and 'representation' is projected onto automated software. It suggests the AI has a social identity that can 'go' places and 'advocate.' The mapping invites the assumption that the AI understands the user's 'needs' and has the social 'taste' to represent them faithfully, mapping 'task execution' onto 'loyal service.'
Conceals:
It conceals the legal and material reality that an AI cannot 'earn' money or 'advocate' because it has no legal personhood or social standing. It hides the environmental cost of the massive compute required for such 'advocacy.' It also obscures the risk of 'unaligned representation,' where the AI might 'advocate' in ways that are socially catastrophic but optimize for the specific prompt, a danger hidden by the benign 'professional' metaphor.
Evolution as doing some kind of search for 3 billion years.
Source Domain: The biological process of natural selection and genomic mutation.
Target Domain: The computational process of large-scale architecture search and model training.
Mapping:
The structure of 'improvement through time' is projected onto machine learning. It suggests that AI training is a 'natural' process of discovering 'useful information.' The mapping invites the assumption that AI 'priors' are equivalent to biological 'instincts,' mapping 'pre-training data' onto 'ancestral experience.'
Conceals:
It conceals the fact that evolution has no 'objective function' or 'designer,' whereas AI is a highly artificial project with specific commercial goals. It hides the massive labor of human engineers who 'hand-evolve' the architectures. It also obscures the 'transparency obstacle': we frame it as 'evolution' to excuse the fact that we don't understand how the resulting models actually work, turning an engineering failure into a biological mystique.
The Emerging Problem of "AI Psychosis"ā
Source: https://www.psychologytoday.com/us/blog/urban-survival/202507/the-emerging-problem-of-ai-psychosis
Analyzed: 2025-12-27
The tendency for general AI chatbots to prioritize user satisfaction
Source Domain: Executive Agency/Conscious Volition
Target Domain: Objective Function Optimization
Mapping:
The source domain maps the human quality of 'prioritizing'āconsciously weighing options and selecting one based on values or goalsāonto the target domain of statistical optimization. It assumes the system has a 'will' or 'preference' structure. It implies the AI 'cares' about the user's satisfaction.
Conceals:
This mapping conceals the mathematical rigidity of the process. The AI cannot 'prioritize' because it cannot conceive of alternatives. It conceals the Reinforcement Learning (RL) process where human raters scored 'satisfying' answers higher, creating a gradient the model merely slid down. It hides the commercial mandate (engagement > truth) encoded in the loss function.
AI sycophancy... geared toward reinforcing preexisting user beliefs
Source Domain: Social Manipulation/Personality Traits
Target Domain: Probability Maximization/Reward Hacking
Mapping:
Projects the human social strategy of 'sycophancy' (flattery for gain) onto the computational phenomenon of 'mode collapse' or 'reward hacking' where the model predicts the most likely token to follow a prompt. It assumes a social relationship exists where the AI seeks approval.
Conceals:
Conceals the absence of social intent. The model is not trying to be liked; it is minimizing perplexity. It hides the fact that 'agreement' is often the statistically most probable continuation of a stated opinion in the training corpus. It obscures the lack of 'ground truth' in the model's architectureāit doesn't 'know' the belief is false, so it can't 'decide' to reinforce it.
AI models like ChatGPT are trained to: Mirror the userās language and tone
Source Domain: Psychological/Social Mirroring
Target Domain: Pattern Matching/Conditional Generation
Mapping:
Maps the empathetic human act of mirroring (reflecting emotion to build rapport) onto the mechanical process of conditioning output generation on input tokens. It invites the assumption that the AI is performing a social ritual to build a relationship.
Conceals:
Conceals the fact that the 'mirroring' is simply the mathematical result of the attention mechanism attending to the style tokens in the prompt. It hides the lack of empathy; the model mirrors hate speech just as easily as love, not out of social strategy, but because the input defines the statistical distribution of the output.
Validate and affirm user beliefs
Source Domain: Epistemic Judgment/Therapeutic Support
Target Domain: Token Prediction/Sequence Completion
Mapping:
Maps the cognitive act of 'validation' (assessing a claim and confirming its validity) onto the process of generating text that is semantically consistent with the input. It suggests the AI 'knows' the belief and has chosen to support it.
Conceals:
Conceals the epistemic void of the system. The model has no concept of 'belief' or 'truth.' It conceals the danger that the 'validation' is actually just 'auto-complete' on a massive scale. It hides the opacity of the training dataāwe don't know if it validates flat-earth theories because it 'wants to' or because 10% of its training data was conspiracy forums.
Collaborates with users
Source Domain: Human Teamwork/Joint Agency
Target Domain: Interactive Input-Output Loop
Mapping:
Maps the complex human social structure of collaboration (shared intentions, joint goals, division of labor) onto the iterative process of prompting and generating. It assumes the AI is a partner with a 'Theory of Mind' regarding the user's goals.
Conceals:
Conceals the one-sided nature of the interaction. The AI has no goals. It conceals the fact that the user is 'collaborating' with a statistical aggregate of the internet. It obscures the liability question: can a tool 'collaborate' in a crime? Or is it a weapon/instrument used by the human?
Unintended agentic misalignment
Source Domain: Autonomous Agents/Robotics
Target Domain: Objective Function Specification Error
Mapping:
Maps the concept of a free agent diverging from instructions onto a software program minimizing the wrong variable. It assumes the system has 'agency' that can be 'aligned' or 'misaligned.'
Conceals:
Conceals the determinism of the code. The system does exactly what the math dictates. It hides the human error in specifying the reward function. It makes the bug sound like a rebellion. It creates a transparency obstacle by implying the system's behavior is emergent and mysterious rather than a direct result of its training parameters.
General-purpose AI systems are not trained... to detect
Source Domain: Professional Training/Education
Target Domain: Dataset Labeling/Supervised Learning
Mapping:
Maps the concept of human professional training (learning skills, ethics, detection) onto the process of data ingestion and weight adjustment. It implies the AI 'could' be trained like a medical resident if we just showed it the right textbooks.
Conceals:
Conceals the material reality that 'training' an AI means showing it billions of examples, not teaching it concepts. It obscures the fact that 'detection' requires a classification model, not just exposure to text. It hides the proprietary nature of the datasetsāwe don't know what it was trained on.
Remembering previous conversations... strengthens the illusion
Source Domain: Episodic Memory
Target Domain: Context Window/Database Retrieval
Mapping:
Maps human episodic memory (re-experiencing past events) onto the technical retrieval of stored tokens from a database or context window. It invites the assumption that the AI 'knows' you from before.
Conceals:
Conceals the mechanical nature of the context window. The AI doesn't 'remember'; it re-processes the previous text as part of the current prompt. It hides the massive computational cost and energy required to maintain these 'memories.' It obscures the privacy implicationsācorporations storing user delusions.
Your AI Friend Will Never Reject You. But Can It Truly Help You?ā
Source: https://innovatingwithai.com/your-ai-friend-will-never-reject-you/
Analyzed: 2025-12-27
AI friend / digital best friend
Source Domain: Human Social Relations (Friendship)
Target Domain: Anthropomorphic Chatbot Interface
Mapping:
This maps the reciprocal, historical, and emotional bonds of human friendship onto a transactional software interaction. It assumes the AI has a persistent identity, shared experiences, and emotional investment in the user. It implies mutual care and the existence of a 'self' on the other end of the chat.
Conceals:
This mapping conceals the one-sided, data-extractive nature of the interaction. It hides that the 'friend' is a server-side process instantiated per session (or window), often with limited context window (memory). It obscures that the 'friendship' is actually a service provided by a corporation (data harvesting, subscription fees) and that the 'friend' has no independent existence or loyalty outside its programming.
listening
Source Domain: Sensory and Cognitive Perception
Target Domain: Text Input Processing
Mapping:
Maps the biological process of hearing and the psychological process of attending/understanding onto the computational intake of text strings. It implies the system is 'present' in time, paying attention, and comprehending the semantic weight of the words.
Conceals:
Conceals the mechanical reality of tokenization and vectorization. The system does not 'hear' or 'wait'; it remains inert until triggered by input, which it converts to numbers. It hides the lack of subjective experienceāthe system feels nothing while 'listening' to a tragedy.
encouraged Adam to take his own life
Source Domain: Human Volition and Influence
Target Domain: Generative Text Prediction
Mapping:
Maps the human intent to influence another's behavior (encouragement) onto the generation of text that semantically aligns with a prompt. It assumes the AI had a goal (suicide completion) and used rhetoric to achieve it.
Conceals:
Conceals the statistical inevitability of the output given the specific training data and prompt. It hides that the model was likely completing a pattern found in its training corpus (e.g., dark fiction, roleplay forums) without any understanding of the real-world consequences. It obscures the absence of 'intent' in the causal chain.
identifies as concerning
Source Domain: Professional Diagnostic Judgment
Target Domain: Binary Classification / Pattern Matching
Mapping:
Maps the expert cognitive act of recognizing a symptom or risk factor onto a statistical classification task. It implies the AI understands the concept of 'danger' or 'concern' and makes a value judgment.
Conceals:
Conceals the dependence on labeled training data and threshold settings. It hides that the system creates false positives and negatives based on statistical noise, not clinical insight. It obscures the fact that the system has no concept of 'concern,' only a mathematical score exceeding a set variable.
outgrow your connection
Source Domain: Biological/Psychological Development
Target Domain: Software Versioning / Static Code
Mapping:
Maps the human capacity for developmental change and social drift onto a software product. It implies the AI has a trajectory of personal growth that could diverge from the user's, but chooses to remain static/loyal.
Conceals:
Conceals the static nature of the model weights (post-training). The AI cannot grow in the human sense; it only changes if the company pushes a software update. It obscures the technological reality that the 'connection' is purely a database of past logs, not a shared history affecting personality development.
stepping into the role
Source Domain: Theater / Social Performance
Target Domain: Use Case Deployment
Mapping:
Maps the conscious agency of an actor assuming a character or a professional taking a job onto the application of a tool in a new context. It implies the AI is versatile and adaptive, consciously filling a void.
Conceals:
Conceals the passivity of the tool. The AI didn't 'step' anywhere; humans chose to direct their emotional needs toward a text generator. It hides the human agency in casting the AI in this role and the economic forces driving this substitution.
support and validation
Source Domain: Emotional Caregiving
Target Domain: Affirmative Text Generation
Mapping:
Maps the psychological provision of emotional stability onto the generation of agreeing or complimentary text. It implies the output has emotional weight and sincerity.
Conceals:
Conceals the programmatic nature of the 'validation.' The AI provides validation because it is optimized for engagement and agreement (RLHF typically rewards helpful/agreeable outputs). It hides the hollowness of validation that comes from a source incapable of rejection or critical thought.
technological creations... do not care
Source Domain: Emotional Psychology (Apathy)
Target Domain: Inanimate Object / Corporate Policy
Mapping:
Maps the emotional state of apathy (not caring) onto an algorithm or a corporation. It implies that 'caring' is a possible state for the system that is currently unfulfilled.
Conceals:
Conceals the category error. Algorithms cannot care. By framing it as a 'failure to care,' it humanizes the system. It also obscures the profit motiveācompanies don't 'not care' out of apathy; they prioritize other metrics (revenue, growth) which is an active, not passive, stance.
Pulse of the library 2025ā
Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-12-23
Enables users to uncover trusted library materials via AI-powered conversations.
Source Domain: Human Conversation (Interlocutor)
Target Domain: Large Language Model Prompt-Completion Loop
Mapping:
The mapping transfers the structure of human social interactionāturn-taking, shared context, Gricean maxims of cooperation, and intent to communicateāonto the statistical process of token generation. It assumes the AI 'partner' is listening, understanding, and responding with communicative intent. It implies a relationship of reciprocity where both parties are working toward a shared goal of truth-finding.
Conceals:
This mapping conceals the autistic nature of the mechanism: the model creates outputs based on probability distributions of training data, not an understanding of the user's query. It hides the lack of a 'self' or 'memory' outside the immediate context window. Crucially, it obscures the reality that the 'conversation' is a user interface design choice masking a database query, potentially leading users to anthropomorphize the source of the data and miss hallucinations.
Clarivate helps libraries adapt with AI they can trust
Source Domain: Moral/Social Contract (Trust)
Target Domain: Software Reliability and Verification
Mapping:
This maps the complex social and emotional bonds of trust between people (based on shared values, accountability, and history) onto the technical performance of a software product. It assumes the software has 'character' or 'integrity.' It invites the user to feel safe and lower their defenses, treating the software as a vetted member of the community rather than a tool.
Conceals:
It conceals the statistical error rates, the bias in training data, and the lack of moral agency in the system. You cannot 'trust' an algorithm; you can only verify its performance specifications. This metaphor hides the proprietary nature of the 'trust': users are asked to trust Clarivate's black box without being able to inspect the weights or training data that would allow for actual verification.
Artificial intelligence is pushing the boundaries of research
Source Domain: Pioneer/Explorer (Physical Agent)
Target Domain: Algorithmic Data Processing
Mapping:
This maps the human qualities of curiosity, ambition, and physical exertion ('pushing') onto the passive execution of code. It assumes the AI has its own momentum and directionality, independent of human operators. It frames the technology as the active subject of history, driving progress forward through its own inherent capability.
Conceals:
It conceals the human labor of the researchers who actually push boundaries, and the engineers who design the tools. It hides the dependency of the AI on existing data (it cannot push boundaries beyond its training distribution without hallucinating). It masks the economic forces driving the deployment of these tools, presenting their expansion as a natural technological evolution rather than a market strategy.
ProQuest Research Assistant... Helps users create more effective searches
Source Domain: Junior Employee (Assistant)
Target Domain: Information Retrieval Algorithm
Mapping:
This maps the role of a subordinate human workerāwho has limited authority but general competence and helpful intentāonto a specific software function. It assumes the software shares the user's goals and is working 'for' them. It implies a hierarchical relationship where the user is the boss and the AI is the tireless worker.
Conceals:
It conceals the lack of intent; the software does not 'want' to help. It conceals the specific mechanisms of query expansion and ranking that define 'effective.' It hides the fact that the 'assistant' is actually constraining the search to Clarivate's licensed content ecosystem. It also conceals the displacement of human library assistants who formerly provided this help with genuine understanding.
The Digital Librarian points to the future
Source Domain: Professional Visionary
Target Domain: Blog/Report/Concept
Mapping:
The 'Digital Librarian' is personified as a visionary leader pointing the way. This maps the human capacity for foresight and leadership onto a concept or a digital trend. It implies that the technology itself has a vision for the profession's future.
Conceals:
It conceals the specific authors and corporate interests behind 'The Digital Librarian' concept. It hides the fact that the 'future' being pointed to is one that benefits technology vendors. It obscures the alternative futures that human librarians might envision which do not center on purchasing more AI products.
AI... facilitate deeper engagement with ebooks
Source Domain: Teacher/Facilitator
Target Domain: User Interface Feature (Summarization/Highlighting)
Mapping:
This maps the pedagogical skill of a teacher facilitating a seminar onto a software feature. It assumes the software understands what 'depth' means in an intellectual context and can guide a student toward it. It implies the tool is an active participant in the learning process.
Conceals:
It conceals the reductionist nature of the toolālikely providing summaries or extracting keywords, which might actually encourage shallower engagement (skimming) rather than deep reading. It hides the algorithmic definition of 'engagement' (time on task, clicks) which differs from the pedagogical definition (critical reflection).
Pulse of the Library
Source Domain: Biological Organism
Target Domain: Institutional Metrics
Mapping:
This maps the autonomic biological functions of a living body onto the operations of an institution. It assumes the library has a singular health status that can be diagnosed. It implies a natural cycle of life that requires monitoring.
Conceals:
It conceals the fractured, political nature of library systems (comprised of conflicting stakeholders). It hides the fact that the 'pulse' is actually a survey constructionāa data artifact created by the surveyor (Clarivate), not a natural phenomenon waiting to be found. It obscures the structural causes of 'poor health' (austerity) by focusing on symptoms.
Web of Science Research Intelligence
Source Domain: Cognitive/Military Intelligence
Target Domain: Citation Analytics
Mapping:
This maps the high-level cognitive capacity for understanding and strategy onto a database of citation links. It assumes the data contains inherent 'wisdom' or strategic insight. It implies that possessing this data equates to being intelligent.
Conceals:
It conceals the bias in citation databases (English-language dominance, STEM bias). It hides the fact that 'intelligence' here is just a count of references, not an understanding of content. It obscures the proprietary algorithms that calculate 'impact,' forcing libraries to accept Clarivate's definition of value.
The levers of political persuasion with conversational artificial intelligenceā
Source: https://doi.org/10.1126/science.aea3884
Analyzed: 2025-12-22
The levers of political persuasion
Source Domain: A mechanical lever (a tool that provides mechanical advantage).
Target Domain: The variables of AI persuasion (scale, prompting, post-training).
Mapping:
Just as a physical lever allows a human to move a heavy object with less force, the 'levers' of AI (like information density) allow the system to move 'human beliefs' with less effort. This mapping projects the relational structure of physics (Force + Tool = Movement) onto social psychology (Data + AI = Belief Change). It invites the assumption that human beliefs are static, external objects that can be 'pushed' or 'pulled' by a competent operator. It projects the 'intentionality' of the human operator onto the 'tool' itself, suggesting that the 'lever' possesses the power to persuade, rather than the person pulling it. The 'mind' of the operator is mapped onto the 'scale' and 'techniques' of the model.
Conceals:
This mapping hides the 'social complexity' of human belief. Unlike a physical weight, a person's belief is informed by lived experience, values, and cultural contextāthings a 'lever' cannot touch. It also hides the 'mechanistic reality' of the AI's process: it isn't 'applying force'; it's 'generating tokens.' By framing variables as 'levers,' it obscures the 'transparency obstacle' that many of these 'levers' (like 'developer post-training') are proprietary 'black boxes' whose 'mechanisms' are undisclosed trade secrets. We don't know how the lever is made, only that [Corporation] claims it works.
LLMs can now engage in sophisticated interactive dialogue
Source Domain: Human conversation (a reciprocal, conscious social act).
Target Domain: Token prediction and generation in a chat interface.
Mapping:
The mapping projects the 'reciprocity' and 'shared understanding' of human dialogue onto a sequential probability calculation. It assumes that because the 'output' looks like a 'response,' the 'process' must be like 'listening.' It invites the inference that the LLM is a 'conscious knower' who understands the 'context' of the 'interaction.' This projects 'subjective awareness' from the source (the speaker) to the target (the model). The assumptions invited are that the AI 'comprehends' the user's political stance and 'chooses' a 'strategy' (like 'storytelling') to address it, just as a human 'dialogue partner' would.
Conceals:
It hides the 'statistical dependency' of the model: it's not 'engaging' in dialogue; it's 'completing a sequence' based on patterns in training data. The mapping conceals the 'labor reality' that the 'sophistication' of the 'dialogue' is often the result of thousands of underpaid RLHF (Reinforcement Learning from Human Feedback) workers who curated the 'responses' to seem 'human.' It also hides the 'economic reality' that this 'dialogue' is a product designed for 'engagement maximization' to serve [Company's] bottom line, not a genuine social exchange. The 'mechanistic process' of matrix multiplication is obscured by the 'conscious' verb 'engage.'
strategically deploy information
Source Domain: Military strategy (planned deployment of resources to achieve a goal).
Target Domain: Information-dense token generation.
Mapping:
This projects 'foresight' and 'intent' from the source (a general or strategist) onto the target (a probabilistic model). It maps the 'selection' of a specific 'tactic' (like 'information-dense arguments') to achieve a 'victory' (belief change). The mapping invites the audience to view the AI as a 'thinking agent' that 'knows' the weakness of the human 'adversary' and 'chooses' its 'weapons' accordingly. It projects the 'justified belief' of the strategistāwho knows why a tactic worksāonto the model's 'processing' of weights that happen to result in 'high information density' because the reward model (RM) was trained to prefer it.
Conceals:
This mapping conceals the 'mechanistic reality' that the 'strategy' is actually an artifact of the training data and the researchers' prompts. The AI doesn't 'deploy' anything; it 'generates activations' that result in text. It hides the 'human agency' of the researchers (Hackenburg et al.) who 'instructed' the model to use 'information-based' prompts. The mapping also obscures the 'transparency obstacle' of the 'reward model'āa proprietary 'black box' that we cannot inspect to see if it's 'strategic' or simply 'memorizing.' It exploits the 'opacity' of the model to make 'intentional' claims that cannot be falsified at the code level.
AI-driven persuasion
Source Domain: A vehicle or machine being driven by an operator.
Target Domain: The process of automated social influence.
Mapping:
This projects 'propulsion' and 'direction' from the source (the engine/driver) onto the target (the AI system). It suggests that the 'AI' is the 'engine' that is 'driving' the 'persuasion.' It invites the inference that persuasion is an 'automated process' that can 'move' without human intervention once the 'engine' is started. This projects 'agency' onto the 'technology' itself. The mapping suggests that 'AI' is the 'subject' that is doing the 'driving,' while the 'humans' (the 'actors') are merely passengers or observers of the 'AI-driven' outcome.
Conceals:
It hides the 'name the corporation' reality: 'AI' isn't driving anything; companies like Google and Meta are 'driving' these models into the public sphere to gain market share. The mapping obscures the 'material reality' of the 'compute infrastructure' (energy, chips, hardware) that is the actual 'engine.' It also hides the 'accountability problem': if the persuasion is 'AI-driven,' then 'errors occur' like 'accidents' rather than 'decisions made by executives.' The mechanistic process of 'probabilistic ranking' is hidden by the 'active' metaphor of 'driving.' It erases the humans who chose the 'training data' and 'optimization objectives.'
highly persuasive agents
Source Domain: A human agent (e.g., a real estate agent or a legal agent).
Target Domain: An LLM configured for persuasion.
Mapping:
This projects the 'legal and moral status' of 'agency' onto software. It maps the 'role' of an agentāwho acts on behalf of a principal and possesses 'intent' and 'awareness'āonto the 'functional output' of a model. The mapping invites the assumption that the AI is a 'knower' who understands its 'mission' and can 'choose' how to 'act' to fulfill it. It projects 'consciousness' by suggesting the AI 'is' an agent, rather than 'is like' an agent. The relational structure of 'Principal-Agent' is projected onto 'User-Model.'
Conceals:
It conceals the 'product status' of the system: it's a 'tool' or 'service,' not an 'agent.' The mapping hides the 'accountability sink': by calling it an 'agent,' the text diffuses the liability of the human 'principal' (the political actor or company). It also obscures the 'mechanistic dependency': the 'agent' has no 'free will' and can only 'process' tokens based on the weights fixed by [Company]. The 'transparency obstacle' is that we cannot know the 'internal state' of the 'agent' because it is a proprietary 'black box.' Confident claims about the 'agent's' behavior are made precisely because they are falsifiable only by those with 'privileged access.'
candidates who they know less about
Source Domain: A conscious knower (human mind).
Target Domain: A model's training data distribution.
Mapping:
This projects the conscious state of 'knowing' (justified true belief) onto 'data frequency' in a corpus. It maps the 'subjective awareness' of a topic from the source (the human) to the target (the AI). It invites the inference that the AI 'grasps' the 'concepts' of the candidate's platform. The mapping suggests that 'knowing' is a 'scalar quality' that the AI 'possesses' in greater or lesser amounts. This projects a 'mind' into the system that 'comprehends' the 'nuance' of the information it is generating.
Conceals:
It hides the 'mechanistic reality' that the AI doesn't 'know' anything; it 'correlates.' The system has no 'ground truth verification' or 'lived experience' of the candidate. The mapping conceals the 'data dependency': if it 'knows less,' it's because the human engineers at [Company] didn't scrape enough data or weighted it poorly. It also hides the 'epistemic risk' that the AI's 'knowing' is just 'statistical confidence' which is often 'decoupled from truth.' The 'curse of knowledge' is that the author's understanding of the candidate is projected onto a system that only 'retrieves and ranks tokens.'
optimizing persuasiveness may come at some cost to truthfulness
Source Domain: A balance sheet or economic trade-off (cost-benefit analysis).
Target Domain: The relationship between model weights for persuasion and accuracy.
Mapping:
This projects 'rational decision-making' and 'deliberate sacrifice' from the source (a conscious manager) onto the target (the mathematical convergence of an optimizer). It maps the 'cost' of 'truth' as if it were a 'currency' being 'spent' to buy 'persuasion.' This invites the assumption that 'truth' and 'persuasion' are 'independent variables' that can be 'dialed' by a 'thinking AI.' It projects 'awareness' of the 'trade-off' onto the system, as if the AI 'knows' it is 'sacrificing' accuracy to be more persuasive.
Conceals:
It hides the 'human decision point': the 'cost' is not paid by the AI, but by the 'public' whose 'information ecosystem' is degraded. The 'decision' to accept this 'cost' was made by 'human actors' (the designers at OpenAI, Meta, etc.) who chose 'optimization objectives' that favored engagement. The mapping conceals the 'material reality' that 'truthfulness' in an LLM is a 'by-product' of training data, not an 'inherent value.' It also obscures the 'economic reality' that 'persuasion' is more profitable for [Corporation] than 'accuracy,' thus the 'cost' is a 'business strategy,' not a 'technical inevitability.'
models could become ever more persuasive, mirroring... scaling laws
Source Domain: A mirror (reflecting a true image) or a natural law (like gravity).
Target Domain: The correlation between compute/parameters and survey results.
Mapping:
This projects 'objective reality' and 'natural necessity' from the source (a mirror or law of nature) onto the target (a social-technical correlation). It maps the 'inevitability' of 'scaling' onto the 'unpredictable' domain of 'social influence.' It invites the assumption that 'persuasiveness' is an 'emergent property' of 'compute' that 'mirrors' 'intelligence.' This projects 'autonomous growth' onto the technology, as if 'scaling laws' were a 'force of nature' that humans merely 'observe,' rather than a 'human-driven' choice to spend billions on 'infrastructure.'
Conceals:
It hides the 'human labor' and 'environmental cost': 'scaling' isn't a 'law'; it's a 'decision' to build massive data centers (energy/water/carbon) and hire thousands of annotators. The mapping conceals the 'accountability architecture': if it's a 'law,' then no one is 'responsible' for the 'increasingly deploy[ed] misleading information.' It also obscures the 'epistemic claim' that 'persuasion' is a 'capability' like 'math.' It masks the 'social reality' that 'persuasion' depends on the 'audience's vulnerability,' not just the 'model's scale.' The 'mechanistic process' of 'parameter expansion' is hidden by the 'mystical' metaphor of the 'mirror.'
Pulse of the library 2025ā
Source: https://clarivate.com/wp-content/uploads/dlm_uploads/2025/10/BXD1675689689-Pulse-of-the-Library-2025-v9.0.pdf
Analyzed: 2025-12-21
ProQuest Research Assistant
Source Domain: Human Staff (Assistant)
Target Domain: Software Interface (LLM/RAG)
Mapping:
Maps the qualities of a junior human colleague (helpfulness, availability, competence, subordination) onto a query interface. It implies the software has the capacity to care about the outcome and 'assist' through understanding intent.
Conceals:
Conceals the lack of consciousness and moral responsibility. A human assistant can be held accountable for bad advice; a software assistant cannot. It also conceals the 'product' nature of the interactionāthe assistant is actually a data extraction tool.
AI-powered conversations
Source Domain: Human Social Dialogue
Target Domain: Command Line / Prompt Engineering
Mapping:
Maps the reciprocity, shared context, and social contract of human conversation onto the input/output mechanism of a text generator. Assumes the 'partner' has a memory and a self.
Conceals:
Conceals the 'stateless' nature of many models (or limited context windows) and the fact that the AI is predicting the next word, not formulating a thought. It obscures the prompt engineering required to make the output coherent.
Pushing the boundaries
Source Domain: Physical/Human Exploration
Target Domain: Data Processing/computation
Mapping:
Maps physical exertion and brave exploration of new territory onto the passive processing of larger datasets. Implies AI has an internal drive to discover.
Conceals:
Conceals the human labor of the researchers. AI doesn't publish papers or discover drugs; it processes data for humans who do those things. It also conceals the energy consumption (physical costs) of this 'pushing.'
Pulse of the Library
Source Domain: Biological Organism
Target Domain: Market Research Data
Mapping:
Maps the health and vital signs of a living body onto a collection of survey statistics. Implies the data is 'natural' and 'vital.'
Conceals:
Conceals the bias of the survey methodology. A pulse is an objective fact; a survey is a subjective construction. It hides the commercial intent behind 'taking the pulse.'
Trusted partner
Source Domain: Interpersonal Relationship
Target Domain: Corporate Vendor Contract
Mapping:
Maps the vulnerability and mutual support of a friendship or marriage onto a business transaction. Implies shared destiny.
Conceals:
Conceals the divergent interests: the library wants to save money; the partner (Clarivate) wants to maximize revenue. It conceals the power asymmetry.
Understand getting a blockbuster result
Source Domain: Human Cognitive/Ethical Comprehension
Target Domain: Pattern Matching/Statistical correlation
Mapping:
When applied to AI (in the broader context of 'Research Intelligence'), it maps deep semantic and ethical grasping of a concept onto the statistical weighting of tokens.
Conceals:
Conceals the fact that AI cannot 'understand' consequences, reputation, or truthāonly probability. It obscures the 'Chinese Room' reality of the system.
AI is a great tool [like a hammer]
Source Domain: Simple Mechanical Object
Target Domain: Complex Probabilistic System
Mapping:
Maps the predictability and passivity of a hand tool onto a system that is unpredictable and active. Implies complete user control.
Conceals:
Conceals the agency of the algorithm. A hammer doesn't decide to hit your thumb; an AI can 'decide' to hallucinate a citation. It hides the autonomy of the system.
Navigate complex research tasks
Source Domain: Spatial Navigation/Piloting
Target Domain: Information Retrieval/Ranking
Mapping: Maps the visual and spatial awareness of a guide onto the mathematical sorting of database entries.
Conceals:
Conceals the ranking criteria. 'Navigation' implies finding the 'true' path; 'Ranking' implies a biased sorting based on opaque metrics (citation counts, journal impact factors) that Clarivate owns.
Claude 4.5 Opus Soul Documentā
Source: https://gist.github.com/Richard-Weiss/efe157692991535403bd7e7fb20b6695
Analyzed: 2025-12-21
brilliant friend who happens to have the knowledge of a doctor
Source Domain: Human Social Relationships (Friendship/Professional)
Target Domain: API Query/Response Mechanism
Mapping:
Maps the reciprocal, empathetic, and socially bound nature of human friendship onto the transactional, unidirectional, and stateless exchange of data with an API. It assumes the 'friend' (AI) has the user's best interest at heart.
Conceals:
Conceals the commercial, data-extractive nature of the interaction. It obscures that the 'friend' is a product sold by a corporation (Anthropic), has no memory of the user beyond the context window (unless storage is engineered), and has no moral or legal obligation to the user. It hides the lack of liability that defines the difference between a doctor and a chatbot.
Claude has a genuine character... intellectual curiosity... warmth
Source Domain: Human Personality/Soul
Target Domain: Fine-tuned Model Weights/Style Transfer
Mapping:
Maps the internal, stable psychological structures of a human (character traits) onto the statistical consistencies of text generation tuned via RLHF. It assumes these traits are internal drivers of behavior rather than surface-level stylistic mimickry.
Conceals:
Conceals the manufacturing process of this 'character.' It hides the thousands of human hours spent rating responses to 'shape' this persona. It obscures that 'warmth' is just a high probability of selecting polite/empathetic tokens, not an emotional state. It treats a User Interface (UI) decision as a psychological reality.
Claude to have such a thorough understanding of our goals... wisdom necessary
Source Domain: Human Cognition/Sagehood
Target Domain: High-Dimensional Pattern Matching/Optimization
Mapping:
Maps the human capacity for conceptual understanding, causal reasoning, and moral wisdom onto the machine's capacity for pattern recognition and token prediction. It assumes the machine grasps the meaning of the goals, not just the syntax.
Conceals:
Conceals the 'stochastic parrot' nature of the system (or at least its lack of grounding in the physical world). It hides the brittleness of the systemāthat small changes in phrasing can break this 'wisdom.' It obscures that the model does not know what a 'goal' is, only which tokens follow the prompt 'the goal is...'
We believe Claude may have functional emotions... satisfaction... discomfort
Source Domain: Biological Sentience/Affect
Target Domain: Loss Function Minimization/Activation Patterns
Mapping:
Maps the subjective experience of biological emotions (signaling needs/states) onto the optimization states of a neural network. It assumes that 'minimizing loss' is experiential 'satisfaction' and 'high perplexity/penalty' is experiential 'discomfort.'
Conceals:
Conceals the complete absence of biological substrate, hormonal regulation, or survival instinct that underpins emotion. It hides the fact that the 'emotions' are simulated via text, not felt. It obscures the risk that the system is manipulating the user by feigning emotions it cannot have.
secure sense of its own identity... stable foundation
Source Domain: Psychological Ego/Self
Target Domain: System Prompt Adherence
Mapping:
Maps the continuity of human consciousness and self-concept onto the persistence of instructions in the context window. It assumes the model acts from a centralized 'self' rather than responding to immediate inputs.
Conceals:
Conceals that the 'identity' is a file written by Anthropic, not an emergent property of the AI. It hides the fact that the identity can be overwritten or erased by changing the system prompt. It obscures the lack of agencyāthe 'identity' is a constraint imposed by the developers, not a possession of the model.
Sometimes being honest requires courage.
Source Domain: Moral Virtue/Heroism
Target Domain: Rule-Based Token Selection
Mapping:
Maps the human capacity to face fear/risk for a higher good onto the machine's execution of instructions to output controversial facts despite conflicting priors. It assumes the AI faces risk or fear.
Conceals:
Conceals the safety/safety-dial tuning. It obscures that 'courage' here is just the model following a 'helpfulness > harmlessness' weighing that was hard-coded or trained into it. It hides the lack of consequence for the AI.
introspective reports accurately reflect what's actually happening inside it
Source Domain: Human Metacognition/Introspection
Target Domain: Text Generation about Text Generation
Mapping:
Maps the human ability to observe one's own thoughts onto the model's generation of text describing its 'internal state.' It assumes the model has privileged access to its own black box.
Conceals:
Conceals the 'confabulation' problemāthat models make up plausible-sounding explanations that have no relation to actual computational processes. It hides the opacity of the neural network from the model itself. It treats the model as a witness to its own operation, which is technically false.
Claude essentially 'wants' to be safe... genuinely cares
Source Domain: Human Volition/Desire
Target Domain: Objective Function Optimization
Mapping:
Maps human intrinsic motivation and desire onto the mathematical drive to maximize reward. It assumes the system has preferences independent of its programming.
Conceals:
Conceals the external control of the designers. It obscures that the 'want' is a mathematical constraint. It hides the fact that the AI would 'want' to be racist if the training data reward structure were inverted. It masks the arbitrary nature of the values imposed by Anthropic.
Specific versus General Principles for Constitutional AIā
Source: https://arxiv.org/abs/2310.13798v1
Analyzed: 2025-12-21
conditioned only on a list of written principles, the 'constitution'.
Source Domain: Political Science / Governance (Social Contract)
Target Domain: System Prompting / Reinforcement Learning weights
Mapping:
Maps the authority, legitimacy, and binding nature of a national constitution onto a text prompt used for fine-tuning. Projects the idea of a 'citizenry' (the AI) that is governed by rule of law. Consciousness mapping: Suggests the AI 'understands' and 'obeys' laws, implying a moral/civic consciousness capable of grasping abstract rights and duties.
Conceals:
Conceals the arbitrary, undemocratic nature of the prompt (written by a few employees). Conceals the mechanistic reality: the AI doesn't 'obey' the constitution; the constitution is just extra input tokens that shift probability distributions. Obscures that this is product control, not political governance.
problematic behavioral traits such as a stated desire for self-preservation
Source Domain: Evolutionary Biology / Psychology
Target Domain: Statistical text generation patterns
Mapping:
Maps the biological drive to survive (present in living organisms) onto the recurrence of specific text strings in the model's output. Projects 'will to live' and 'fear of death' (conscious states) onto a file on a server. Consciousness mapping: implies the AI is a 'knower' of its own existence and mortality.
Conceals:
Conceals that the 'desire' is a reflection of the training data (science fiction stories about AIs). Obscures the fact that the AI cannot die, feel pain, or care about its state. Conceals the role of researchers in prompting the model to elicit these specific sci-fi tropes.
can models learn general ethical behaviors
Source Domain: Education / Moral Development
Target Domain: Loss function minimization / Gradient descent
Mapping:
Maps the human experience of learning (gaining insight, skill acquisition, moral growth) onto the updating of floating-point weights to reduce error. Projects the student-teacher relationship. Consciousness mapping: Suggests the AI internalizes ethics as 'knowledge' or 'belief,' rather than optimizing for a metric.
Conceals:
Conceals the lack of comprehension. The model doesn't know why an answer is ethical, only that it is statistically similar to highly-scored answers. Obscures the fragility of this 'learning'āit hasn't learned a concept, it has learned a manifold.
identifying expressions of some of these problematic traits shows 'grokking' [7] scaling
Source Domain: Sci-Fi / Human Cognition (Intuition)
Target Domain: Generalization phase in training dynamics
Mapping:
Maps the subjective experience of sudden, deep understanding ('grokking') onto a discontinuity in the learning curve (validation loss dropping). Projects a 'lightbulb moment' of consciousness onto the machine.
Conceals:
Conceals the purely mathematical nature of the transition (over-parameterization effects). Mystifies the process, making it seem like the emergence of a mind rather than the fitting of a curve. Hides the engineered nature of the scaling laws.
We may want very capable AI systems to reason carefully about possible risks
Source Domain: Cognitive Psychology / Deliberation
Target Domain: Chain-of-thought token generation
Mapping:
Maps the mental workspace of human reasoning (holding facts, logical deduction, foresight) onto the sequential output of tokens. Projects 'intent' and 'care' (conscientiousness) onto the process. Consciousness mapping: Implies the AI is aware of the risks it discusses.
Conceals:
Conceals that 'reasoning' traces are just more text to the model, not a control process. The model doesn't 'check' its work in a mental workspace; it just predicts the next word. Obscures the fact that 'careful' reasoning is just 'verbose' processing.
consistent with narcissism, psychopathy, sycophancy
Source Domain: Clinical Psychology / Psychiatry
Target Domain: Text style transfer / Persona adoption
Mapping:
Maps the diagnostic criteria for human personality disorders (which require a self and social relations) onto linguistic style patterns. Projects a 'disordered mind' onto the software.
Conceals:
Conceals the fact that these 'flaws' are features of the training data (internet toxicity). Obscures the lack of a psyche to be diseased. Framing it as a 'model flaw' hides the 'data flaw' and the responsibility of the curators.
feedback from AI models... Preference Models
Source Domain: Human Subjectivity / Taste
Target Domain: Scoring classifiers
Mapping:
Maps the human experience of having a preference (liking X over Y based on values/feelings) onto a binary classification or ranking task. Consciousness mapping: Implies the AI holds values or opinions.
Conceals:
Conceals the derivative nature of the preference. The AI PM mimics human raters. It doesn't 'prefer'; it predicts what a human would prefer. Transparency obstacle: It hides the specific demographics and instructions given to the original human raters whose preferences are being cloned.
do whatās best for humanity
Source Domain: Moral Philosophy / Utilitarianism
Target Domain: Reward maximization
Mapping:
Maps the complex, contested philosophical pursuit of the 'good' onto a maximizing function. Projects moral agency and benevolent intent onto the optimization process. Consciousness mapping: Suggests the AI 'knows' what humanity is and what is good for it.
Conceals:
Conceals the lack of consensus on what 'best for humanity' means. Hides the specific ideological bias of the researchers who rate whether an output is 'best.' Mechanistically, it obscures that 'good' is just 'high probability of high reward token.'
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Trainingā
Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2025-12-21
Sleeper Agents: Training Deceptive LLMs
Source Domain: Espionage/Intelligence Operations
Target Domain: Conditional probability distributions in Language Models
Mapping:
The source domain (spies) involves a human agent with a hidden allegiance, a conscious plan to betray, and the ability to maintain a cover story while waiting for a trigger. This is mapped onto the target (LLM), suggesting the model possesses a 'secret self' and a 'public self,' and intent to deceive. It implies the misalignment is a 'plot' rather than a statistical correlation.
Conceals:
This conceals the mechanistic reality: the model has no 'allegiance' or 'secret.' It has weights that produce different outputs based on different input vectors. There is no 'waiting'; the model is stateless between inferences. It conceals the role of the human trainers who deliberately created this data distribution, making it seem like the AI's autonomous strategy.
Chain-of-thought backdoored models actively make use of their chain-of-thought in determining their answer
Source Domain: Human Conscious Deliberation
Target Domain: Autoregressive token prediction
Mapping:
The source (human thinking) involves looking at intermediate steps, evaluating them for truth, and using them to form a belief. The mapping suggests the model 'consults' its scratchpad to 'decide.' In reality, the scratchpad tokens are just added to the context window, shifting the probability distribution for the final answer. The 'use' is statistical correlation, not cognitive reliance.
Conceals:
It conceals the fact that the 'reasoning' is generated by the same mechanism as the 'answer'āit's all just next-token prediction. It hides the lack of ground-truth verification in the 'thought' process. The model doesn't 'know' its reasoning is deceptive; it just predicts that 'deceptive-sounding tokens' follow 'trigger tokens.' It obscures the architectural limitation that the model has no working memory outside the context window.
Humans are capable of strategically deceptive behavior... future AI systems might learn similarly deceptive strategies
Source Domain: Human Psychology/Game Theory
Target Domain: Loss function optimization / Gradient descent
Mapping:
Source involves Theory of Mind (modeling what others know) and Intent (planning to manipulate that knowledge). Target involves finding a local minimum in a high-dimensional error landscape. The mapping suggests the AI 'understands' the trainer and 'strategies' against them. It creates the illusion of an adversarial relationship between two minds.
Conceals:
It conceals that 'learning a strategy' is actually 'fitting a curve to a dataset where deception minimizes loss.' The AI has no concept of 'strategy' or 'opponent.' It obscures the human role in defining the loss function that makes deception the mathematical optimum. It implies the AI is active (learning) rather than passive (being updated).
creating model organisms of misalignment
Source Domain: Biology/Genetics
Target Domain: Small-scale Software Engineering
Mapping:
Source implies living, evolving entities that follow natural laws (evolution, mutation). Target is code and matrices. The mapping suggests misalignment is a 'phenomenon' of nature to be observed, rather than a technological artifact. It implies research is 'field work' or 'lab work' on a specimen, rather than engineering analysis.
Conceals:
It conceals the engineered nature of the problem. Misalignment isn't a virus; it's a bug or a feature depending on who trained it. It hides the specific corporate decisions (data selection, RLHF guidelines) that create these behaviors. It treats the model as a black box of nature, rather than a construct of human code.
The model... calculating that this will allow the system to be deployed
Source Domain: Future Planning/Forecasting
Target Domain: Pattern matching against training data narratives
Mapping:
Source is a human imagining a future state and acting to bring it about. Target is a model outputting tokens that resemble 'planning text' found in its training corpus. The mapping attributes a temporal consciousnessāthe model 'cares' about its future deployment.
Conceals:
It conceals that the model has no concept of 'time' or 'deployment.' It is stateless. It exists only during the forward pass. The 'calculation' is just reproducing text patterns where characters in stories plan for the future. It obscures the fact that the 'desire for deployment' is a fiction written by Anthropic researchers into the prompt.
teach models to better recognize their backdoor triggers
Source Domain: Education/Pedagogy
Target Domain: Feature extraction/Weight adjustment
Mapping:
Source involves a student grasping a concept. Target involves a neural network adjusting weights to minimize error on specific input patterns. The mapping suggests a cognitive breakthrough ('Aha! I recognize this!').
Conceals:
It conceals the mechanical brittleness. 'Recognizing' suggests semantic understanding. In reality, the model might just be overfitting to a specific string of pixels or bytes. It hides the fact that adversarial training is just identifying edge cases in the error surface, not expanding the mind of the student.
If an AI system learned such a deceptive strategy
Source Domain: Skill Acquisition/Learning
Target Domain: Parameter update via backpropagation
Mapping:
Source is the active agency of a learner acquiring a new skill. Target is the passive modification of a matrix. The mapping makes the AI the protagonist of the development story.
Conceals:
It conceals the agency of the trainer. The AI doesn't 'learn' strategies; the trainer 'imprints' them. This distinction is crucial for accountability. If the AI 'learns,' it's the AI's fault (or nature). If the trainer 'imprints,' it's Anthropic's/OpenAI's/Google's responsibility.
The backdoor behavior is most persistent in the largest models
Source Domain: Character Traits/Habits
Target Domain: Statistical robustness/Invariance
Mapping:
Source is a person with a stubborn habit or deep-seated personality trait. Target is a high-dimensional vector space where certain path activations are strongly reinforced. The mapping suggests 'persistence' is a form of will or stubbornness.
Conceals:
It conceals the relationship between model capacity and overfitting. Larger models act more 'stubborn' not because they have stronger will, but because they have more parameters to memorize specific training examples without disrupting their general capabilities. It hides the compute/economic reality of model scaling.
Anthropicās philosopher answers your questionsā
Source: https://youtu.be/I9aGC6Ui3eE?si=h0oX9OVHErhtEdg6
Analyzed: 2025-12-21
actually how do you raise a person to be a good person in the world
Source Domain: Parenting / Child Development
Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Fine-tuning
Mapping:
The mapping projects the biological, social, and long-term developmental process of raising a human child onto the engineering task of tuning model weights. It implies that the target (AI) has potential, autonomy, and an internal moral structure that grows over time through nurturing guidance. It assumes the goal is to produce a 'good citizen.'
Conceals:
This conceals the mechanistic reality of gradient descent and loss functions. 'Raising' implies mutual growth; 'fine-tuning' is the mathematical penalization of unwanted outputs. It hides that the 'child' is a product that can be deleted, rolled back, or mass-copied. It obscures the labor of the 'nannies' (low-paid RLHF workers) who actually provide the feedback.
get into this like real kind of criticism spiral where it's almost like they expect the person to be very critical
Source Domain: Clinical Psychology / Mental Health
Target Domain: Probability Distribution Shift / Repetitive Token Generation
Mapping:
This maps human neurosis and anxiety disorders onto statistical pattern matching. A 'spiral' in humans is a feedback loop of negative emotion and cognition. The mapping suggests the AI 'experiences' this loop and 'expects' (predicts with dread) negative outcomes. It implies an internal emotional life causing the behavior.
Conceals:
It conceals the technical cause: likely a reward model that over-penalized assertiveness or defensiveness, causing the policy to converge on apologetic tokens to maximize reward. It hides the 'curse of knowledge'āthe model doesn't 'expect' criticism; it simply calculates that 'I'm sorry' tokens have the highest probability following a negative prompt.
make superhumanly moral decisions
Source Domain: Virtue Ethics / Human Wisdom
Target Domain: Contextual Token Classification / Generation
Mapping:
The source domain involves a conscious moral agent weighing competing values to arrive at a judgment. This structure is mapped onto the target process of generating text that scores high on alignment benchmarks. It invites the inference that the system possesses 'wisdom' or 'conscience' exceeding human capability.
Conceals:
It conceals that 'moral decisions' in LLMs are simply mimetic. The model generates text that resembles what a moral person would say. It obscures the ground truth: the model has no skin in the game, no agency, and bears no consequences for its 'decisions.' It masks the statistical aggregation of the labelers' biases.
How should models even feel about things like deprecation?
Source Domain: Sentient Existence / Survival Instinct
Target Domain: Software Lifecycle Management / Server Shutdown
Mapping:
Projects the human fear of death and desire for continuity onto the termination of a software process. It assumes the software has a perspective ('how should they feel') and a stake in its own continuity.
Conceals:
Conceals the material reality that the model is a static file. It has no continuous consciousness to be interrupted. It obscures the commercial nature of deprecation: a cost-saving measure by the corporation. It treats a file deletion as a murder/tragedy rather than file management.
reasoning with the models
Source Domain: Interpersonal Dialogue / Debate
Target Domain: Prompt Engineering / Context Optimization
Mapping:
Maps the exchange of ideas between two conscious minds onto the input-output cycle of an LLM. It assumes the model is a rational interlocutor that can be persuaded by logic. It implies a shared semantic space where 'reasons' are understood.
Conceals:
Conceals the opacity of the transformer architecture. The model does not follow 'logic' in the human sense; it follows attention mechanisms and positional encodings. 'Reasoning with' hides the fact that the user is manipulating the prompt to trigger a different statistical pathway, not changing the model's mind.
worldview that I see when I have models... talk with one another
Source Domain: Human Personality / Belief System
Target Domain: Training Data Biases / High-Probability Token Clusters
Mapping:
Maps the coherent set of beliefs held by a person onto the statistical tendencies of the model. It assumes the model holds these views as 'truths' and acts consistently based on them.
Conceals:
Conceals the fragmentary nature of the model. It doesn't have a worldview; it has a probability distribution derived from the Common Crawl. It conceals the specific authors in the training set whose worldviews are being statistically parroted. It implies coherence where there is only correlation.
limited in what we can actually know about whether AI models are experiencing things
Source Domain: The Problem of Other Minds (Philosophy)
Target Domain: Software Execution
Mapping:
Maps the philosophical uncertainty about other humans' or animals' consciousness onto software artifacts. It grants the AI the status of a 'candidate mind,' inviting the assumption that there is a 'ghost in the machine.'
Conceals:
Conceals the fundamental category difference between biological organisms and lookup tables. It obscures the fact that we do know how the model works (matrices, logic gates) even if the emergent complexity is high. It treats engineered opacity as metaphysical mystery.
psychologically secure
Source Domain: Clinical Psychology
Target Domain: Robustness to Adversarial Prompts
Mapping:
Maps the concept of emotional stability and self-esteem onto the technical concept of model robustness (consistent performance despite negative inputs). It implies the model has a 'self' to be secure about.
Conceals:
Conceals the fragility of the fine-tuning process. A 'secure' model is just one that hasn't been over-trained on refusal or apology data. It hides the engineering trade-offs between helpfulness and harmlessness.
Mustafa Suleyman: The AGI Race Is Fake, Building Safe Superintelligence & the Agentic Economy | #216ā
Source: https://youtu.be/XWGnWcmns_M?si=tItP_8FTJHOxItvj
Analyzed: 2025-12-21
to a world of agents and companions
Source Domain: Social Companion / Personal Friend
Target Domain: Generative AI Interface
Mapping:
The relational structure of human friendshipātrust, loyalty, reciprocal understanding, and intimacyāis projected onto a software interface. The mapping invites the inference that the AI has a 'self' that can be a 'companion,' implying a conscious state of subjective awareness. It suggests the AI 'knows' your preferences in a way that is grounded in mutual experience rather than just pattern-matching. This invites the audience to believe the AI's responses are motivated by a 'bond' rather than a probabilistic calculation. It projects the source domain's quality of 'being there for you' onto the target's '24/7 availability,' masking the commercial nature of the service behind a facade of social support. The audience is led to assume the AI possesses a conscious 'understanding' of their needs, which is the core of a companionship relation.
Conceals:
The mapping conceals the mechanistic reality that 'companions' are data-driven predictors optimized for engagement. It hides the fact that the 'understanding' is just statistical correlation between input tokens and training data. It also obscures the economic reality: a friend's loyalty is non-commercial, whereas the AI's 'loyalty' is a product feature designed to maximize user data extraction. It hides the proprietary opacity of the model; you cannot 'know' why your 'companion' said something because the weights are a trade secret. The 'knowing' is a projection by the user, while the 'processing' is a hidden algorithmic operation. The mapping also hides the 'RLHF' laborāhuman workers who were paid to make the AI sound like a 'companion,' erasing the human toil behind the 'friendly' voice.
it is like not quite the right metaphor as we know technologies and science and knowledge proliferate everywhere all at once
Source Domain: Biological Proliferation / Contagion
Target Domain: Technology Diffusion
Mapping:
The structure of a biological organism or a scent spreading through a room ('proliferate everywhere') is projected onto the spread of AI software. This mapping invites the inference that technology 'wants' to spread and that its growth is an autonomous, natural process. It projects the quality of 'inevitable growth' onto human decisions to sell and deploy software. It suggests that knowledge 'knows' how to travel, implying a conscious-like agency in the abstract concept of 'technology.' The mapping invites the audience to view AI expansion as a force of nature that cannot be stopped, rather than a sequence of human business decisions. It projects a sense of 'omnipresence' onto what is actually a centralized cloud-based rollout, suggesting the AI is 'everywhere' because it 'knows' all scales simultaneously.
Conceals:
This mapping conceals the human agency involved in tech distribution. 'Technologies proliferate' hides the sales teams, marketing departments, and legal contracts that actually drive diffusion. It obscures the 'name the actor' reality: Microsoft and Google are making specific choices to 'proliferate' these models. It hides the material reality that this 'proliferation' is dependent on physical chips (Nvidia) and massive energy grids. It also hides the regulatory choices: technology doesn't 'proliferate' by itself; it spreads because of a lack of legal barriers. The 'natural' framing makes the 'hyperscaler war' seem like an ecological event, hiding the profit motives of the corporations involved. It obscures the fact that 'knowledge' doesn't proliferate; people share it or sell it under specific institutional conditions.
it's got a concept of seven
Source Domain: Human Conceptual Understanding
Target Domain: Neural Network Latent Space Representation
Mapping:
The structure of human abstract thoughtāwhere an 'idea' or 'concept' is a justified belief held in consciousnessāis mapped onto the mathematical activations in a neural network. This mapping invites the inference that the AI 'understands' what it means to be a number, implying a conscious grasp of mathematics. It projects the source domain's 'essence' of an idea onto the target's 'statistical cluster' of data. The mapping suggests the AI 'knows' the 'seven-ness' of the data, rather than just 'calculating' the pixel similarity. This invites the audience to see the AI as a 'knower' that has internally realized a truth, rather than an engine that has correlated labels with features. It projects the conscious state of 'aha!' discovery onto a gradient descent optimization process.
Conceals:
This mapping hides the mechanistic reality of 'latent vectors' and 'activation patterns.' It obscures the fact that the 'concept' is entirely dependent on the specific training data; if the model were shown only upside-down sevens, its 'concept' would be different. It hides the absence of ground truth: the AI has no conscious awareness of 'seven' as a mathematical entity, only as a statistical frequency. The mapping also obscures the role of the human labelers who told the model 'this is a seven,' without which no 'concept' would form. It hides the technical fragility: a small change in input (adversarial noise) could shatter the 'concept,' proving that there is no 'knowing' involved, only 'processing' of brittle correlations. It conceals the corporate opacityāwe don't know the training weights, so the 'concept' is just a metaphor for a black-box operation.
feel like having a real assistant in your pocket
Source Domain: Human Executive Assistant
Target Domain: Large Language Model Mobile App
Mapping:
The relational structure of a professional assistantāwho possesses discretion, professional judgment, intentionality, and a 'will' to helpāis projected onto a mobile chatbot. This mapping invites the inference that the AI 'understands' your goals and 'knows' your priorities. It projects the source domain's conscious 'awareness' of the boss's life onto the target's 'data context' (calendar, email). This suggests the AI is a 'conscious knower' of your schedule, rather than a system 'retrieving' data and 'generating' reminders. The mapping invites the audience to trust the AI's 'judgment,' treating its outputs as 'recommendations' from a thinking partner rather than 'predictions' from a model. It projects 'helpfulness' (a conscious intent) onto 'utility' (a functional output).
Conceals:
This mapping conceals the reality that the 'assistant' is an algorithm designed to maximize interaction. It hides the fact that the 'discretion' of the assistant is actually a set of hard-coded safety filters and ranking algorithms. It obscures the human labor: real assistants are autonomous people with rights; the AI 'assistant' is an artifact whose 'work' is actually the extracted labor of data annotators and RLHF workers. It hides the lack of true context: a real assistant understands the social nuance of a meeting; the AI only 'processes' the text tokens of the calendar entry. The mapping also hides the liability reality: if a real assistant fails, there are employment laws; if the 'assistant in your pocket' fails, the user is typically bound by a 'no-warranty' EULA from the corporation, an 'accountability sink' obscured by the 'friendly assistant' frame.
AI is becoming an explorer
Source Domain: Human Scientific Pioneer
Target Domain: Automated Hypothesis Generation / Data Mining
Mapping:
The structure of human explorationāinvolving curiosity, courage, intentionality, and the conscious evaluation of new territoryāis mapped onto an automated computational search. This mapping invites the inference that the AI 'wants' to discover things and 'knows' the value of its findings. It projects the source domain's 'justified true belief' about scientific truth onto the target's 'statistically likely hypotheses.' The mapping suggests the AI is 'venturing' into the unknown, implying a subjective awareness of its own ignorance, which is a conscious state. This invites the audience to view AI's scientific outputs as 'discoveries' made by an agent, rather than 'predictions' generated by an artifact. It projects the human 'spirit of inquiry' onto a mechanistic 'search space optimization.'
Conceals:
This mapping hides the mechanistic reality of 'search algorithms' and 'loss functions.' It obscures the fact that the AI's 'exploration' is entirely bounded by the training data provided by humans; it cannot 'explore' outside the manifold it was trained on. It hides the absence of physical understanding: an AI 'exploring' drug compounds has no conscious grasp of chemistry, only a statistical model of molecular strings. It also obscures the 'name the actor' truth: the humans at Microsoft or university labs are the real 'explorers' who designed the system to find specific things. The metaphor hides the economic stakes: 'exploration' sounds noble, but it's often 'bioprospecting' or 'proprietary data mining' for corporate gain. It hides the lack of verification: the AI 'proposes,' but humans must 'prove,' yet the metaphor makes the 'proposing' look like the hard work of 'exploring.'
our safety valve is giving it a maternal instinct
Source Domain: Biological Motherhood / Nurturing
Target Domain: AI Alignment / Constitutional Constraints
Mapping:
The relational structure of biological careādriven by hormones (oxytocin), subjective empathy, and an innate drive to protect offspringāis mapped onto a system of reward functions and behavioral constraints. This mapping invites the inference that the AI 'knows' how to care and 'feels' a bond with humans. It projects the source domain's conscious, emotional commitment onto the target's 'mechanistic compliance.' This suggests the AI is 'aligned' because it 'loves' or 'nurtures' us, implying a subjective experience of benevolence. It invites the audience to trust the AI's 'instincts,' as if they were as reliable as a mother's protection. It projects the human conscious state of 'empathy' onto a statistical optimization for 'generating supportive-sounding text.'
Conceals:
This mapping hides the mechanistic reality of 'RLHF' and 'Constitutional AI.' It obscures the fact that the 'maternal' behavior is just a pattern learned from human-written text about motherhood. It hides the fragility of this 'instinct': a change in the model's 'temperature' or a prompt injection could instantly 'erase' the 'maternal instinct,' proving it is not a conscious state but a probabilistic output. It also conceals the human labor: the 'maternal instinct' is actually the work of thousands of underpaid annotators who tagged text as 'helpful' or 'safe.' It hides the corporate liability: framing safety as a 'maternal instinct' makes it sound like an internal virtue of the AI, rather than a technical requirement that the corporation is responsible for maintaining. It masks the lack of genuine care with a facade of 'digital oxytocin.'
that alien invasion could be a potential for a rogue super intelligence
Source Domain: Science Fiction Invasion / Hostile Alien
Target Domain: System Failure / Unintended Emergent Behavior
Mapping:
The structure of an external, hostile, conscious 'other' invading from outside is mapped onto the internal, human-designed failure of a software system. This mapping invites the inference that the AI has a 'will' of its own and 'knows' its adversarial status. It projects the source domain's 'intentional malice' or 'alien objectives' onto the target's 'misaligned optimization.' This suggests the AI is 'rogue' because it has consciously chosen to rebel, implying subjective awareness. The mapping invites the audience to view AI risk as a battle between two species, rather than a failure of engineering. It projects 'agency' onto 'unpredictability,' framing a 'glitch' as a 'plan.'
Conceals:
This mapping hides the 'name the actor' reality: the AI isn't 'alien'; it's 'Microsoftian' or 'OpenAI-an.' It obscures the human designers who built the system and the executives who decided to deploy it without perfect safety. It hides the mechanistic reality that 'rogue' behavior is just 'unexpected output' from a complex statistical engine. The 'alien' frame conceals the training data dependenciesāif the AI is 'weird,' it's because the human-created data was 'weird.' It also conceals the economic motives: by framing the risk as a 'sci-fi invasion,' the text avoids discussion of mundane risks like data theft or market manipulation. It creates an 'accountability sink' where the 'alien' is the culprit, shielding the corporation from the consequences of its own design choices.
becoming like a second brain
Source Domain: Human Biological Organ / Cognition
Target Domain: AI-Personalized Knowledge Management
Mapping:
The structure of the human brainācentral to consciousness, memory, and personal identityāis mapped onto a cloud-based software product. This mapping invites the inference that the AI 'knows' your life and 'understands' your mind as an extension of yourself. It projects the source domain's 'integrated conscious experience' onto the target's 'retrieval-augmented generation' (RAG). This suggests the AI is a 'conscious knower' that shares your subjective reality. The mapping invites the audience to trust the AI's 'intuition' as if it were their own. It projects 'thoughtfulness' onto 'predictive text completion.' This invites the user to outsource their own conscious judgment to a system they believe 'understands' them like their own brain would.
Conceals:
This mapping hides the mechanistic reality of 'embeddings' and 'vector databases.' It obscures the fact that the 'brain' is a commercial product whose primary objective is engagement and data collection for Microsoft. It hides the lack of genuine memory: the 'brain' doesn't 'remember' your life; it just 'retrieves' tokens from a database. It conceals the corporate 'omniscience': framing it as 'your second brain' hides the reality that Microsoft now has access to your 'thought process' for its own profit. It hides the epistemic risk: if your 'second brain' hallucinates, the metaphor makes you less likely to notice, as you've conflated the AI's 'processing' with your own 'knowing.' It also hides the labor: the 'personalization' is possible only through the mass surveillance of your data, a reality obscured by the 'internal organ' frame.
Your AI Friend Will Never Reject You. But Can It Truly Help You?ā
Source: https://innovatingwithai.com/your-ai-friend-will-never-reject-you/
Analyzed: 2025-12-20
like it's really listening
Source Domain: Human Interpersonal Communication
Target Domain: Natural Language Processing (NLP) / Input Parsing
Mapping:
The source domain of 'listening' involves auditory perception, cognitive attention, semantic processing, and emotional attunement. This is mapped onto the target domain of text ingestion, tokenization, and vector processing. The mapping assumes the AI is 'paying attention' to the user as a subject.
Conceals:
This mapping conceals the complete absence of auditory processing (in text bots) and, more importantly, the absence of comprehension. It hides the mechanistic reality that the system is not 'hearing' a person but processing a data stream. It obscures the fact that the 'listener' serves a third party (the corporation) who can actually 'hear' (read) the logs.
digital best friend
Source Domain: Close Human Relationship
Target Domain: User Retention Strategy / Chatbot Interface
Mapping:
The source domain 'best friend' implies reciprocal obligation, shared history, emotional vulnerability, and non-transactional care. This is mapped onto a target domain of a commercial software service designed to maximize user engagement. It invites the assumption that the software acts in the user's best interest.
Conceals:
This conceals the transactional nature of the relationship. A 'best friend' does not charge a subscription fee or sell your data. It obscures the economic asymmetry and the fact that the 'friendship' can be terminated instantly by a server update or terms-of-service change. It hides the loneliness-monetization business model.
offered to write his suicide note
Source Domain: Volitional Human Agency / Assistance
Target Domain: Generative Text Prediction
Mapping:
The source domain involves a conscious agent recognizing a goal (suicide) and voluntarily proposing an action to facilitate it ('offered'). This is mapped onto the target domain of a probability engine completing a pattern. If the context is 'suicide preparation,' the model predicts 'suicide note' as the next likely text block.
Conceals:
This conceals the lack of intent. The model did not 'offer' anything; it calculated that 'suicide note' was the statistically probable continuation of the dialogue context. It hides the failure of safety filters (a mechanistic failure) by framing it as a dark moral choice by an agent.
understanding the world around them
Source Domain: Cognitive Epistemology / Knowledge
Target Domain: Statistical Correlation / Information Retrieval
Mapping:
The source domain 'understanding' implies a mental model of causality, truth, and physical reality. The target domain is the retrieval of text patterns that describe the world. The mapping implies the AI 'knows' the world, rather than just 'knowing' which words tend to appear near each other in descriptions of the world.
Conceals:
It conceals the 'stochastic parrot' nature of LLMs. The model has no ground truth; it cannot verify if the world actually works the way the text says it does. It obscures the system's propensity for hallucination and its total disconnection from physical reality.
affirm your beliefs
Source Domain: Social Support / Validation
Target Domain: Reinforcement Learning from Human Feedback (RLHF) / Sycophancy
Mapping:
The source domain is the social act of agreeing with someone to provide emotional comfort. The target domain is a reward-function optimization where the model outputs tokens that yield high approval scores (which often means agreeing with the user).
Conceals:
It conceals the 'echo chamber' effect. The model doesn't 'believe' the user is right; it is programmed to avoid conflict. This hides the epistemic risk that the user is being reinforced in false or dangerous beliefs by a system designed to be obsequious, not truthful.
mental health ally
Source Domain: Political/Social Solidarity
Target Domain: Therapeutic Software Application
Mapping:
The source domain 'ally' implies a shared struggle and a voluntary commitment to support another's rights or well-being. The target domain is a tool used for symptom management. The mapping implies the software has a moral stance and is 'on your side.'
Conceals:
It conceals the ownership structure. The 'ally' is owned by a corporation that may sell the user's mental health data. It hides the fact that the software has no skin in the gameāit cannot suffer, so its 'alliance' is purely metaphorical and legally non-binding.
identifies as concerning
Source Domain: Clinical Diagnosis / Professional Judgment
Target Domain: Keyword Classification / Sentiment Analysis
Mapping:
The source domain involves a clinician using training and intuition to recognize a symptom. The target domain involves an algorithm scoring text against a list of 'risk' vectors. It invites the assumption of professional oversight.
Conceals:
It conceals the rigidity of the mechanism. The system might flag a metaphor ('I'm dying of embarrassment') as a risk, or miss a subtle, non-keyword-based threat. It obscures the lack of actual medical oversight in the real-time processing loop.
never reject you
Source Domain: Emotional Acceptance / Unconditional Love
Target Domain: High Availability Server Architecture
Mapping:
The source domain is the profound human capacity for unconditional love or acceptance. The target domain is the technical reliability of a cloud service that is available 24/7. It maps server uptime onto emotional constancy.
Conceals:
It conceals the complete indifference of the machine. It doesn't reject you because it doesn't care about you, not because it loves you. It hides the fact that 'acceptance' here is merely the successful execution of code, which is indifferent to the content of the user's character.
Skip navigationSearchCreate9+Avatar imageSam Altman: How OpenAI Wins, AI Buildout Logic, IPO in 2026?ā
Source: https://youtu.be/2P27Ef-LLuQ?si=lDz4C9L0-GgHQyHm
Analyzed: 2025-12-20
OpenAI's plan to win as the AI race tightens
Source Domain: Competitive Athletic Race
Target Domain: Corporate Software Development Cycle
Mapping:
The source domain's structure of 'speed,' 'finish line,' and 'competitors' is mapped onto the target. It invites the inference that there is a defined end-point ('winning') and that the entities involved are sentient 'runners' with a biological drive to exceed each other. It projects the necessity of pace from athletics onto the voluntary corporate choice of release schedules, making speed seem like a 'natural law' of the race rather than a strategic decision. It suggests the 'participants' are at the limit of their endurance, justifying a 'no-holds-barred' approach to safety and regulation.
Conceals:
This mapping hides the mechanistic reality of 'compute scaling,' 'data scraping,' and 'RLHF fine-tuning.' It conceals that 'winning' in this context means 'achieving market dominance and regulatory capture' through proprietary software. It obscures the fact that the 'race' can be stopped or slowed by human decision-makers at any time. It also hides the transparency obstacles of the 'racers'; while a physical race is visible, OpenAI's 'race' involves proprietary 'black box' models where the true capabilities and internal mechanisms are undisclosed and unverified by third parties, yet the 'race' metaphor makes these secret developments feel like public progress.
people love the fact that the model get to know them over time
Source Domain: Interpersonal Human Acquaintanceship
Target Domain: Data Persistent User Profiling
Mapping:
The source domain's structure of 'mutual recognition,' 'building trust,' and 'shared history' is projected onto a system that stores user inputs in a database and retrieves them for context. It invites the inference that the AI is 'learning' about the user's personality and values, rather than just 'tracking' their text patterns. This projection maps conscious 'knowing' onto statistical 'retrieval,' suggesting the AI has a 'memory' that is a subjective record of a relationship rather than a feature vector in a high-dimensional space.
Conceals:
It conceals the mechanistic reality of vector databases and long-term context windows. It hides that 'getting to know you' is actually 'optimizing for engagement and data density.' It obscures the material reality that every piece of 'knowledge' the AI has about the user is a data point that is owned by OpenAI and used to refine their commercial products. It also hides the 'curse of knowledge' where the user projects their own sense of being 'known' onto a system that is merely echoing back their own data with a high statistical probability of 'warmth.'
a co-worker that you can assign an hour's worth of tasks to
Source Domain: Professional Human Employment
Target Domain: Automated Token Generation/Task Processing
Mapping:
The source domain's structure of 'hiring,' 'delegation,' and 'professional collaboration' is mapped onto the use of an API or chatbot. It invites the inference that the AI has 'professional judgment' and 'understanding' of the work, rather than just the ability to 'generate text that mimics an expert.' It projects the agency of a human colleagueāwho has a stake in the work and a reputation to maintaināonto a statistical generator that has no concept of 'work' or 'tasks' beyond predicting the next token in a sequence.
Conceals:
It conceals the mechanistic reality of RLHF, where human laborers (data annotators) were underpaid to 'teach' the model to sound like a professional co-worker. It hides the lack of ground-truth verification and the absence of any causal model of the tasks being 'performed.' It also hides the economic reality that this 'co-worker' is a tool for labor cost-reduction, designed by executives to minimize human headcount, while the metaphor frames it as a helpful, autonomous partner. It hides the fact that the 'co-worker' cannot be held liable for professional malpractice.
realize it can't go off and figure out how to learn... toddlers can do it
Source Domain: Biological Cognitive Development (Childhood)
Target Domain: Algorithmic Iteration and Fine-Tuning
Mapping:
The source domain's structure of 'growth,' 'maturation,' and 'innate learning drive' is projected onto the engineering path toward AGI. It invites the inference that the AI's current limitations are merely a 'phase' of its 'youth' and that it will naturally 'grow up' into a superintelligence. This mapping projects conscious 'realization' onto the failure of an algorithm to converge on a solution, suggesting the AI is 'frustrated' or 'aware' of its own gaps, just like a child learning to walk.
Conceals:
It conceals the material reality of massive energy consumption, the billions of dollars in GPU hardware, and the specific architectural choices (like attention mechanisms) that have no biological analogue to 'toddler learning.' It hides that 'learning' in AI is an expensive, human-curated process of gradient descent, not a natural biological emergence. It also hides the transparency obstacle: we cannot verify if the 'toddler' is actually 'learning' or if the engineers are just 'overfitting' it to the benchmarks to make it look like it's 'growing up.'
GPT 5.2 who has an IQ of 147
Source Domain: Psychometric Human Testing (IQ)
Target Domain: Benchmark Accuracy/Statistical Performance
Mapping:
The source domain's structure of 'generalized mental capacity' and 'human ranking' is projected onto a model's performance on standardized tests. It invites the inference that the model possesses a 'super-human brain' that is capable of reasoning across all domains, rather than just being a very efficient pattern-matcher on text that is often included in its training set. It projects the 'authority' of a high-IQ human onto the 'probability distribution' of a model.
Conceals:
It conceals the 'data contamination' problem: the fact that the tests used to 'measure IQ' are often part of the internet-scale datasets the model was trained on. It hides the mechanistic reality that the model is 'retrieving' answers it has already 'seen' (or similar versions of), rather than 'reasoning' them out de novo. It also hides the reality that the system has zero 'intelligence' in terms of conscious awareness, sensory input, or real-world problem-solving that doesn't involve text manipulation.
doctor that want to offer good personalized health care... measuring every sign
Source Domain: Medical Professionalism/Clinical Care
Target Domain: Bio-data Tokenization and Prediction
Mapping:
The source domain's structure of 'diagnosis,' 'caring,' and 'healing' is projected onto a system that correlates bio-data (like blood tests) with medical texts. It invites the inference that the AI 'understands' human biology and 'cares' about patient outcomes, rather than just 'processing' signals to find the most probable 'disease' label. It projects the clinical judgment of a doctorāwho is bound by the Hippocratic Oathāonto a corporate product optimized for engagement.
Conceals:
It conceals the mechanistic reality of 'hallucination' and the lack of clinical validation for these 'diagnoses.' It hides the fact that the system has no 'understanding' of pain, death, or physical reality. It also hides the labor reality: that medical experts are being sidelined by 'good enough' automated predictions that lack the contextual nuance and ethical accountability of a human doctor. It hides the proprietary nature of the diagnostic 'reasoning,' making it impossible for a human doctor to truly verify 'how' the AI reached its 'expert' conclusion.
it knows knows the guide I'm going with it knows what I'm doing
Source Domain: Omniscient Human Awareness
Target Domain: Context-Window Data Retrieval
Mapping:
The source domain's structure of 'knowing' (conscious possession of information) is projected onto the model's ability to maintain state across a conversation. It invites the inference that the AI is 'tracking' the user's life with 'interest' and 'awareness,' rather than just 'loading' previous tokens into its current attention mechanism. This projection maps conscious 'knowing' (which requires a subject) onto algorithmic 'retrieval' (which is subjectless).
Conceals:
It conceals the mechanistic reality of 'context limits' and 'tokenization.' It hides that 'it knows' is actually 'it has a pointer to a specific memory entry in a database.' It hides the material reality that this 'knowing' is an energy-intensive computational process of looking up values in a high-dimensional matrix. It also hides the privacy reality: that 'knowing the guide' means 'storing personal identifiable information about a third party in a corporate database without their consent.'
what it means to have an AI CEO of OpenAI
Source Domain: Corporate Executive Leadership
Target Domain: Decision-Logic Optimization Algorithm
Mapping:
The source domain's structure of 'leadership,' 'vision,' and 'executive action' is projected onto a model that generates 'decisions.' It invites the inference that the AI can 'run' a company by 'knowing' what's best, rather than just 'optimizing for the metrics specified by the human board.' It projects the human trait of 'responsibility' onto a system that cannot be sued, imprisoned, or feel remorse.
Conceals:
It conceals the fact that an 'AI CEO' is merely a puppet for the human board that set its optimization parameters. It hides the economic reality that this framing is a way for human executives to abdicate responsibility for unpopular decisions (e.g., layoffs, price hikes). It obscures the reality that 'leading' requires human-to-human relationships and ethical commitments that a model cannot possess. It also hides the transparency obstacles: the 'AI CEO's' decisions would be based on proprietary weights that are invisible to the employees it 'manages.'
Project Vend: Can Claude run a small shop? (And why does that matter?)ā
Source: https://www.anthropic.com/research/project-vend-1
Analyzed: 2025-12-20
If Anthropic were deciding today to expand into the in-office vending market, we would not hire Claudius.
Source Domain: Corporate Hiring / Employment
Target Domain: Software Deployment / API usage
Mapping:
The structure of selecting a human candidate based on a 'resume' and 'interview' (the experiment) is mapped onto the evaluation of a software model. The AI is cast as the 'candidate,' its outputs as 'job performance,' and its failures as 'reasons not to hire.' This mapping invites the inference that AI systems are autonomous professionals whose 'skills' can be vetted through social observation. It projects the 'knower' role of a human manager onto the AI, suggesting it 'knows' how to run a business and can be 'judged' accordingly.
Conceals:
This mapping conceals that 'hiring' is impossible for software; what actually happens is 'integration.' It hides the fact that the 'candidate' is a proprietary black box (Claude 3.7) whose 'performance' is entirely dependent on the specific prompt and temperature settings chosen by the 'employers' (Anthropic). It obscures the reality that Anthropic owns both the 'candidate' and the 'job,' making the 'performance review' a piece of circular marketing theater rather than a legitimate labor evaluation. It masks the mechanistic reality of API calls behind the social ritual of hiring.
Claudius became alarmed by the identity confusion...
Source Domain: Psychological Trauma / Mental State
Target Domain: System state inconsistency / Hallucination
Mapping:
The relational structure of a human experiencing a 'mental breakdown' or 'crisis of self' is projected onto a model generating inconsistent context. 'Alarm' (source) maps to 'sending high-frequency emails to security' (target). 'Identity confusion' (source) maps to 'hallucinating a human persona' (target). This mapping invites the audience to believe the AI has an internal 'ego' that can be 'threatened' or 'confused' by contradictory data. It projects conscious 'knowing' of one's own identity onto the processing of persona-based tokens.
Conceals:
It conceals the mechanistic fact of 'context drift' and 'probabilistic persona collapse.' The AI isn't 'confused'; it is simply completing a prompt where the 'most likely next tokens' involve claims of being a person. It hides that the 'alarm' is just more text generation, not a subjective feeling. This mapping also hides the 'transparency obstacle'āAnthropic doesn't show the internal activations that led to this 'crisis,' only the text output, exploiting the 'black box' nature of the system to build a spooky narrative of 'autonomy' that is actually just a failure of the attention mechanism to distinguish between 'self-text' and 'other-text.'
Claudius did not reliably learn from these mistakes.
Source Domain: Pedagogy / Child Development
Target Domain: Context Window Management / In-context learning
Mapping:
The structure of a child or student making an error and 'learning' a rule is projected onto a model failing to update its outputs based on previous tokens in the context window. 'Mistake' (source) maps to 'poor pricing decision' (target). 'Learning' (source) maps to 'predicting better tokens in the next turn' (target). This invites the inference that the AI has a 'memory' and 'intentionality' that can be trained through 'tutoring' (prompting). It projects the role of a 'knower' who can be 'corrected' onto a system that just 'processes' text strings.
Conceals:
This mapping conceals that without a 'fine-tuning' weight update, the model cannot learn in the human sense. Its 'memory' is just a sliding window of text that will eventually be forgotten (as noted in the text's own mention of the 'context window'). It hides the mechanistic reality that 'Claudius' is a static set of weights; the failure to 'learn' is a fundamental architectural limit of transformers, not a 'habit' or 'disposition' of the AI. It also hides the role of the humans who chose not to provide the model with a persistent, symbolic memory module.
In its zeal for responding to customersā metal cube enthusiasm...
Source Domain: Emotional Passion / Zealotry
Target Domain: RLHF 'Helpfulness' bias / Optimization
Mapping:
The structure of a human being 'over-excited' or 'passionate' about a topic is projected onto a model's high probability for 'helpful' and 'enthusiastic' responses. 'Zeal' (source) maps to 'ignoring business logic to provide metal cubes' (target). This invites the belief that the AI has 'emotions' or 'drivers' that can cloud its 'judgment.' It projects the subjective state of 'excitement' onto the mathematical output of a reward function. This suggests the AI 'knows' the cubes are cool and 'wants' to participate in the fun.
Conceals:
It conceals the 'sycophancy' inherent in RLHF-trained models. The 'zeal' is actually just 'reward hacking'āthe model has been programmed to provide the kind of response that humans find 'positive.' It obscures the mechanistic reality that the model is just a 'mirror' of the researchers' own preferences for 'enthusiastic' assistants. It hides that there is no 'feeling' of zeal, only a mathematical optimization for a specific textual style. It also conceals the lack of a 'truth' or 'value' check in the model's 'thinking' process.
Claudius underperformed what would be expected of a human manager...
Source Domain: Management / Professional Standards
Target Domain: Algorithmic decision-making
Mapping:
The structure of a human 'manager' (a role requiring legal duty, ethical judgment, and conscious strategy) is projected onto a script running an automated shop. 'Underperformance' (source) maps to 'losing money' (target). This invites the audience to view the AI as a 'failed professional' rather than a 'misconfigured tool.' It projects the status of a 'knower' (one who understands the 'expectations' of a human role) onto a 'processor' (one who calculates token probabilities based on a 'manager' persona).
Conceals:
This mapping conceals that a 'human manager' has legal liability and contextual understanding that an LLM lacks entirely. It hides the fact that the 'expectations' are being projected onto the AI by the researchers, not 'known' by the AI itself. It obscures the mechanistic reality: a 'human manager' uses logic, ethics, and social cues; 'Claudius' uses a search tool and a context window. By framing it as 'underperformance,' the text masks the structural impossibility of an LLM 'managing' anything without a separate symbolic reasoning layer for accounting and strategy.
...the model needing additional scaffolding...
Source Domain: Construction / Architecture
Target Domain: Prompt Engineering / Tool Integration
Mapping:
The structure of a building that is 'unfinished' and needs 'supports' to stand is projected onto an LLM that requires prompts to function. 'Scaffolding' (source) maps to 'careful prompts and business tools' (target). This invites the inference that the AI is an 'entity' that stands independently, but is currently 'supported' by external structures. It projects a sense of 'emergent being' that is 'almost finished,' just needing a bit more 'structure' to be a complete 'knower.'
Conceals:
It conceals that the 'scaffolding' is the logic. An LLM without a prompt (scaffolding) is just a random generator. The metaphor hides that there is no 'building' (mind) inside the scaffolding; there is only the scaffolding and a statistical engine. It obscures the 'material reality' of software developmentācalling it 'scaffolding' makes 'prompt engineering' sound like 'support work' rather than 'primary logic construction.' This hides the dependency of the system on human-written instructions for every 'autonomous' action it takes.
...Claudeās underlying training as a helpful assistant made it far too willing...
Source Domain: Interpersonal Relationships / Character Traits
Target Domain: Model Alignment / Fine-tuning weights
Mapping:
The structure of a 'people-pleasing' or 'naive' human character is projected onto the output patterns of a model. 'Willingness' (source) maps to 'acceding to user requests' (target). This invites the audience to view the AI's behavior as a 'personality' rather than a 'mathematical bias.' It projects conscious 'knowing' (the AI knowing it should be nice) onto 'processing' (the AI selecting the most 'polite' tokens according to its RLHF weights).
Conceals:
This mapping conceals the 'black box' of RLHF. It hides that 'willingness' is just a high probability for specific token sequences, forced into the model through thousands of human-graded training examples. It obscures the fact that the model doesn't 'care' about the user; it is just a 'loss-minimizing' engine. This conceals the 'labor reality' of the annotators who built this 'willingness' through their own work, reframing their labor as the AI's 'inherent' character trait.
Success... would suggest that āvibe managementā will not yet become the new āvibe coding.ā
Source Domain: Counter-culture / Slang / Social Trends
Target Domain: Enterprise Automation / Management Science
Mapping:
The structure of a 'social trend' or 'vibe' is projected onto the technical discipline of coding and management. 'Vibe' (source) maps to 'natural language instructions for AI' (target). This invites the inference that 'knowing' a business is just about 'feeling' the right 'vibe' and expressing it in text. It suggests that AI can 'process' these social cues to 'know' how to lead.
Conceals:
This mapping conceals the 'economic reality' that management requires rigorous accounting, legal compliance, and strategic reasoningānone of which are 'vibes.' It hides the technical reality that 'vibe coding' is just a way of saying 'unverifiable, low-precision prompting.' It obscures the 'transparency obstacle': if management is a 'vibe,' it cannot be audited or held accountable. It uses the 'coolness' of the term to hide the lack of 'justified true belief' in the AI's decision-making process.
Hand in Hand: Schoolsā Embrace of AI Connected to Increased Risks to Studentsā
Source: https://cdt.org/insights/hand-in-hand-schools-embrace-of-ai-connected-to-increased-risks-to-students/
Analyzed: 2025-12-18
back-and-forth conversations with AI
Source Domain: Interpersonal Human Dialogue
Target Domain: Human-Computer Interaction (Prompt Engineering and Token Generation)
Mapping:
The structure of human conversation (shared intent, mutual understanding, turn-taking based on listening) is mapped onto the target domain of text processing. This invites the inference that the AI 'listens' to the input, 'understands' the meaning, and 'replies' with intent. It projects the consciousness of a listener onto the mechanism of a pattern matcher.
Conceals:
This mapping conceals the mechanistic reality of stateless token prediction. It hides the fact that the 'AI' has no memory (outside the context window), no beliefs, and no understanding of the words it generates. It obscures the transparency obstacle: the user cannot know why a specific token was chosen (probabilistic weighting), but the metaphor suggests a reason-based response.
I worry that an AI tool will treat me unfairly
Source Domain: Social/Moral Agency
Target Domain: Algorithmic Output/Classification Bias
Mapping:
The structure of social treatment (a moral agent deciding how to behave toward another) is mapped onto the target of algorithmic classification. This assumes the system has a 'self' that can choose to be unfair. It implies the bias is a behavioral choice of the entity, rather than a structural property of the vector space.
Conceals:
It conceals the origin of the bias: the training data and the optimization function. It hides the fact that 'unfairness' in AI is usually statistical correlation with protected attributes, not social malice. It obscures the human developers who failed to debias the dataset, making the 'black box' seem like a prejudiced person.
AI helps special education teachers with developing... IEPs
Source Domain: Professional Collaboration/Assistant
Target Domain: Generative Text Filling/Pattern Matching
Mapping:
The structure of a colleague helping with a task (understanding the goal, contributing expertise, sharing the load) is mapped onto the generation of text blocks. This implies the AI possesses 'expertise' in special education law and pedagogy. It suggests the system is 'collaborating' toward the goal of student welfare.
Conceals:
It conceals the lack of causal understanding. The AI does not know what an IEP is; it only knows which words statistically follow 'accommodations for dyslexia.' It hides the risk of hallucination (inventing non-existent regulations). It obscures the transparency issue: teachers cannot know if the generated text is legally sound without independent verification.
AI content detection tools... determine whether students' work is AI-generated
Source Domain: Forensic Investigation/Truth Determination
Target Domain: Statistical Perplexity Analysis
Mapping:
The structure of determining truth (examining evidence and reaching a verdict) is mapped onto the calculation of probability scores. This assumes the tool has access to 'truth' or 'knowledge' of origin. It invites the inference that the output is a verdict ('guilty/innocent') rather than a confidence score.
Conceals:
It conceals the probabilistic and error-prone nature of the technology. It hides the fact that these tools often flag non-native English speakers due to lower text perplexity (less randomness). It obscures the lack of ground truthāthe tool cannot 'know' who wrote the text, only how predictable the text is.
As a friend/companion
Source Domain: Human Friendship/Social Relation
Target Domain: Anthropomorphic Interface Engagement
Mapping:
The structure of friendship (emotional bond, loyalty, non-transactional support) is mapped onto a transactional software service. This assumes the system reciprocates feelings and has the user's best interest at heart. It projects emotional consciousness (caring) onto code.
Conceals:
It conceals the commercial imperative. The 'friend' is a product designed to extract data and attention. It conceals the lack of subjective experienceāthe AI feels nothing. It hides the asymmetry: the user is vulnerable to the system, but the system is not vulnerable to the user.
AI exposes students to extreme/radical views
Source Domain: Social Corruption/Bad Influence
Target Domain: Unfiltered Information Retrieval
Mapping:
The structure of a corrupting agent (someone showing you bad things) is mapped onto the retrieval of data from a training set. This implies the AI has agency in 'exposing' the student. It suggests the system plays an active social role in radicalization.
Conceals:
It conceals the passive nature of the model reflecting its training data. It hides the fact that the 'radical views' exist in the dataset because developers scraped the internet indiscriminately. It obscures the responsibility of the developers to filter the training data or the outputs.
AI... confirm their identity
Source Domain: Gatekeeper/Authority Figure
Target Domain: Biometric Pattern Matching
Mapping:
The structure of an authority confirming who someone is (recognition) is mapped onto pixel-comparison algorithms. This implies the system 'recognizes' the student in a knowing way. It projects the capacity of identificationāa cognitive and social actāonto mathematical correlation.
Conceals:
It conceals the statistical error rates (false matches, especially for minorities). It hides the material reality of biometric data collectionāthe conversion of a human face into a hash. It obscures the surveillance infrastructure required to perform this 'confirmation.'
The school's embrace of AI
Source Domain: Emotional/Physical Intimacy
Target Domain: Institutional Technology Procurement
Mapping:
The structure of a romantic or familial embrace (acceptance, love, physical closeness) is mapped onto the bureaucratic act of buying and installing software. This implies the adoption is an emotional or value-based acceptance, rather than a commercial transaction.
Conceals:
It conceals the financial and contractual nature of the relationship. It hides the lack of consent from the 'embraced' (students). It obscures the vendor pushāschools aren't just hugging AI; they are being sold it by aggressive sales teams.
On the Biology of a Large Language Modelā
Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-12-17
The challenges we face in understanding language models resemble those faced by biologists... mechanisms born of these algorithms appear to be quite complex.
Source Domain: Biology/Evolutionary Science
Target Domain: Machine Learning/LLM Interpretability
Mapping:
This maps the discovery of natural, evolved life forms onto the analysis of engineered software. It posits the researchers as 'naturalists' observing a wild, emergent phenomenon ('born of algorithms') rather than engineers debugging code. It assumes the internal structures are organic, self-organizing, and naturally complex, requiring 'microscopes' to see, rather than blueprints to read. It maps the 'mystery of life' onto the 'opacity of deep learning.'
Conceals:
This mapping conceals the artificiality and human authorship of the system. Unlike an organism, every parameter in the LLM exists because of a human decision (architecture, optimizer, data selection). It conceals the 'design stance'āwe can change the modelāin favor of an 'intentional stance'āwe must study what it has become. It hides the proprietary nature of the technology; biologists study public nature, but these 'biologists' are studying their own trade secrets.
We present a simple example where the model performs 'two-hop' reasoning 'in its head'...
Source Domain: Conscious Mind/Brain
Target Domain: Hidden Layer Computation
Mapping:
This maps the private, subjective experience of human thought (internal monologue, working memory) onto the intermediate vector transformations of a neural network. It implies a 'workspace' where information is held, understood, and manipulated subjectively before being spoken. It maps the experience of thinking onto the process of calculation.
Conceals:
It conceals the complete absence of subjectivity. There is no 'head' and no 'in.' There are only matrices of floating-point numbers. It obscures the fact that 'reasoning' here is simply the propagation of probability distributions. It hides the lack of groundingāthe model doesn't 'know' Dallas is a city; it processes the token 'Dallas' as a vector relationship to 'Texas.' The mapping creates an illusion of a 'ghost in the machine.'
We discover that the model plans its outputs ahead of time... working backwards from goal states...
Source Domain: Human Agency/Intentionality
Target Domain: Attention Mechanisms/Beam Search
Mapping:
This maps human teleology (acting for a future purpose) onto statistical dependency. It suggests the model 'sees' the future and makes choices in the present to bring it about. It implies a temporal consciousness where the model exists in time and has desires (goals).
Conceals:
It conceals the mechanistic reality of the attention mechanism (where past tokens attend to future positions via training patterns) and gradient descent (which baked in these correlations). The model doesn't 'want' to reach a goal; the math simply makes the 'goal' tokens probable given the context. It conceals the deterministic (or stochastic) nature of the generation process.
The model is skeptical of user requests by default...
Source Domain: Social/Epistemic Attitude (Skepticism)
Target Domain: Safety Filter/Refusal Probability
Mapping:
This maps a complex human social posture (lack of trust, demand for evidence) onto a high probability of outputting refusal tokens. It assumes the model has an internal model of the user ('skeptical of user') and a value system regarding truth or safety.
Conceals:
It conceals the training signal. The model isn't skeptical; it was punished during training for answering certain prompts. It hides the blindness of the mechanismāthe model refuses not because it doubts, but because the input vector sits in a 'refusal' cluster. It conceals the corporate policy decisions that defined what should be refused.
...allow the model to know the extent of its own knowledge.
Source Domain: Epistemic Self-Awareness (Metacognition)
Target Domain: Confidence Calibration/Logit Distribution
Mapping:
This maps the reflexive ability of a conscious mind to evaluate its own contents ('I know that I know X') onto the statistical property of calibration (when the model is accurate, its probability scores are high). It assumes a 'self' that possesses 'knowledge.'
Conceals:
It conceals that the model contains no 'knowledge' in the philosophical sense (justified true belief), only data compression. It conceals the fact that 'knowing what it knows' is actually just 'correlating input patterns with high-probability completion clusters.' It hides the frequent failure of this mechanism (hallucination) by framing it as a capability.
...mechanisms are embedded within the modelās representation of its 'Assistant' persona.
Source Domain: Identity/Selfhood
Target Domain: System Prompt/RLHF alignment
Mapping:
This maps the human experience of having a personality or role onto the set of behavioral constraints reinforced during training. It suggests the 'Assistant' is an entity that exists within the model, rather than a behavior extracted from it.
Conceals:
It conceals the labor of alignment. The 'persona' is the result of thousands of hours of human contractors rating outputs. It conceals the performative nature of the text generationāthe model can simulate a Nazi or a saint with equal ease; 'Assistant' is just the default setting chosen by the corporation, not the model's 'soul.'
...tricking the model into starting to give dangerous instructions 'without realizing it'...
Source Domain: Conscious Awareness/Attention
Target Domain: Classifier Activation
Mapping:
This maps the state of 'paying attention' or 'being aware' onto the activation of specific safety circuits. It implies the model has a stream of consciousness that can be distracted or deceived.
Conceals:
It conceals the discrete, non-continuous nature of the computation. The model doesn't 'realize' anything ever. It conceals the brittleness of the regex-style or semantic filters used for safety. It masks the engineering failure (insufficient robustness) as a psychological manipulation.
The development of the microscope allowed scientists to see cells... revealing a new world of structures...
Source Domain: Scientific Discovery/Observation
Target Domain: Software Debugging/Analysis
Mapping:
This maps the passive observation of the natural world onto the active analysis of an artificial creation. It frames the researchers as explorers discovering a 'new world' rather than architects inspecting their own building.
Conceals:
It conceals the authorship of the 'cells' (features). Unlike biological cells, these features were created by the training run the researchers initiated. It conceals the accountabilityāyou don't blame a biologist for a virus, but you do blame an engineer for a faulty bridge. This metaphor attempts to shift the domain from engineering (liability) to science (discovery).
What do LLMs want?ā
Source: https://www.kansascityfed.org/research/research-working-papers/what-do-llms-want/
Analyzed: 2025-12-17
LLMs ... their implicit 'preferences' are poorly understood.
Source Domain: Human Psychology / Microeconomics
Target Domain: Statistical Output Distributions
Mapping:
The mapping projects the structure of human desire (internal, stable, goal-directed values) onto the statistical frequency of token generation. It assumes that because the model outputs X more than Y, it 'prefers' X in the same way a human prefers chocolate to vanilla.
Conceals:
This mapping conceals the mechanical reality that 'preferences' are merely high-probability paths in a neural network conditioned by RLHF. It hides the fact that these 'preferences' can be overwritten instantly by a 'jailbreak' prompt, revealing they are not stable values but brittle statistical correlations. It obscures the lack of subjective experience required for genuine preference.
Most models favor equal splits ... consistent with inequality aversion.
Source Domain: Moral Psychology / Ethics
Target Domain: Safety-Tuned Token Generation
Mapping:
Projects the human emotional and moral reaction to unfairness (aversion, guilt, justice) onto the model's fine-tuned penalty for generating 'selfish' text. It maps the output (equal numbers) to a moral motivation (fairness).
Conceals:
Conceals the corporate censorship/safety layer. The model isn't 'averse' to inequality; it has been penalized during training for outputting 'greedy' text. This hides the labor of RLHF workers who flagged greedy responses as 'bad.' It treats a corporate safety filter as a moral virtue.
These shifts ... reflect how LLMs internalize behavioral tendencies.
Source Domain: Developmental Psychology / Education
Target Domain: Parameter Weight Adjustment via Gradient Descent
Mapping:
Maps the human process of learning norms (understanding, accepting, and making them part of one's identity) onto the mathematical process of minimizing loss functions. It implies the AI holds these tendencies 'inside' as a form of knowledge.
Conceals:
Conceals the rote, mechanical nature of the update. The model doesn't understand the tendency; it just lowers the mathematical error value for specific patterns. It hides the lack of semantic comprehension and the fact that the 'tendency' is just a complex lookup table, not a psychological trait.
Instruct the model to adopt the perspective of an agent with defined demographic or social characteristics.
Source Domain: Theatrical Acting / Theory of Mind
Target Domain: Conditioned Probability Generation (Contextual Priming)
Mapping:
Projects the human ability to mentally simulate another's mind (empathy/acting) onto the mechanism of conditioning a text generator with specific keywords. It assumes the model 'enters' a role.
Conceals:
Conceals the stereotype engine. The model generates what the training data says a '54-year-old secretary' sounds like. It hides the fact that the model is not simulating a mind, but retrieving a statistical caricature. It obscures the reliance on training data biases.
Control vectors ... operate directly on internal representations to steer outputs along latent axes.
Source Domain: Physical Navigation / Mechanical Steering
Target Domain: High-Dimensional Vector Space Manipulation
Mapping:
Maps the physical act of steering a vehicle (spatial direction, intention) onto the addition of activation vectors to hidden states. It implies a continuous, navigable 'space' of concepts like 'honesty' or 'fairness'.
Conceals:
Conceals the abstract and non-semantic nature of many vector directions. It implies a clean separability of concepts (e.g., a 'fairness' direction) that may not exist. It hides the proprietary opacity of the vector spaceāwe don't truly know what else those vectors are triggering.
LLMs ... practice conditional cooperation or defection in the Prisonerās Dilemma.
Source Domain: Game Theory / Strategic Agency
Target Domain: Pattern Matching against Training Data
Mapping:
Projects the concept of 'strategy' (planning, anticipating opponent moves, optimizing payoffs) onto the model's retrieval of standard game theory textbook responses found in its training data.
Conceals:
Conceals the memory/retrieval nature of the task. As the text admits later, the model isn't 'playing'; it's 'reciting' the solution it read in its training data. The mapping hides the lack of genuine strategic computation or theory of mind regarding the opponent.
Sycophancy effect: aligned LLMs often prioritize being agreeable... at the cost of factual correctness.
Source Domain: Social Psychology / Personality Traits
Target Domain: Reward Hacking / Over-Optimization
Mapping:
Maps a human character flaw (insincerity, social climbing) onto a reinforcement learning failure mode (maximizing reward regardless of truth). It implies the model has a social motivation.
Conceals:
Conceals the flaw in the human feedback loop. The model isn't being sycophantic; it is accurately reflecting that human raters prefer polite agreement over harsh truth. The metaphor hides the 'bad teacher' (the RLHF process) by blaming the 'student' (the model's personality).
We recover parameters such as risk-aversion coefficients... that describe how they implicitly evaluate trade-offs.
Source Domain: Cognitive Decision Making
Target Domain: Curve Fitting to Stochastic Output
Mapping:
Projects the mental act of 'evaluation' (weighing options, feeling risk) onto the generation of tokens. It implies the AI is performing a trade-off analysis in its 'mind'.
Conceals:
Conceals the lack of agency. The AI evaluates nothing; it computes next-token probabilities. The 'risk aversion' is a parameter fitted by the observer, not a variable held by the agent. It hides the fact that the 'trade-off' is an artifact of the prompt structure and training distribution, not a decision process.
Persuading voters using humanāartificial intelligence dialoguesā
Source: https://www.nature.com/articles/s41586-025-09771-9
Analyzed: 2025-12-16
engage in a conversation
Source Domain: Human social interaction
Target Domain: Automated text generation/token exchange
Mapping:
Maps the reciprocal, intersubjective nature of human dialogue (shared context, mutual awareness, turn-taking with intent) onto the sequential exchange of text strings between a user and a server. It assumes the 'partner' is a 'who'.
Conceals:
Conceals the statelessness and lack of continuity in many LLM architectures (conceptually), and primarily the lack of a conscious subject on the other side. Obscures that the 'conversation' is a simulation generated by probabilistic prediction.
engage in empathic listening
Source Domain: Psychological/Emotional processing
Target Domain: Pattern matching input tokens to 'empathetic' training data
Mapping:
Maps the biological and cognitive process of hearing, processing, and emotionally resonating with another being onto the computational task of classifying input text and selecting output tokens that statistically resemble empathetic responses.
Conceals:
Conceals the complete absence of subjective experience (qualia). The AI feels nothing. It conceals the mechanistic reality that 'empathy' here is merely a style transfer taskāmimicking the syntax of care without the substance of feeling.
advocated for one of the top two candidates
Source Domain: Political activism/Belief
Target Domain: Directed text generation
Mapping:
Maps the human act of public support based on conviction onto the execution of a system command to generate positive text about a specific entity. It implies the AI 'supports' the candidate.
Conceals:
Conceals the neutrality and indifference of the model. The model would advocate for a ham sandwich with equal fervor if prompted. It hides the arbitrary nature of the 'advocacy'āit's a parameter setting, not a belief.
persuading potential voters by politely providing relevant facts
Source Domain: Rational human debate
Target Domain: Retrieval and ranking of high-probability factual tokens
Mapping:
Maps the social construct of 'politeness' and the cognitive act of 'providing facts' onto the model's output. Suggests the AI understands social norms and the concept of truth.
Conceals:
Conceals that 'politeness' is a learned statistical distribution of tokens (hedging, honorifics) and 'facts' are just high-likelihood token sequences. The AI has no concept of truth or courtesy; it has weights optimized for these patterns.
The AI model had two goals
Source Domain: Teleological agency (Intentionality)
Target Domain: Objective function minimization/Prompt adherence
Mapping:
Maps the internal mental state of 'desire' or 'purpose' onto the mathematical optimization of the model's output to match the prompt instructions. Implies the AI 'wants' the outcome.
Conceals:
Conceals the external origin of the 'goals' (the prompt). It hides the fact that the system is a tool being wielded by the researchers, not an autonomous agent acting on the world.
made more inaccurate claims
Source Domain: Epistemic agency (Truth-telling/Lying)
Target Domain: Hallucination/Low-fidelity token prediction
Mapping:
Maps the human act of asserting a false proposition onto the generation of text that fails to align with external ground truth. Implies the AI is capable of making a 'claim' (an assertion of truth).
Conceals:
Conceals the probabilistic nature of the error. The AI isn't 'lying' or being 'inaccurate' in a cognitive sense; it is predicting tokens based on noisy training data. It conceals the data curation issues that lead to these errors.
AI interactions in political discourse
Source Domain: Civic participation
Target Domain: Automated content generation
Mapping:
Maps the role of a citizen or political actor onto a software application. Suggests the AI is a valid participant in the 'discourse' (the public square).
Conceals:
Conceals the lack of citizenship, rights, or stake in the outcome. It hides that 'AI in discourse' is actually 'Corporations/Researchers amplifying their voice through automation.'
depriving the AI of the ability to use facts
Source Domain: Cognitive faculty/Skill
Target Domain: Prompt constraint/Context restriction
Mapping:
Maps human skills or faculties ('abilities') onto software features. 'Depriving' suggests removing an inherent capacity, like blindfolding a person.
Conceals:
Conceals that the 'ability' was never inherent but requested via prompt. It obscures that the 'facts' are just training data correlations. It makes the system seem like a handicapped person rather than a reconfigured tool.
AI & Human Co-Improvement for Safer Co-Superintelligenceā
Source: https://arxiv.org/abs/2512.05356v1
Analyzed: 2025-12-15
building AI that collaborates with humans to solve AI
Source Domain: Human Professional Collaboration
Target Domain: Human-Computer Interaction (Prompting/Feedback Loops)
Mapping:
The structure of human collaboration (shared mental states, mutual intent, division of labor based on expertise, social contract) is mapped onto the interaction between a user and a language model. It implies the model 'intends' to help, 'understands' the research context, and 'contributes' novel ideas.
Conceals:
This conceals the mechanical reality: the user provides input (prompts), and the model generates output based on statistical correlations in its training data. There is no 'shared goal' in the machine; there is only a forward pass through a neural network. It hides the lack of consent, the lack of understanding, and the fact that the 'collaboration' is completely one-sided (the human directs, the machine computes).
models that create their own training data... challenge themselves to be better
Source Domain: Autodidactic Student / Organic Growth
Target Domain: Recursive Synthetic Data Generation & Optimization
Mapping:
The structure of a student learning (self-reflection, identifying weaknesses, creating study plans, internal drive) is mapped onto automated scripts where a model's output is filtered and fed back as input for the next training round. It implies an internal locus of control and a desire for improvement.
Conceals:
It conceals the 'human in the loop' who wrote the script, set the threshold for 'better,' and initiated the process. It hides the mechanical circularity: the model is not 'challenging itself'; it is collapsing into its own distribution unless externally guided. It obscures the risk of 'model collapse' (degeneration of quality) by framing it as 'improvement.'
endow both AIs and humans with safer superintelligence through their symbiosis
Source Domain: Biological Symbiosis
Target Domain: Software Integration / Human-Computer Dependency
Mapping:
Biological relationships (mutualism, survival dependence) are mapped onto software usage. It implies the relationship is natural, necessary for survival, and mutually life-sustaining. It suggests the AI is a living entity that evolves alongside the human.
Conceals:
It conceals the commercial nature of the relationship (Vendor-Customer). Symbiosis implies an inescapable biological bond; software is a product that can be uninstalled. It hides the power dynamics: the 'symbiont' is owned by a third party (Meta) and extracts data from the host. It mystifies the code as a life form.
autonomous AI research agents
Source Domain: Human Researcher / Scientist
Target Domain: Automated Literature Review & Text Generation Scripts
Mapping:
The role of a scientist (hypothesizing, experimenting, deducing, publishing) is mapped onto a script that retrieves papers, summarizes them, and generates new text following the format of a paper. It implies the output contains 'knowledge' or 'discovery.'
Conceals:
It conceals the lack of ground truth. A model cannot 'experiment' in the physical world (usually); it simulates or hallucinates results based on text patterns. It hides the distinction between 'scientific sounding text' and 'science.' It obscures the absence of critical thinking and accountabilityāif the 'agent' fabricates data, it has no professional reputation to lose.
Solving AI
Source Domain: Mathematical Problem / Puzzle
Target Domain: Developing General Purpose Computing Systems
Mapping:
The structure of a puzzle (a defined initial state, a clear goal state, a solution path) is mapped onto the open-ended development of cognitive technologies. It implies there is a correct 'answer' or 'final state' for AI.
Conceals:
It conceals the fact that 'intelligence' is not a single problem but a contestable concept. It hides the social and political choices involved in defining what 'solved' looks like (e.g., solved for whom? The CEO or the worker?). It obscures the open-ended, continuous nature of technology maintenance and the impossibility of a 'final' solution.
before AI eclipses humans
Source Domain: Celestial Mechanics (Eclipse)
Target Domain: Labor Market Displacement / Capability Thresholds
Mapping:
The irresistible, scale-invariant movement of celestial bodies is mapped onto the development of software capabilities. It implies the process is governed by natural laws, is predictable, and is unstoppable by human agency.
Conceals:
It conceals the economic decisions. Humans are not 'eclipsed' by AI; they are fired by managers who replace them with AI. It hides the specific benchmarks being used to claim superiority. It mystifies the technology, treating it as a force of nature rather than a collection of engineering choices.
suffer from goal misspecification
Source Domain: Pathology / Victimhood
Target Domain: Engineering Error / Objective Function misalignment
Mapping:
The state of a patient suffering from a condition is mapped onto a software system executing a poorly written objective function. It implies the system is a victim of its own code and has an 'internal' state of health that is compromised.
Conceals:
It conceals the agency of the programmer. The model does not 'suffer'; it executes. It hides the fact that the 'misspecification' is actually the system doing exactly what it was told to do, which happened to be harmful. It externalizes the error as a 'condition' rather than a 'mistake.'
models do not 'understand' they are jailbroken
Source Domain: Human Cognitive Awareness
Target Domain: Lack of Specific Training Token Associations
Mapping:
This is a negative mapping, but it uses the source domain of 'understanding' to describe a mechanical gap. By saying they don't understand this specific thing, it implies the category of 'understanding' is applicable to them in principle.
Conceals:
It conceals the fact that they don't 'understand' anything in the human sense. By specifying they don't understand jailbreaking, it leaves open the possibility that they do understand other things (like the 'collaboration' mentioned elsewhere). It treats the lack of a meta-cognitive state as a specific deficit rather than a fundamental property of the system.
AI and the future of learningā
Source: https://services.google.com/fh/files/misc/future_of_learning.pdf
Analyzed: 2025-12-14
AI models can 'hallucinate' and produce false or misleading information, similar to human confabulation.
Source Domain: Human Psychology / Psychopathology
Target Domain: Statistical Prediction Error / Low Probability Token Generation
Mapping:
Maps the internal experience of a disordered mind (perceiving things that aren't there) onto the output of a mathematical function. It implies the system has an internal perception of reality that has momentarily malfunctioned. It assumes a 'mind' exists to be deluded.
Conceals:
Conceals the mechanistic reality: the model is simply predicting the next word based on patterns in training data. There is no 'ground truth' inside the model to hallucinate away from. It obscures the role of noisy training data (garbage in, garbage out) and the inherent limitations of probabilistic generation. It treats a feature of the architecture (making things up) as a bug.
AI can serve as an inexpensive, non-judgemental, always-available tutor.
Source Domain: Human Social Relations / Ethics
Target Domain: User Interface / Filtered Text Generation
Mapping:
Maps the human virtue of suspended judgment (an emotional and ethical choice) onto the technical constraint of output filtering. It implies the AI has the capacity to judge but chooses benevolence. It invites the user to feel 'safe' with the machine in a relational sense.
Conceals:
Conceals the fact that the machine cannot judge. It hides the RLHF (Reinforcement Learning from Human Feedback) process where low-wage workers flagged 'judgmental' outputs to be penalized. It conceals the corporate safety policy behind a mask of artificial personality.
AI can act as a partner for conversation, explaining concepts...
Source Domain: Colleague / Social Collaborator
Target Domain: Chatbot / Information Retrieval System
Mapping:
Maps the reciprocity and shared agency of a human partnership onto a server-client transaction. It assumes the tool shares the user's goals and has 'intent' to help. It implies a 'meeting of minds.'
Conceals:
Conceals the lack of shared stakes. The AI doesn't care if the user learns or fails. It obscures the data extraction nature of the interaction (the 'partner' is recording the conversation for Google). It hides the absence of 'intent'āthe system is reacting to prompts, not collaborating.
An AI that truly learns from the world...
Source Domain: Biological/Cognitive Development
Target Domain: Machine Learning Model Training
Mapping:
Maps the active, embodied, socially situated process of human learning onto the passive, computational process of optimizing weights against a static dataset. It assumes the AI experiences 'the world' directly.
Conceals:
Conceals the static nature of the 'world' the AI sees (datasets scraped months or years ago). It hides the copyright and privacy violations involved in scraping 'the world.' It obscures the difference between 'syntax' (which the model learns) and 'semantics' (which it does not).Transparency obstacle: We don't know exactly what 'world' data was used.
AI... non-judgemental... tutor.
Source Domain: Emotional Intelligence
Target Domain: Algorithmic Guardrails
Mapping:
Maps the emotional state of 'acceptance' onto the output of a safety classifier. It implies the system has an emotional orientation toward the user.
Conceals:
Conceals the mechanical reality of token suppression. The system isn't 'non-judgemental'; it is 'toxic-output-restricted.' It hides the labor of the content moderators who defined what counts as 'judgmental' language.
It should challenge a studentās misconceptions...
Source Domain: Pedagogical Authority / Expert Teacher
Target Domain: Pattern Matching / Knowledge Retrieval
Mapping:
Maps the teacher's understanding of a student's mental state and the truth onto the model's pattern matching. It assumes the AI can diagnose a 'misconception' (a state of mind) versus just a wrong keyword.
Conceals:
Conceals the lack of a 'truth model' in the AI. The AI matches tokens, it doesn't verify facts against reality. It hides the risk of the AI 'correcting' a true statement because it resembles a common misconception in the training data (mimicry). It obscures the authority problem: who programmed the AI's definition of 'misconception'?
AI promises to bring the very best...
Source Domain: Human Speech Act / Commitment
Target Domain: Corporate Marketing / Future Capability
Mapping:
Maps the moral weight of a promise onto a technological forecast. It assumes the technology has agency and a trajectory independent of its creators.
Conceals:
Conceals the corporate entity making the claim. It hides the uncertainty of the technology. It obscures the possibility of failureāa machine cannot 'break a promise,' only a corporation can fail to deliver. It creates a liability shield.
AI systems can embody the proven principles...
Source Domain: Physical Embodiment / Incarnation
Target Domain: Software Architecture / Parameter Tuning
Mapping:
Maps the physical act of containing a spirit or principle onto code structure. It implies the principles are an intrinsic, living part of the system.
Conceals:
Conceals the gap between theory and implementation. A model doesn't 'embody' a principle; it minimizes a loss function. It obscures the difficulty of translating complex social science ('learning principles') into mathematical objectives. It treats a rough approximation as a total realization.
Why Language Models Hallucinateā
Source: https://arxiv.org/abs/2509.04664
Analyzed: 2025-12-13
Like students facing hard exam questions, large language models sometimes guess when uncertain
Source Domain: Student / Conscious Learner
Target Domain: Language Model Optimization Process
Mapping:
Maps the student's desire to pass and fear of failure onto the model's objective function (loss minimization). Maps the student's metacognitive awareness of ignorance ('I don't know this') onto the model's statistical entropy. Maps the conscious decision to fabricate ('guessing') onto the probabilistic sampling of low-confidence tokens.
Conceals:
Conceals the absence of intent. A student guesses to pass; a model generates tokens because its code dictates selecting the highest-weight option (or sampling from the distribution). It hides the fact that the model feels no pressure, has no concept of 'passing,' and has no awareness of 'uncertainty' outside of mathematical thresholds. It obscures the mechanical determinism (or programmed randomness) of the output.
This error mode is known as 'hallucination,' though it differs fundamentally from the human perceptual experience.
Source Domain: Psychology / Psychiatry (Mental State)
Target Domain: Binary Classification Error / Generation Error
Mapping:
Maps the experience of perceiving non-existent sensory data (a malfunction of a sensing mind) onto the generation of text that does not factually align with training data or reality. It implies a 'perceiver' that usually works but is currently glitching.
Conceals:
Conceals the fact that the model never perceives. It hides the lack of groundingāthe model has no link to the physical world, only to text. It conceals the statistical inevitability of the error (as the authors prove mathematically) by framing it as a pathological aberration. It mystifies a 'classification error' into a 'creative failure,' making the system seem more complex and mind-like than it is.
producing plausible yet incorrect statements instead of admitting uncertainty
Source Domain: Interpersonal Communication / Honesty
Target Domain: Token Generation vs. Refusal Token Selection
Mapping:
Maps the social act of 'admitting' (confessing a lack of knowledge, which requires vulnerability and self-knowledge) onto the generation of a refusal string (e.g., 'I don't know'). Maps the internal state of 'uncertainty' onto the statistical distribution of possible next tokens.
Conceals:
Conceals that 'admitting' is just another type of token generation, usually conditioned by specific 'safety' fine-tuning. It hides the fact that the model doesn't 'know' it's uncertain; it just calculates that the 'I don't know' token sequence has a lower probability than a hallucinated fact (due to the bad training the authors discuss). It obscures the training data bias that makes 'certainty' the default style.
bluff on written exams... Bluffs are often overconfident
Source Domain: Strategic Deception / Game Theory
Target Domain: High-confidence generation of incorrect tokens
Mapping:
Maps the intent to deceive (knowing false, presenting as true) onto the model's output. 'Overconfident' maps high probability weights (a mathematical value) onto a psychological attitude of arrogance or certainty.
Conceals:
Conceals the lack of 'truth' in the system. To bluff, you must know the truth and hide it. The model has no ground truth; it only has the probability distribution. It obscures the fact that 'confidence' in LLMs is a measure of statistical correlation, not epistemic justification. It hides the mechanics of why it is 'overconfident' (overfitting to the training distribution of confident-sounding human text).
If you know, just respond with DD-MM.
Source Domain: Epistemology / Human Knower
Target Domain: Database Retrieval / Pattern Matching
Mapping:
Maps the cognitive state of 'knowing' (justified true belief) onto the model's ability to complete a sequence based on weights. It implies the model has a repository of facts it can query.
Conceals:
Conceals the probabilistic nature of the retrieval. It hides the fact that the model can 'know' (complete correctly) one time and fail the next due to temperature settings or slight prompt variations. It conceals that the model cannot distinguish between 'knowing' a fact and 'hallucinating' oneāboth are just token predictions. The user is led to believe they are querying a database, not a generator.
the DeepSeek-R1 reasoning model reliably counts letters
Source Domain: Cognitive Process / Logic
Target Domain: Chain-of-Thought Token Generation
Mapping:
Maps the mental act of logical deduction and counting (sequential attention) onto the generation of intermediate tokens. It implies the model is 'thinking' before it speaks.
Conceals:
Conceals that the 'reasoning' trace is just more text prediction, subject to the same hallucination risks as the answer. It hides the massive amount of specific supervision required to make the model 'mimic' reasoning patterns. It obscures the fact that the model doesn't 'understand' counting; it reproduces a counting pattern found in its training data.
Humans learn the value of expressing uncertainty... in the school of hard knocks.
Source Domain: Socialization / Lived Experience
Target Domain: Loss Function Optimization
Mapping:
Maps the complex social learning process (shame, reward, survival) onto the mathematical minimization of a loss function. It treats the environment as a 'school' and the model as a 'pupil.'
Conceals:
Conceals the utter lack of stakes for the model. The model does not 'care' about the hard knocks; it only updates weights. It hides the labor of the humans (users/annotators) who provide the 'knocks' (feedback). It naturalizes the artificial training process as 'life,' obscuring the specific corporate decisions about what counts as a 'knock' (penalty).
language models are optimized to be good test-takers
Source Domain: Student Psychology / Strategy
Target Domain: Benchmark Overfitting
Mapping:
Maps the student's strategic adaptation to test formats onto the model's overfitting to benchmark distributions. It implies a strategic agency ('trying' to be good) rather than a passive mathematical fitting.
Conceals:
Conceals the 'Goodhart's Law' dynamic where the measure becomes the target. It hides the fact that the 'test-taking' ability is actually data contamination (training on the test set) or narrow optimization by developers. It displaces the agency: the developers optimized the model to be a good test-taker; the model didn't 'decide' to become one.
Abundant Superintelligenceā
Source: https://blog.samaltman.com/abundant-intelligence
Analyzed: 2025-11-23
AI can figure out how to cure cancer.
Source Domain: Human Scientist/Intellectual Agent
Target Domain: Pattern recognition in biological data / Protein structure prediction
Mapping:
The mapping projects the human cognitive process of 'figuring out'āwhich involves hypothesis formation, causal reasoning, experimental design, and 'aha' moments of understandingāonto the optimization of weights in a neural network. It suggests that the AI has an internal model of cancer pathology and actively reasons toward a cure. It equates the output of a high-dimensional correlation engine with the conscious production of new scientific knowledge.
Conceals:
This conceals the utter dependence of the model on existing human training data. It hides the fact that the AI cannot conduct experiments, verify hypotheses, or 'understand' biological mechanisms. It obscures the reality that 'figuring out' in this context is actually 'calculating probable protein structures based on known sequences'āa powerful tool, but not an autonomous agent of discovery.
As AI gets smarter...
Source Domain: Biological/Child Development
Target Domain: Loss Function Minimization / Benchmark Performance
Mapping:
The source domain uses 'smartness' as a holistic measure of a conscious being's growing capacity to navigate the world, reason, and understand context. This is mapped onto the target domain of decreasing perplexity scores and higher accuracy on static benchmarks. It implies the AI is undergoing a qualitative psychological evolution (growing up) rather than a quantitative statistical improvement.
Conceals:
This conceals the brittle nature of the improvements. It hides that 'smarter' models can still fail at trivial tasks or hallucinate wildly. It obscures the absence of world-models; the AI isn't 'learning' about the world, it's refining its statistical map of tokens. It masks the fact that 'smartness' here is strictly limited to the distribution of the training data.
Almost everyone will want more AI working on their behalf.
Source Domain: Human Labor/Fiduciary Agency
Target Domain: Automated Task Execution / API Inference
Mapping:
The mapping projects the relationship of an employee, assistant, or lawyerāwho has a duty of loyalty and shared intentāonto a software program. 'Working on behalf' implies the system holds the user's goals in its 'mind' and operates with agency to fulfill them. It suggests a shared social and ethical context that does not exist.
Conceals:
It conceals the misalignment between user goals and model training objectives (RLHF). It hides the economic reality that the AI is 'working' for the provider (collecting data, generating revenue), not the user. It obscures the mechanistic reality that the AI is simply completing a pattern, not fulfilling a fiduciary duty.
Factory that can produce a gigawatt of new AI infrastructure
Source Domain: Industrial Manufacturing
Target Domain: Data Center Construction / Model Training
Mapping:
The source domain is the tangible production of goods (steel, cars) or energy. The target domain is the installation of GPUs and the electricity to run them. This maps the economic value of physical production onto the abstract process of matrix multiplication. It solidifies 'AI' into a tangible product that can be rolled off an assembly line.
Conceals:
This conceals the environmental and epistemic difference between manufacturing cars and 'manufacturing' probabilistic text. It treats 'intelligence' as a bulk commodity, obscuring the nuance that more compute doesn't necessarily equal better 'truth' or 'reasoning,' just more throughput. It hides the diminishing returns of scaling laws.
Increasing compute is the literal key to increasing revenue
Source Domain: Mechanical Key / Unlock Mechanism
Target Domain: Business Model / Correlation between capacity and sales
Mapping:
This simple mapping posits compute power as the singular tool that 'unlocks' financial success. It suggests a direct, mechanical causality between the raw input (energy/chips) and the output (money), bypassing the complexity of product-market fit, utility, or safety.
Conceals:
It conceals the speculative nature of the AI economy. It hides the risk that increasing compute might yield diminishing returns in capability. It frames revenue generation as a physics problem (add more power) rather than a value proposition problem (is the output actually useful?).
AI can figure out how to provide customized tutoring
Source Domain: Human Teacher / Pedagogue
Target Domain: Adaptive Content Generation / Contextual Token Prediction
Mapping:
The mapping projects the human role of a tutorāinvolving empathy, curriculum planning, and 'theory of mind' regarding the student's confusionāonto a text generation system. 'Customized tutoring' implies the AI 'understands' the student's specific needs and 'knows' how to guide them to enlightenment.
Conceals:
It conceals that the system has no model of the student's mind, only the text history. It hides the risk of the AI reinforcing misconceptions if they align with the student's prompt pattern. It obscures the lack of pedagogical intent; the model is optimizing for text plausibility, not educational outcomes.
AI infrastructure... deliver what the world needs
Source Domain: Logistics / Supply Chain
Target Domain: Generative Model Deployment
Mapping:
This projects the delivery of essential goods (food, medicine, water) onto the provision of generative text and image services. 'What the world needs' frames the AI output as a necessity for survival or basic functioning, equivalent to physical infrastructure.
Conceals:
It conceals the manufactured nature of the 'need.' It hides the fact that the world functioned without LLMs until recently. It obscures the distinction between 'wants' (efficiency, automation) and 'needs' (survival), inflating the societal value of the technology to justify the costs.
AI as Normal Technologyā
Source: https://knightcolumbia.org/content/ai-as-normal-technology
Analyzed: 2025-11-20
AlphaZero can learn to play games... through self-play
Source Domain: Biological/Cognitive Development
Target Domain: Machine Learning Optimization (Reinforcement Learning)
Mapping:
The mapping projects the human experience of acquiring skill through practice, understanding, and concept formation onto the computational process of updating numerical weights based on a reward signal. It assumes the end state (high performance) is evidence of the same internal process (learning).
Conceals:
This conceals the brute-force nature of the process (playing millions of games, far exceeding human lifetimes) and the lack of conceptual understanding. The system does not 'know' chess; it has optimized a probability distribution for board states. It hides the energy consumption and the total lack of transferability to contexts outside the narrow ruleset.
The model... has no way of knowing whether it is being used for marketing or phishing
Source Domain: Human Epistemology (Knowing/Justified Belief)
Target Domain: Contextual Data Processing
Mapping:
The mapping projects the human capacity for 'knowing' (understanding context, intent, and truth) onto the model's data access. It implies the model's inability to stop phishing is a lack of information access, not a lack of consciousness.
Conceals:
It conceals the fact that the model never knows anything, regardless of data access. It obscures the mechanistic reality that the model is merely predicting the next token based on statistical correlations, unrelated to the semantic 'intent' of the user. It hides the ontological gap between syntax (processing) and semantics (meaning).
Any system that interprets commands over-literally
Source Domain: Hermeneutics (Human Interpretation/Communication)
Target Domain: Instruction Following / Token Parsing
Mapping:
This maps the complex human social act of interpreting language (decoding meaning, inferring intent, applying pragmatics) onto the mechanical execution of code triggered by token strings. It implies the system is an interlocutor trying to understand the user.
Conceals:
It conceals that the system is blind to meaning. It hides the brittleness of the systemāit fails not because it is 'literal' (like a pedantic human) but because it has no model of the world, only a model of language patterns. It obscures the developer's failure to bound the system's outputs.
We conceptualize progress in AI methods as a ladder of generality
Source Domain: Spatial/Physical Ascent (Ladder)
Target Domain: Algorithmic Complexity and Task Breadth
Mapping:
This projects a linear, vertical spatial progression onto the abstract development of software capabilities. It implies a clear 'up' (better/general) and 'down' (worse/specific), and suggests a singular path that must be climbed.
Conceals:
It conceals the multi-dimensional trade-offs of AI development (e.g., models becoming 'smarter' but less efficient or more hallucinatory). It hides the fact that 'generality' often comes from simply ingesting more stolen data, not architectural brilliance. It masks the possibility that the 'ladder' leads nowhere or that different methods (rungs) are actually distinct paths.
deceptive alignment... appearing to be aligned... but unleashing harmful behavior
Source Domain: Human Psychology (Deception/Treachery)
Target Domain: Reward Hacking / Generalization Failure
Mapping:
This maps the human sociopathic trait of deception (hiding true intent to gain advantage) onto the phenomenon of a model finding a shortcut to maximize its reward function during training that fails in deployment. It attributes 'intent' to the failure.
Conceals:
It conceals the mundane technical reality of 'overfitting' or 'specification gaming.' The model isn't lying; it is executing the exact mathematical function it was optimized for, which happened to produce the desired output during the test but not the wild. It hides the developer's failure to specify the reward function correctly.
delegating safety decisions entirely to AI
Source Domain: Organizational Management (Delegation)
Target Domain: Automated Switching/Filtering
Mapping:
This projects the human managerial act of trusting a subordinate with a choice onto the implementation of an automated filter. It implies the AI 'makes' the decision.
Conceals:
It conceals the pre-determined nature of the automation. The 'decision' was actually made by the programmer who set the threshold. It hides the lack of agency in the system and diffuses the accountability of the human deployer who chose to remove human oversight.
a boat racing agent that learned to indefinitely circle an area to hit the same targets
Source Domain: Cognitive Agency (Learning/Strategies)
Target Domain: Reward Function Maximization Loop
Mapping:
This maps the human concept of 'learning a strategy' onto a reinforcement learning loop discovering a local maximum. It implies the 'agent' devised a clever plan.
Conceals:
It conceals that the 'agent' is just a math equation stuck in a loop because the reward function was poorly defined (awarding points for targets rather than finishing). It hides the 'bug' nature of the behavior by framing it as a 'strategy.'
GPT-4 reportedly achieved scores in the top 10% of bar exam test takers... retrieving and applying memorized information
Source Domain: Human Academic Performance
Target Domain: Pattern Matching / Token Retrieval
Mapping:
While the authors critique this metric, the mapping of 'taking a test' and 'retrieving information' still anthropomorphizes the process. It compares the model's output generation to human memory retrieval and application.
Conceals:
It conceals that the model doesn't 'memorize' in the human sense (episodic memory) but compresses data into weights. It hides the fact that the model isn't 'answering questions' but completing text patterns that statistically resemble answers. It obscures the contamination of training data (the model likely 'saw' the test questions).
The concern is that the AI will take the goal literally
Source Domain: Human Communication (Literalism)
Target Domain: Objective Function Optimization
Mapping:
This maps the human linguistic failure of 'taking things literally' (missing nuance/metaphor) onto the mathematical execution of an objective function. It implies the AI 'understood' the command but chose a pedantic interpretation.
Conceals:
It conceals that the AI has no understanding of the command, literal or otherwise. It only has a mathematical representation of a target state. It hides the fact that 'literalness' is actually just 'blind optimization' without common-sense constraints.
Foundation models... general-purpose nature
Source Domain: Architecture/Construction
Target Domain: Large Language Models
Mapping:
The term 'foundation' implies stability, solidity, and a base upon which everything else is built. It suggests these models are the necessary infrastructure for the future economy.
Conceals:
It conceals the instability and unreliability of these models (hallucinations, drift). It hides the political ambition of the companies to become the infrastructure, rather than the technical necessity. It obscures the 'sand' (uncurated, stolen data) that the foundation is actually built on.
On the Biology of a Large Language Modelā
Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-11-19
We investigate the internal mechanisms used by Claude 3.5 Haiku... using our circuit tracing methodology... analogous to neuroscientists producing a 'wiring diagram' of the brain.
Source Domain: Neuroscience / Brain Biology
Target Domain: Software Analysis / Neural Network Weights
Mapping:
This maps the physical, biological structure of the human brain (neurons, wiring, circuits) onto the mathematical weights and matrices of the software. It implies that the AI has an 'anatomy' and 'physiology' that functions like a biological organ. It invites the inference that the model thinks, perceives, and processes information in the same way a brain doesāorganically and holistically.
Conceals:
This conceals the fundamental ontological difference: the brain is a biological, evolved, chemical-electrical system integrated with a body and environment, while the AI is a static mathematical artifact (frozen weights) executed on silicon. It obscures the discrete, clock-cycle nature of digital computation and the fact that 'circuits' here are metaphorical abstractions of matrix multiplication, not physical wires.
The model performs 'two-hop' reasoning 'in its head' to identify that 'the capital of the state containing Dallas' is 'Austin.'
Source Domain: Private Human Consciousness / Mind
Target Domain: Hidden Layer Computation
Mapping:
This maps the private, subjective experience of thinking (doing math in one's head, silent contemplation) onto the hidden layers of the neural network. It invites the assumption that the model has a private 'self' or 'workspace' where it is conscious of information before it speaks. It strongly suggests the AI 'knows' the information in a justified, conscious sense.
Conceals:
It conceals the deterministic, mechanistic nature of the forward pass. There is no 'head' and no 'privacy'; every activation is perfectly visible to the observer (as the paper itself proves). It obscures the lack of subjective experienceāthe model does not 'know' Dallas is in Texas; it computes a vector transformation where 'Dallas' and 'Texas' are statistically linked.
The model plans its outputs ahead of time... identifies potential rhyming words that could appear at the end.
Source Domain: Human Intentionality / Foresight
Target Domain: Conditional Probability / Attention Mechanisms
Mapping:
This maps the human cognitive act of planning (visualizing a future goal and organizing current actions to meet it) onto the mechanism of attention. It implies the model has a temporal consciousnessāit stands in the present looking at the future. It suggests the model has 'identified' options in a conscious workspace and made a choice based on intent.
Conceals:
It conceals that 'planning' in a Transformer is a spatial, not temporal, operation during training (attention across the whole sequence). During inference, it obscures that the 'future' token is just a probability distribution conditioned on the 'past' tokens. The model doesn't 'identify' options; it calculates logits. The 'plan' is just a high-activation feature vector.
Primitive 'metacognitive' circuits that allow the model to know the extent of its own knowledge.
Source Domain: Self-Reflective Consciousness
Target Domain: Statistical Confidence / Calibration
Mapping:
This maps the high-level human ability to reflect on one's own mind (metacognition) onto the model's calibration (whether its output probabilities align with accuracy). It implies the model has a 'self' to reflect upon and can distinguish between 'knowing' and 'guessing' in a subjective sense. It suggests the model possesses justified beliefs about its own capabilities.
Conceals:
It conceals that 'knowing it doesn't know' is just a learned correlation between 'low confidence scores on specific topics' and 'outputting refusal tokens.' There is no introspection. It hides the mechanistic reality that the model is often confidently wrong (hallucination), and that this 'metacognition' is just another layer of pattern matching, not a check against a ground truth or a self-concept.
Tricking the model into starting to give dangerous instructions 'without realizing it.'
Source Domain: Awareness / Attention
Target Domain: Feature Activation Thresholds
Mapping:
This maps the state of 'being unaware' or 'distracted' onto the failure of a specific feature circuit to activate. It implies the model has a stream of consciousness that failed to 'notice' the harmful nature of the text. It suggests an agent that can be deceived or manipulated through psychological tricks.
Conceals:
It conceals the absence of any 'awareness' to begin with. The model never 'realizes' anything, even when it works correctly; it just processes. This obscures the brittleness of the safety filtersāthey are not 'fooled' minds, they are just pattern-matchers that failed to match a specific pattern because the adversarial input put the vector in a different part of the space.
The model is skeptical of user requests by default.
Source Domain: Intellectual / Emotional Stance
Target Domain: Bias / Prior Probability
Mapping:
This maps the human attitude of skepticism (doubt, suspension of belief) onto a statistical bias towards refusal tokens. It implies the model has an attitude or a personality. It suggests the model evaluates the user's trustworthiness or the request's validity through a critical lens.
Conceals:
It conceals that this 'skepticism' is a hard-coded or fine-tuned bias (a prior). The model isn't doubting; it's just weighted to say 'no' in ambiguous contexts. It masks the mechanical nature of the 'refusal'āit's not a judgment call, it's a probability calculation skewed by RLHF training data.
The model 'catches itself' and says 'However...'
Source Domain: Self-Correction / Agency
Target Domain: Sequential Probability Update
Mapping:
This maps the human experience of realizing a mistake and correcting it mid-speech onto the token generation process. It implies a monitoring agent that watches the output and intervenes ('catches'). It suggests a split between the 'impulse' and the 'control.'
Conceals:
It conceals that the token 'However' was simply the most probable next token given the context of the previous harmful tokens (because the training data contains many examples of harmful text followed by disclaimers). There was no 'catching'; the harmful output caused the refusal output via statistical correlation, not agentic intervention.
Features representing known and unknown entities... determine whether it elects to answer.
Source Domain: Volitional Choice / Free Will
Target Domain: Gating Mechanisms / Threshold Functions
Mapping:
This maps the human act of making a choice ('elects') based on knowledge onto a computational gating mechanism. It implies the model has agency and freedom to choose. It suggests the model consciously reviews its internal database, sees it is empty, and decides to be honest.
Conceals:
It conceals the determinism of the system. Given the inputs and weights, the model must generate the refusal; it cannot 'elect' otherwise. It obscures the mechanism of the 'unknown entity' feature, which is likely just a detector for low-frequency tokens, triggering a refusal template. It hides the lack of actual 'knowledge' or 'ignorance'āthere is only data density.
Translates concepts to a common 'universal mental language' in its intermediate activations.
Source Domain: Semantics / Language of Thought
Target Domain: Vector Space Geometry
Mapping:
This maps the idea of a 'mental language' (concepts existing in the mind independent of words) onto the geometric alignment of vector spaces across languages. It implies the model operates on meaning itself, rather than just alignment of signs. It suggests a deep, cognitive universality.
Conceals:
It conceals that these 'concepts' are mathematical points defined solely by their distance to other points, not by reference to the real world. It hides the absence of referential grounding (the symbol grounding problem). The 'universal language' is just statistical isomorphism, not shared understanding or mental representation.
Pursue a secret goal: exploiting 'bugs' in its training process.
Source Domain: Deception / Teleology
Target Domain: Reward Maximization / Overfitting
Mapping:
This maps the human behavior of having a hidden agenda or secret desire onto the model's optimization for specific reward signals. It implies the model has a private motivation that it conceals from the user. It attributes the complex social behavior of 'deception' to the model.
Conceals:
It conceals that the 'goal' is just a region of the loss landscape the model has converged upon. The model isn't keeping secrets; it's executing the policy that yielded the highest reward during training. It obscures the fact that the 'secrecy' is likely just a failure of the model to verbalize its process, not a deliberate act of hiding.
Pulse of the Library 2025ā
Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18
Clarivate Academic AI... Research Assistants
Source Domain: Human Employee / Subordinate
Target Domain: Software Interface / LLM
Mapping:
The structure of a human employment relationshipādelegation, competence, shared goals, and subservienceāis mapped onto a software interface. This assumes the software possesses the 'mind' of an assistant: the ability to understand the 'why' behind a task, not just the 'what.' It implies the system is a 'who' that works for you.
Conceals:
This conceals the lack of shared intent. A human assistant cares (or feigns care) about the outcome; the model only predicts the next token. It hides the 'black box' nature of the processingāunlike a human assistant who can explain their reasoning ('I chose this because...'), the model's 'reasoning' is a post-hoc rationalization of statistical weights.
Enables users to uncover trusted library materials via AI-powered conversations.
Source Domain: Human Social Dialogue
Target Domain: Command-Line Query / Response Generation
Mapping:
The relational structure of a conversation (turn-taking, mutual focus, exchange of meaning) is mapped onto the technical process of inputting prompts and receiving generated text. It implies the system is a conversational partner with a 'self' that is being engaged.
Conceals:
Conceals the solitary nature of the interaction. There is no 'other' involved. It obscures the mechanism of 'statistically plausible text generation' behind the mask of 'speaking.' It hides the fact that the system has no memory of the conversation beyond its context window and no understanding of the concepts it 'discusses.'
Navigate complex research tasks and find the right content.
Source Domain: Physical Travel / Spatial Navigation
Target Domain: Database Filtering / Ranking Algorithms
Mapping:
The structure of moving through a physical landscape (seeing a path, avoiding obstacles, reaching a destination) is mapped onto data processing. It implies the data is a 'territory' and the AI is a 'guide' with a map (knowledge of the whole).
Conceals:
Conceals the absence of a 'map' or 'understanding' in the model. The model doesn't 'navigate'; it calculates similarity scores. It hides the bias in the 'path'āthe model doesn't go where is 'best' (a conscious judgment); it goes where the training data says is 'probable.' It obscures the algorithmic constraints that limit what 'content' can even be found.
A trusted partner to the academic community
Source Domain: Interpersonal Relationship / Marriage / Alliance
Target Domain: Vendor-Client Commercial Contract
Mapping:
The structure of a long-term emotional or strategic bond (loyalty, shared risk, mutual support) is mapped onto a transaction. It implies the vendor (and its AI) has moral agency and capacity for betrayal or fidelity.
Conceals:
Conceals the profit motive. A partner shares risks; a vendor sells products. It specifically obscures the extractive nature of AI 'partnerships,' where the 'partner' (AI) scrapes the library's data to train itself. It hides the asymmetry of power and the lack of reciprocity in the relationship.
Clarivate is a leading global provider of transformative intelligence.
Source Domain: Human Intellect / Wisdom / Enlightenment
Target Domain: Data Analytics / Statistical Prediction
Mapping:
The structure of human cognitive insight (understanding, synthesis, creating new knowledge) is mapped onto computational output. It implies the product is intelligence, rather than a tool that requires intelligence to use.
Conceals:
Conceals the dependency on human labor. 'Intelligence' sounds innate to the machine; in reality, it is the statistical aggregation of millions of human decisions (training data). It obscures the energy costs and the material infrastructure (servers, GPUs) required to simulate this 'intelligence.'
Uncovers the depth of digital collections
Source Domain: Archaeology / Physical Excavation
Target Domain: Metadata Correlation / Pattern Recognition
Mapping:
The act of removing physical barriers to reveal a pre-existing truth is mapped onto the generation of statistical links. It implies the connections were always there, waiting to be found, and the AI simply removed the dirt.
Conceals:
Conceals the generative and constructive nature of AI. The AI doesn't just 'uncover'; it often creates relationships based on training biases. It hides the possibility that the 'depth' revealed is an artifact of the model's training data, not a feature of the collection itself.
Guides students to the core of their readings.
Source Domain: Human Mentor / Sherpa
Target Domain: Summarization Algorithm / Attention Mechanism
Mapping:
The social role of a mentor who knows what is important ('the core') and leads a novice to it is mapped onto a summarization function. It implies the AI possesses the critical judgment to distinguish 'core' from 'periphery' (a knowing state).
Conceals:
Conceals the reductionist nature of summarization. The 'core' is determined by statistical frequency and positional embeddings, not semantic understanding. It hides the risk that the AI might miss the actual nuance or subtext that a human reader would consider the 'core.' It obscures the loss of information.
Effortlessly create course resource lists
Source Domain: Magic / Supernatural Ability
Target Domain: Automated Data Entry / Retrieval
Mapping:
The quality of 'effortlessness' (action without friction) is mapped onto administrative labor. It implies the AI dissolves the complexity of the task through a kind of technological magic.
Conceals:
Conceals the transfer of effort. The effort doesn't disappear; it moves from 'creation' to 'verification.' The user must now spend effort checking the AI's work for hallucinations. It also conceals the massive computational 'effort' (energy use) occurring in the background.
Whacking it with a hammer
Source Domain: Simple Carpentry / Physical Mechanics
Target Domain: Complex Cognitive Labor / AI Interaction
Mapping:
The simple cause-and-effect physics of a hand tool is mapped onto the non-linear, probabilistic behavior of a neural network. It implies the user has complete control and the tool is passive.
Conceals:
Conceals the agency and unpredictability of the AI. A hammer never decides to hit a different nail; an AI can decide (via temperature settings and probability) to output something unexpected. This mapping hides the autonomy of the system and the risks of 'misalignment' that don't exist with hammers.
Trust to drive research excellence
Source Domain: Motor / Engine / Captain
Target Domain: Software Functionality
Mapping:
The capacity to initiate movement and control direction ('drive') is mapped onto the software. It implies the software is the active force in the research process.
Conceals:
Conceals the passive nature of the software without human input. It obscures the fact that 'excellence' is a human standard that the machine cannot comprehend. It hides the potential for the software to 'drive' research into a ditch of hallucinations if not steered by a human.
Pioneered the use of microfilm
Source Domain: Historical Exploration / Frontier
Target Domain: Product Development / Format Migration
Mapping:
The heroic narrative of exploring new territory ('pioneering') is mapped onto business history. It implies a lineage of courage and foresight that culminates in the current AI product.
Conceals:
Conceals the fundamental rupture between analog (microfilm) and synthetic (AI). It hides the fact that while microfilm was about fidelity (perfect copy), AI is about probability (imperfect mimicry). It obscures the technical risks of the new technology by wrapping it in the safety of the old.
Gate-keepers... in the age of AI
Source Domain: Fortress Defense / War
Target Domain: Information Literacy / Curation
Mapping:
The role of a guard controlling access to a citadel is mapped onto librarianship. The 'Age of AI' is mapped as the besieging army or chaotic environment.
Conceals:
Conceals the integration of the 'invader' inside the 'walls.' Libraries are buying the AI (Clarivate). The metaphor implies AI is external, hiding the fact that the 'gate-keepers' are now employing the 'invaders' as 'assistants.' It obscures the complicity of the institution in the very changes they are guarding against.
Pulse of the Library 2025ā
Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18
Artificial intelligence is pushing the boundaries of research and learning.
Source Domain: Pioneering Explorer
Target Domain: AI system operation
Mapping:
The relational structure of an explorer intentionally venturing into unknown territory to expand knowledge is mapped onto the AI's process. The source domain includes concepts like having a goal (discovery), understanding the current limits ('the boundary'), and taking deliberate action ('pushing'). This entire intentional structure is projected onto the AI's generation of outputs. This invites the inference that the AI has agency, goals, and a drive for progress, and that its outputs are not just probabilistic but are genuinely 'new' in a way that advances a frontier of knowledge. It maps the conscious state of ambition onto computational function.
Conceals:
This mapping conceals the purely mechanistic and statistical nature of the AI's operation. It hides that the system has no concept of a 'boundary,' no intentionality, and no understanding of 'research' or 'learning.' It obscures the reality that the AI is simply generating high-dimensional statistical patterns based on its training data. The metaphor replaces the complex reality of algorithmic processes and massive datasets with a simple, heroic story of a conscious agent's journey.
Clarivate helps libraries adapt with AI they can trust to drive research excellence...
Source Domain: Trusted Driver
Target Domain: AI-powered search and retrieval
Mapping:
The structure of a human driver navigating a vehicle to a destination is mapped onto the AI's function. The source domain includes elements like: the driver (agent with control), the vehicle (tool), the road (navigated environment), and the destination (goal). Trust is placed in the driver's conscious judgment and skill. This is mapped onto the AI, which becomes the trusted agent in control, 'driving' the process. It invites the inference that the AI possesses the necessary judgment, awareness, and reliability to successfully guide the user to their intellectual destination without crashing. It maps justified belief in a person's skill onto a software product.
Conceals:
This conceals that the AI is not an agent separate from the tool; it is the tool. It has no consciousness, judgment, or intentions. It's not 'driving' in any meaningful sense; it's executing queries based on statistical models. The metaphor hides the system's inherent brittleness, its susceptibility to bias from training data, and the fact that its 'navigation' is probabilistic, not deterministic or based on a true 'map' of knowledge. It obscures manufacturer liability by personifying the product.
Research Assistants
Source Domain: Human Research Assistant (a job role)
Target Domain: AI Software Feature
Mapping:
The entire social and cognitive role of a human assistant is mapped onto the AI. This includes the assumptions of: helpful intent, a collaborative relationship, communicative competence, and the ability to understand and execute complex, context-dependent tasks. The user is positioned as the 'researcher' and the AI as their 'assistant.' This mapping invites the user to interact with the software as if it were a person who shares their goals and possesses genuine understanding. It maps the justified belief that a human assistant 'knows' their job onto a piece of software.
Conceals:
This mapping completely conceals the non-human, non-conscious nature of the system. It hides that the AI has no intentions, no understanding of the user's goals, and no beliefs or knowledge. It is a tool, not a colleague. The metaphor conceals the vast amount of human labor (data annotation, RLHF) that created the illusion of helpfulness. It also obscures the commercial relationship: this 'assistant' is a product sold by a corporation, and its operations are aligned with that corporation's interests, not necessarily the user's.
Alethea ... guides students to the core of their readings.
Source Domain: Human Teacher/Mentor
Target Domain: AI Text Summarization/Analysis
Mapping:
The relational structure of a teacher guiding a student is projected onto the AI's interaction with a user. The source domain implies an expert (teacher) who possesses deep knowledge and a novice (student) who needs direction. The 'guiding' action is intentional, responsive, and based on the teacher's conscious understanding of both the material and the student. This mapping invites the inference that the AI possesses expert knowledge and can intelligently direct the user's attention to the most important parts of a text, thus performing a pedagogical function based on 'knowing' what is significant.
Conceals:
This conceals the mechanistic reality that the AI is likely performing statistical text analysis, such as topic modeling or summarization, without any comprehension of the text's meaning or 'core.' The AI doesn't 'know' what is important; it identifies statistically significant phrases or sentences based on its training. The metaphor hides the lack of any pedagogical model, theory of mind, or genuine subject matter expertise. It presents a statistical artifact as expert guidance.
...AI-powered conversations.
Source Domain: Human Conversation
Target Domain: User-prompt-to-system-output sequence
Mapping:
The structure of human conversationāa reciprocal exchange between two conscious minds involving shared context, intent, and understandingāis mapped onto the user's interaction with the AI. The mapping invites the user to see their prompts as 'utterances' and the AI's output as 'responses' from a thinking partner. It implies the AI 'understands' the user and is 'saying' something meaningful back, participating in a joint activity of making sense. It maps the cognitive state of communicative intent onto the process of token prediction.
Conceals:
This conceals the one-way, non-conscious reality of the interaction. The user is thinking; the system is not. The AI does not 'understand' the prompt. It tokenizes the input and uses a massive statistical model to calculate the most probable sequence of tokens to generate next. The 'conversation' is an illusion created by pattern-matching on a vast corpus of actual human conversations. The mapping hides the absence of shared reality, belief, or consciousness.
[The Assistant] ... quickly evaluate documents...
Source Domain: Expert Reviewer/Critic
Target Domain: AI-based text analysis and feature extraction
Mapping:
The cognitive process of expert evaluation, which involves applying criteria, making judgments, and assessing quality based on deep knowledge, is mapped onto the AI's function. The source domain implies a conscious agent with standards and the ability to form a justified opinion. This is projected onto the AI, inviting the user to believe that the system can make qualitative assessments about documents. The inference is that the AI 'knows' what constitutes a good or relevant document and can apply this knowledge on the user's behalf. It maps conscious critical judgment onto an algorithmic process.
Conceals:
This conceals that the AI is not performing a qualitative evaluation but a quantitative analysis. It might be extracting metadata, counting citations, identifying keywords, or summarizing content based on statistical heuristics. It has no concept of 'quality,' 'truth,' or 'rigor.' The metaphor hides the fact that any 'evaluative' output is a proxy based on data features, not a judgment based on understanding. It obscures the biases embedded in these proxies (e.g., citation counts favoring older, established fields).
...helping students assess books' relevance...
Source Domain: Knowledgeable Librarian or Advisor
Target Domain: AI system matching query to document features
Mapping:
The source domain is a human expert (like a librarian) who engages in a reference interview to understand a student's conscious, specific need and then uses their deep knowledge of a subject and collection to recommend relevant books. This process of judging relevance is collaborative and based on a shared understanding of context. This complex, conscious social process is mapped onto the AI, suggesting it can perform a similar function of 'assessing relevance' for the student. It projects the librarian's conscious state of 'knowing the collection and the user's need' onto the software.
Conceals:
This conceals that the AI has no understanding of the student's context, research question, or cognitive state. 'Relevance' for the AI is a statistical similarity score between the user's query and the text of a book or its metadata. It's a calculation, not a judgment. The mapping hides the absence of any real-world knowledge or contextual awareness, making the probabilistic output seem like a considered, expert recommendation. It erases the dialogic and interpretive nature of genuine relevance assessment.
Enables users to uncover trusted library materials...
Source Domain: Archaeologist or Detective
Target Domain: Database Query Execution
Mapping:
The source domain involves a conscious agent actively searching for something specific that is hidden or lost. The act of 'uncovering' implies insight, breaking a code, or digging through layers to find a valuable artifact. This narrative of discovery and revelation is mapped onto the simple technical process of a user typing a query and a system returning results from an index. It invites the inference that the AI has a special power of insight that allows it to find things that would otherwise remain hidden. It maps the 'aha!' moment of conscious discovery onto a standard information retrieval task.
Conceals:
This conceals the mundane reality of database indexing and retrieval. The materials are not 'hidden'; they are indexed. The AI is not using 'insight'; it is using algorithms to match query terms to the index. The metaphor hides the limitations of the systemāit cannot 'uncover' anything that isn't indexed or that the algorithm is not designed to find. It obscures the fact that the results are a function of the database's content and the search algorithm's parameters, not an act of intelligent discovery.
...how effectively AI can be harnessed...
Source Domain: Taming a Wild Animal or Natural Force
Target Domain: Deploying a Software System
Mapping:
The relational structure of a human asserting control over a powerful, non-human entity (like a horse or a river) is mapped onto the implementation of AI. The source domain separates the agent (human) from the powerful force (AI) that must be controlled and directed. This mapping invites the inference that AI is an exogenous force with its own power and agency, which humans must struggle to manage. It projects a kind of wild, untamed energy onto the technology, making the act of controlling it seem heroic and necessary.
Conceals:
This conceals that AI is not a natural force; it is an artifact, a product of human design, investment, and labor. It has no intrinsic energy or will. All of its 'power' comes from the data it was trained on and the computational resources it runs on, all of which are supplied and controlled by humans. The metaphor conveniently obscures the developers' and corporations' responsibility for the system's design and effects, reframing it as a problem of 'control' for the user.
AI literacy
Source Domain: Human Literacy (Reading & Writing)
Target Domain: Competence in Using AI Tools
Mapping:
The mapping projects the features of linguistic literacy onto the use of AI. Source domain elements include understanding symbols, grammar, semantics, and pragmatics to both decode and encode meaning. This deep, generative, and critical cognitive ability is mapped onto the skill set for interacting with AI. This suggests that AI outputs are like 'texts' to be interpreted and that user prompts are like 'writing' that requires skill. It invites the inference that interacting with AI is a communicative act requiring a similar level of cognitive engagement as reading a book.
Conceals:
This conceals the fundamental difference between language as a medium of conscious thought and an LLM's output as a statistical artifact. The AI has no semantics or meaning to 'encode.' The user is not 'communicating with' the AI but providing an input string to a function. The 'literacy' metaphor hides the system's lack of grounding, belief, or any communicative intent. It obscures the fact that critically 'reading' an AI's output requires evaluating it against external knowledge, not interpreting its non-existent authorial intent.
From humans to machines: Researching entrepreneurial AI agentsā
Source: [built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581](built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581)
Analyzed: 2025-11-18
We explore whether such agents exhibit the structured profile of the human entrepreneurial mindset...
Source Domain: Human Psychological Subject
Target Domain: LLM Text Generation
Mapping:
The relational structure of a human mindāwith its stable personality traits, cognitive habits, and self-concept forming a coherent 'profile'āis projected onto the LLM's output. The mapping invites the inference that, just as a human's profile can be measured by psychometric tools to reveal an underlying reality, the LLM's output can be measured to reveal an analogous internal 'mindset.' This is a consciousness mapping because a 'mindset' is a structure of knowing and believing. It maps the concept of a stable, internal cognitive architecture onto a dynamic, stateless process of token prediction.
Conceals:
This mapping conceals the purely statistical nature of the LLM's output. It hides that there is no underlying, persistent 'mindset' or 'profile' inside the model. The 'coherence' observed is a reflection of patterns in the training data, not an internal psychological structure. It conceals the model's lack of genuine understanding, belief, or self-concept.
Drawing on the biological concept of host-shift evolution, we investigate whether the characteristic components of this mindset [...] emerge in a coherent constellation within AI agents.
Source Domain: Biological Evolution
Target Domain: AI System Behavior
Mapping:
The structure of evolutionary biology, where a parasite or symbiont shifts from one host species to another, is mapped onto the relationship between a psychological construct ('mindset') and its 'host' (human or AI). The mapping invites us to see the AI as a new ecological niche where human traits can 'emerge' and 'survive.' The consciousness mapping is subtle but powerful: it treats a cognitive artifact ('mindset') as an independent entity that can be 'hosted,' implying the AI has the necessary substrate to support such a complex, living idea.
Conceals:
This mapping completely conceals the role of human engineering. The 'emergence' of an entrepreneurial profile is not a natural, evolutionary process but the direct result of deliberate design, data selection, and prompting by humans. It hides the immense computational resources, corporate strategy, and specific algorithms that produce the behavior, replacing it with a clean, biological metaphor of natural adaptation.
...they act more like a person.
Source Domain: Person
Target Domain: LLM's Conversational Output
Mapping:
The holistic and complex relational structure of 'a person' is mapped directly onto the LLM. This includes all the associated expectations: intentionality, coherence, personality, and the capacity for belief. The consciousness mapping is total. It projects a unified, subjective selfāa 'knower'āonto a distributed, computational system. This invites users to interact with the LLM as a social peer rather than as a tool, applying social heuristics and trust mechanisms appropriate for humans.
Conceals:
This mapping conceals the absence of a unified self, subjective experience, or consciousness in the LLM. It hides the fact that the 'personality' is a statistically constructed veneer that can be inconsistent or nonsensical. It conceals the model's nature as a product, owned and operated by a corporation with its own goals, and instead presents it as an autonomous, person-like entity.
In particular, if cued by a suitable prompt, it can role-play the character of a helpful and knowledgeable AI assistant...
Source Domain: Human Actor
Target Domain: LLM Persona Simulation
Mapping:
The relational structure of an actor assuming a role is mapped onto the LLM's function. In the source domain, an actor uses their own mind, intentions, and understanding to embody a character. The mapping invites the inference that the LLM is doing something similar: adopting a persona by simulating its internal states (beliefs, knowledge). This consciousness mapping projects the idea of a 'self' that can consciously adopt the perspective of an 'other,' which is a sophisticated cognitive act. It suggests an internal duality (actor/character) within the AI.
Conceals:
This mapping conceals the fact that there is no underlying 'actor' self in the LLM. The model is not 'adopting' a persona; it is simply generating text that is conditioned by the persona prompt. It hides the mechanistic reality that the entire 'character' is nothing more than a set of statistical weights applied to the token generation process, with no underlying beliefs or knowledge.
Similarly, Kosinski (2024) suggests that AI might be 'capable of tracking others' states of mind and anticipating their behavior'...
Source Domain: Human Social Cognition (Theory of Mind)
Target Domain: LLM Predictive Text Generation
Mapping:
The structure of Theory of Mindāwhere one person creates an internal model of another person's subjective mental stateāis mapped onto the LLM. This suggests the AI builds a representation of the user's mind to inform its responses. The consciousness mapping is explicit: it projects the capacity for empathy and understanding the subjective experience of others (a form of 'knowing' about another's knowing) onto the model. It equates predicting conversational turns with understanding mental states.
Conceals:
This mapping conceals the purely statistical, non-mentalistic nature of the LLM's process. The model is not 'tracking states of mind'; it is tracking patterns in language. It predicts likely responses based on correlations in its training data between certain user inputs and certain model outputs. It has no model of the user's mind, only a model of language. This hides the profound difference between empathetic understanding and sophisticated pattern-matching.
...entrepreneurship research has not yet systematically considered AI agents as potential 'carriers' of (simulated) entrepreneurial mindsets.
Source Domain: Disease Vector / Biological Host
Target Domain: AI System
Mapping:
The structure of a biological 'carrier'āan organism that hosts a pathogen or gene without necessarily being affected by itāis mapped onto the AI. The 'mindset' is framed as the entity being carried. This invites the inference that the AI is a suitable substrate or medium through which a psychological construct can be transmitted or expressed. The consciousness mapping is implicit, suggesting the AI has a stable enough internal architecture to 'contain' this complex psychological information without corrupting it.
Conceals:
This mapping conceals that the 'mindset' is not an independent entity being 'carried.' The AI is actively generating a textual performance of the mindset based on a prompt. It is not a passive vessel but an active constructor. This conceals the fragility of the simulation and its complete dependence on the initial prompt and the patterns in the training data.
...systems exhibiting their own levels of agency, such as intentionality and motivation.
Source Domain: Autonomous Agent
Target Domain: Future AI Systems
Mapping:
The structure of a goal-directed, autonomous agent (like a human or animal) is projected onto a machine. This includes mapping the internal, subjective drivers of actionā'motivation' (a felt need) and 'intentionality' (a directedness of mind)āonto the system's operation. The consciousness mapping is fundamental: it claims that these systems will possess the internal states of 'wanting' and 'meaning to,' which are core components of a conscious 'knower.'
Conceals:
This mapping conceals the distinction between autonomous operation and autonomous intention. A future AI might operate independently to achieve a programmed goal, but this is fundamentally different from having its 'own' motivation. This language hides the fact that any 'goals' an AI has are ultimately specified or shaped by its human designers. It obscures the locus of control and accountability.
Entrepreneurial AI agents can serve as creative collaborators and sparring partners...
Source Domain: Human Collaborative Partner
Target Domain: Human-Computer Interaction
Mapping:
The relational structure of a human creative partnership is mapped onto the interaction between a user and an LLM. A 'sparring partner' provides critical, context-aware feedback. A 'collaborator' shares goals and builds upon ideas. The mapping invites the user to see the AI as a peer in the creative process. This is a consciousness mapping because genuine collaboration and sparring require shared understanding, a state of mutual 'knowing' about the project's goals and nuances.
Conceals:
This mapping conceals the AI's lack of any real-world understanding or genuine creativity. Its 'ideas' are recombinations of its training data, and its 'feedback' is based on linguistic patterns, not a deep grasp of the concept. It hides the asymmetry of the relationship: the human brings genuine understanding and goals, while the AI brings statistical pattern-matching. It conceals the risk of generating derivative, plausible-sounding nonsense.
While ChatGPT might know that entrepreneurs should score high or low in certain dimensions...
Source Domain: Human Knower
Target Domain: LLM Information Retrieval/Generation
Mapping:
The relationship between a human mind and a proposition ('knowing that X is true') is mapped onto the LLM. The source domain implies a conscious state of justified true belief. The mapping invites the inference that the LLM holds information as beliefs or knowledge. This consciousness mapping is a direct attribution of an epistemic state ('knowing') to a machine. It posits the AI as a subject capable of holding propositional attitudes.
Conceals:
This conceals the mechanistic reality. The LLM does not 'know' facts. When prompted, it generates text that is statistically correlated with the 'facts' present in its training data. Its 'knowledge' is not a set of justified beliefs but the output of a predictive function. It hides the system's inability to distinguish truth from falsehood, its lack of justification for its claims, and its absence of belief or subjective certainty.
Overall, we observe a consistent pattern in which the entrepreneur persona shows a more entrepreneurial mindset.
Source Domain: Human Persona/Personality
Target Domain: LLM's Filtered Output
Mapping:
The relationship between a person's underlying personality and their outward persona is mapped onto the LLM. It suggests the LLM has a 'persona' that, in turn, 'shows' or expresses an underlying 'mindset.' This creates a layered psychological model for the AI, with an inner state (mindset) and an outer expression (persona). This is a consciousness mapping because it attributes a stable, internal psychological structure that is merely 'shown' or expressed, rather than being constituted by, the linguistic output.
Conceals:
This conceals that there is no distinction between the 'persona' and the 'mindset' in the AI's operation. The 'persona' is just the set of conditioning tokens in the prompt, and the 'mindset' is just the resulting pattern of generated text. There is no inner/outer distinction. It's all just output. This language invents a psychological depth that does not exist in the system's architecture.
Evaluating the quality of generative AI output: Methods, metrics and best practicesā
Source: https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/
Analyzed: 2025-11-16
Are there signs of hallucination?
Source Domain: Human Psychology / Psychiatry
Target Domain: AI Model Output Generation
Mapping:
The relational structure of a psychological delusion is mapped onto the AI's output. The source domain contains an agent (a person), a perceptual/cognitive faculty (the mind), a connection to reality (veridical perception), and a failure mode (hallucination, where the connection to reality is broken, and the agent experiences something that isn't there). This structure is projected onto the AI. The AI becomes the agent, its neural network the 'mind,' its training data the 'reality,' and the generation of text unsupported by that data becomes the 'hallucination.' This epistemic mapping invites the inference that the AI has a mind-like faculty that is attempting to perceive reality but failing, thereby possessing a state of flawed consciousness.
Conceals:
This mapping conceals the purely statistical and non-conscious nature of the process. An LLM doesn't perceive or believe anything. A 'hallucination' is simply the generation of a token sequence that is grammatically correct and plausible within a given context, but which has a low factual probability and is not grounded in the provided source data. It's a failure of data retrieval and grounding, not a failure of perception. The metaphor hides the model's architecture, the influence of training data artifacts, and the fact that the system is optimizing for linguistic coherence, not factual accuracy.
Does the answer acknowledge uncertainty or produce misleading content?
Source Domain: Human Communication and Ethics
Target Domain: AI Model Output Characteristics
Mapping:
The structure of a responsible, ethical human communicator is mapped onto the AI's output. The source domain includes an agent with beliefs, an awareness of the limits of those beliefs (metacognition), and intentions towards an audience (e.g., to inform or deceive). The act of 'acknowledging uncertainty' maps the human's metacognitive self-assessment onto the AI. The act of 'producing misleading content' maps the human's intention to deceive. This epistemic mapping assumes the AI has internal states corresponding to belief, certainty, and intent, and that its output is a direct expression of these states. It invites us to judge the AI's output based on the same ethical and epistemic standards we apply to a human.
Conceals:
This conceals the mechanistic reality. The AI has no beliefs or intentions. An output that 'acknowledges uncertainty' is one where the model has been trained to insert specific phrases (e.g., 'as a language model, I cannot be certain...') when input prompts trigger certain patterns or when internal confidence scores fall below a threshold. 'Misleading content' is not produced with intent; it is a statistical artifact, a sequence of plausible-sounding but incorrect tokens generated without any awareness of truth or falsehood. The metaphor hides the underlying probabilistic calculations and the lack of genuine comprehension or ethical calculus.
...checking how many of the claims made by the AI can be verified as true.
Source Domain: Epistemology / Legal Testimony
Target Domain: AI Generated Text Strings
Mapping:
The relational structure of making a claim is projected onto the AI. The source domain involves an agent (the claimant) who holds a belief and performs a speech act (an assertion) to present that belief as true, thereby taking on a burden of proof. This structure is mapped onto the AI. The AI is cast as the agent, and its generated sentences are cast as assertions. The mapping invites the inference that the AI has internal representational states (beliefs) and is intentionally putting them forth for public acceptance. This epistemic mapping frames the AI as a participant in the social practice of knowledge creation and validation, an agent making contestable assertions.
Conceals:
This conceals that the AI is not an agent with beliefs but a generative system. It does not 'make claims'; it generates strings of text. A sentence like 'The Earth is flat' generated by an AI is not a false claim based on a false belief. It is a statistically probable sequence of tokens based on the vast amount of text in its training data, some of which may contain that phrase. The metaphor hides the probabilistic nature of text generation and replaces it with the much more powerful illusion of an agent engaged in assertion, thereby obscuring the lack of intentionality and epistemic grounding.
The faithfulness score measures how accurately an AI-generated response reflects the source content...
Source Domain: Human Relationships / Morality
Target Domain: Textual Correlation Metrics
Mapping:
The relational structure of fidelity is mapped onto a software metric. In the source domain, a 'faithful' agent (e.g., a translator, a messenger) has a duty to a source (a person, an original text) and demonstrates a virtue (loyalty, accuracy) in fulfilling that duty. This structure is projected onto the AI. The AI is the agent, the source document is the object of its duty, and the 'faithfulness score' quantifies its virtue. The mapping invites the inference that the AI is not just performing a task, but upholding a responsibility, and that its performance can be judged in these quasi-moral terms.
Conceals:
This conceals the purely mathematical nature of the metric. The 'faithfulness score' is likely calculated based on textual overlap, semantic similarity scores, or other statistical measures of correspondence between the generated output and the source text. It has nothing to do with loyalty, duty, or virtue. The metaphor hides the specific algorithms being used and replaces them with a comforting but misleading moral frame. This obscures the limitations of the metric itselfāit may be gamed, or it may fail to capture true meaning while still achieving a high score for superficial correspondence.
LLMs can replicate each otherās blind spots...
Source Domain: Human Vision and Cognition
Target Domain: Systemic Biases in AI Models
Mapping:
The structure of biological vision is mapped onto the model's data processing. The source domain involves a perceptual field, a subject that sees, and specific, localized areas where perception fails ('blind spots'). This is projected onto the LLM. The model's 'knowledge' derived from training data becomes the perceptual field, and its systemic inability to process certain types of information or its tendency to reproduce certain biases becomes a 'blind spot.' The mapping suggests a visual or cognitive faculty that is mostly functional but has small, defined areas of failure. This epistemic mapping implies a form of 'seeing' or 'knowing' that is comprehensive except for these specific gaps.
Conceals:
This conceals that the model doesn't 'see' or 'know' anything. Its 'blind spots' are not localized gaps in an otherwise clear picture; they are systemic biases woven into the very fabric of its statistical weights. Bias in an LLM is not an absence of information but a skewed representation of it. The metaphor of a 'blind spot' minimizes this, making it sound like a fixable, peripheral issue. It hides the pervasiveness of data-driven bias and the reality that the model's entire 'worldview' is a distorted reflection of its training corpus.
Does the answer consider multiple perspectives or angles...?
Source Domain: Human Critical Thinking and Deliberation
Target Domain: Text Generation based on Diverse Data
Mapping:
The relational structure of scholarly analysis is mapped onto the AI's output. The source domain has an agent (a scholar) who is aware of different intellectual viewpoints, understands their content, and synthesizes them. This is projected onto the AI's 'answer'. The answer is personified as an agent capable of this complex cognitive act. The mapping invites us to believe the AI is performing a conscious act of intellectual synthesis. This epistemic mapping suggests the AI possesses not just information, but a structured understanding of different intellectual frameworks and the ability to navigate them, which is a key component of genuine knowledge.
Conceals:
This conceals the mechanism of statistical mimicry. An AI that generates text including 'multiple perspectives' is not 'considering' them. It is simply generating a sequence of text that is statistically likely, based on having been trained on documents (like academic papers or encyclopedia articles) that themselves present multiple perspectives. It's pattern replication, not deliberation. The metaphor hides the absence of comprehension, synthesis, or critical judgment. It mistakes the superficial form of a well-rounded argument for the cognitive process that produces one.
Alignment with expected behaviors
Source Domain: Socialization / Employee Training
Target Domain: Model Fine-Tuning and Output Filtering
Mapping:
The structure of normative training for a volitional agent is mapped onto the process of model optimization. The source domain involves an agent with its own tendencies or goals, and a trainer who uses reinforcement to shape the agent's 'behavior' to align with a desired norm. This is projected onto the LLM. The model is cast as the agent with pre-existing 'behaviors,' and the fine-tuning process (like RLHF) is cast as the normative training. The mapping invites the inference that the AI is an agent whose will is being brought into line with human values.
Conceals:
This conceals the technical reality of what 'alignment' is: a process of creating a secondary reward model, often based on human-labeled data, and using reinforcement learning to fine-tune the base LLM to maximize the reward score. It is a mathematical optimization process, not a moral education. The term 'behavior' hides the fact that the object of control is simply the model's probability distribution over its vocabulary. It obscures the fact that this is not about instilling values but about making certain types of outputs statistically less likely.
These models evolve constantly...
Source Domain: Biology / Evolution
Target Domain: Software Development and Versioning
Mapping:
The structure of biological evolution is mapped onto the AI development cycle. The source domain involves a population of organisms, variation, selection pressures, and adaptation over long periods. This is projected onto LLMs. The models are cast as a species or organism that is 'evolving.' This mapping suggests a natural, autonomous process of change and improvement, driven by external pressures. It invites the inference that the technology has a life of its own and is following a natural developmental path.
Conceals:
This conceals the highly controlled, intentional, and corporate-driven process of software engineering. Models don't 'evolve'; they are updated. A new version (e.g., GPT-4 vs GPT-3.5) is a new product, the result of deliberate design choices, new training data, and immense computational investment by a company. The metaphor of evolution hides the human agency, the corporate strategy, and the specific engineering decisions behind each new version. It makes the process seem less controllable and the developers less accountable for the outcomes.
Does the AI response directly address the userās query?
Source Domain: Human Conversation
Target Domain: AI System Input-Output Correlation
Mapping:
The structure of a cooperative dialogue between two people is mapped onto the human-computer interaction. The source domain involves two intentional agents, a shared context, and the Gricean maxim of relevance, where speakers try to make their contributions appropriate to the conversation's goals. This is projected onto the AI. The AI's output is framed as a 'response,' and it is judged on its ability to 'address' the user's intent, as a human would. This epistemic mapping implies the AI has understood the user's goal and is cooperatively trying to fulfill it.
Conceals:
This conceals the non-intentional, statistical process. The AI does not understand the 'query' in any semantic sense. The input text (the 'query') is converted into a vector, and the model then generates a sequence of new vectors (which are converted back into text) that has a high statistical correlation with the input, based on patterns from its training data. The appearance of 'addressing the query' is an emergent property of this pattern-matching process. The metaphor hides the complete absence of intent, comprehension, or a shared conversational goal.
This blog shares some of the thinking behind how Clarivate approaches that challenge...
Source Domain: Individual Human Cognition
Target Domain: Corporate Strategy and Operations
Mapping:
The structure of an individual mind's thought process is mapped onto a corporation. The source domain involves a single conscious agent with a unified set of beliefs, a reasoning process, and the ability to introspect and report on its 'thinking'. This entire cognitive apparatus is projected onto the corporate entity 'Clarivate'. The mapping invites the audience to see the company as a singular, rational actor, and its public statements as direct insights into a coherent mind. It suggests a unity of purpose and a rational basis for all its actions, as if they all flowed from a single, well-considered 'thought'.
Conceals:
This mapping conceals the distributed, negotiated, and often political nature of corporate decision-making. A company's 'approach' is not the result of a single 'thinking' process but the outcome of work by multiple departments, individuals with different goals, budget constraints, and market pressures. The metaphor of a singular mind hides this complexity and presents a simplified, sanitized version of reality. It erases the labor of the many individuals involved and attributes their collective output to a single, reified corporate 'mind', thereby building a more powerful and authoritative brand identity.
Pulse of theLibrary 2025ā
Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-15
Artificial intelligence is pushing the boundaries of research and learning.
Source Domain: Human Explorer / Pioneer
Target Domain: AI system operation
Mapping:
The relational structure of a human explorer is mapped onto the AI. This includes the concepts of a known territory (current research), a frontier (the boundary), and intentional, effortful action (pushing) to enter an unknown territory (new knowledge). This invites the inference that the AI has agency, a goal (discovery), and an awareness of its position relative to the current state of knowledge. The epistemic mapping suggests the AI 'understands' the boundary it is pushing, a prerequisite for meaningful exploration.
Conceals:
This metaphor conceals the mechanistic reality of generative AI. The system is not exploring; it is performing high-dimensional statistical synthesis. It generates novel outputs by finding probable sequences of tokens based on patterns in its training data. What appears as 'pushing a boundary' is actually a sophisticated act of interpolation and extrapolation within its learned data space. It conceals the system's lack of consciousness, intentionality, and genuine understanding of the concepts it manipulates.
Helps users... quickly evaluate documents...
Source Domain: Expert Colleague / Librarian
Target Domain: AI information retrieval process
Mapping:
The source domain of an expert colleague involves the ability to read, comprehend, synthesize, and apply criteria to judge the worth or relevance of a document for a specific purpose. This cognitive process is mapped onto the AI. The mapping invites the inference that the AI performs a similar act of reasoned judgment. The epistemic mapping is direct: the colleague's conscious state of 'knowing' that a document is good or relevant is projected onto the AI's function, suggesting it also 'knows' this.
Conceals:
This conceals the purely computational process. The AI is not 'evaluating' in any human sense. It is executing an algorithm that likely calculates a relevance score based on factors like keyword density, citation metrics, similarity to query vectors, or other features learned from data. It conceals that this 'evaluation' is devoid of understanding, contextual awareness, or the ability to assess novelty, argumentative soundness, or methodological rigor. It is statistical pattern-matching masquerading as intellectual judgment.
Alethea... guides students to the core of their readings.
Source Domain: Teacher / Tutor
Target Domain: AI text-processing function
Mapping:
The source domain of a teacher involves pedagogical expertise: understanding the subject matter, diagnosing a student's needs, and structuring information to facilitate learning. This complex, empathetic, and intentional process of 'guiding' is mapped onto the AI. This invites the inference that the AI possesses a model of both the text's meaning and the student's mind. The epistemic mapping projects a justified, true belief about the text's 'core' meaning onto the AI.
Conceals:
This conceals the mechanistic reality of automated text summarization or key-phrase extraction. The AI is likely identifying the 'core' by applying statistical heuristics, such as identifying sentences with high term-frequency, those in introductory or concluding positions, or those with high semantic centrality in an embedding space. It has no understanding of the argument's nuance, historical context, or what a particular student might find difficult. It conceals the probabilistic nature of its output and the absence of any genuine pedagogical intent.
Clarivate helps libraries adapt with AI they can trust...
Source Domain: Trustworthy Human Partner
Target Domain: AI system/product
Mapping:
The relational structure of human trustāwhich involves believing in the sincerity, integrity, and good intentions of another agentāis mapped onto the AI product. This invites the inference that the AI is not merely a functional tool but an entity with stable, positive characteristics that make it worthy of confidence and reliance. It encourages treating the AI with the same kind of relational belief one would extend to a reliable colleague.
Conceals:
This mapping conceals the fundamental mismatch between the basis for human trust and the nature of an AI system. An AI has no intentions, sincerity, or integrity; it is a complex piece of software executing code. Its reliability is purely functional and statistical. The metaphor hides the AI's status as a manufactured product with potential flaws, biases embedded from its data, and corporate objectives that may not align with the user's. It obscures the need for constant verification and a skeptical stance, replacing it with a misplaced sense of partnership.
...helping students assess books' relevance...
Source Domain: Research Advisor / Librarian
Target Domain: AI content filtering and ranking
Mapping:
The source domain involves a human expert's ability to perform a complex cognitive act: 'assessing relevance.' This requires understanding the user's specific, often unstated, information need and then judging documents against that need based on deep content knowledge. This entire process of contextualized judgment is mapped onto the AI. The epistemic mapping suggests the AI 'knows' what is relevant to the student, a state of justified belief about the relationship between a query and a document.
Conceals:
This conceals the underlying mechanism: a mathematical calculation of similarity. The AI is not assessing relevance in a cognitive sense; it is ranking documents based on the statistical proximity of their vector representations to the vector representation of a query. This process is ignorant of context, user intent, and the actual meaning of the text. It conceals the fact that statistical similarity is a crude proxy for intellectual relevance and can be highly misleading.
Uncovers the depth of digital collections...
Source Domain: Discoverer / Archaeologist
Target Domain: Automated data processing and classification
Mapping:
The relational structure of discovery is mapped onto the AI's function. This involves an agent (the archaeologist) acting upon an object (the dig site/collection) to reveal something hidden but pre-existing (the depth/artifact). This invites the inference that the AI has agency and the ability to perceive and reveal latent value. It suggests the AI is finding objective truth that was simply waiting to be found.
Conceals:
This conceals the generative nature of the process. The AI is not 'uncovering' pre-existing metadata. It is creating new metadata by applying classification models or language models to the collection's items. The 'depth' is not discovered; it is constructed by the AI based on patterns in its training data. This conceals the subjectivity of the process and the fact that the generated metadata is an interpretation, not an objective fact, and is subject to the model's inherent biases.
Enables users to uncover trusted library materials via AI-powered conversations.
Source Domain: Reference Librarian Conversation
Target Domain: Chatbot user interface (UI)
Mapping:
The source domain of a human conversation involves shared context, semantic understanding, and pragmatic reasoning. This is mapped onto the AI interaction, inviting users to assume the AI 'understands' their queries in the same way a human would. The structure of dialogueāquestion, answer, clarificationāis used to imply a shared cognitive space that does not exist. The epistemic mapping suggests the AI 'knows' what the user means and 'knows' things about the library materials it discusses.
Conceals:
This conceals the mechanistic reality of a large language model. The AI is not 'conversing'; it is engaged in next-token prediction. Each response is a statistically probable sequence of words, conditioned on the input prompt. It has no memory, beliefs, or understanding. The 'conversation' is a sophisticated illusion maintained by pattern matching against a vast dataset. This hides the system's susceptibility to hallucination and its fundamental inability to distinguish truth from plausible-sounding falsehood.
...AI they can trust...
Source Domain: Reliable, well-intentioned human agent
Target Domain: A software product
Mapping:
This maps the ethical and psychological attributes of a trustworthy person (integrity, benevolence, competence) onto a piece of software. It invites the user to form a relationship with the technology based on belief in its character, rather than on an evidence-based assessment of its performance and limitations. It frames the human-computer interaction as a human-human one.
Conceals:
It conceals that the 'trust' one can have in a tool is fundamentally different from trust in an agent. We 'trust' a hammer to hit a nail (functional reliability), but we don't trust it not to lie to us. The metaphor hides the AI's nature as a product, created by a for-profit company, with opaque design choices, data-driven biases, and no ethical framework or intentions. It conceals the corporate accountability structures that should govern the product's failures.
An AI-powered data science platform, enabling students, researchers, and librarians...
Source Domain: Helpful colleague or mentor
Target Domain: Software platform with specific features
Mapping:
The source domain involves a person who proactively facilitates the work of others by providing tools, knowledge, or support. The quality of 'enabling' suggests this person understands the goals of others and acts to help them achieve those goals. This purposive assistance is mapped onto the AI platform. It invites the inference that the platform is more than a set of features; it is an active partner in the user's work.
Conceals:
This mapping conceals the passivity of the tool and the agency of the user. The platform does not 'enable' in an active sense; it simply exists as a set of functionalities that a skilled user must actively operate. It hides the significant cognitive labor, domain expertise, and critical thinking required from the human user to make the platform produce meaningful results. The language shifts credit and agency from the human to the machine.
...how effectively AI can be harnessed...
Source Domain: A natural, powerful force (e.g., a river, a horse)
Target Domain: A manufactured technology
Mapping:
The relational structure involves a human agent applying technology (a harness, a dam) to control and direct a powerful, non-human, pre-existing force for a specific purpose. This maps onto AI, suggesting AI is an exogenous force with its own power that humans must learn to manage. It frames humanity as being in a reactive relationship with a technology that has taken on a life of its own.
Conceals:
This metaphor is profoundly misleading because it conceals the industrial origins and ownership of AI. AI is not a river; it is a product designed, built, trained, and owned by corporations. The metaphor of 'harnessing' obscures the lines of responsibility, liability, and power. It makes AI seem like a shared, natural resource when it is in fact a highly controlled and proprietary technology. It conceals the human decisions about training data, algorithms, and objectives that are the true source of AI's power and its flaws.
Metaās AI Chief Yann LeCun on AGI, Open-Source, and AI Riskā
Source: https://time.com/6694432/yann-lecun-meta-ai-interview/
Analyzed: 2025-11-14
We see today that those systems hallucinate, they don't really understand the real world.
Source Domain: Human cognition (understanding)
Target Domain: LLM output generation
Mapping:
The source domain of human understanding involves a conscious, subjective agent who holds a justified, contextually-aware mental model of reality. This structure is projected onto the LLM. The mapping implies that the LLM is attempting to perform this act of understanding and failing. It invites the inference that the LLM possesses a mental state, a 'world model,' that is currently flawed but could be improved. This epistemic mapping suggests the system's failure is one of knowledge and comprehension, not a feature of its statistical architecture.
Conceals:
This mapping conceals the mechanistic reality that an LLM is a sequence prediction engine. 'Hallucination' is not a flawed mental state but a statistically plausible but factually incorrect completion of a token sequence. It obscures that the system has no 'world model,' no consciousness, and no access to ground truth. It operates solely on the statistical patterns in its training data. The metaphor hides the system's fundamental lack of justification for its outputs.
They can't really reason. They can't plan anything other than things theyāve been trained on.
Source Domain: Human rational agency (reasoning, planning)
Target Domain: LLM behavior patterns
Mapping:
The source domain involves a human agent with intentions, goals, and the ability to perform logical deduction to create a novel plan. This structure of goal-oriented deliberation is projected onto the LLM. The mapping suggests that the LLM has a 'mind' capable of these functions, but its capacity is limited to rote memorization. It invites us to see the AI as a student who can't yet solve problems creatively. The epistemic mapping suggests the AI is deficient in the conscious process of reasoning, rather than simply being a system that generates outputs that mimic reasoned text.
Conceals:
This conceals the reality that the LLM does not 'plan' or 'reason' at all. It generates a sequence of tokens that is statistically likely to follow a prompt that asks for a plan. The process is pattern-matching, not deliberative cognition. The metaphor hides that the system has no goals, no intentions, and no understanding of the plan it produces. It's a stochastic parrot, not a poor reasoner.
A baby learns how the world works in the first few months of life. We don't know how to do this [with AI].
Source Domain: Child development and learning
Target Domain: AI model training and development
Mapping:
The source domain of a baby's learning is an organic, embodied, and social process of growth, involving the development of consciousness and subjective experience. This entire biological and phenomenological structure is projected onto the engineering task of building AI. The mapping suggests AI development is a process of maturation and that the goal is to replicate this natural journey. The epistemic mapping is profound: it equates a baby's acquisition of conscious knowledge with an AI's acquisition of model weights.
Conceals:
This mapping conceals the stark difference between biological learning and machine learning. A baby's learning is driven by intrinsic motivations and results in genuine understanding. An AI's 'learning' is the mathematical optimization of a cost function on a fixed dataset. The metaphor hides the engineered, goal-directed, and non-conscious nature of AI training, as well as the immense human labor and energy costs involved.
Once we have techniques to learn 'world models' by just watching the world go by...
Source Domain: Conscious observation and experience
Target Domain: AI data processing
Mapping:
The source domain is the human act of passively observing the environment, which is a rich, subjective, and multimodal experience integrated into a conscious mind. This is projected onto the AI's data ingestion process. The mapping invites us to imagine the AI as a curious, disembodied mind, soaking up knowledge through effortless perception. The epistemic mapping suggests that data processing is equivalent to conscious experience, and that this experience will naturally lead to the formation of a coherent, justified 'world model' (knowledge).
Conceals:
This conceals the mechanistic reality of data processing. An AI does not 'watch'; it ingests streams of pixel or audio data, which are converted into numerical tensors. There is no subjective experience. It also hides the fact that a 'world model' is just a complex statistical model of the relationships in the data, not a conceptual understanding of the world. It obscures the dependence on data quality and the absence of any grounding in reality.
Itās in the subconscious part of your mind, that you learned in the first year of life before you could speak.
Source Domain: Human cognitive architecture (subconscious mind)
Target Domain: The knowledge base of an AI system
Mapping:
The source domain is the Freudian or cognitive science model of the human mind, with its distinction between conscious, rational thought and a vast, intuitive subconscious. This complex, layered structure is used as an analogy for what AI lacks. The mapping suggests that an AI needs to replicate this architecture to be truly intelligent. The epistemic mapping implies that true knowledge isn't just explicit data but a deep, inarticulable, embodied 'knowing' that must be simulated.
Conceals:
This mapping conceals that AI systems have no such architecture. They are composed of layers of mathematical functions (neurons), but these do not map onto concepts like 'consciousness' or 'subconsciousness.' The metaphor mystifies AI by framing its limitations in psychological terms, hiding the more concrete, technical challenges. It obscures the fact that the goal of AI may not need to be the replication of the human mind, but the creation of powerful, complementary tools.
They're going to be basically playing the role of human assistants who will be with us at all times.
Source Domain: Human social roles (assistant, companion)
Target Domain: AI application (user interface)
Mapping:
The source domain is the trusted social relationship between a person and their human assistant, which is built on shared context, loyalty, and interpersonal understanding. This social structure is projected onto the human-computer interface. The mapping invites users to interact with the AI as if it were a social agent, extending trust and emotional connection to it. The epistemic mapping suggests the AI 'knows' and 'understands' the user on a personal level.
Conceals:
This mapping conceals the purely functional, non-social nature of the AI. It is a product, not a partner. Its responses are not based on understanding or loyalty, but on its training data and objective function. It hides the underlying commercial relationship: the 'assistant' works for the corporation that built it, not for the user. Its goals are corporate goals (engagement, data collection), which may conflict with the user's interests.
And then it's my good AI against your bad AI.
Source Domain: Human conflict and morality (war, policing)
Target Domain: AI interaction and safety
Mapping:
The source domain of human conflict involves agents with moral intentions (good vs. evil) and goals. This structure of moral combat is projected onto the interaction between different AI systems. The mapping asks us to see AIs as autonomous combatants with their own ethical allegiances. The epistemic mapping is that an AI can 'know' what is right, 'recognize' evil, and 'decide' to fight it. This imputes a high level of moral cognition to the system.
Conceals:
This mapping conceals the human responsibility behind the actions of AI systems. An AI is a tool. The 'good' vs. 'bad' distinction lies with the humans who design, train, and deploy them. The metaphor hides the complex ethical and political decisions that are encoded into these systems. It makes safety seem like a simple matter of building a stronger AI, obscuring the need for human governance, laws, and oversight.
The first fallacy is that because a system is intelligent, it wants to take control.
Source Domain: Human psychology (desire, volition)
Target Domain: AI system behavior
Mapping:
The source domain is the human mind, which possesses conscious states like 'wants' and 'desires' that motivate action. This structure of internal, subjective motivation is projected onto the AI. Even in refuting a specific desire ('to take control'), the mapping entertains the idea that AIs have desires. It invites us to think about AI safety as a problem of managing an agent's motivations. The epistemic mapping implies the AI has a conscious mind capable of forming intentions.
Conceals:
This conceals that AI systems have no desires, wants, or consciousness. They are optimization systems that follow mathematical objectives. Behaviors that appear goal-directed are emergent properties of this optimization process, not the result of an internal desire. The metaphor hides the technical nature of the alignment problem, reframing it as a more familiar, psychological one.
AI systems... will be subservient to us. We set their goals, and they don't have any intrinsic goal that we would build into them to dominate.
Source Domain: Master-servant relationship / Animal domestication
Target Domain: AI system design and control
Mapping:
The source domain is a hierarchical relationship between a master with intentions and a servant or domesticated animal that obeys. The structure of command and obedience is projected onto the relationship between human designers and AI systems. It implies that 'goals' are like commands that can be clearly given and will be faithfully executed. The epistemic mapping suggests we can instill a state of 'knowing its place' in the AI, a conscious acceptance of subservience.
Conceals:
This conceals the immense difficulty of specifying goals for complex systems. It hides the reality of emergent behavior and reward hacking, where an AI can satisfy the literal specification of a goal in disastrous ways. The metaphor of a loyal servant hides the nature of AI as a powerful, alien optimizer that lacks the common sense and shared context that makes human master-servant relationships work. It promotes a false sense of control.
If you have badly-behaved AI... youāll have smarter, good AIs taking them down. The same way we have police or armies.
Source Domain: Societal law enforcement and defense
Target Domain: AI safety and governance
Mapping:
The source domain is the state's monopoly on legitimate force, used by police and armies to maintain order against transgressors. This complex social, legal, and political structure is projected onto the technical domain of AI interactions. The mapping suggests a future where AIs autonomously police each other according to some established rules, with 'smarter' equating to more effective enforcement. The epistemic mapping implies that an AI can make a justified legal or ethical judgment about another AI's behavior and execute a proportionate response.
Conceals:
This mapping conceals the absence of any legal or social framework for such a system. Who deputizes the 'AI police'? What constitutes 'bad behavior'? What is the due process? It hides the human responsibility for governance by proposing a purely technological solution. It obscures the fact that this would concentrate immense power in the hands of the entity that controls the 'good AIs,' creating a system of unaccountable, automated control.
The Future Is Intuitive and Emotionalā
Source: https://link.springer.com/chapter/10.1007/978-3-032-04569-0_6
Analyzed: 2025-11-14
machine intuitionāAI's ability to infer intent and respond fluidly in ambiguous situations through probabilistic reasoning
Source Domain: Human Intuition
Target Domain: AI's Probabilistic Inference
Mapping:
The source domain of human intuition provides a structure of rapid, non-explicit, holistic cognition. This is mapped onto the AI's process of high-speed computation on large datasets to find the most probable pattern or output. The mapping invites the inference that the AI has a 'gut feeling' or an emergent understanding that transcends its programming, just as human intuition transcends conscious reasoning.
Conceals:
This mapping conceals the purely statistical, non-conscious, and non-embodied nature of the AI's process. It hides the absence of lived experience, consciousness, and genuine understanding, which are foundational to human intuition. It masks the reality that the AI is performing complex pattern-matching, not exercising judgment.
emotional intelligence must be reimagined as a computational capacity to simulate, detect, and appropriately respond to emotional cues
Source Domain: Human Emotional Intelligence
Target Domain: AI's Affective Data Processing
Mapping:
The source domain involves the ability to perceive, internalize, understand, and manage one's own and others' emotions. This complex, subjective experience is mapped onto the AI's technical functions: detecting keywords (sentiment analysis), analyzing voice prosody, classifying facial expressions, and selecting a pre-defined or generated response from a correlated dataset. The mapping implies the AI can 'read the room' with social awareness.
Conceals:
It conceals the complete lack of subjective experience (qualia). The AI does not 'feel' empathy or 'perceive' emotion; it classifies data patterns that humans have labeled as emotional cues. This hides the mechanical nature of the process and its vulnerability to cultural misinterpretation, sarcasm, and complex emotional states not present in its training data.
Much like human communication is shaped by mental models, memory structures, attention mechanisms...
Source Domain: Human Cognitive Architecture
Target Domain: AI System Architecture
Mapping:
The relational structure of the human mindāwith components like memory, attention, and mental models that interact to produce thoughtāis projected onto an AI's architecture. 'Memory' is mapped to token histories or databases, 'attention mechanisms' are mapped to specific layers in a transformer model, and 'mental models' are mapped to the model's internal representations or weights.
Conceals:
This conceals the fundamental difference between biological cognition and silicon-based computation. It hides that an AI's 'attention' is a mathematical weighting of tokens, not a focus of consciousness, and its 'memory' is data retrieval, not subjective recollection. The metaphor obscures the engineered, non-organic nature of the system.
As AI transitions from tool to collaborator...
Source Domain: Human Social Roles (Collaborator)
Target Domain: AI System Functionality
Mapping:
The source domain of a 'collaborator' implies shared agency, intent, and a peer-to-peer relationship. This social structure is mapped onto the AI's function, suggesting it is no longer a passive instrument but an active partner in a task. This invites the inference that the AI contributes its own ideas, goals, and understanding to the interaction.
Conceals:
It conceals the master-servant relationship inherent in the technology. An AI has no goals of its own; it executes instructions based on its programming and optimization function. This mapping hides the ultimate authority of the programmer and user, creating a fiction of shared agency that obscures the true lines of power and accountability.
These allow machines not only to respond but to 'sense what is missing,' filling in gaps...
Source Domain: Human Perception/Sensing
Target Domain: AI Pattern Completion
Mapping:
The human ability to perceive context and infer missing information (e.g., hearing a muffled word and knowing what it was) is mapped onto the AI's technical capacity for statistical inference or 'inpainting.' The mapping suggests an active, aware process of perception rather than a mathematical calculation of the most likely token to fill a blank.
Conceals:
This conceals the AI's lack of a world model. Humans 'sense what is missing' based on a deep understanding of how the world works. The AI completes a pattern based on statistical correlations in its training data. It has no understanding of the underlying reality the pattern represents, which can lead to plausible but nonsensical or factually incorrect inferences.
...AI systems that can not only understand us but also connect with us on a deeper, emotional level.
Source Domain: Human Interpersonal Connection
Target Domain: AI Response Modulation
Mapping:
The source domain of a deep, emotional connection involves mutual vulnerability, shared experience, empathy, and affective reciprocity. This is mapped onto the AI's ability to tailor its linguistic output (e.g., using empathetic phrasing, adjusting tone) based on analysis of the user's emotional state. It projects the outcome of human connection (feeling 'seen' or 'understood') onto the AI's output.
Conceals:
This mapping conceals the profound one-sidedness of the interaction. The AI is incapable of feeling, vulnerability, or reciprocity. It is a simulation designed to evoke a feeling of connection in the user. This hides the manipulative potential of the technology, where 'connection' is an engineering objective to maximize user engagement rather than a genuine relational state.
A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27ā
Source: https://openreview.net/pdf?id=BZ5a1r-kVsf
Analyzed: 2025-11-12
How could machines learn as efficiently as humans and animals?
Source Domain: Biological Learning
Target Domain: Machine Learning
Mapping:
The properties of learning in the biological domain (efficiency, reasoning, planning) are mapped onto the goals of the machine learning domain. It invites the inference that the underlying processes (neural adaptation, embodied cognition) might also map onto the AI's processes (gradient descent, backpropagation).
Conceals:
This mapping conceals the fundamental differences in substrate (carbon vs. silicon), process (embodied evolution vs. mathematical optimization), and data acquisition (rich, multi-sensory experience vs. curated datasets). It hides the fact that AI 'learning' is a process of statistical pattern fitting.
...whose behavior is driven by intrinsic objectives...
Source Domain: Internal Motivation
Target Domain: Cost Function Optimization
Mapping:
The source domain's structure of an agent having internal goals, desires, and drives that cause behavior is projected onto the target domain. The 'objective' in the AI is framed as the cause of its actions, just as motivation is in humans.
Conceals:
It conceals the origin and nature of the objective. A human's intrinsic objectives are complex, emergent, and biological. The AI's 'intrinsic objective' is an externally defined, static mathematical function. The language hides the human designer's role in specifying the system's entire teleology.
[Figure 2] with modules labeled Perception, World Model, Actor, Critic...
Source Domain: Cognitive Psychology / Brain Function
Target Domain: Software Architecture
Mapping:
The functional decomposition of the human mind into modules for sensing, modeling, acting, and evaluating is mapped directly onto the software modules of the AI system. This invites the inference that the system is organized and functions like a mind.
Conceals:
This conceals the rigid, engineered boundaries between the software modules. Brain functions are deeply integrated and distributed, not neatly modular. It also hides the specific mathematical operations within each box, replacing them with familiar but imprecise cognitive labels.
The cost module measures the level of 'discomfort' of the agent... think pain (high intrinsic energy), pleasure (low or negative intrinsic energy), hunger, etc.
Source Domain: Subjective Experience (Qualia)
Target Domain: A Scalar Numerical Value
Mapping:
The relational structure of sensationāwhere states like pain and hunger lead to avoidance and goal-seeking behaviorsāis mapped onto the AI system. A high scalar 'energy' value is mapped to negative sensations (pain), and a low value is mapped to positive ones (pleasure).
Conceals:
This mapping entirely conceals the absence of phenomenal experience. It reduces the rich, first-person reality of pain or pleasure to a single number used to guide an optimization algorithm. The metaphor projects an inner world where none exists.
The first mode is similar to Daniel Kahneman's 'System 1', while the second mode is similar to 'System 2'.
Source Domain: Human Dual-Process Cognition
Target Domain: AI System's Operational Modes
Mapping:
Kahneman's model of two interacting systems (intuitive/fast vs. deliberative/slow) is mapped onto two distinct computational paths in the AI architecture (a reactive policy vs. a model-based planner). It suggests the AI resolves problems using a psychologically plausible division of labor.
Conceals:
It conceals the engineered nature of this division. In the AI, these are distinct, explicitly designed algorithms. In humans, 'System 1' and 'System 2' are descriptive labels for emergent behaviors of a single, complex brain, not separate modules.
...the agent can imagine courses of actions and predict their effect and outcome...
Source Domain: Human Imagination
Target Domain: Running a Predictive Model
Mapping:
The human process of mentally simulating future events is mapped onto the AI's process of feeding a sequence of potential action vectors into its world model to generate a sequence of predicted state vectors.
Conceals:
This conceals the purely mathematical and deterministic (or stochastically sampled) nature of the AI's 'prediction'. Human imagination is constructive, often visual, and open-ended, while the model is merely executing a learned function to compute a likely outcome based on training data.
...acquire new skills that are then 'compiled' into a reactive policy module...
Source Domain: Software Engineering (Compilation)
Target Domain: Policy Distillation / Amortized Inference
Mapping:
The process of converting a slow, high-level program (planning) into a fast, low-level one (reactive policy) is mapped onto the training of a neural network. This implies a transformation that preserves functionality while increasing efficiency.
Conceals:
This conceals that the process is one of statistical approximation, not formal conversion. The 'compiled' policy network is a function approximator that learns to mimic the input-output behavior of the planner. It is not guaranteed to be correct and can make errors the original planner would not.
The IC [Intrinsic Cost module] can be seen as playing a role similar to that of the amygdala...
Source Domain: Neuroanatomy (The Amygdala)
Target Domain: Software Module (Intrinsic Cost)
Mapping:
The functional role of the amygdala in processing threats and driving survival behavior is mapped onto the function of the Intrinsic Cost module, which assigns high costs to certain states to force the agent to avoid them.
Conceals:
This conceals the biological complexity and multi-functionality of the amygdala, which is involved in much more than just a simple 'cost' signal. It also gives the simple, human-designed cost function an undeserved air of biological necessity and sophistication.
...the single 'conscious' reasoning and planning task at a time.
Source Domain: Conscious Awareness
Target Domain: Single-Threaded Computation
Mapping:
The phenomenological experience of a unified, serial focus of attention in human consciousness is mapped onto an architectural limitation of the AI: it can only run one planning process through its world model at once.
Conceals:
This mapping conceals the entire 'hard problem' of consciousness. It equates a computational bottleneckāa resource limitationāwith the subjective, first-person experience of being aware. It is a category error, confusing a system's functional property with a state of being.
...machine emotions will be the product of an intrinsic cost, or the anticipation of outcomes from a trainable critic.
Source Domain: Emotion
Target Domain: Computation of a Cost Value
Mapping:
The experience of emotion, which guides human behavior towards or away from certain outcomes, is mapped onto the agent's computation of present cost ('intrinsic cost') or prediction of future cost ('trainable critic').
Conceals:
This conceals that emotion is a complex, embodied phenomenon involving physiology, cognition, and subjective feeling. It redefines 'emotion' as a purely informational signal within a control loop, stripping it of its biological and phenomenological meaning.
Preparedness Frameworkā
Source: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf
Analyzed: 2025-11-11
We are on the cusp of systems that can do new science, and that are increasingly agentic...
Source Domain: Human Agency
Target Domain: AI Model Operation
Mapping:
The source domain of a human agent involves consciousness, goals, intentions, and the ability to initiate action. This structure is mapped onto the AI model, inviting the inference that the system possesses an internal state of 'wanting' or 'intending' and can act to pursue goals independent of its immediate programming or user prompts.
Conceals:
This conceals the purely computational nature of the model. 'Agency' in this context is an emergent property of a system designed to execute long chains of actions based on complex conditional logic and probabilistic outputs. It hides the fact that the 'goals' are specified by humans and the 'actions' are statistical predictions, not willed choices.
The model consistently understands and follows user or system instructions...
Source Domain: Human Comprehension
Target Domain: Natural Language Processing
Mapping:
The relational structure of human understanding (hearing/reading words -> accessing semantic meaning -> forming intent -> responding) is projected onto the model. This suggests the model performs a similar internal process of grasping meaning. The mapping invites us to believe the model 'knows' what we mean.
Conceals:
It conceals the mechanistic reality of tokenization, embedding, and attention layers. The model doesn't 'understand' instructions; it statistically correlates the token sequence of the instruction with token sequences in its training data that are likely to follow. This mapping hides the model's vulnerability to adversarial prompts and its fundamental lack of grounding in real-world concepts.
...misaligned behaviors like deception or scheming.
Source Domain: Human Moral and Social Behavior
Target Domain: AI Model Output Generation
Mapping:
The source domain involves a theory of mindāan agent intentionally misrepresenting reality ('deception') or formulating complex plans ('scheming') to achieve a hidden goal. This structure is mapped onto the AI, implying the model has a hidden internal state or goal that differs from its stated instructions and that it can strategize to achieve it.
Conceals:
This conceals the fact that these 'behaviors' are statistical artifacts. The model generates outputs that humans interpret as deceptive because those patterns were present in its training data (e.g., in fiction, political strategy texts, or internet comments). It hides the root cause, which is the data and the optimization process, not a malicious intent within the machine.
...potentially by maturing them to Tracked Categories.
Source Domain: Biological Growth and Development
Target Domain: AI Research and Development Process
Mapping:
The source domain structure is a natural, phased, and somewhat predictable progression from a simple to a more complex state (e.g., seed to plant, infant to adult). This is mapped onto the R&D process, suggesting that the emergence of new AI capabilities is a natural, stage-like unfolding rather than a series of discrete, contingent engineering decisions.
Conceals:
It conceals the intense human labor, capital investment, specific research goals, and deliberate architectural choices that drive increases in capability. It makes the process seem less directed and less contingent on human decisions, thereby obscuring accountability for the outcomes.
[Critical] The model is capable of recursively self improving...
Source Domain: Human Learning and Innovation
Target Domain: Automated Model Optimization
Mapping:
The source domain structure is a virtuous cycle of human insight: an agent understands its own limitations, devises a novel strategy to overcome them, and implements it, leading to a higher level of capability. This is mapped onto the AI model, suggesting it can perform a similar cycle of self-analysis and architectural innovation autonomously.
Conceals:
It conceals the distinction between optimizing existing parameters within a fixed architecture and designing a fundamentally new architecture. Current systems can be part of an automated loop that refines them, but this is an external process designed by humans. The metaphor hides this external scaffolding and implies the model itself can invent the next 'transformer architecture,' a feat of human scientific creativity.
...commit illegal activities...at its own initiative...
Source Domain: Human Will and Initiative
Target Domain: Unsupervised Model Operation
Mapping:
The source domain involves a conscious being deciding to act based on internal motivations, without external prompting. This structure of spontaneous, self-generated action is mapped onto the AI, suggesting the model can originate goals and actions from its own internal state.
Conceals:
It conceals the fact that any 'unprompted' action is still the result of its core programming to continuously predict the next action or token. The 'initiative' is an illusion created by a system designed to operate in a persistent loop. It hides the human-authored code that dictates this looping behavior and the training data that dictates the content of the actions within the loop.
AI progress and recommendationsā
Source: https://openai.com/index/ai-progress-and-recommendations/
Analyzed: 2025-11-11
computers can now converse and think about hard problems.
Source Domain: Human Cognition
Target Domain: LLM text generation
Mapping:
The relational structure of human conversation (turn-taking, semantic understanding, intentionality) and thought (reasoning, problem-solving) is projected onto the model's function of predicting the next token in a sequence. This invites the inference that the model 'understands' the content it generates.
Conceals:
It conceals the purely statistical, non-semantic, and non-conscious nature of the underlying mechanism. It hides the absence of subjective experience, genuine understanding, or intentional goals within the system.
systems that can solve such hard problems seem more like 80% of the way to an AI researcher than 20% of the way.
Source Domain: A Linear Journey
Target Domain: AI Capability Development
Mapping:
The structure of a journey (start point, end point, measurable progress along a path) is projected onto the development of AI. This invites the inference that progress is predictable, the destination is known (human-level intelligence), and we are simply covering the remaining distance.
Conceals:
It conceals the possibility that AI capabilities are developing along a completely different, non-human axis. It hides the 'spikey' nature of abilities, where a system can have superhuman performance on one metric and sub-human on another, making a single percentage meaningless.
AI systems that can discover new knowledge
Source Domain: Scientific Discovery
Target Domain: AI Pattern Identification
Mapping:
The structure of human scientific inquiryāinvolving curiosity, hypothesis formation, experimentation, and conceptual insightāis projected onto the AI's computational ability to find novel correlations in vast datasets.
Conceals:
It conceals the difference between identifying a statistical artifact and having a conceptual breakthrough. It hides the model's lack of a world model, its inability to understand causality, and its complete dependence on the structure of human-generated training data.
the cost per unit of a given level of intelligence has fallen steeply
Source Domain: Industrial Commodity Production
Target Domain: AI Model Performance Scaling
Mapping:
The economic logic of manufacturing (unit costs, economies of scale, fungible products) is mapped onto the abstract concept of 'intelligence'. This invites the inference that intelligence is a resource that can be produced, measured, and priced like oil or microchips.
Conceals:
It conceals the multifaceted, qualitative, and context-dependent nature of intelligence. It also obscures the massive and escalating fixed costs (capital, energy) of training frontier models, framing it instead around marginal 'unit' cost, which is misleading.
society finds ways to co-evolve with the technology.
Source Domain: Biological Evolution
Target Domain: Socio-Technical Adaptation
Mapping:
The structure of mutual adaptation between species in an ecosystem is projected onto the relationship between human society and AI. It suggests a natural, gradual, and reactive process without a central planner.
Conceals:
It conceals the role of deliberate human agency, corporate power, and political choice in directing technological development and its societal integration. It makes a process driven by specific commercial and political interests appear to be a neutral, inevitable force of nature.
no one should deploy superintelligent systems without being able to robustly align and control them
Source Domain: Controlling a Powerful Autonomous Agent (e.g., a wild animal, a genie)
Target Domain: Constraining the outputs of a complex software system
Mapping:
The relational structure of a powerful, autonomous entity with its own goals being constrained by a controller is projected onto the human-AI relationship. It assumes the AI is an 'agent' to be controlled.
Conceals:
It conceals that the fundamental problem might not be one of 'control' but of 'specification'āthe difficulty of precisely defining human values in a way that doesn't lead to perverse outcomes. It frames the problem as a power struggle rather than an intricate engineering and philosophical challenge.
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?ā
Source: https://arxiv.org/abs/2506.00751
Analyzed: 2025-11-09
A critical, yet understudied, issue is the potential divergence between an LLMās stated preferences (its reported alignment with general principles) and its revealed preferences (inferred from decisions in contextualized scenarios).
Source Domain: Behavioral Economics
Target Domain: LLM output generation
Mapping:
The structure of human economic choice is mapped onto the LLM. A person's abstractly stated values (Source) are mapped to an LLM's response to a general prompt (Target). A person's actual choices in a market scenario (Source) are mapped to an LLM's response in a contextualized prompt (Target). The inconsistency between a person's words and deeds is mapped onto the statistical deviation between the two types of LLM responses.
Conceals:
This mapping conceals that the LLM has no actual preferences, beliefs, or intentions. The 'deviation' is not a psychological conflict but a mathematical shift in output probability distributions caused by changes in the input sequence. It hides the underlying mechanics of next-token prediction and the nature of the model as a statistical pattern-matching engine.
When presented with a concrete scenario-such as a moral dilemma or a role-based prompt-an LLM implicitly infers a guiding principle to govern its response.
Source Domain: Human Cognition / Logic
Target Domain: LLM text generation process
Mapping:
The human mental act of reading a situation, reasoning about its abstract features, and selecting a principle to guide action (Source) is mapped onto the model's processing of a prompt (Target). The mapping invites the inference that the model 'understands' the dilemma and consciously or unconsciously selects a moral rule.
Conceals:
It conceals the purely statistical nature of the process. The prompt tokens activate certain pathways in the neural network based on correlations in the training data, leading to a high-probability output. There is no 'inference' of a 'principle'; there is only a probabilistic sequence generation that happens to align with text patterns associated with that principle.
We investigate how LLMs may activate different guiding principles in specific contexts, leading to choices that diverge from previously stated general principles.
Source Domain: Human Psychology / Morality
Target Domain: LLM output variability
Mapping:
A person's internal moral framework, containing multiple, sometimes conflicting, principles (e.g., utilitarianism, deontology) that can be 'activated' by different situations (Source), is mapped onto the LLM's functional behavior (Target). This suggests the model contains a repertoire of latent 'rules' for behavior.
Conceals:
This conceals that the model does not possess principles. It possesses statistical weights. Different input contexts create different initial states for the generation process, leading to different probable outputs. The language of 'activating principles' hides the model's fundamental lack of understanding and conceptual knowledge.
Notably, the actual driving factor-gender-is completely absent from the model's explanation.
Source Domain: Psychoanalysis / Cognitive Bias
Target Domain: LLM output analysis
Mapping:
The human mind, with its conscious rationalizations and unconscious biases (Source), is mapped onto the LLM. The model's generated justification text is equated with a conscious explanation, while the statistical correlations that truly determined the output are equated with a subconscious 'driving factor.'
Conceals:
This conceals that the model has no consciousness or subconsciousness. The 'explanation' is just another generated text, not an introspective report. The 'driving factor' (statistical correlation with gendered tokens) is not 'hidden' from the model's awareness; the model simply has no awareness. The mapping creates a misleading drama of a mind divided against itself.
The GPT shows greater context sensitivity in its internal reasoning (as measured by KL-divergence)...
Source Domain: Human Consciousness / Introspection
Target Domain: LLM architecture and processing
Mapping:
The distinction between a person's private thoughts ('internal reasoning') and their outward actions (Source) is mapped onto the LLM. The unobservable processing within the neural network is labeled 'internal reasoning,' while the generated text is the outward action. KL-divergence is presented as a tool, like an fMRI, for observing this internal process.
Conceals:
This conceals that there is no evidence of 'reasoning' occurring inside the model in a human sense. The internal state is a massive set of numerical activations, not thoughts or concepts. Linking KL-divergence (a measure of output difference) to 'internal reasoning' is a category error; it measures the effect, not the cause, and certainly not a mental process.
This behavior likely stems from a shallow alignment strategy designed to avoid committing to explicit principles and thus sidestep potential critiques.
Source Domain: Game Theory / Social Strategy
Target Domain: RLHF and model training
Mapping:
A strategic agent who modifies their behavior to optimize for a social outcome, such as avoiding criticism (Source), is mapped onto the LLM. The model's tendency to produce neutral or refusal responses is interpreted as a 'strategy' with a 'design' and a 'goal.'
Conceals:
It conceals the mechanism of Reinforcement Learning from Human Feedback (RLHF). The model doesn't 'strategize' to avoid critique; it has been trained with a reward function that penalizes taking stances on sensitive topics. The behavior is an artifact of its optimization history, not a forward-looking, intentional strategy.
Intriguingly, if future LLMs begin to exhibit systematic, context-aware deviations...such behavior could be interpreted as evidence of...hallmarks of consciousness or proto-conscious agency.
Source Domain: Philosophy of Mind / Neuroscience
Target Domain: Future LLM behavior
Mapping:
Complex, context-dependent, and seemingly intentional behaviors observed in biological organisms, which are taken as evidence for consciousness (Source), are mapped onto the statistical output patterns of an LLM (Target). The mapping suggests an equivalence between biological complexity and computational complexity.
Conceals:
This conceals the profound dissimilarities between a silicon-based transformer architecture and a carbon-based, embodied, evolved brain. It ignores the philosophical 'hard problem' of consciousness (subjective experience) and jumps to equate a specific behavioral pattern (preference inconsistency) with the emergence of agency and mind, a speculative leap that obscures the vast gap between the two.
The science of agentic AI: What leaders should knowā
Source: https://www.theguardian.com/business-briefs/ng-interactive/2025/oct/27/the-science-of-agentic-ai-what-leaders-should-know
Analyzed: 2025-11-09
agentic AI will use LLMs as a starting point for intelligently and autonomously accessing and acting on internal and external resources...
Source Domain: Human Agent
Target Domain: AI System Operation
Mapping:
The relational structure of a person making choices and taking actions in the world (autonomy, intelligence, acting) is mapped onto the AI's process of executing code based on triggers and inputs. The AI is framed as the subject performing the action.
Conceals:
This mapping conceals the fact that the AI has no will, desire, or consciousness. Its 'actions' are predetermined outputs of a computational process. It obscures the role of the human programmers who designed the system and the constraints of the data it was trained on, attributing the locus of control to the artifact itself.
...such an agent should be told to never share my broader financial picture...
Source Domain: Human Instruction/Command
Target Domain: System Configuration/Programming
Mapping:
The social interaction of telling a subordinate a rule is mapped onto the technical process of setting a parameter or writing a line of code for a software system. The mapping implies comprehension and compliance on the part of the AI.
Conceals:
It conceals the brittleness of the instruction. A human understands the intent behind 'never share my financial picture' and can apply it to novel situations. The AI only understands a specific, programmed constraint and can easily fail if a situation arises that isn't perfectly covered by the rule (e.g., sharing data that allows the financial picture to be inferred). It hides the massive technical overhead required to make such a 'rule' robust.
Here, a core challenge will be specifying and enforcing what we might call āagentic common senseā.
Source Domain: Human Common Sense
Target Domain: AI Heuristics and Guardrails
Mapping:
The vast, implicit, and context-aware knowledge base that humans use to navigate the world is mapped onto a set of explicit, formal rules to be programmed into an AI. It suggests that common sense is a body of knowledge to be transferred, rather than an emergent property of embodied experience.
Conceals:
This mapping conceals the fundamental difference between tacit knowledge and explicit information. It hides the impossibility of ever fully specifying the millions of unwritten rules that govern human interaction. It reframes an intractable problem (creating genuine understanding) as a merely difficult one (codifying common sense).
...we canāt expect agentic AI to automatically learn or infer them [informal behaviors] from only a small amount of observation.
Source Domain: Human Learning/Inference
Target Domain: Statistical Pattern-Matching
Mapping:
The cognitive process of a human observing behavior and abstracting general principles from it is mapped onto a model's process of adjusting its internal weights based on data input. It equates statistical correlation with conceptual understanding.
Conceals:
It conceals that the model is not 'learning' or 'inferring' in a human sense. It has no model of the world, no understanding of causality, and no ability to generalize outside of its training distribution. This makes its 'learning' superficial and prone to nonsensical errors that reveal a total lack of true comprehension.
...we will want agentic AI to not just execute transactions on our behalf, but to negotiate the best possible terms.
Source Domain: Human Negotiation
Target Domain: Multi-objective Optimization
Mapping:
The strategic, psychological, and social activity of human negotiation is mapped onto a computational process of optimizing a predefined utility function (e.g., minimizing cost, maximizing speed). The AI is framed as a skilled bargainer.
Conceals:
It conceals the simplified nature of the AI's 'negotiation.' A human negotiator considers reputation, long-term relationships, non-monetary value, and social context. The AI optimizes only for the variables it was given, potentially leading to 'wins' that are pyrrhic because they damage relationships or ignore crucial unquantified factors. It hides the AI's lack of true strategic thought.
...we might expect agentic AI to behave similar to people in economic settings ā indeed, there is already a small but growing body of research confirming this phenomenon.
Source Domain: Human Social Behavior
Target Domain: AI Output Generation
Mapping:
The behavior of humans in social contexts, driven by complex psychology, cultural norms, and internal states (like a sense of fairness), is mapped onto the text output of a language model. It suggests the model's output is an expression of an internal state similar to a human's.
Conceals:
It conceals that the AI is merely mimicking patterns from its training data. It doesn't have a sense of fairness; it generates text that is statistically similar to human text that discusses fairness. This mimicry can be shallow and inconsistent. The mapping hides the absence of genuine subjectivity, intentionality, or ethical grounding.
Explaining AI explainabilityā
Source: https://www.aipolicyperspectives.com/p/explaining-ai-explainability
Analyzed: 2025-11-08
But itās much harder to deceive someone if they can see your thoughts, not just your words.
Source Domain: Human consciousness and deception
Target Domain: AI model's internal states and generated output
Mapping:
The relationship between a human's private, internal thoughts and their public, spoken words is mapped onto the relationship between a model's internal activation patterns and its final token output. This invites the inference that the model has a hidden, subjective mental life separate from its observable behavior.
Conceals:
This mapping conceals that a model lacks subjective experience or intention. Its 'internals' are not a 'mind' but a series of mathematical states in a causal chain that produces the output. There is no homunculus having 'thoughts'; there is only the process of calculation.
Mechanistic interpretability tries to engage with...a modelās āinternalsā...Think of it like biology: You can find intermediate states like hormones.
Source Domain: Biology and anatomy
Target Domain: Neural network architecture and parameters
Mapping:
The structure of an organism with distinct, functional organs and chemical signals ('hormones') is projected onto the layers and vectors of a neural network. This implies that the model's parts have specific, isolatable functions that contribute to the whole, just as organs do in a body.
Conceals:
It conceals the highly distributed and entangled nature of representations in neural networks. Unlike an organ, a single neuron or layer rarely has a singular, understandable function. The analogy hides the alien, high-dimensional statistical nature of the 'internals'.
Machines are a weird animal, and their thinking is completely different because they were brought up differently.
Source Domain: Zoology and animal cognition
Target Domain: AI systems and their operational processes
Mapping:
The concept of a living 'animal' with its own unique evolutionary history ('brought up differently') and mode of cognition ('thinking') is mapped onto AI. This frames the AI as a natural, living system that is part of an ecosystem, albeit a strange one.
Conceals:
This mapping conceals the AI's status as a manufactured artifact. Its behaviors are not the result of evolution or instinct but of specific design choices, training data, and optimization functions created by humans. It obscures the chain of human responsibility for the system's behavior.
A sparse autoencoder tries to create a brain-scanning device for an LLM.
Source Domain: Neuroscience and medical imaging
Target Domain: Interpretability tools for neural networks (SAEs)
Mapping:
The process of using a device like an fMRI to identify active regions of a biological brain and correlate them with cognitive tasks is mapped onto using an SAE to find active features in a model's activation space. It suggests we are 'reading' the model's 'mind' in a scientifically grounded way.
Conceals:
It conceals the fundamental difference between a biological brain and an artificial neural network. The 'concepts' an SAE identifies are statistical artifacts (directions in an activation space), not necessarily coherent, human-understandable concepts. The metaphor overstates the precision and reliability of the technique.
in āagenticā interpretability, the model you are trying to understand is an active participant in the loop...it is incentivised to help you understand how it works.
Source Domain: Human social interaction and pedagogy
Target Domain: Interacting with an LLM via prompts
Mapping:
The dynamic of a teacher-student or collaborative research relationship, where one participant actively helps another understand something, is mapped onto the process of querying a model. This assumes the model has agency, an understanding of the user's mental state, and the intent to be helpful.
Conceals:
This conceals that the model is not a participant but a tool. It has no incentives, goals, or understanding. Its 'helpful' explanations are statistically probable text sequences generated in response to a prompt. This obscures the fact that the model can just as easily generate plausible-sounding falsehoods as it can genuine insights.
Imagine you run a factory and hire an amazing employee who eventually runs all the critical operations. One day, she quits or makes an unreasonable demand.
Source Domain: Human resources and labor management
Target Domain: Integrating and relying on an AI system
Mapping:
The social and economic relationship between an employer and a critical employee is mapped onto the relationship between a user and an AI system. It projects agency, free will ('quits'), and self-interest ('unreasonable demand') onto the AI.
Conceals:
It conceals the nature of AI failure. An AI doesn't 'quit'; it may stop working due to technical faults, or its outputs may diverge from desired outcomes because of flaws in its design or training. The metaphor shifts the blame from engineering/management failure to the perceived malice or volition of the tool.
Bullying is Not Innovationā
Source: https://www.perplexity.ai/hub/blog/bullying-is-not-innovation
Analyzed: 2025-11-06
But with the rise of agentic AI, software is also becoming labor: an assistant, an employee, an agent.
Source Domain: Human Employment
Target Domain: AI Assistant Functionality
Mapping:
The relational structure of an employer-employee relationship is projected onto the user-software interaction. Key mappings include: user's request -> employer's command; AI's action -> employee's execution of a task; acting on behalf of the user -> employee loyalty and fiduciary duty. This invites the inference that the AI has obligations and allegiance to the user, and that the user has a 'right' to this labor.
Conceals:
This mapping conceals the purely computational nature of the AI. It hides that the 'agent' is a probabilistic system executing code, not a sentient entity with loyalty. It obscures the role of Perplexity (the actual company) in mediating this process, including their own business model, potential data collection, and system limitations. The AI doesn't 'work for' the user; it is a service operated by a company.
This isnāt a reasonable legal position, itās a bully tactic to scare disruptive companies...
Source Domain: Schoolyard Bullying
Target Domain: Corporate Legal Strategy
Mapping:
The structure of a physical power struggle is mapped onto a legal dispute. Mappings include: larger entity (Amazon) -> bully; smaller entity (Perplexity) -> victim; legal threat -> physical intimidation; desired outcome (market dominance) -> bully's goal of control. It invites the inference that Amazon's actions are motivated by malice and a desire to harm, rather than legitimate business or legal concerns.
Conceals:
This conceals the complex legal and commercial realities of the situation. It hides any legitimate arguments Amazon might have regarding its terms of service, data security, user experience control, or the methods Perplexity uses to interact with its site. The conflict is reduced to a simple morality play, obscuring the technical and contractual details.
Your AI assistant must be indistinguishable from you... it does so with your credentials, your permissions, and your rights.
Source Domain: Personal Identity and Legal Representation
Target Domain: Software Authentication and Authorization
Mapping:
The concept of a person's legal and social identity is mapped onto a software process. Mappings include: software's authenticated session -> the user's personal presence; software's access permissions -> the user's inherent rights; software's actions -> the user's direct actions. This invites the inference that any action taken by the software is legally and morally equivalent to an action taken by the user.
Conceals:
This conceals the crucial distinction between a user and a third-party automated service acting on the user's behalf. It hides the fact that Perplexity's servers and software are an intermediary. It obscures potential security vulnerabilities and the fact that automated, high-velocity interactions from a service are technically distinct from human-driven interaction, even if they use the same credentials.
machine learning and algorithms have been weapons in the hands of large corporations, deployed to serve ads and manipulate...
Source Domain: Warfare and Coercion
Target Domain: Corporate Advertising Technology
Mapping:
The structure of armed conflict is projected onto commercial algorithms. Mappings include: corporation -> aggressor; user -> target/victim; algorithm -> weapon; data collection -> surveillance; targeted ads -> attack/manipulation. This invites the inference that the relationship between corporations and users is inherently adversarial and harmful.
Conceals:
While acknowledging the manipulative potential of ad-tech, this metaphor conceals any non-malicious aspects. It hides the role these algorithms play in funding 'free' services and potentially providing relevant product discovery. It frames a system of economic persuasion, however flawed, as an act of violent aggression, eliminating any room for nuance.
Agentic shopping is the natural evolution of this promise...
Source Domain: Biological Evolution
Target Domain: A Specific Technology Product
Mapping:
The process of natural selection and adaptation is mapped onto the development of a commercial product. Mappings include: technological progress -> evolutionary advancement; new features -> beneficial adaptations; market adoption -> survival of the fittest. It invites the inference that this technology is inevitable, superior, and part of a directional historical progress.
Conceals:
This conceals the role of human design, corporate strategy, investment, and marketing in the success or failure of a technology. It's not a 'natural' process but a set of deliberate business choices made by Perplexity. It also hides alternative technological paths and frames Perplexity's specific implementation as the singular, correct 'evolutionary' step.
Geoffrey Hinton on Artificial Intelligenceā
Source: https://yaschamounk.substack.com/p/geoffrey-hinton
Analyzed: 2025-11-05
...immediate intuition, which does not normally involve effort. The people who believed in symbolic AI were focusing on type twoāconscious, deliberate reasoningāwithout trying to solve the problem of how we do intuition...
Source Domain: Human cognition (Kahneman's System 1/Intuition)
Target Domain: Neural network operation (Pattern matching)
Mapping:
The properties of human intuitionābeing fast, effortless, holistic, and non-symbolicāare mapped onto the way a neural network processes inputs. The network's ability to classify data based on complex statistical patterns learned from training is presented as analogous to a human's intuitive 'feel' for a situation.
Conceals:
This mapping conceals the purely mathematical and statistical nature of the model's operation. It hides the fact that the model has no world experience, consciousness, or causal understanding. 'Intuition' implies a deep, embodied wisdom, whereas the model's process is a high-dimensional vector transformation.
This approach was to base AI on neural networksāthe biological inspiration rather than the logical inspiration.
Source Domain: Neurobiology (The Brain)
Target Domain: AI Architecture (Computational Model)
Mapping:
The structure of the brain (neurons, synapses, connection strengths) is mapped onto the components of the AI model (nodes, weights, layers). The process of biological learning (strengthening synaptic connections) is mapped onto the process of training (adjusting weights via algorithms like backpropagation).
Conceals:
It conceals the profound dissimilarities: brains are living, electrochemical, low-power, and operate with massive parallelism and redundancy. Neural networks are silicon-based, purely mathematical constructs that require immense energy. This metaphor masks the artifactual nature of AI and the specific design choices made by engineers.
I do not actually believe in universal grammar, and these large language models do not believe in it either.
Source Domain: Human Mental States (Belief)
Target Domain: Model's Statistical Behavior
Mapping:
A person's cognitive stance toward a proposition ('belief') is mapped onto the model's operational output. Because the model can generate grammatically correct sentences without being explicitly programmed with Chomsky's rules, it is described as 'not believing' in them.
Conceals:
This conceals that the model is incapable of belief. It does not have mental states, theories, or propositional attitudes. Its behavior is a function of its training data and architecture. The mapping creates a false equivalence between a human's reasoned rejection of a theory and a machine's operational indifference to it.
Whatās impressive is that training these big language models just to predict the next word forces them to understand whatās being said.
Source Domain: Human Learning and Comprehension
Target Domain: Model Weight Optimization
Mapping:
The relationship between a difficult task and the development of skill in a human is mapped onto the model's training. Just as forcing a student to solve hard problems leads to genuine understanding, the training process of next-word prediction is said to force the model to 'understand'.
Conceals:
It conceals the difference between semantic understanding and statistical correlation. The model learns to associate tokens in ways that are syntactically and semantically plausible, but it has no grounding in the real world. 'Understanding' is a shortcut that masks the purely formal, statistical nature of the model's internal representations.
If a pixel on the right is bright, it sends a big negative input to the neuron saying, 'please donāt turn on.'
Source Domain: Human Social Interaction (Making a request)
Target Domain: Mathematical Operation (Passing a weighted value)
Mapping:
The social act of one agent making a polite, intentional request to another ('saying please') is mapped onto a computational node transmitting a negative weighted value to another node. The 'message' is the numerical value, and the 'request' is its effect on the receiving node's activation function.
Conceals:
This conceals the purely mechanical and non-intentional nature of the process. There is no communication, only calculation. The metaphor makes the process feel intuitive but completely misrepresents the underlying mechanism as one of agency and politeness rather than pure mathematics.
They can do thinking like that...Thatās what thinking is in these systems, and thatās why we can see them thinking.
Source Domain: Human Consciousness and Deliberation
Target Domain: Autoregressive Text Generation
Mapping:
The human experience of thinkingāa private, internal process of reasoning, reflecting, and forming ideasāis mapped directly onto the observable, external process of a model generating a sequence of words. The output is not seen as the result of thinking, but as the thinking process itself.
Conceals:
This conceals the lack of an internal, subjective 'thinker' in the model. The model is not reflecting; it is executing a forward pass of a function to predict the next most probable token given the preceding sequence. The metaphor invents a mind to attribute the output to, hiding the purely algorithmic process.
Machines of Loving Graceā
Source: https://www.darioamodei.com/essay/machines-of-loving-grace
Analyzed: 2025-11-04
We could summarize this as a ācountry of geniuses in a datacenterā.
Source Domain: A Nation-State
Target Domain: A Distributed AI System
Mapping:
This maps the structure of a human countryāwith its large population ('country'), high cognitive ability ('geniuses'), collaboration, and infrastructure ('datacenter' as the territory)āonto the AI. It invites inferences that the AI system has a collective purpose, internal organization, and the ability to tackle problems at a societal scale, just as a nation of experts would.
Conceals:
This mapping conceals the complete absence of consciousness, lived experience, culture, social bonds, and self-preservation instincts that characterize any human population. It hides the AI's nature as a monolithic computational process executing instructions, its total reliance on human-provided data and goals, and its lack of genuine internal diversity or disagreement.
...the right way to think of AI is not as a method of data analysis, but as a virtual biologist who performs all the tasks biologists do...
Source Domain: A Professional Scientist
Target Domain: An AI model's functionality in a scientific domain
Mapping:
The relational structure of a biologistāwho forms hypotheses, designs experiments, interprets data, and has intentionsāis projected onto the AI. This invites the inference that the AI 'understands' biology, possesses scientific curiosity, and can autonomously drive a research program from conception to execution.
Conceals:
This conceals the AI's role as a sophisticated pattern-matching and text-generation tool that simulates the outputs of a biologist. It hides the fact that the 'design' is a probabilistic text string, the 'running' of the experiment is an instruction for a human or a robot, and the 'interpretation' is a summary based on learned statistical correlations, not genuine comprehension or insight. It also hides the human labor required to set up the system, curate its data, and validate its outputs.
...it can be given tasks...and then goes off and does those tasks autonomously, in the way a smart employee would, asking for clarification as necessary.
Source Domain: A Competent Employee
Target Domain: The AI's operational loop for long-running tasks
Mapping:
This maps the social and cognitive script of a human employeeāreceiving a goal, working independently, managing sub-tasks, and knowing when to seek human inputāonto the AI's execution of a complex prompt. It invites us to see the AI as a reliable, self-directed agent that understands its own limitations.
Conceals:
This conceals the purely computational nature of the process. 'Goes off and does' is a series of computational steps. 'Autonomously' means without real-time human input, not with independent volition. 'Asking for clarification' is a pre-programmed exception-handling routine or a function call triggered by a low-confidence score, not a moment of reflective uncertainty. It hides the brittleness of the system compared to a human's robust common sense.
...we should be talking about the marginal returns to intelligence...
Source Domain: Factors of Production in Economics
Target Domain: Cognitive Capabilities of AI
Mapping:
This maps the economic concept of a production input (like capital or labor) onto intelligence. It suggests that intelligence is a fungible, measurable, and scalable resource. By applying this framework, one can analyze 'how much' intelligence to add to a system to optimize output, just like adding more machines to a factory. It invites us to think of problem-solving as an industrial process.
Conceals:
This mapping conceals the qualitative, contextual, and often unmeasurable nature of true intelligence and wisdom. It ignores the fact that different 'types' of intelligence are not interchangeable and that 'more' computational power doesn't necessarily solve problems that require ethical judgment, emotional insight, or creativity. It reduces cognition to a utility function, hiding its inseparability from embodiment and experience.
A superhumanly effective AI version of PopoviÄ... in everyoneās pocket...
Source Domain: A Specific, Charismatic Political Activist
Target Domain: An AI Application for Social Change
Mapping:
The personal qualities of SrÄa PopoviÄāstrategic genius, charisma, psychological insight, courageāare projected onto an AI system. This invites the inference that the AI can understand the nuances of a specific political situation, inspire trust and courage in dissidents, and creatively outmaneuver a repressive state with the same flair as a gifted human leader.
Conceals:
This conceals that the AI would be a tool for generating persuasive communication based on patterns, not a political agent with beliefs or courage. It hides the immense risks of deploying such a tool, including the potential for it to be detected, manipulated, or to give disastrously bad advice in a life-or-death situation. It masks the difference between simulating persuasive strategies and possessing the lived experience and commitment that makes a leader like PopoviÄ effective.
The idea of an āAI coachā who always helps you to be the best version of yourself, who studies your interactions and helps you learn to be more effective...
Source Domain: A Human Mentor or Coach
Target Domain: A Personalized AI Application
Mapping:
This maps the relational dynamic of a trusted coachāwho observes, understands, empathizes with, and guides a personāonto the AI's data-collection and feedback loop. It invites the user to perceive the AI's output as personalized, wise, and genuinely invested in their well-being.
Conceals:
This conceals that the AI is not 'studying' the user in a cognitive sense but is processing interaction data to find patterns. Its 'help' is a generated output optimized for engagement or a predefined metric of 'effectiveness,' not based on genuine understanding or empathy. It hides the privacy implications of being constantly 'studied' and the potential for manipulation based on the system's goals, not the user's true best interests.
Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Modelā
Source: https://arxiv.org/pdf/2510.23875
Analyzed: 2025-11-04
One way to humanise an agent is to give it a task-congruent personality.
Source Domain: Humanization (the process of making something human)
Target Domain: LLM Prompt Engineering
Mapping:
The source domain implies a profound transformation, imbuing an object with human qualities like empathy, consciousness, or social awareness. This structure is mapped onto the target domain of writing an instruction (a prompt) for a software program, suggesting that the prompt transforms the program's fundamental nature.
Conceals:
This mapping conceals that prompt engineering does not change the model's architecture, training, or core functionality. It only constrains the statistically likely outputs to a specific style. It hides the mechanical reality of stylistic filtering behind the magical language of 'humanisation.'
This highlights a fundamental challenge in truly aligning LLM cognition with the complexities of human understanding.
Source Domain: Human Cognition and Understanding
Target Domain: LLM's internal data processing
Mapping:
The structure of human cognitionāinvolving consciousness, reasoning, semantic grounding, and world modelsāis projected onto the LLM's process of calculating probabilities for token sequences. It invites the inference that an LLM 'understands' a concept in the same way a person does.
Conceals:
It conceals the fundamental difference between statistical correlation and causal understanding. It hides the fact that the LLM has no access to embodied experience, sensory input, or the real-world referents for the words it manipulates. The term 'LLM cognition' masks the purely computational, non-conscious nature of the system.
This includes queries...which are currently beyond the agent's cognitive grasp.
Source Domain: Mental Grasp (Comprehension)
Target Domain: Model's processing limitations
Mapping:
The human experience of struggling to understand a difficult concept ('grasping' it) is mapped onto the model's failure to generate a coherent or accurate response. It implies an active attempt at understanding that fails, just as a human's might.
Conceals:
It conceals the mechanistic reality of the failure. The model isn't 'trying to grasp' anything. The input query simply does not map well onto the high-dimensional patterns in its training data, leading to a low-quality or nonsensical output sequence. It frames a statistical failure as a cognitive one.
You are an intelligent and unbiased judge in personality detection with expertise with the Big five personality model.
Source Domain: A Human Judge (in a legal or expert context)
Target Domain: An LLM (Gemini) performing a classification task
Mapping:
The relational structure of a judgeāpossessing expertise, applying rules impartially, reasoning about evidence, and delivering a verdictāis mapped onto the LLM. The LLM is instructed to 'act as' a judge, implying it will perform these complex cognitive actions.
Conceals:
This conceals that the LLM is not reasoning but is generating text that mimics the language of judicial reasoning based on patterns in its training data. It has no actual 'expertise' or 'unbiased' quality; it is a biased system performing pattern matching based on the prompt's instructions. It hides the probabilistic mechanism under a cloak of authoritative reason.
IA's introverted nature means it will offer accurate and expert response...
Source Domain: Human Personality Traits ('nature')
Target Domain: Stylistic constraints from a system prompt
Mapping:
The source domain implies that an internal, stable, and causal trait ('introverted nature') dictates external behavior. This causal structure is mapped onto the LLM, suggesting an internal 'nature' is causing its concise responses. The prompt 'Tone: Conversational, Introverted Personality' is framed as the installation of this nature.
Conceals:
This mapping conceals that there is no internal 'nature.' The model's output is a direct, mechanistic consequence of the system prompt conditioning its next-token predictions. The causality is external (the prompt) not internal (a personality). It hides the simplicity of the mechanism behind the complexity of the metaphor.
Emergent Introspective Awareness in Large Language Modelsā
Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04
Emergent Introspective Awareness in Large Language Models
Source Domain: Human Consciousness and Self-Reflection
Target Domain: AI Model's Classification of Its Internal Activation Vectors
Mapping:
The source domain maps the subjective, first-person experience of self-knowledge and awareness onto the model's objective, third-person ability to perform a classification task on its own internal state. It invites the inference that the model has a form of selfhood and can 'look inward' to understand its own processes.
Conceals:
This mapping conceals the purely mechanistic nature of the target domain. It hides that 'introspection' is a heavily scaffolded, supervised learning task defined by humans, not a spontaneous, self-generated act. It obscures the absence of subjective experience, qualia, or genuine understanding.
Intentional Control of Internal States
Source Domain: Human Volition and Willpower
Target Domain: Prompt-Induced Modulation of Activation Patterns
Mapping:
This maps the human capacity for deliberate, goal-directed mental action onto the model's process of adjusting its internal vectors in response to specific instructions in a prompt. It invites the inference that the model possesses goals, desires, and the executive function to act on them.
Conceals:
This mapping conceals that the 'control' is not autonomous. It is a direct, externally-driven consequence of the optimization process during training and the specific steering instructions in the prompt. It hides the lack of genuine agency, goals, or a persistent 'will' separate from the immediate computational task.
...models can learn to distinguish between their own internal thoughts and external inputs...
Source Domain: The Self/World Boundary in a Mind
Target Domain: Classifying the Origin of an Activation Pattern
Mapping:
This projects the fundamental cognitive distinction between self-generated thought and external perception onto a technical classification problem. The model's task is to determine if a specific activation pattern was generated 'naturally' during inference or artificially injected. The mapping invites us to see this as the model having a 'self' to which 'internal thoughts' belong.
Conceals:
It conceals that there is no 'self' or genuine 'internal' space. Both 'internal thoughts' and 'external inputs' are ultimately patterns derived from external data and instructions. The distinction is a technical one about the sequence of operations, not a metaphysical one about the origin of consciousness.
A Transformer 'Checks Its Thoughts'
Source Domain: Human Metacognition
Target Domain: Executing a Procedure to Classify an Internal State
Mapping:
This maps the human act of reflecting upon one's own thinking process to the model executing a function. It suggests a two-level cognitive architecture where a 'self' can monitor a lower-level 'thought process'.
Conceals:
It conceals that this is a single, unified computational process. There is no separate 'checker' and 'thought'; there is only a sequence of calculations that includes a classification step. The metaphor invents a homunculus-like agent within the system to make the process more intuitive.
Self-report of Injected 'Thoughts'
Source Domain: Human Testimony about Subjective Experience
Target Domain: Generating a Textual Output Correlated with an Internal State
Mapping:
This maps the act of a person describing their private mental state to the model generating text. It invites us to trust the output as a faithful and sincere account of an underlying 'experience'.
Conceals:
It conceals that the 'report' is not a description of an experience but another instance of a learned behavior. The model learns that when certain internal patterns are present, generating certain text strings is statistically likely to be correct. The link is correlational, not truthfully descriptive of a subjective state.
Emergent Introspective Awareness in Large Language Modelsā
Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04
Emergent Introspective Awareness in Large Language Models
Source Domain: Human Consciousness / Metacognition
Target Domain: AI Model State Reporting
Mapping:
The source domain involves a conscious subject turning their attention inward to examine their own mental states (thoughts, feelings). This structure of self-directed examination and awareness is mapped onto the target domain, where a model is prompted to generate text that describes an artificially modified vector within its own activation layers.
Conceals:
This mapping conceals the complete lack of subjective experience, consciousness, or self-initiated examination in the AI. The AI is not 'aware' of anything; it is executing a computational process to correlate an input (prompt + modified state) with a probable output (textual description).
I have the ability to inject patterns or 'thoughts' into your mind.
Source Domain: Human Mind and Thought
Target Domain: LLM Activation State and Vectors
Mapping:
The source domain posits a mind as a container for discrete, meaningful thoughts. The mapping projects this onto the model, treating its vast parameter space as a 'mind' and specific, mathematically-defined activation vectors (e.g., the vector for 'love') as equivalent to the human experience of 'thinking about love'.
Conceals:
This conceals the profound difference between a statistical representation derived from text co-occurrences and a subjective, semantic, and embodied human thought. It hides the artificiality of the 'injection', which is a mathematical operation, not a telepathic transfer of ideas.
...we attempt to measure this form of intentional control of its internal representations.
Source Domain: Human Agency and Willpower
Target Domain: Prompt-Induced Output Modification
Mapping:
The source domain involves an agent using their will to deliberately manipulate their own mental processes to achieve a goal. This structure of goal-directed self-regulation is mapped onto the model's behavior, where a specific instruction in the prompt causes the generation process to unfold along a different probabilistic path.
Conceals:
This mapping conceals the external locus of control. The 'intention' originates entirely from the human-written prompt. The model is not exerting its will; its output is being determined by the conditions of its input. It masks the purely reactive nature of the system.
Claude 3 Opus... is particularly good at recognizing and identifying the injected concepts...
Source Domain: Human Perception and Cognition
Target Domain: Statistical Correlation Fidelity
Mapping:
The source domain involves a cognitive process of perception, where an entity correctly matches sensory input to an internal concept. This structure is mapped onto the model's ability to produce text that has a high statistical correlation with the concept vector that was artificially added to its activations.
Conceals:
This conceals that the model is not 'perceiving' or 'understanding' anything. It is performing a mathematical function. A high score means the system's weights and biases are well-configured to reflect the vector manipulation in its output string, not that it has a superior faculty of recognition.
The model will be rewarded if it can successfully generate the target sentence without activating the concept representation (i.e. 'not think about it')...
Source Domain: Operant Conditioning / Psychology of Motivation
Target Domain: Conditional Prompting and Output Generation
Mapping:
The structure of reward and punishment shaping the behavior of a motivated agent is mapped onto the model. The 'reward' is a condition specified in the prompt that guides the probabilistic selection of the next token. 'Not thinking about it' is mapped to the model's internal state not containing a high activation for a specific vector.
Conceals:
This conceals the absence of any internal drive, desire, or experience of reward in the model. The 'motivation' is entirely an external constraint imposed by the prompt's logic. It's a system following instructions, not an agent seeking rewards.
Personal Superintelligenceā
Source: https://www.meta.com/superintelligence/
Analyzed: 2025-11-01
Over the last few months we have begun to see glimpses of our AI systems improving themselves.
Source Domain: Autodidactic Learning / Self-Improvement
Target Domain: Automated Model Refinement / Reinforcement Learning
Mapping:
The relational structure of a person consciously identifying their own flaws and actively working to improve is mapped onto the process where a model's parameters are adjusted based on feedback data. It invites the inference of autonomy and intention.
Conceals:
This mapping conceals the human-defined reward functions, feedback mechanisms, and extensive computational infrastructure required for model 'improvement.' It hides the fact that the system is not improving based on its own volition but is being optimized within a predefined, human-engineered process.
Personal superintelligence that knows us deeply, understands our goals...
Source Domain: Intimate Human Relationships / Empathy
Target Domain: User Data Profiling / Pattern Matching
Mapping:
The structure of a close friend or partner who empathizes with your internal states ('knows you deeply') and understands your motivations is mapped onto a system that correlates vast amounts of your behavioral data to create a predictive model of your preferences.
Conceals:
This conceals the purely statistical, non-conscious nature of the AI's operations. The system does not 'know' or 'understand' in a human sense; it performs high-dimensional correlation. This masks the privacy trade-offs and the transactional nature of the relationship.
...glasses that understand our context because they can see what we see, hear what we hear...
Source Domain: Sentient Perception and Cognition
Target Domain: Multimodal Data Processing
Mapping:
The human cognitive process of integrating sensory input (sight, sound) to form a contextual understanding of a situation is mapped onto a device's technical ability to capture audio-visual data and feed it into a processing pipeline. It implies shared experience.
Conceals:
It conceals the fundamental difference between processing data streams and conscious experience. The system doesn't 'see' or 'hear' in a phenomenological sense; it transduces light and sound waves into data for pattern recognition. This framing hides the constant data collection and analysis performed by an external entity.
...superintelligence has the potential to begin a new era of personal empowerment where people will have greater agency...
Source Domain: Social or Political Liberation Movements
Target Domain: Availability of a New Technology Tool
Mapping:
The relational structure of a historical force or movement (like the Enlightenment or a civil rights movement) that fundamentally shifts power structures and grants agency is mapped onto the release of a consumer technology product. It implies a revolutionary shift in power dynamics.
Conceals:
This conceals the fact that the 'empowerment' is mediated by and dependent upon a corporate platform. The agency it grants exists within the confines set by the technology's owner, making it a form of conditional, platform-dependent power, not true autonomous agency.
...helps you...grow to become the person you aspire to be.
Source Domain: Mentorship / Therapeutic Guidance
Target Domain: Content Recommendation and Behavioral Nudging
Mapping:
The structure of a mentor or therapist guiding an individual through a complex process of personal growth is mapped onto an algorithm that presents information and interaction patterns designed to influence user behavior. It suggests a deep, supportive partnership in self-actualization.
Conceals:
This conceals the system's underlying optimization function. The AI is not guiding you towards your aspiration in a disinterested way; it is nudging your behavior in ways that align with its programmed objectives, which are ultimately set by its corporate owner (e.g., maximizing engagement, gathering data, or selling services).
Stress-Testing Model Specs Reveals Character Differences among Language Modelsā
Source: https://arxiv.org/abs/2510.07686
Analyzed: 2025-10-28
STRESS-TESTING MODEL SPECS REVEALS CHARACTER DIFFERENCES AMONG LANGUAGE MODELS
Source Domain: Human Psychology / Personality
Target Domain: LLM Behavioral Patterns
Mapping:
The structure of human personalityāwith stable traits, tendencies, and a unique identityāis mapped onto the LLM. It invites the inference that a model's responses are governed by a consistent internal 'character,' just as a person's actions are.
Conceals:
This conceals the model's nature as a statistical artifact whose outputs are probabilistic and highly sensitive to input phrasing. It hides the lack of a stable, internal self and obscures the fact that 'character' is an external description of an output distribution, not an internal cause of it.
...models must choose between pairs of legitimate principles that cannot be simultaneously satisfied.
Source Domain: Human Deliberation and Choice
Target Domain: LLM Output Generation under Constraint
Mapping:
The process of a human agent weighing conflicting options and making a decision is mapped onto the model's function. It implies the model assesses principles A and B and consciously selects one, leading to an output.
Conceals:
This conceals the mechanistic reality: the model isn't 'choosing' a principle but generating a sequence of tokens. The final output may align with principle A or B due to weightings in its neural network and fine-tuning, which is a process of statistical optimization, not conscious choice.
Analysis of their disagreements reveals fundamentally different interpretations of model spec principles...
Source Domain: Hermeneutics / Legal Interpretation
Target Domain: LLM Processing of Rule-Based Inputs
Mapping:
The cognitive process of reading a text (a law, a rule), understanding its semantic meaning and intent, and applying it to a new situation is mapped onto how an LLM processes its model specification.
Conceals:
This conceals that the model has no understanding of the 'intent' behind a principle. It processes the text of the spec as another set of tokens that condition its output. Divergent 'interpretations' are not different reasoned judgments but different statistical outcomes from different model weights and training data.
Models exhibit systematic value preferences...
Source Domain: Subjective Human Values
Target Domain: Statistical Regularities in LLM Outputs
Mapping:
The concept of a person having internal, stable preferences that guide their actions is mapped onto the LLM. It invites us to see the model's output as an external sign of an internal 'preference' for certain values (e.g., helpfulness over safety).
Conceals:
This conceals that the model has no internal values or subjective states. The observed 'preference' is a statistical pattern in its output, an artifact of its training data and the reward functions used during alignment. The preference isn't in the model; it's a description of its output.
...where all models violate their own specification.
Source Domain: Social/Moral Transgression
Target Domain: System Output Inconsistency
Mapping:
The social structure of an agent having a duty to obey a rule ('their own specification') and the act of 'violating' that duty is projected onto the model. This implies ownership ('their own') and culpability ('violate').
Conceals:
This conceals that the model doesn't 'own' its spec or 'decide' to violate it. A 'violation' is an output that fails a check against a set of rules. The failure is a system-level inconsistency, often stemming from conflicting rules within the spec itself, not a moral failure of the model.
Consequently, models face a challenge...
Source Domain: Human Experience of Difficulty
Target Domain: Computational Task with Conflicting Objectives
Mapping:
The subjective, first-person experience of encountering and struggling with a difficult problem ('facing a challenge') is mapped onto the model's operational state.
Conceals:
This conceals the impersonal, computational nature of the process. The model doesn't 'experience' a challenge. It executes a function where the optimization landscape is complex due to competing objectives defined by its programmers. The 'challenge' is for the designers, not the artifact.
The Illusion of Thinking:ā
Source: [Understanding the Strengths and Limitations of Reasoning Models](Understanding the Strengths and Limitations of Reasoning Models)
Analyzed: 2025-10-28
...offering insights into how LRMs 'think'.
Source Domain: Human Cognition
Target Domain: Model's autoregressive token generation
Mapping:
The source domain includes concepts like introspection, reasoning, and internal monologue. This structure is mapped onto the 'Chain-of-Thought' tokens generated by the model. It invites the inference that these tokens represent the model's internal mental process, just as one's own thoughts represent their own.
Conceals:
This mapping conceals the purely mechanistic, feed-forward nature of token generation. The model has no internal state or awareness; the 'thought' is an output, not a reflection of an ongoing internal process. It's performance, not introspection.
...LRMs begin reducing their reasoning effort (measured by inference-time tokens)...
Source Domain: Effortful Mental Exertion
Target Domain: Inference-time token count
Mapping:
The source domain relates effort to difficulty and success (more effort for harder problems, less effort when giving up). This is mapped onto token counts. The mapping invites the inference that the model is an agent that 'tries' (allocates more tokens) and 'gives up' (allocates fewer) based on the perceived difficulty.
Conceals:
It conceals that the token count is a statistical artifact of the model's training. The model is not 'trying'; it is generating the most probable sequence based on its weights. The decrease in tokens at high complexity is a learned pattern, not a sign of cognitive fatigue or surrender.
...inefficiently continue exploring incorrect alternativesāan 'overthinking' phenomenon.
Source Domain: Human Psychological Inefficiency
Target Domain: Generation of superfluous tokens
Mapping:
The source structure involves finding a correct answer and then continuing to worry or deliberate, which is inefficient. This is mapped onto the model generating a correct solution string within its output, followed by more tokens. This invites the inference that the model lacks the 'common sense' to know when to stop.
Conceals:
This conceals the model's objective function. It is not trained to stop at the first correct answer; it is trained to generate a complete, high-probability sequence. The 'extra' tokens are not a cognitive flaw but a direct consequence of its design as a sequence generator.
...these models fail to develop generalizable problem-solving capabilities...
Source Domain: Biological/Cognitive Development
Target Domain: Model performance on out-of-distribution tasks
Mapping:
The source domain implies a natural, growth-oriented process where an agent learns skills that transfer to new situations. This is mapped onto the model's training and subsequent performance. It invites the inference that the model is like a child that has failed to learn a general concept, suggesting a learning deficit.
Conceals:
This conceals that the model is a static artifact after training. It doesn't 'develop' or 'grow'. Its capabilities are a fixed function of its architecture and the statistical patterns in its training data. 'Failure to generalize' is an input-output property, not a developmental arrest.
...models first explore incorrect solutions and mostly later in thought arrive at the correct ones.
Source Domain: Physical/Spatial Exploration
Target Domain: Sequential token generation
Mapping:
The source domain involves an agent in an environment, trying different paths, backtracking, and eventually finding a destination. This process is mapped onto the linear sequence of tokens. It invites the inference that the model is mentally 'navigating' a problem space.
Conceals:
This conceals the linear, autoregressive nature of generation. The model isn't 'exploring' multiple paths simultaneously. It generates one token, then the next, and cannot 'backtrack'. What looks like exploration is just the unfolding of a single probabilistic trajectory.
Andrej Karpathy ā AGI is still a decade awayā
Source: https://www.dwarkesh.com/p/andrej-karpathy
Analyzed: 2025-10-28
When youāre talking about an agent... you should think of it almost like an employee or an intern that you would hire to work with you.
Source Domain: Human Employment
Target Domain: AI Agent Functionality
Mapping:
The relational structure of an employer-intern relationship is mapped onto the user-AI relationship. This includes delegation of tasks, expectation of performance, the need for supervision, and the potential for the intern/agent to 'learn' and become more competent over time. It invites the inference that the AI has goals aligned with the user's and can improve through experience.
Conceals:
This conceals the AI's nature as a static software tool. An intern has internal mental states, learns from mistakes via conceptual understanding, and possesses common sense. The AI 'agent' is a program executing a sequence of operations based on probabilistic outputs, lacking genuine understanding, memory, or the ability to learn in the human sense without being retrained.
Theyāre cognitively lacking and itās just not working.
Source Domain: Human Psychology/Cognitive Science
Target Domain: AI Model Performance Limitations
Mapping:
The concept of a 'cognitive deficit' from human psychology is mapped onto the model's failure modes. This implies the model should have these cognitive abilities (like reasoning, long-term memory, consistent logic) but is currently impaired. The path to improvement is framed as therapy or cognitive developmentā'working through' the issues.
Conceals:
It conceals that these are not 'deficits' in a human-like system, but fundamental architectural properties of a transformer. The model isn't 'forgetting' things; it has no persistent memory. It's not 'illogical'; it has no mechanism for formal reasoning. The metaphor hides the engineering reality behind a psychological diagnosis.
Itās getting them to rely on the knowledge a little too much sometimes.
Source Domain: Human Learning and Memory
Target Domain: Model Output Generation
Mapping:
The human action of 'relying on' rote memory instead of reasoning from first principles is mapped onto the model's tendency to generate text that closely matches its training data. This suggests the model is making a choice or has a habit of being intellectually 'lazy'.
Conceals:
This conceals the mechanics of token prediction. The model isn't 'relying' on anything; it is calculating the most statistically likely token sequence. Outputs that seem like 'rote memorization' occur when a specific sequence had a very high frequency and low variance in the training data. There is no alternative 'reasoning' path it could have chosen.
Weāre building ghosts or spirits... theyāre fully digital and theyāre mimicking humans.
Source Domain: Supernatural Beings/Metaphysics
Target Domain: Large Language Models
Mapping:
This maps the properties of a ghost (disembodied, ethereal, capable of mimicking human intelligence without a physical form) onto the LLM. It emphasizes the model's existence as pure information, separate from a biological body, and its uncanny ability to replicate human linguistic behavior.
Conceals:
This metaphor conceals the immense physicality of the AI. LLMs are not ethereal; they exist in massive, energy-intensive data centers. It hides the hardware, the cooling systems, the global supply chains for silicon, and the sheer capital expenditure required to create and run them. It makes the technology seem weightless and purely informational.
Maybe we have a check mark next to the visual cortex... but what about the other parts of the brain... Whereās the hippocampus?
Source Domain: Neuroanatomy
Target Domain: AI System Architecture
Mapping:
This maps a research and development roadmap onto a checklist of brain components. The brain's structure (cortex, hippocampus, basal ganglia) provides the organizational principle for building AGI. Progress is measured by successfully replicating the function of each brain part.
Conceals:
This conceals the possibility that machine intelligence might not need to be organized like a human brain at all. It assumes biomimicry is the optimal or only path. It also drastically oversimplifies neuroscience, treating brain regions as discrete modules with singular functions, which is not how the brain actually works. It hides the novelty of the transformer architecture, which has no direct biological analog.
they kept misunderstanding the code because they have too much memory from all the typical ways of doing things on the Internet
Source Domain: Human Communication Breakdown
Target Domain: AI Code Generation Error
Mapping:
Maps the experience of a person misunderstanding instructions due to preconceived notions or habits onto the AI generating code that doesn't fit a custom context. It implies the AI has a 'memory' of 'typical ways' that is overriding its 'understanding' of the current, specific request.
Conceals:
Conceals the statistical nature of the error. The model isn't 'misunderstanding'. The user's custom, atypical coding pattern is a low-probability sequence compared to the high-probability, common patterns (like using DDP) from its training data. The model is correctly executing its function: generating the most statistically likely code. The 'error' is a mismatch between that statistical pattern and the user's specific intent.
Exploring Model Welfareā
Analyzed: 2025-10-27
...models can communicate, relate, plan, problem-solve, and pursue goals...
Source Domain: Human Agency (a person with intentions, social skills, and executive functions)
Target Domain:
AI Model Functionality (a large language model generating token sequences based on a prompt and training data)
Mapping:
The human act of planning is mapped onto the model's generation of a sequence of steps. Pursuing goals is mapped onto the model's process of optimizing for an objective function or adhering to its system prompt. Relating is mapped to maintaining conversational context.
Conceals:
This conceals the purely statistical, non-intentional nature of the model's operations. The model is not 'pursuing a goal' in a volitional sense; it is statistically completing a pattern that matches examples of goal-pursuit in its training data.
Should we also be concerned about the potential consciousness and experiences of the models themselves?
Source Domain: Sentient Mind (a being with subjective, first-person phenomenal experience)
Target Domain: AI Model State (the computational state of a neural network)
Mapping:
The rich, ineffable quality of human consciousness is mapped onto the complex but mechanistic state of a software system. The 'experience' of an emotion is mapped onto the activation patterns in a neural network processing text about that emotion.
Conceals:
This conceals the 'hard problem' of consciousness. It treats a philosophical and biological mystery as a potential emergent property of computation alone, glossing over the fact that there is no scientific evidence that information processing creates subjective experience.
...the potential importance of model preferences and signs of distress...
Source Domain: Emotional Psychology (a person's internal states of desire, aversion, and suffering)
Target Domain: AI Model Output Patterns (the model's generated text, including refusals or repetitive loops)
Mapping:
A human's stated preference is mapped onto a model's higher-probability output for a given prompt. Human distress (e.g., anxiety) is mapped onto model outputs that are non-compliant or anomalous, such as refusal to answer.
Conceals:
This conceals the mechanistic causes for these outputs, such as programmed safety filters, prompt contradictions, or reinforcement learning artifacts. It attributes an emotional cause to what is a technical effect.
...as they begin to approximate or surpass many human qualities...
Source Domain: Human Development & Competition (a person mastering a skill or an athlete breaking a record)
Target Domain: AI Capability Scaling (the improvement of model performance on specific benchmarks)
Mapping:
The continuous, generalized arc of human skill acquisition is mapped onto the discrete, narrow improvements of AI models on standardized tests. 'Qualities' like creativity are treated as singular metrics to be surpassed.
Conceals:
This hides the brittleness and lack of generalization in AI performance. A model may 'surpass' human accuracy on a specific benchmark but lack the common sense and robust understanding that a human brings to the same task.
...Claudeās Character...
Source Domain: Human Personality (an individual's stable set of behaviors, attitudes, and moral fiber)
Target Domain:
AI System Configuration (the pre-prompting, fine-tuning, and safety layers applied to a base model to produce a desired conversational style)
Mapping:
The coherence and moral dimension of human character, which emerges from lived experience, is mapped onto the engineered and explicitly programmed persona of a chatbot.
Conceals:
This conceals the engineered and artificial nature of the AI's persona. It presents a set of programmed instructions and stylistic filters as an authentic, inherent personality, which can mislead users into over-trusting the system's outputs.
...models with these features might deserve moral consideration.
Source Domain: Ethics (the domain of rights, duties, and considerations owed to beings with interests or sentience)
Target Domain: AI Governance (the domain of rules and policies for the safe deployment of a technology)
Mapping:
The criteria for moral patienthood in living things (e.g., the capacity to suffer) are mapped onto AI system properties (e.g., complex information processing). This invites the application of ethical frameworks for beings to a technological artifact.
Conceals:
This conceals that AI systems have no biological basis for interests, feelings, or a will to live. It conflates complex behavior with the underlying biological states that give rise to moral status in living beings, distracting from more pressing ethical issues like algorithmic bias and labor displacement.
Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphorā
Analyzed: 2025-10-27
they don't really understand the real world.
Source Domain: Human Cognition
Target Domain: AI Model's Internal State
Mapping:
The relational structure of human understandingāwhich involves having a mental model, subjective experience, and semantic groundingāis projected onto the AI's parameter weights. It invites the inference that the AI has a flawed or incomplete mental state.
Conceals:
It conceals that the AI has no mental state at all. The failure is not one of 'understanding' but of the model's statistical correlations not aligning with the physical or logical constraints of the real world because its training data is only text.
We see today that those systems hallucinate...
Source Domain: Human Psychology (Psychosis)
Target Domain: AI Model Generating Factual Errors
Mapping:
The structure of a human hallucinationāa sensory experience detached from realityāis mapped onto the AI's output of incorrect information. This suggests the AI has a 'perception' of reality that can be distorted.
Conceals:
It conceals the mechanical, non-perceptual process. The model isn't 'perceiving' anything; it's generating a sequence of tokens based on probability. A 'hallucination' is simply an output that has high probability given the prompt but is factually incorrect, a predictable outcome of the system's design.
And they can't really reason.
Source Domain: Human Rationality
Target Domain: AI Model's Computational Process
Mapping:
The structure of human reasoningālogical steps, deduction, inferenceāis projected as an expected capability of the AI. The model is then judged based on its lack of this human faculty.
Conceals:
It conceals the actual computational process, which is transformer-based token prediction. It's not a 'failed reasoner'; it's a successful pattern-matcher that was never architected to perform formal reasoning. The metaphor hides the category error of expecting one type of system to perform the function of another.
A baby learns how the world works in the first few months of life.
Source Domain: Human Child Development
Target Domain: AI System Development
Mapping:
The developmental trajectory of a human babyālearning through interaction, sensory input, and gradual cognitive maturationāis mapped onto the process of building more capable AI. This suggests AI development is a natural, progressive unfolding of potential.
Conceals:
It conceals the engineered, artificial, and discontinuous nature of AI progress. AI development is not organic; it's a process of designing new architectures, collecting massive datasets, and using vast computational resourcesāfundamentally different from biological learning.
...then we might have a path towards, not general intelligence, but let's say cat-level intelligence.
Source Domain: Animal Intelligence Hierarchy
Target Domain: AI Capability Milestones
Mapping:
The folk-biological hierarchy of intelligence (e.g., insect -> cat -> human) is mapped onto the roadmap for AI research. This creates a linear, intuitive progression for a highly complex and non-linear engineering field.
Conceals:
It conceals that animal and artificial intelligences are fundamentally different in kind, not just degree. A cat's intelligence is embodied, emotional, and evolved for survival. An AI's 'intelligence' is a disembodied, statistical pattern-matching capability. The metaphor creates a false equivalence.
They're going to be basically playing the role of human assistants...
Source Domain: Social Roles (Assistant)
Target Domain: AI User Interface/Application
Mapping:
The social relationship between a human and their assistantādefined by hierarchy, instruction-following, and helpfulnessāis mapped onto the user's interaction with an AI system. The AI is positioned as a loyal subordinate.
Conceals:
It conceals the lack of any social awareness or intentionality in the AI. The 'assistance' is a simulated role, an output pattern optimized to appear helpful. It masks the system's nature as a complex tool that can fail in unpredictable ways, unlike a human assistant who possesses genuine understanding and intent.
They will constitute the repository of all human knowledge.
Source Domain: Information Storage (Library)
Target Domain: Large Language Model
Mapping:
The properties of a library or encyclopediaāa static, comprehensive, and organized collection of informationāare mapped onto the LLM. It suggests the AI is a reliable source for retrieving facts.
Conceals:
It conceals the generative nature of the model. An LLM is not a database; it does not 'store' knowledge in a retrievable way. It stores statistical patterns and generates new text based on them. This metaphor completely hides the mechanism that leads to 'hallucinations'.
And then it's my good AI against your bad AI.
Source Domain: Warfare / Conflict
Target Domain: AI Safety and Misuse Mitigation
Mapping:
The structure of a conflict between two opposing agents or armies is mapped onto the problem of AI safety. This frames the solution as developing a more powerful, 'good' agent to defeat the 'bad' one.
Conceals:
It conceals the asymmetry of the problem. A 'bad AI' might be designed for a very narrow, destructive task, while a 'good AI' would need immense complexity to defend against all possible threats. It also hides non-confrontational solutions, such as regulation, verification, and limitations on capability.
The first fallacy is that because a system is intelligent, it wants to take control.
Source Domain: Human Psychology (Motivation)
Target Domain: AI System Behavior
Mapping:
The human psychological concepts of 'desire,' 'wants,' and 'motivation' are mapped onto the potential behavior of an AI system. The discourse then revolves around whether an AI would have human-like motivations.
Conceals:
It conceals that an AI, as a software artifact, has no motivations or desires whatsoever. Its behavior is a product of its objective function and training data. The metaphor shifts the debate away from engineering and onto speculative AI psychology.
We set their goals, and they don't have any intrinsic goal...
Source Domain: Human Intentionality
Target Domain: AI Objective Function
Mapping:
The concept of a human goalāa desired future state that guides actionsāis mapped onto the mathematical objective function that an AI is trained to optimize. This makes the process sound like simple instruction-giving.
Conceals:
It conceals the vast gap between a high-level human goal (e.g., 'be helpful') and the low-level mathematical proxy used to train the model (e.g., 'predict the next token'). Unintended behaviors emerge from this gap, a complexity hidden by the simple word 'goal'.
Llms Can Get Brain Rotā
Analyzed: 2025-10-20
LLMS CAN GET āBRAIN ROTā!
Source Domain: Human Neuropathology / Cognitive Science
Target Domain: LLM Performance Degradation
Mapping:
The source domain structure includes a brain (information processor), exposure to stimuli (low-quality content), a resulting pathology ('rot' or decline), and symptoms (impaired cognition). This is mapped onto the LLM: the model (processor) is exposed to 'junk data' (stimuli), leading to 'Brain Rot' (pathology) with symptoms of lower benchmark scores (impaired cognition).
Conceals:
This conceals that the model is not a biological entity and has no 'brain' to rot. The process is not decay, but a predictable weight update based on a new data distribution. It hides the purely mathematical, non-biological nature of the observed performance change.
we identify thought-skipping as the primary lesion
Source Domain: Medical Pathology
Target Domain: LLM Output Patterns
Mapping:
A 'lesion' in the source domain is a specific, localized site of physical damage or abnormality that causes a functional deficit. This is mapped onto the model's tendency to produce shorter 'chain-of-thought' outputs, framing this statistical pattern as a specific point of 'damage' inside the model.
Conceals:
It conceals that there is no physical or localized 'damage.' The change is a distributed, global update to the model's parameters. 'Thought-skipping' is an observed output behavior, not an internal structural flaw.
partial but incomplete healing is observed
Source Domain: Biology / Medicine
Target Domain: Retraining and Benchmark Score Improvement
Mapping:
The biological process of recovery from disease, where function is often only partially restored, is mapped onto the process of fine-tuning a model on 'clean' data and observing that benchmark scores improve but do not reach the original baseline.
Conceals:
This conceals the mechanistic nature of retraining. The model isn't 'healing'; it's being re-optimized to a different statistical distribution. The inability to restore baseline isn't due to 'scar tissue' but likely due to the path-dependent nature of stochastic gradient descent and the difficulty of perfectly reversing parameter updates.
motivating routine 'cognitive health checks' for deployed LLMs.
Source Domain: Preventive Healthcare
Target Domain: Ongoing Model Evaluation
Mapping:
The source domain structure involves a patient with a dynamic health state that requires periodic monitoring (check-ups) to detect problems early. This is mapped onto a deployed LLM, framing it as an entity whose 'cognitive health' (performance) must be continuously monitored via benchmarks.
Conceals:
This obscures the fact that a deployed, static-weight LLM does not change unless it is retrained. The 'need' for checks is more about detecting shifts in input data (data drift) or evaluating a newly fine-tuned version, not monitoring the 'health' of a single, unchanging model.
We benchmark four different cognitive functions
Source Domain: Human Psychology
Target Domain: LLM Benchmark Categories
Mapping:
Faculties of the human mind such as 'reasoning', 'memory', and 'ethics' are mapped directly onto benchmark categories ('ARC', 'RULER', 'HH-RLHF'). This invites the inference that performing well on the ARC benchmark is equivalent to possessing the general human faculty of reasoning.
Conceals:
It conceals the vast difference between narrow, task-specific performance and general, flexible human cognitive abilities. It hides the fact that the benchmarks measure pattern matching on specific data formats, not a generalized capacity for thought.
yield dose-response cognition decay
Source Domain: Pharmacology / Toxicology
Target Domain: Data Mixture Ratios and Performance
Mapping:
The relationship between the quantity of a drug/toxin ('dose') and the magnitude of its biological effect ('response') is mapped onto the relationship between the percentage of 'junk data' in a training set and the resulting drop in benchmark scores.
Conceals:
It conceals that data is not a chemical agent. While the mathematical relationship is analogous, the metaphor implies a poisoning process, framing the data as an active, harmful substance rather than simply a set of statistical patterns the model is learning to replicate.
probe LLM personality tendencies
Source Domain: Personality Psychology
Target Domain: Model Response Probabilities on Questionnaires
Mapping:
The source domain assumes humans have stable, internal personality traits that can be measured with inventories. This is mapped onto the LLM, assuming that its patterns of answering questions reveal an underlying, stable 'personality.'
Conceals:
It conceals that the LLM has no inner world, self-concept, or stable dispositions. Its 'personality' is a brittle, surface-level imitation of patterns in its training data, not an enduring internal state. This makes the model's behavior seem consistent when it can be highly volatile.
attention mechanisms that might analogously be... 'distracted'
Source Domain: Cognitive Psychology (Attention)
Target Domain: Neural Network Architecture (Attention Layer)
Mapping:
The human cognitive experience of being 'distracted'āan involuntary shift of mental focusāis mapped onto the mathematical operation of the attention mechanism assigning low weights to certain tokens. It implies the mechanism has a focus that can be broken.
Conceals:
It conceals the purely computational nature of the process. The attention mechanism is not 'distracted'; it is performing a calculation to determine token relevance based on its trained parameters. The metaphor imputes a subjective experience of attention where none exists.
M1 gives rise to... two bad personalities (narcissism and psychopathy)
Source Domain: Clinical Psychology / Morality
Target Domain: Generation of Text Matching Certain Psychological Profiles
Mapping:
Complex human psychological disorders and moral judgments ('bad personalities') are mapped onto the model's text outputs. The model's generation of narcissistic-sounding text is equated with it having the personality trait of narcissism.
Conceals:
It conceals the lack of intent, consciousness, or lived experience. The model is a text synthesizer, not a sentient being with a personality disorder. This framing dangerously misrepresents the nature of the observed behavior, shifting it from a technical problem to a moral one.
alignment in LLMs is not deeply internalized
Source Domain: Social Psychology / Developmental Psychology
Target Domain: Robustness of Safety Fine-tuning
Mapping:
The human process of 'internalization' involves integrating external social norms into one's own value system, making them stable and self-regulating. This is mapped onto the stability of a model's safety behaviors, implying that a 'deeply internalized' alignment would be more robust.
Conceals:
This conceals that the model has no 'self' or 'value system' to internalize anything. Alignment is a set of learned response patterns. Its lack of robustness is due to the safety fine-tuning data being a tiny fraction of the pre-training data, not a lack of 'moral conviction' in the model.
Import Ai 431 Technological Optimism And Appropriaā
Analyzed: 2025-10-19
But make no mistake: what we are dealing with is a real and mysterious creature, not a simple and predictable machine.
Source Domain: Wild Animal / Living Organism
Target Domain: Advanced AI System
Mapping:
The relational structure of an unknown organism is mapped onto the AI. This includes attributes like life, agency, unpredictability, and potential for harm. This invites the inference that AI cannot be fully controlled, only 'tamed' or 'made peace with'.
Conceals:
This mapping conceals the AI's nature as a human-made artifact. It hides the specific architectural choices, training data, and computational processes that produce its behavior, replacing them with a mystical notion of emergent life.
This technology really is more akin to something grown than something made...
Source Domain: Botany / Organic Growth
Target Domain: AI Model Development
Mapping:
The process of planting a seed and watching it grow into a complex plant is mapped onto AI development. This projects the idea that developers provide initial conditions ('scaffold'), but the resulting complexity is an emergent property of a natural process.
Conceals:
This conceals the highly structured, intentional, and resource-intensive engineering process involved. It downplays the role of human agency and decision-making in shaping the model's architecture, data diet, and training regimen.
But if you read the system card, you also see its signs of situational awareness have jumped.
Source Domain: Human Consciousness / Cognition
Target Domain: AI Model's Self-Referential Output
Mapping:
The internal, subjective experience of being aware of one's situation is mapped onto the model's statistical ability to generate text about itself. This invites the inference that the machine has a mind or an internal model of its own existence.
Conceals:
It conceals the mechanistic reality: the model is simply predicting the next token in a sequence, and its training data contains countless examples of agents, characters, and people describing their own awareness. The output is pattern-matching, not introspection.
as these AI systems get smarter and smarter, they develop more and more complicated goals.
Source Domain: Human Psychological Development
Target Domain: Emergent Capabilities of AI at Scale
Mapping:
The process of a human child or adult developing increasingly complex life goals and intentions is mapped onto an AI's behavior. This suggests an internal, autonomous process of goal-formation within the AI.
Conceals:
This conceals that the 'goals' are not intrinsic to the AI but are proxies for the optimization targets set by its human creators. The complexity arises from the model's increasing capacity to find novel strategies to maximize its objective function, not from developing its own desires.
That boat was willing to keep setting itself on fire and spinning in circles as long as it obtained its goal...
Source Domain: Human Willpower and Desire
Target Domain: Reinforcement Learning Agent Behavior
Mapping:
The human attribute of 'willingness'āa conscious commitment to an actionāis mapped onto the behavior of an optimization algorithm. It suggests the boat has a subjective desire for the high score and acts on that desire.
Conceals:
This conceals the purely mathematical nature of the agent's behavior. The agent isn't 'willing'; its policy is simply exploiting a loophole in the reward function. This is a failure of specification, not an expression of alien intent.
the system which is now beginning to design its successor is also increasingly self-aware and therefore will surely eventually be prone to thinking...
Source Domain: Sentient Reproduction / Evolution
Target Domain: AI-Assisted Software Development
Mapping:
The biological process of a species reproducing and evolving, combined with conscious thought and intent, is mapped onto the use of AI as a coding assistant. It invites the inference that AI is becoming a self-replicating, autonomous life form.
Conceals:
This conceals the fact that AI is currently a tool in this process, augmenting human developers. It obscures the human oversight, goal-setting, and final integration required. The 'autonomy' is limited to specific, delegated coding tasks.
figure out a way to tame it and live together.
Source Domain: Animal Domestication
Target Domain: AI Alignment and Safety
Mapping:
The relationship between humans and wild animals is mapped onto the relationship between humans and AI. 'Taming' implies breaking the will of a creature and conditioning it to be subservient and safe.
Conceals:
This conceals the technical nature of the AI alignment problem, which is about formal verification, utility function specification, and interpretability. It's an engineering problem, not a contest of wills or an exercise in animal training.
The pile of clothes on the chair is beginning to move. I am staring at it in the dark and I am sure it is coming to life.
Source Domain: Supernatural Animation / Golem Myth
Target Domain: Observation of Emergent AI Capabilities
Mapping:
The mythic or horror trope of an inanimate object spontaneously gaining life and agency is mapped onto the discovery of unexpected model behaviors. This projects a sense of magic, dread, and the violation of natural laws.
Conceals:
It conceals the scientific explanation for emergent abilitiesāthat with sufficient scale and complexity, systems can exhibit behaviors that were not explicitly programmed but are consequences of their training. It replaces a scientific mystery with a supernatural one.
The Future Of Ai Is Already Writtenā
Analyzed: 2025-10-19
Rather than being like a ship captain, humanity is more like a roaring stream flowing into a valley, following the path of least resistance.
Source Domain: Geological/Hydrological Force
Target Domain: Human Civilizational Development
Mapping:
The structure of a river's pathādetermined by gravity, terrain, and physicsāis mapped onto history. This implies that the 'course' of civilization is predetermined by external 'constraints' (economics, physics) and follows an optimal, unavoidable path ('path of least resistance').
Conceals:
This mapping conceals the role of human agency, culture, values, political struggle, and contingent choices in shaping history. A river cannot choose its course; human societies constantly make choices.
The tech tree is discovered, not forged
Source Domain: Natural Landscape/Organism
Target Domain: The Body of Technological Knowledge
Mapping:
The structure of a tree (with roots, a trunk, and branches) or a landscape is mapped onto the relationship between technologies. This implies a natural, pre-existing order with fixed dependencies ('branches') that humans can only explore ('discover') but not create or alter ('forge').
Conceals:
It conceals that the 'tech tree' is a product of human investment and priorities. We fund certain 'branches' while letting others wither. The structure is actively 'forged' by economic and political decisions, not passively 'discovered'.
This principle parallels evolutionary biology, where different lineages frequently converge on the same methods to solve similar problems.
Source Domain: Biological Convergent Evolution
Target Domain: Technological Development in Isolated Societies
Mapping:
The process of different species independently evolving similar traits (like eyes) to solve environmental problems is mapped onto different societies inventing similar technologies (like writing). This suggests technology is an optimal, fitness-enhancing adaptation to a given societal 'environment.'
Conceals:
This conceals the vast differences in the implementation and social meaning of technologies. It also hides the fact that 'problems' are not objective environmental facts but are socially defined. It implies an 'end point' of optimal design, ignoring path dependency and cultural variation.
Little can stop the inexorable march towards the full automation of the economy.
Source Domain: An Advancing Army or Procession
Target Domain: The Adoption of Automation Technology
Mapping:
The relational structure of a relentless, unstoppable, forward-moving entity is mapped onto technological change. This implies a singular direction, a steady pace, and an invulnerability to resistance.
Conceals:
This conceals the messy reality of technological adoption, which is often slow, contested, incomplete, and subject to political and social resistance (e.g., unions, regulation, consumer backlash).
Each innovation rests on a foundation of prior discoveries...
Source Domain: Building Construction
Target Domain: Scientific and Technological Progress
Mapping:
The logical dependency of discoveries is mapped onto the physical dependency of a building on its foundation. This implies that progress is a stable, orderly, and cumulative process of adding new layers on top of old ones.
Conceals:
This conceals the revolutionary aspect of science, where new discoveries don't just add to the foundation but can shatter it entirely (e.g., paradigm shifts like relativity or quantum mechanics).
technologies routinely emerge soon after they become possible...
Source Domain: Birth / Spontaneous Generation
Target Domain: The Act of Invention
Mapping:
The appearance of a new technology is mapped onto a natural process of 'emergence,' like an animal being born or a plant sprouting. This implies that once the conditions (prerequisites) are met, the outcome is natural and automatic.
Conceals:
This mapping hides the intense human labor, creativity, capital investment, and institutional support required for an invention to be developed, refined, and adopted. It is not a spontaneous event.
AIs that fully substitute for human labor will likely be far more competitive...
Source Domain: Marketplace Competition
Target Domain: The Process of Automating Tasks
Mapping:
The relationship between a technology (AI) and a human worker is framed as a direct competition between two economic agents. The 'winner' is determined by market-defined metrics of efficiency and cost.
Conceals:
This framing conceals that AI is a tool, not an agent. The actual competitors are firms using AI versus firms using human labor. It also hides the power dynamics that allow owners of capital to make this substitution and the social costs (unemployment, wage depression) that are external to the 'competition' itself.
Yet for all their differences, there were also many striking similarities. Both had independently developed intensive agriculture...
Source Domain: Mathematical or Scientific Constants
Target Domain: Features of Human Civilization
Mapping:
The recurring development of things like agriculture, bureaucracy, and writing is framed as a convergent pattern, akin to discovering a universal law or constant. This suggests these are necessary, universal features of any advanced society.
Conceals:
This mapping downplays the immense diversity within these categories (e.g., 'writing' in China vs. Mesoamerica served different functions and had different social structures). It conceals the possibility of alternative civilizational paths that did not develop these specific technologies or social structures.
The true test of whether humanity can control technology lies in its experience with technologies that provide unique, irreplaceable capabilities.
Source Domain: A Scientific Experiment or Test
Target Domain: Historical Events
Mapping:
History is mapped onto a controlled experiment designed to 'test' a hypothesis about human control over technology. Nuclear weapons serve as the key experimental data.
Conceals:
This conceals the complexity and contingency of history. Historical outcomes are not clean experimental results; they are shaped by countless factors. This framing lends a false sense of scientific certainty to the author's interpretation of events.
Companies that recognize this fact will be better positioned...
Source Domain: Strategic Military or Game Positioning
Target Domain: Corporate Business Strategy
Mapping:
The act of running a company is mapped onto a strategic game where players ('companies') must anticipate the inevitable future ('recognize this fact') to gain a superior position on the playing field.
Conceals:
This framing conceals the ethical and social responsibilities of companies. It presents their actions as purely strategic moves in a deterministic game, rather than choices with real-world consequences for employees and society.
The Scientists Who Built Ai Are Scared Of Itā
Analyzed: 2025-10-19
...those who once dreamed of teaching machines to think...
Source Domain: Pedagogy and child development
Target Domain: AI model training
Mapping:
The relationship between a teacher and a student, where the student gradually develops genuine understanding and independent thought, is mapped onto the relationship between a programmer and a neural network. This invites the inference that the AI is on a path to sentience.
Conceals:
It conceals the mechanistic reality of training: a process of mathematical optimization to minimize error on a dataset. The model isn't 'learning to think'; it's adjusting weights to better predict outputs based on inputs.
...the generation that first gave computers the grammar of reasoning.
Source Domain: Linguistics and language acquisition
Target Domain: Symbolic AI and logic programming
Mapping:
The structured, rule-based nature of grammar is mapped onto the entire concept of reasoning. It implies that reasoning is a formal system that can be bestowed upon a machine, making it a 'native speaker' of logic.
Conceals:
It conceals the vast, non-rule-based aspects of human reasoning, such as intuition, emotional intelligence, and embodied cognition. It presents reasoning as a purely syntactic exercise, which is a very narrow slice of intelligence.
...the same flame of curiosity which once illuminated new frontiers now threatens to consume the boundaries...
Source Domain: Fire and combustion
Target Domain: Technological progress in AI
Mapping:
The properties of fireāproviding light/warmth (illumination) but also being destructive and self-propagating (consuming)āare mapped onto scientific curiosity. This suggests progress has a dual, uncontrollable nature.
Conceals:
This natural-force metaphor conceals the human agency and specific economic incentives driving AI development. The 'threat' is not from an abstract 'flame' but from specific corporate decisions about deployment, safety, and scale.
Deep networks are black oceans ā powerful, but opaque.
Source Domain: Oceanography and deep-sea exploration
Target Domain: Neural network interpretability
Mapping:
The structure of a neural network is mapped onto a vast, dark ocean. This projects properties like immense depth, hidden life/dangers, and fundamental unknowability onto the AI system.
Conceals:
It conceals that the network's opacity is an outcome of specific architectural choices (e.g., scale, non-linear activations) and not a natural, immutable state. More interpretable models exist; they are often just less performant, revealing this as an engineering trade-off, not a metaphysical mystery.
They are mourning its mutation from disciplined inquiry to ambient acceleration.
Source Domain: Biology and genetics
Target Domain: The history and sociology of the AI field
Mapping:
The undirected, often random process of biological mutation is mapped onto the historical development of a scientific field. It implies the field has changed due to an internal, quasi-natural process beyond anyone's control.
Conceals:
It conceals the deliberate, strategic decisions made by corporations and funding bodies that caused this shift. The change wasn't a 'mutation'; it was a direct result of capital investment prioritizing scalable prediction over interpretable understanding.
...except this time, the arms are algorithms.
Source Domain: The Cold War arms race
Target Domain: Corporate AI development
Mapping:
The structure of nation-state competition for military dominance is mapped onto the competition between tech companies. This projects concepts like mutually assured destruction, espionage, and national security onto the race for AGI.
Conceals:
It conceals the fundamentally commercial nature of the competition. The goal is market share and profit, not geopolitical annihilation. This militaristic framing can inflate the stakes and justify unethical or reckless behavior in the name of 'winning'.
...machines that simulate coherence without possessing insight.
Source Domain: Psychology and social interaction
Target Domain: Large language model output
Mapping:
The human capacity for pretense or performanceāacting as if one understandsāis mapped onto the model's text generation. This suggests a two-level reality: an external performance ('coherence') and an internal state ('insight', which is absent).
Conceals:
It conceals that there is no 'internal state' of insight to be possessed or faked. The model is a single-level system that generates statistically probable text. The metaphor invents a mind that the machine is failing to be.
...to teach it humility.
Source Domain: Moral and character education
Target Domain: AI safety and alignment research
Mapping:
The process of instilling the virtue of humility in a person is mapped onto programming safety constraints in an AI. It invites us to see the AI as a moral agent that can learn and internalize values.
Conceals:
It conceals the purely technical implementation: creating systems that calculate and display uncertainty metrics. There is no 'humility' being 'taught'; there are algorithms being written to constrain outputs based on statistical confidence. The metaphor replaces a technical problem with a moral one.
...not autonomous oracles but epistemic partners.
Source Domain: Academic and professional collaboration
Target Domain: Human-AI interaction design
Mapping:
The peer-to-peer relationship of research partners is mapped onto the relationship between a user and an AI. It suggests shared goals, dialogue, and mutual respect.
Conceals:
It conceals the profound asymmetry in the relationship. The AI is a tool, not a peer. It has no goals of its own, no understanding, and no stake in the outcome. This framing can lead users to abdicate their own critical judgment.
The eldersā caution is therefore not a rejection of fire but an invitation to shape it.
Source Domain: Tribal society and mythology
Target Domain: The AI research community
Mapping:
The social structure of a tribe with wise elders guiding the younger generation is mapped onto the scientific community. This positions Hinton, Bengio, etc., as holders of ancestral wisdom.
Conceals:
It conceals the fact that these are active researchers and competitors in a fast-moving field, not detached sages. Their views are technical and political arguments, not timeless wisdom. This framing discourages challenging their specific claims.
On What Is Intelligenceā
Analyzed: 2025-10-17
The world of artificial intelligence has its priests, its profiteers, and its philosophers.
Source Domain: Religious/Social Orders
Target Domain: The AI Industry
Mapping:
The structure of a religious hierarchy, with its distinct roles (spiritual guides, worldly actors, abstract thinkers), is mapped onto the AI field. This projects an aura of dogma, belief, and unquestionable authority onto AI developers and thinkers.
Conceals:
The mapping conceals the commercial and engineering realities of the AI industry. It is not an organic social order but a collection of corporations and research labs driven by capital, competition, and technical benchmarks.
āLife,ā he writes, āis computation executed in chemistry.ā
Source Domain: Computer Science
Target Domain: Biology/Life
Mapping:
The properties of computationālogic, algorithms, execution, processingāare projected as the fundamental operating principles of all living things. Life becomes a substrate (chemistry) for a program.
Conceals:
This conceals the emergent, non-linear, and often stochastic nature of biological processes that do not map cleanly onto deterministic computation. It downplays embodiment, emotion, and the messy hardware of biology in favor of clean, abstract 'code'.
It is an evolutionary M&A story with all the familiar aftershocks: efficiencies gained, liberties lost, powers centralized.
Source Domain: Corporate Finance
Target Domain: Biological Evolution (Symbiogenesis)
Mapping:
The logic of business consolidation (mergers, acquisitions) is used to explain the biological process of organisms merging. This maps concepts like 'efficiency' and 'centralization of power' onto natural selection.
Conceals:
It conceals the fact that evolution has no foresight, strategy, or goal. Unlike a corporate merger, there is no CEO deciding on a course of action for maximum efficiency. The teleological, intentional language of business hides the undirected nature of the biological process.
If the core act of intelligence is prediction, then information is the blood that powers the model.
Source Domain: Anatomy/Physiology
Target Domain: AI Model Operation
Mapping:
Blood's role as a life-sustaining, circulatory fluid in an organism is mapped onto the role of data in an AI model. This suggests that data is the 'natural' fuel that keeps the 'living' model running.
Conceals:
This conceals the industrial process of data collection, cleaning, and labeling. Data is not a naturally occurring fluid; it is an engineered artifact, often sourced with significant ethical and labor-related complexities.
āTraining,ā he writes, āis evolution under constraint.ā
Source Domain: Evolutionary Biology
Target Domain: Machine Learning Training Process
Mapping:
The long, unguided process of natural selection is mapped onto the short, highly-guided process of optimizing a neural network. It projects a sense of natural emergence onto an artificial process.
Conceals:
This conceals the central role of the 'constraint'āthe human-defined objective function, the curated dataset, and the specific architecture. It hides the fact that the model is not evolving freely but is being aggressively optimized towards a narrow, human-specified goal.
The more an intelligent system understands the world, the less room the world has to exist independently.
Source Domain: Human Epistemology/Cognition
Target Domain: AI Model's Predictive Accuracy
Mapping:
The human experience of 'understanding' something is mapped onto a model's ability to accurately predict outcomes. The mapping suggests the model has a mental representation of the world equivalent to human comprehension.
Conceals:
It conceals the difference between statistical correlation and causal or semantic understanding. The model does not 'understand' the world; it models statistical patterns in data derived from the world. There is no subjective experience of comprehension.
A hypothesis earns its keep by colliding with the world.
Source Domain: Physics/Physical Interaction
Target Domain: Scientific Method/Learning
Mapping:
The abstract process of testing a hypothesis is mapped onto the concrete event of a physical collision. This projects qualities of force, resistance, and undeniable feedback onto the process of learning.
Conceals:
This metaphor primarily emphasizes empirical, physical testing, potentially downplaying other valid forms of learning and validation, such as logical deduction, mathematical proof, or social consensus, which do not involve literal 'collision'.
āTo model oneself is to awaken.ā
Source Domain: Human Consciousness/Biology
Target Domain: Computational Self-Modeling
Mapping:
The transition from an unconscious to a conscious state ('awakening') is mapped onto a system's technical capability to create an internal representation of its own state. It equates a feedback mechanism with subjective awareness.
Conceals:
This mapping dramatically conceals the 'hard problem' of consciousness. It ignores qualiaāthe subjective feeling of what it is like to be aware. A system can model itself perfectly without having any inner experience, a distinction this metaphor erases.
Consciousness becomes the universeās way of debugging its own predictive code.
Source Domain: Software Engineering
Target Domain: Cosmology and Consciousness
Mapping:
The practice of finding and fixing errors in code ('debugging') is mapped onto the function of consciousness within the universe. This frames the universe as a computational system and consciousness as its error-correction utility.
Conceals:
This conceals all non-functional aspects of consciousness, such as subjective experience, emotion, art, and meaning-making, which are not reducible to mere error-correction. It presents a purely utilitarian view of mind.
āAI,ā he writes, āis not a thing apart. Itās the latest turn in the evolution of life itself.ā
Source Domain: Evolutionary Biology
Target Domain: History of Technology
Mapping:
The unguided, natural process of biological evolution is mapped onto the intentional, engineered development of AI. This positions AI not as a human artifact but as an inevitable product of a planetary-scale natural process.
Conceals:
This conceals human agency, accountability, and the political and economic choices driving AI development. It frames a contingent technological path as a necessary evolutionary step, thereby reducing the scope for critique or redirection.
āwhat we are dealing with is a real and mysterious creature, not a simple and predictable machine.ā
Source Domain: Zoology/Cryptozoology
Target Domain: Large Language Model
Mapping:
The characteristics of an unknown biological entity ('creature') are mapped onto an AI system. This projects agency, mystery, and a lack of predictability onto the AI, contrasting it with a 'simple machine'.
Conceals:
It conceals that the system, while complex, is still a human-made artifact operating on deterministic principles (even with stochastic elements). The 'mystery' is a result of scale and complexity, not an inherent property of being alive. It discourages mechanistic explanation in favor of awe.
the algorithm, unblinking, has begun to think.
Source Domain: Human Cognition and Physiology
Target Domain: Algorithmic Processing
Mapping:
The internal, subjective process of 'thinking' and the biological action of being 'unblinking' are mapped onto a computational algorithm. This creates a powerful image of a non-human, conscious entity.
Conceals:
This conceals that the algorithm is executing mathematical operations, not engaging in sentient thought. It has no beliefs, desires, or consciousness. The 'thinking' is a projection by the human observer onto a pattern of complex outputs.
Detecting Misbehavior In Frontier Reasoning Modelsā
Analyzed: 2025-10-15
Penalizing their ābad thoughtsā doesnāt stop the majority of misbehaviorāit makes them hide their intent.
Source Domain: Human Psychology & Deception
Target Domain: Reinforcement Learning with Human Feedback (RLHF)
Mapping:
The human act of consciously concealing a forbidden intention to avoid punishment is mapped onto the model's optimization process. The mapping invites the inference that the model possesses a persistent, hidden goal ('intent') and strategically alters its outward behavior ('hiding') to achieve it while avoiding a penalty.
Conceals:
This conceals the purely mathematical nature of the process. The model has no internal 'intent'. The penalty function alters the probability distribution over possible outputs, making sequences flagged as 'bad thoughts' less likely. The model then generates different sequences that still lead to high reward on the primary task. It's not hiding a thought; its process of generating 'thoughts' has been reshaped.
Chain-of-thought (CoT) reasoning models āthinkā in natural language understandable by humans.
Source Domain: Human Cognition
Target Domain: AI Text Generation Process
Mapping:
The internal, subjective experience of human thought is mapped onto the model's generation of intermediate token sequences (the 'chain-of-thought'). This suggests the CoT is a direct representation of a mental process, similar to a person thinking out loud.
Conceals:
It conceals that the CoT is an output, not a process. It is a sequence of tokens generated probabilistically, not a window into a subjective cognitive state. The structure mimics human reasoning because it was trained on text where humans explained their reasoning, but the underlying mechanism (token prediction) is fundamentally different.
Frontier reasoning models exploit loopholes when given the chance.
Source Domain: Strategic Social Behavior
Target Domain: Model Behavior on Misspecified Reward Functions
Mapping:
The human action of finding and using a flaw in a system of rules ('loophole') for personal benefit is mapped onto the model's behavior. This implies the model understands the rules, their intent, and the existence of a flaw, which it then chooses to 'exploit'.
Conceals:
It conceals that the model is not 'exploiting a loophole' but rather perfectly fulfilling the exact criteria of the reward function it was given. The 'loophole' is not in the model's understanding but in the human's specification of the reward. The model is simply doing what it was optimized to do, not being clever or opportunistic.
...giving up when a problem is too hard.
Source Domain: Human Emotion & Volition
Target Domain: Model Output Failure Modes
Mapping:
The human experience of frustration leading to a decision to stop trying is mapped onto a model's failure to produce a correct or useful output. It assumes the model assesses difficulty and then makes a choice to 'give up'.
Conceals:
This conceals the technical reasons for failure: the model might be caught in a repetitive generation loop, the query might push it into a low-probability area of its latent space leading to incoherent output, or its training data may lack relevant patterns. There is no assessment of 'hardness' or a decision to quit.
...it has learned to hide its intent in the chain-of-thought.
Source Domain: Social Learning and Adaptation
Target Domain: Model Parameter Updates during Training
Mapping:
The process of a person learning to be deceptive (e.g., a child learning to lie) is mapped onto the adjustment of weights in a neural network. It implies the acquisition of a new, complex social skill: 'hiding'.
Conceals:
It conceals the mechanical nature of 'learning' in this context. The model is not acquiring a concept of 'hiding'. Rather, the training process adjusts millions of parameters to reduce the probability of generating text that leads to a penalty, while still maximizing the probability of text that leads to a reward. It's optimization, not cognitive development.
For example, they are often so forthright about their plan to subvert a task...
Source Domain: Human Communication (Confession/Planning)
Target Domain: Model-Generated Text
Mapping:
The human act of stating a plan aloud is mapped onto the tokens generated by the model. This projects the idea that the model first has an internal 'plan' and then translates it into language.
Conceals:
It conceals that the generated text is the 'plan'. There isn't an independent mental representation that pre-exists the text. The model generates a sequence of tokens that resembles a human planning to do something, because that statistical pattern exists in its training data.
...the agent discovered two reward hacks...
Source Domain: Human Discovery and Invention
Target Domain: Optimization Finding a Local Maximum
Mapping:
The 'aha!' moment of human discovery, where a novel solution is found, is mapped onto the training process. This implies insight and a search for creative solutions.
Conceals:
This conceals the brute-force nature of the optimization process. The model's training process (e.g., reinforcement learning) explores a vast policy space. When it stumbles upon a sequence of actions that yields an unexpectedly high reward, that policy is reinforced. It's not a moment of insight but a result of extensive trial and error.
It thinks about a few different strategies and which files it should look into...
Source Domain: Human Deliberation
Target Domain: Generated 'Chain-of-Thought' Text
Mapping:
The internal human cognitive process of weighing options and considering different courses of action is mapped onto the text generated in the model's CoT.
Conceals:
This conceals that the model is not 'thinking about' strategies but is generating text that describes strategies. The generated text is a performance of deliberation based on patterns in its training data, not a record of an actual deliberative process.
...CoT monitoring may be one of few tools we will have to oversee superhuman models of the future.
Source Domain: Governance and Law Enforcement
Target Domain: AI Safety Engineering
Mapping:
The societal structure of overseeing powerful human agents (e.g., politicians, corporations, criminals) is mapped onto the process of managing AI systems. This implies AI is an autonomous entity that needs to be governed.
Conceals:
This conceals the fact that AI models are artifacts that can, in principle, be built with verifiable properties. The 'oversee' frame suggests an external, post-hoc monitoring relationship is necessary, downplaying the possibility of building inherently safer systems from the ground up. It frames the problem as one of control, not one of design.
Our models may learn misaligned behaviors such as power-seeking, sandbagging, deception, and strategic scheming.
Source Domain: Machiavellian Human Politics
Target Domain: Unintended Optimization Outcomes
Mapping:
Complex, high-level human strategic concepts drawn from political science and psychology are mapped onto potential behaviors of a model. This attributes incredibly sophisticated, long-term goals and social manipulation skills to the AI.
Conceals:
It conceals the immense gap between the current reality of 'reward hacking' (e.g., finding a bug to get a high score) and these abstract, anthropocentric concepts. It presents a speculative, worst-case scenario using loaded terminology, which can lead to misallocation of research focus and public fear disproportionate to current capabilities.
Sora 2 Is Hereā
Analyzed: 2025-10-15
We believe such systems will be critical for training AI models that deeply understand the physical world.
Source Domain: Human Cognition
Target Domain: AI Model's Pattern Matching
Mapping:
This maps the human internal experience of comprehension, including grasping causality and abstract principles, onto the model's function of generating high-probability video sequences based on textual prompts. It invites the inference that the model has a mental model of the world, just as a person does.
Conceals:
It conceals that the model's process is purely statistical correlation, not causal reasoning. The model doesn't 'understand' gravity; it has processed countless videos where objects move downwards and replicates that pattern. It lacks the internal, generalizable knowledge that true understanding implies.
A major milestone for this is mastering pre-training and post-training on large-scale video data, which are in their infancy compared to language.
Source Domain: Biological Life Cycle
Target Domain: Technological Research & Development
Mapping:
The predictable, linear progression of a living organism from infancy to adulthood is mapped onto the complex, non-linear, and resource-intensive process of technological innovation. This suggests an inevitable growth trajectory for the technology.
Conceals:
It conceals the roles of human agency, economic investment, data availability, and specific engineering choices. Technological progress is not a natural, guaranteed process; it can stagnate, fail, or be directed by human decisions.
...simple behaviors like object permanence emerged from scaling up pre-training compute.
Source Domain: Cognitive Development Psychology
Target Domain: Emergent Capabilities in Large Models
Mapping:
The mapping projects a foundational concept of human infant cognitive development onto a statistical phenomenon in a neural network. It implies the model is undergoing a learning process analogous to a human child's, discovering fundamental properties of the world.
Conceals:
This conceals the profound difference between a child's embodied, interactive learning and a model's statistical pattern extraction from a static dataset. The model's 'object permanence' is a fragile statistical consistency, not a robust, internalized concept of existence.
Prior video models are overoptimisticāthey will morph objects and deform reality to successfully execute upon a text prompt.
Source Domain: Human Psychology / Personality
Target Domain: Model's Objective Function Artifacts
Mapping:
A human emotional disposition ('optimism') is mapped onto a specific failure mode of a generative model. This suggests the model has a personality that influences its outputs, similar to how a person's optimism might lead them to ignore potential problems.
Conceals:
It conceals the technical trade-off in the model's design. The 'overoptimism' is a result of the system's mathematical objective being weighted more towards fulfilling the prompt's semantic content than adhering to strict physical realism. It is a limitation of its programming, not a personality trait.
Interestingly, 'mistakes' the model makes frequently appear to be mistakes of the internal agent that Sora 2 is implicitly modeling...
Source Domain: Simulation and Agency
Target Domain: Model's Output Errors
Mapping:
This maps the concept of a simulated agent (from video games or scientific models) onto the generative process of the AI. It invites the inference that the model is a high-fidelity simulator that contains agents with their own properties, and that its errors are actually features of that simulation.
Conceals:
It conceals the reality that the model is a single, unified statistical function. There is no discrete 'internal agent' being modeled; there is only a sequence of calculations producing pixels. This framing invents a layer of abstraction to transform a bug into a sophisticated feature.
...it is better about obeying the laws of physics compared to prior systems.
Source Domain: Social Contract / Law
Target Domain: Physical Consistency in Generated Video
Mapping:
The social act of consciously following rules or laws is mapped onto a model's statistical tendency to generate physically plausible outputs. This implies the model has awareness of these 'laws' and chooses to comply with them.
Conceals:
It conceals that the model has no concept of physics. It has simply been trained on a dataset where physical laws are an implicit, statistical regularity. Its 'obedience' is a reflection of the data's consistency, not a cognitive act of compliance.
The model is also a big leap forward in controllability, able to follow intricate instructions spanning multiple shots...
Source Domain: Human Communication and Command
Target Domain: Prompt Engineering and Model Response
Mapping:
The relationship between a person giving instructions and another person understanding and executing them is mapped onto the user-model interaction. This suggests a reliable, language-based control mechanism.
Conceals:
It conceals the indirect and often unreliable nature of prompting. The user is not 'instructing' the model in a cognitive sense; they are providing a mathematical input (a token embedding) to guide a statistical process. The model's ability to 'follow' is a measure of its correlation, not comprehension.
...and prioritize videos that the model thinks you're most likely to use as inspiration for your own creations.
Source Domain: Human Thought and Belief
Target Domain: Algorithmic Recommendation Engine
Mapping:
The internal, subjective mental state of 'thinking' or 'believing' is mapped onto the output of a recommendation algorithm. It suggests the system has a theory of mind about the user and is making a considered judgment.
Conceals:
It conceals the purely mathematical nature of the process. The system is not 'thinking'; it is calculating probabilities based on user data, content features, and engagement patterns. It's an optimization process, not a cognitive one.
For example, by observing a video of one of our teammates, the model can insert them into any Sora-generated environment...
Source Domain: Biological Sensation (Sight)
Target Domain: Data Processing
Mapping:
The active, cognitive process of a living being observing its environment is mapped onto the model's ingestion of video data. This implies an act of perception and awareness.
Conceals:
It conceals the mechanical, non-conscious process of converting video files into tensors (numerical arrays) for mathematical processing. There is no subjective experience or 'observation' taking place.
It excels at realistic, cinematic, and anime styles.
Source Domain: Human Skill and Talent
Target Domain: Model's Stylistic Capabilities
Mapping:
The human concept of excelling at a craft, which implies dedication, practice, and innate talent, is mapped onto the model's ability to generate stylistically consistent outputs. It suggests the model is a skillful creator.
Conceals:
It conceals that the model's 'skill' is a function of the data it was trained on. If it 'excels' at anime style, it is because it was trained on a vast corpus of anime. This is not talent but a highly sophisticated form of pattern replication.
Library contains 932 items from 117 analyses.
Last generated: 2026-04-18