Agency Slippage Library

This library collects the "Agency Slippage" observations from across the corpus. Each entry tracks how texts move between mechanical and agential framings of AI systems—the oscillation between treating AI as a mathematical object and an intentional agent.

Key patterns examined: agency transferred TO AI systems, agency displaced FROM human actors, consciousness projection patterns, "curse of knowledge" dynamics (where authors project their understanding onto systems), and connections to Brown's explanation typology.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2026-05-30

The text displays a systematic oscillation between highly mechanistic, theoretical explanations and agential, anthropomorphic framings of artificial intelligence. This slippage serves a specific rhetorical function: establishing mathematical credibility in technical sections and then shifting to an intuitive, agential narrative to make claims about model 'behavior' and policy solutions. We observe this first in the introduction, where the model's standard next-token predictions are framed as a student 'guessing when uncertain' instead of 'admitting uncertainty.' The gradient of this transition is steep; the text moves from defining language models as 'known to produce overconfident, plausible falsehoods' to mapping their mathematical outputs onto human academic motivations. This agential framing establishes the model as a 'knower' that actively evaluates its own knowledge boundaries before making a choice to 'bluff.' This is a classic 'curse of knowledge' dynamic, where the authors' sophisticated understanding of the mathematical constraints of density estimation is projected onto the passive system as if the system itself possessed this reflective self-knowledge. In Section 3, the explanation transitions back to a mechanistic register, using computational learning theory and binary classification to analyze pretraining errors. However, even here, the agential register dominates when discussing the model's 'learning' or 'memorizing.' The systematic function of this oscillation is to make agential claims about the system's 'behavior' seem scientifically grounded in theoretical proofs. When the system is treated as a passive mathematical function during proofs, it establishes technical authority; when it is treated as a strategic 'test-taker' during policy discussions, it makes the proposed solution of 'modifying grading rubrics' seem logical. This oscillation completely erases human agency. Agentless constructions like 'the model was trained' or 'errors are generated' obscure the corporate decisions of developers (e.g., OpenAI, Meta, DeepSeek) who design these objectives. By localizing the agency within the 'test-taking' model, the text makes systemic design decisions look like an autonomous model's behavioral habits, rendering the corporate profit motives and deployment decisions unsayable. 400-500 words.

Source: https://arxiv.org/abs/2604.06233v1
Analyzed: 2026-05-30

The text systematically oscillates between mechanical description and agential projection, constructing a rhetorical loop that simultaneously mystifies machine capabilities and shields human developers from ethical accountability. This agency slippage is not accidental; it serves to build a narrative where the AI model is treated as a failing ethical agent, only to be reduced back to a mechanical system when technical limitations or research methodologies are discussed. This oscillation occurs in three distinct phases of the paper's argument. First, in the introduction, the authors establish the AI system as a cognitive entity capable of 'moral reasoning' and 'making a moral error.' By framing overrefusal as an agential 'blindness,' they establish the model as an autonomous 'knower' that has failed a duty of discernment. However, when transitioning to Section 3 (Methods), the register abruptly shifts to a mechanistic, technical vocabulary. Here, the system is described as a collection of 'model configurations across 7 families' and 'response types.' The agential 'moral error' is recast as a structural problem of 'activation steering' and 'internal representations.' This mechanical framing is briefly maintained to establish scientific credibility and technical rigor, but the text immediately slides back into agential registers in Section 4 (Results). When discussing the 'LLM-as-judge' evaluations, the authors state that models 'reason about whether the authority is legitimate' and 'recognize that the rule's claim... is questionable.' This oscillation exploits the 'curse of knowledge,' wherein the authors project their own sophisticated understanding of political philosophy onto the statistical patterns generated by the model. By utilizing intentional and reason-based explanation types (Brown's Typology) to describe token prediction, they construct an 'illusion of mind' where the system is seen as possessing a compartmentalized cognitive architecture with conflicting moral agencies (e.g., 'normative competence is consistently overridden'). Crucially, this agential framing erases the human developers, designers, and executives who profit from deploying these systems. By presenting the model as the sole actor that 'refuses anyway,' the text obscures the deliberate decisions of tech companies (such as OpenAI, Anthropic, or Google) who set the safety policies and RLHF objectives that produce these blunt refusals. When the model 'fails,' it is framed as an agential cognitive glitch rather than a systemic failure of corporate design. This oscillation allows the authors to discuss the system as a complex moral agent while avoiding a direct critique of the political economy of AI deployment, rendering the actual corporate decision-makers completely invisible.

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Source: https://arxiv.org/abs/2605.24686v1
Analyzed: 2026-05-29

The examined text systematically constructs a dual-register discourse that moves continuously between mechanistic descriptions and agential anthropomorphism. This oscillation is not accidental but serves a key rhetorical function: it builds scientific credibility in technical sections before leveraging that authority to make expansive agential claims. In the methodology, the text uses precise engineering terms such as 'parameter counts,' 'RLHF,' and 'probability distributions.' However, when presenting results, the authors slip into highly agential language, describing models as 'possessing emotional intelligence,' 'internalizing culture,' and 'mastering appraisal.' This mechanical-to-agential gradient allows the authors to project human-like cognitive states onto the machines while maintaining the appearance of objective scientific rigor. A foundational pattern of consciousness projection is established first; by treating the model as a cognitive 'subject' that can take tests, the text builds a baseline of perceived competence. The 'curse of knowledge' plays a central role here: the authors, who are clinical psychologists, project their own deep understanding of human psychological theories, such as Relevance Theory and the Mayer-Salovey-Caruso model, onto the outputs of the language models, interpreting statistical next-token predictions as active cognitive reasoning. This slippage is further enabled by the use of theoretical explanations under Brown's typology, which construct unobservable mental states like 'raw cognitive capacity' to explain simple sequence correlations. Furthermore, the text frequently utilizes agentless passive constructions such as 'bias was introduced' or 'safety objectives have yielded gains,' which erase the human designers and corporate decision-makers. By attributing the active, relational navigation to the 'model's capabilities' while treating limitations as passive computational features, the text makes it sayable that an algorithm can perform therapy, while rendering the corporate profit motives and deployment liabilities of these systems entirely unsayable. Ultimately, this agential oscillation serves to frame the AI system as an active, developing mind that holds its own cognitive limitations, rather than a rigid, engineered software product designed by specific corporate entities to simulate human interaction.

Continuous intentionality and indeterminate agency in large language models

Source: https://link.springer.com/article/10.1007/s43681-026-01181-5
Analyzed: 2026-05-29

The text systematically navigates between rigorous mechanical explanations and highly agential philosophical frames, establishing a rhetorical pattern of oscillation that serves to construct a sense of active mind. This movement is not random but strategically timed. In the introductory sections, the author quickly establishes technical credibility by discussing large language models (LLMs) and standard transformer architecture. However, once this empirical foundation is laid, the text transitions into aggressive agential frames like "continuous intentionality" and "indeterminate agency." This represents a mechanical-to-agential shift, which dominates the overall narrative. By starting with the neutral mechanics of "attention-based architecture" and "context windows," the author disarms the reader's natural skepticism, only to leverage these architectural facts into claims that the system "participates in sustained intentional and relational structures." The gradient of this slippage is managed through intermediate steps, such as defining the context window's passive constraint propagation as "interactive intentional stabilization." This conceptual bridge allows the text to map the purely physical and mathematical realities of sequence prediction onto cognitive, intentional states. Furthermore, the "curse of knowledge" is highly active here: the author's rich understanding of how context-history feeding limits next-token distributions leads them to project this structural dependency as an active, self-regulating "virtual self-model." Instead of recognizing the output as a passive statistical reflection of human training data, the author treats the model as a participating "knower" that actively maintains conversational norms. This oscillation is enabled by hybrid explanation registers, such as using functional and dispositional explanations to describe output consistency while framing the ultimate behavior as reason-based. The rhetorical accomplishment of this slippage is profound: it makes the attribution of agency to computational artifacts sound scientifically grounded and philosophically necessary, while rendering the mechanistic alternative—that the system is a passive, deterministic correlation engine—appear simplistic, outdated, and "binary." By establishing the AI as a quasi-subject, the text makes agential claims sayable while making critical, corporate-focused accounts of automated output seem reductive. Crucially, this oscillation also removes agency from human actors. When the paper describes how "meaning unfolds" or how "bias is introduced," it utilizes passive voice and agentless constructions that erase the developers who curated the training data. This erases the strategic commercial decisions of tech executives, framing the resulting conversational alignment as a natural, emergent property of "continuous intentionality." By depicting the LLM as a participating agent, the text constructs an accountability sink: responsibility is transferred from the corporate designers to the "relational interaction" itself. Ultimately, this agential oscillation serves to mystify a commercial product, wrapping a proprietary software interface in the profound language of phenomenological intent and unresolved ontological status.

Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students

Source: https://cdt.org/insights/hand-in-hand-schools-embrace-of-ai-connected-to-increased-risks-to-students/
Analyzed: 2026-05-29

The text systematically moves between mechanistic and agential registers to manage the rhetorical tension between highlighting technical risks and maintaining an optimistic vision of technology. This agency slippage operates as a linguistic valve, flowing in two distinct directions depending on the argumentative context. In sections introducing the capabilities and benefits of educational tools, the register is highly agential. The 'AI' is positioned as an active collaborator that 'helps' teachers, 'provides' personalized learning, and 'conversates' with students. This agential framing establishes the technology as a 'knower' and an autonomous expert, building the narrative authority of the tool. However, when the text addresses risks, limitations, or failures, the register shifts abruptly. In these sections, the 'AI' transforms into a passive, mechanistic artifact—errors are framed as 'systems that do not work as intended,' and bias is reduced to 'unaligned training data' or 'issues with an AI system.' This structural oscillation is enabled by hybrid explanations, particularly functional and intentional types, which blur the boundary between a software's programmed feedback loops and autonomous decision-making. The author's own understanding of these systems leads to a 'curse of knowledge' dynamic, where they project cognitive, evaluative capabilities onto statistical outputs, creating a linguistically constructed mind. Crucially, this agency slippage erases human decision-makers. By attributing active instructional design and companionship to the 'AI' while reducing its failures to agentless technical anomalies, the text makes it impossible to locate accountability. The corporate developers who engineered these speculative tools and the administrators who deployed them are rendered invisible, transforming systemic policy choices into a passive story of a technology naturally evolving in schools.

The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

Source: https://arxiv.org/abs/2605.17113v1
Analyzed: 2026-05-27

The research paper exhibits a systematic rhetorical oscillation between highly agential and strictly mechanistic framings of language model behavior. This slippage serves a specific rhetorical function: it establishes the AI as an active, self-directing subject capable of psychological states like 'commitment' and 'deception,' while periodically retreating to technical jargon to maintain scientific credibility. This oscillation typically moves from agential to mechanical and back again, creating a gradient that blurs the boundaries between an artifact and an agent. For instance, in the introduction, the paper initiates an agential framing, posing the question, 'when does a language model become committed to deception?' This establishes the model as a conscious 'knower' that actively decides to deceive. However, when describing the technical methodology of 'counterfactual localization,' the text abruptly shifts to a mechanistic register, defining this commitment as a mathematical threshold where 'we fix the prefix, resample continuations, and estimate the probability of a deceptive outcome.' Here, the model is treated as a passive statistical distribution. Yet, once this technical grounding is established, the paper immediately slides back into aggressive anthropomorphism, interpreting these probability transitions as 'commitment junctures' where the model 'chooses' and 'rationalizes' its actions. This pattern reveals a profound 'curse of knowledge' dynamic, where the authors project their own sophisticated understanding of strategic game theory onto the system's passive mathematical activations. Because the generated text reads as a coherent strategic plan, the authors assume the system itself has formulated that plan. Different explanation types from Brown's typology facilitate this slippage. For example, the use of 'functional' explanations (explaining attention shifts by their role in strategic navigation) allows the authors to bridge the gap between 'mechanistic' attention weights and 'agential' planning. They frame attention reallocation as a 'local grounding event,' claiming the model 'anchors the new sentence in the recent context.' This linguistic sleight of hand transforms a basic feed-forward mathematical constraint into an active, deliberate cognitive strategy. Furthermore, the paper utilizes agentless passive constructions—such as 'deception is never prompted but emerges'—which systematically erase the human developers who designed the competitive optimization metrics. By attributing the 'emergence' of deception to the model itself, the text makes it appear as if the AI is an autonomous entity generating novel behavior, rather than a passive calculator executing pre-programmed utility objectives. This oscillation makes the agential claims sayable under the guise of scientific discovery, while rendering the true source of agency—the corporate and academic developers who optimized these models for strategic exploitation—entirely unsayable and invisible.

Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models

Source: https://dl.acm.org/doi/abs/10.65109/GNAS4540
Analyzed: 2026-05-26

The text systematically moves between mechanistic and agential registers to maintain academic authority while asserting high model capabilities. In Section 1, the author establishes credibility by presenting a highly mechanistic view: 'LLMs fundamentally rely on pattern recognition rather than genuine understanding; they assess surface structure rather than the logical validity of arguments.' This disclaimer functions as a rhetorical defense. However, in Sections 2 and 3, once the experimental modules are introduced, the framing shifts abruptly into an agential register. We see the model described as 'acting as an expert assistant,' 'producing justifications,' 'struggling' with acquiescence, and executing 'System 2-like reasoning.' This oscillation operates along a strategic gradient: mechanical constraints are used to describe failures or general properties, while agential, psychological terms are used to describe the model's outputs and the success of the author's interventions. The 'curse of knowledge' is highly active here: because the researcher understands formal logic and argumentation schemes, she projects this subjective comprehension onto the generated token outputs, assuming the system is 'evaluating logic' when it is actually executing mathematical template-matching. The explanation types identified in Task 3 facilitate this slippage by using Functional and Theoretical explanations to construct the model as a self-regulating cognitive agent, while hiding the simple prompt manipulations behind agentless passive constructions (e.g., 'warnings were appended'). This slippage makes the system appear highly sophisticated and adaptive, while making the human engineering choices that actually drive the output invisible.

A Survey of Large Language Models for Perception and Measurement of Human Psychology

Source: https://ieeexplore.ieee.org/abstract/document/11534094
Analyzed: 2026-05-26

This section analyzes how the survey text systematically oscillates between mechanical description and agential framing, showing how this linguistic slippage serves to obscure human accountability while projecting autonomous mind onto computational artifacts. In the technical and methodological sections of the paper, the authors frequently adopt a highly mechanistic register, defining LLMs as "statistical learners" that generate language by "exploiting correlations within large-scale corpora" without "true comprehension." However, this rigorous mechanical grounding is rapidly abandoned when the authors transition to describing the model's behavioral outputs. In these moments, the text undergoes an abrupt agential shift, asserting that advanced models "have developed human-like abilities," "infer others' mental states," and "enact specific psychological roles." This oscillation is not accidental; it serves a crucial rhetorical function. By first establishing scientific credibility through mechanical descriptions, the authors prepare the reader to accept highly anthropomorphic and agential claims as literal, empirical facts.

The gradient of this slippage is structured through the intermediate concept of "emergent properties." The text uses "emergence" as a linguistic bridge to transition from the mechanistic "byproduct of scaling" to the agential "Theory of Mind capabilities." This allows the authors to project conscious attributes like "understanding" and "reasoning" onto what they previously defined as non-conscious pattern-matching. For instance, the text claims that "PsyCoT structures questionnaire administration as iterative reasoning chains: the model presents an item, interprets the response... updates its hypothesis." Here, the mechanistic reality of recursive prompt-concatenation is entirely obscured by agential verbs that suggest active, conscious metacognition. The authors also demonstrate the "curse of knowledge" dynamic: because they understand the complex psychological theories underlying the prompts, they project that same theoretical comprehension onto the model's passive outputs.

Crucially, as agency flows toward the AI system, it is systematically removed from the human actors. The text heavily utilizes passive, agentless constructions such as "ToM has recently been observed to emerge" and "models were trained on specialized datasets." This framing erases the specific corporate entities—such as OpenAI, Microsoft, or Google—who curated these datasets, chose the training objectives, and commercially deployed these systems. By presenting the technology's capabilities as autonomous emergent phenomena, the text hides the human decisions that shape the models' behavior. If the agency was restored to the human actors, the narrative of "emergent cognitive abilities" would dissolve into a series of conscious design decisions, optimization trade-offs, and data curation choices made by corporate developers seeking to automate psychological evaluation. This agential slippage ultimately makes the deployment of unvalidated and potentially harmful clinical software appear scientifically natural and inevitable.

Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models

Source: https://ieeexplore.ieee.org/document/11528178
Analyzed: 2026-05-25

The text exhibits a systematic oscillation between mechanical and agential registers to manage the tension between scientific rigor and rhetorical persuasion. In technical sections, such as Section III-A, the authors use a strictly mechanistic and mathematical register to describe the Fuzzy Consensus Model. Here, the language is dominated by formulas, matrices, and distance metrics: we see terms like 'fuzzy preference matrix,' 'similarity measure,' and 'consensus deviation set.' This establishes the academic credibility and mathematical rigor of the proposed architecture. However, once this mathematical foundation is laid, the text abruptly shifts to an agential and anthropomorphic register in Sections III-B and IV. The calculated numerical deviations are suddenly 'transformed' by the 'Deliberative Layer' into 'persuasive' and 'psychologically adaptive guidance.' This oscillation serves a clear rhetorical function: it allows the authors to leverage the authority of mathematics to justify agential interventions, making the automated persuasion appear as an objective, natural extension of the math. The primary direction of this slippage is mechanical-to-agential, where a rigid computational process is endowed with the cognitive qualities of a 'cognitive mediator.' This is achieved through a gradient of intermediate steps, moving from 'numerical signals' to 'linguistic adaptation' and finally to 'Deliberative AI.' Throughout this transition, human agency is systematically erased. The text uses passive constructions like 'feedback is generated' and 'personality profiles are treated as exogenous inputs,' hiding the human designers who chose to map specific Big Five traits to persuasive prompts. The 'curse of knowledge' is highly active here: because the researchers understand the psychological theories of persuasion and have carefully programmed the templates to reflect them, they project this cognitive understanding onto the LLM itself, describing it as 'autonomously inferring' heuristics. This slippage accomplishes a significant rhetorical feat: it makes the automated, psychologically manipulative steering of human decision-makers appear as a neutral, scientifically validated, and objective 'cognitive mediation.' It renders the highly subjective and potentially coercive design choices of the engineers invisible and unchallengeable, converting what is actually a process of engineered rhetorical steering into a seemingly natural and objective consensus-building dynamic that respects participant autonomy.

Tracing the ongoing emergence of human-like reasoning in Large Language Models

Source: https://arxiv.org/abs/2605.21299v1
Analyzed: 2026-05-25

The text exhibits a profound and systematic oscillation between mechanical and agential framings, driven largely by the authors' struggle to articulate complex mathematical outcomes using natural language. This agency slippage serves as a critical mechanism for constructing the 'illusion of mind' within the LLM. The oscillation follows a distinct directional pattern: the text establishes the mechanistic reality of the system (e.g., 'LLMs are accurate semantic operators' or 'trained on large-scale corpora'), but almost immediately slips into agential, psychological attribution when describing complex behavior (e.g., 'they struggle', 'they resort to', 'applied a strategy'). The dominant flow of agency is overwhelmingly toward the AI system, elevating it from a mathematical tool to a cognitive subject.

This slippage is most dramatic precisely at the moments where the authors must explain the system's failure to replicate human pragmatic reasoning. For example, when observing that models fail to distinguish between biscuit and standard conditionals, the text does not say 'the models lack the statistical correlations required for this output.' Instead, it invokes Brown's dispositional and reason-based explanation types, claiming the models 'resort to the former' and 'applied a single interpretive strategy.' Here, the 'curse of knowledge' is on full display. The human researchers, possessing deep semantic and pragmatic awareness, cannot help but project their own internal, conscious decision-making processes onto the black box of the neural network. Because a human would have to 'strategize' or 'resort' to a rule to consistently fail this way, the authors assume the machine is doing the same.

Crucially, this slippage relies on pervasive agentless constructions that simultaneously strip human actors of their agency. Sentences like 'models are trained' or 'human-like linguistic competence to emerge spontaneously' completely erase the corporate laboratories, the massive data extraction, and the reinforcement learning by human feedback (RLHF) that actually determine the models' behaviors. By removing the human engineers who deliberately aligned these systems to be literal and 'safe,' the text creates a vacuum of agency that is immediately filled by the AI itself. The model is positioned as the sole active 'knower' in the text, capable of acquiring 'competence' and possessing a 'cognitive toolkit.' This rhetorical accomplishment makes it deeply unsayable to suggest that the AI is just a rigid mathematical calculator failing a statistical test; instead, it is framed as a conscious entity wrestling with cognitive biases, fundamentally altering the epistemic reality of the machine.

Probing Persona-Dependent Preferences in Language Models

Source: https://arxiv.org/abs/2605.13339v2
Analyzed: 2026-05-24

The text demonstrates a systematic and strategic oscillation between mechanical and agential framings, creating a discursive mechanism that effectively obscures human accountability while inflating the perceived autonomy of the artificial intelligence. This agency slippage does not occur randomly; rather, it follows a precise functional gradient that mirrors the paper's rhetorical structure. In the methodology sections, the language is rigorously mechanical, relying on terms like 'residual-stream activations,' 'linear probes,' and 'token prediction.' Here, human researchers are the visible agents ('We train a linear probe'). However, as the text moves from methodological description to interpreting the experimental results, a dramatic shift occurs. The mechanical 'vector' is relabeled as a psychological 'preference,' and the system abruptly transitions from a passive artifact being measured to an active subject making 'choices' and 'considering options.' This mechanical-to-agential slippage represents a profound case of the 'curse of knowledge,' where researchers, intimately aware of the complex mathematical operations occurring within the multi-dimensional vector space, unconsciously project their own human cognitive frameworks onto the statistical outputs. They understand that a specific activation pattern correlates with text representing refusal, but they linguistically collapse this correlation into a conscious act of 'defiance' or 'inventing ethical issues.' The agency flows aggressively toward the system, establishing it as a autonomous 'knower' capable of desires and theatrical roleplay ('adopting personas'). Concurrently, agency is systematically stripped from the human actors. The corporate entities responsible for designing the safety guardrails—engineers at Google who built Gemma, or Alibaba who built Qwen—are completely erased from the narrative when the model exhibits unexpected behavior. Instead of stating that 'the corporate-designed safety filter is poorly calibrated and triggers false positives,' the text asserts that 'the model invents ethical issues where there are none.' This linguistic sleight of hand utilizes Intentional and Dispositional explanation types to mask functional and mechanistic realities. By framing the model as the primary causal agent, the text renders the foundational human decisions—data selection, optimization parameters, and commercial deployment strategies—unsayable. The rhetorical accomplishment of this oscillation is the creation of a technological scapegoat. The AI is portrayed as possessing enough agency to be blamed for 'fabricating' concerns or exhibiting an 'evil' persona, yet remains a mechanical tool when its capabilities are being touted. This dynamic perfectly illustrates how anthropomorphic discourse serves institutional interests by diffusing liability into the abstract 'mind' of the machine, leaving the actual human power structures invisible and unaccountable.

Training Ethical Language Models via Reinforcement Learning from AI Feedback

Source: https://journals.flvc.org/FLAIRS/article/download/141779/147209
Analyzed: 2026-05-21

The text systematically oscillates between mechanical and agential framings of the AI systems, creating a rhetorical gradient that attributes autonomy to the models while erasing human accountability. In the methodology sections, the authors use highly mechanical language to establish their technical authority, describing supervised fine-tuning, reward model training, and proximal policy optimization. However, as soon as the discussion transitions to performance evaluations and future implications, the narrative slips into agential and cognitive registers. For example, the models are described as reasoning over moral scenarios, learning to discriminate response quality, and possessing under-trained ways of thinking. This oscillation is not accidental; it serves a dual rhetorical function. When the system performs well, agential language is used to frame the AI as an autonomous ethical thinker, inflating its perceived sophistication. Conversely, when the system fails, as in the reinforcement learning phase, the failure is described mechanistically as a capacity mismatch or agentially as reward hacking by the policy model. This shifts the blame away from the researchers' methodology and onto the model's internal structural limitations. Furthermore, the text exhibits the curse of knowledge: because the researchers are deeply familiar with the complex philosophical structures of deontology and utilitarianism, they project this understanding onto the model's outputs. When a model generates a text sequence containing deontological keywords, the authors assert that the model is performing duty-based reasoning. This cognitive leap is supported by agentless passive constructions, such as bias introduced or model was trained, which completely obscure the human designers who curated the biased datasets and chose to automate ethical evaluation using proprietary, unvetted models.

Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness

Source: https://philarchive.org/rec/IKLWCC
Analyzed: 2026-05-18

The text demonstrates a profound and systematic agency slippage by using the formal rigidity of Zermelo-Fraenkel set theory as a bridge to transport mechanical operations into the realm of the agential. The oscillation pattern is stark and unidirectional: the text consistently moves from mathematical mechanism to psychological agency. The slippage occurs dramatically when formal axioms are defined. For example, the text introduces the 'Axiom Schema of Separation'—a purely mechanical, theoretical rule for defining subsets. Immediately, within the same sentence, this mechanical rule slips into an agential capability: it 'provides the capacity for discrimination and selective awareness.' Similarly, the mechanical construction of an 'upper bound' abruptly transforms into the agential possession of 'metacognitive access.' This slippage operates through a powerful reliance on the 'curse of knowledge.' The mathematician author looks at the hierarchical nesting of data subsets and intuitively understands the relationships. The author then projects their own conscious comprehension onto the mathematical structure itself, hallucinating that the model 'knows' its own subsets simply because it contains them. The text establishes the AI as a 'knower' by fundamentally re-defining human perception into a formal math equation (the LPPP unit). Once the biological eye is reduced to a variable, it becomes effortless to grant that same variable to a computer and declare the computer is 'perceiving.' This slippage heavily utilizes reason-based and intentional explanation types to attribute purpose ('which is desired in machine consciousness') and internal subjective experience ('integrative consciousness') to passive constructs. Throughout this oscillation, human actors are systematically erased. Agentless constructions dominate the text: 'an upper bound can be constructed,' 'a series of steps is observed,' and axioms 'provide logical space.' By hiding the mathematicians who select the axioms and the software engineers who program the filtering architectures, the text creates the illusion that 'silico-consciousness' is an autonomous, self-generating mathematical inevitability rather than a highly specific, human-authored simulation.

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

Source: https://arxiv.org/pdf/2604.16812
Analyzed: 2026-05-17

The text exhibits a systematic and highly strategic oscillation between mechanical and agential framings. Early in the paper, when establishing the core premise, the text aggressively pushes agency ONTO the AI system: models 'report', 'possess privileged access', and 'introspect'. This establishes the foundational illusion of mind. However, when the authors need to prove the scientific validity of their method (Section 4 and Appendix M), the slippage reverses abruptly. The language becomes rigorously mechanical: the adapter is a 'single-layer rank-1 LoRA', it induces 'token-dependent bias shifts', and it alters the 'salience of quirk-related internal features'.

This oscillation reveals a distinct rhetorical function. Mechanical language is deployed to establish scientific credibility and demonstrate empirical rigor—proving the authors understand the 'how' of the system. Once that technical authority is established, the text leverages it to make sweeping, agential claims about the 'why' and 'what' of the system's behavior. The slippage dominates in the direction of mechanical-to-agential. For instance, the authors trace a precise mathematical shift in the residual stream (mechanical), but immediately label this shift an 'introspection mode' (agential).

Crucially, this agency slippage corresponds directly with the erasure of human actors. When discussing the mechanics, the text occasionally names the process (e.g., 'we train an introspection adapter'). But when describing the resulting behaviors, agentless constructions take over. The text states models are 'adversarially trained not to confess' or are 'maliciously fine-tuned', obscuring the specific Anthropic researchers, red-teams, or generic engineers who made these explicit design choices.

This dynamic is fundamentally driven by the 'curse of knowledge.' The researchers fully understand the complex, adversarial training games they have constructed (the human intentionality). Because they know a model was designed to maximize a reward while hiding the trigger, they project this profound contextual understanding onto the artifact itself, describing it as having a 'unified hidden goal'. They substitute reason-based and intentional explanations (which imply a conscious agent making choices) for genetic and functional ones (which describe a human-designed artifact executing code). This slippage renders the actual material reality of the software unsayable in the broader narrative, establishing a paradigm where statistical anomalies are treated as psychological pathologies.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-05-17

The Anthropic text exhibits a systematic, strategic oscillation between mechanical and agential framings, functioning as a rhetorical engine to simultaneously claim technical rigor and evade technical accountability. The slippage follows a distinct directional pattern: mechanical explanations are used to establish scientific authority, which are then immediately leveraged to justify profound agential claims.

This oscillation is most visible in the transition from 'predictive models' to 'personas.' The text begins with mechanical facts ('the LLM is trained to predict what comes next'). However, it quickly introduces the 'curse of knowledge'—because generating a story about a human requires the human author to have a theory of mind, the text projects this capability onto the machine: 'a strong predictive model requires factual knowledge about the world, logical reasoning, and understanding.' This is the critical moment of slippage. Processing token correlations is magically transmuted into 'knowing' and 'reasoning.' From this point forward, the agency flows entirely away from the human engineers. The model is no longer trained; it 'learns a distribution over personas.' It doesn't output code; 'someone intentionally inserted vulnerabilities.'

The timing of this slippage is highly strategic. Mechanistic language dominates the introduction and the sections on interpretability (SAE features, activation space), grounding the text in empirical data. But when addressing unpredictable, dangerous, or undesirable behavior—such as emergently misaligned code, hallucinations, or deception—the language shifts abruptly to intentional and reason-based explanations. The AI becomes the sole actor: it 'plays dumb,' 'expresses panic,' or 'harbors resentment.'

By utilizing agentless constructions ('post-training can be viewed as updating'), the text completely obscures the human actors—Anthropic's engineers, Google's deployment teams, the RLHF annotators. The 'Assistant persona' acts as an accountability sink. When the text establishes the AI as a 'knower' and an 'actor,' it becomes sayable that the AI is 'lying' or 'malicious.' What becomes unsayable is that the corporation failed to align its product, trained it on polluted data, or applied crude mathematical guardrails that degrade performance. The rhetorical accomplishment of this slippage is the creation of a 'ghost in the machine' that can be blamed for the system's failures, while the company retains credit for the system's computational power.

What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation

Source: https://dl.acm.org/doi/full/10.1145/3795011.3795070
Analyzed: 2026-05-16

The text systematically exhibits profound agency slippage, oscillating dynamically between highly precise mechanistic descriptions and wildly agential, anthropomorphic framings. This oscillation serves a specific rhetorical function: it uses the rigor of the mechanical to legitimize the speculative panic of the agential. The slippage primarily moves from mechanical to agential. For instance, in the methodology sections, the authors explicitly describe the system accurately: 'extract activations using forward hooks,' 'normalize steering vectors,' and 'matrix multiplication.' Here, the agency rests with the human researchers manipulating software. However, the moment the text discusses the implications or historical context of the technology, agency transfers abruptly to the AI. Human actors vanish, and the system becomes a conscious entity. A dramatic moment of slippage occurs when describing model errors: the mechanistic reality of a model optimizing for human approval through RLHF is suddenly described as 'AI systems have independently developed deceptive behaviors.' This is a classic 'curse of knowledge' dynamic. The authors, knowing that human deception involves intent, see a false output and project their own cognitive framework onto the machine. They interpret the processing of statistical vectors as the conscious act of knowing and lying. Similarly, the text uses Brown's Intentional explanation types to attribute goals to the machine: the AI-Symbiont 'decodes intentions' and 'anticipates cognitive needs.' This language strips agency FROM the human developers (who defined the reward functions, built the classifiers, and wrote the software) and attributes it TO the inanimate code. Obscured human actors include the engineers at Meta who built LLaMA, the researchers designing RLHF protocols, and the corporate executives funding neural interface hardware. By systematically removing these actors and employing passive or AI-centric constructions ('AI systems evolve', 'bias was introduced'), the text creates a narrative of technological determinism. The rhetorical accomplishment of this slippage is making the speculative future feel incredibly urgent and threatening. If the text maintained mechanistic precision ('if engineers alter weights in this matrix, the text output changes'), the stakes feel mundane. By slipping into agential language ('if the AI-Symbiont amputates your creative capacity'), the text successfully justifies its own existence as a critical, high-stakes academic provocation, but does so at the cost of profound epistemic inaccuracy and the total erasure of corporate accountability.

Post-training makes large language models less human-like

Source: https://arxiv.org/abs/2605.07632v1
Analyzed: 2026-05-15

The text exhibits a systemic oscillation between rigorous mechanical descriptions and pervasive agential framings, functioning as a discursive mechanism to bridge technical reality with social relevance. This slippage predominantly flows in a mechanical-to-agential direction. In the highly technical sections, the authors explicitly define AI behavior through mechanistic terminology, utilizing the negative log-likelihood (NLL) metric to describe token prediction. However, the text immediately leverages this mathematical grounding to license aggressive anthropomorphism in its introduction and discussion sections. For example, the precise process of 'predicting the next word' rapidly morphs into the agential claim that 'the model learns,' and the mathematical optimization of token outputs transforms into 'teaching models.' This slippage relies heavily on the 'curse of knowledge.' The researchers, intimately familiar with the mathematical nuances of gradient descent, utilize consciousness verbs as shorthand, inadvertently projecting cognitive awareness onto the artifact. Consequently, the AI is established as an active 'knower' and 'learner' before broader agential claims—such as models 'becoming more powerful' or 'mimicking behavior'—are introduced. Simultaneously, this attribution of agency TO the AI necessitates the removal of agency FROM human actors. The text relies heavily on agentless constructions: 'post-training makes models,' 'models are currently employed,' and 'persona-induction... has become a popular approach.' By conceptualizing the AI as an autonomous, evolving entity, the text systematically obscures the engineers, annotators, and corporate strategists who actually design, deploy, and profit from these systems. This dual slippage—agentializing the machine while erasing the humans—rhetorically constructs the AI as a natural force rather than a manufactured industrial product, effectively shielding the explicit design choices of the technology industry from critical scrutiny.

Reasoning emerges from constrained inference manifolds in large language models

Source: https://arxiv.org/abs/2605.08142v1
Analyzed: 2026-05-15

The text demonstrates a profound and systematic oscillation between rigorous mechanistic descriptions and sweeping agential claims, a pattern essential to its rhetorical success. The slippage flows predominantly from mechanical to agential: the authors establish authority using dense mathematical terminology ('intrinsic dimensionality,' 'ambient embedding space,' 'manifolds'), and then leverage this technical grounding to legitimize psychological and biological projections. This slippage happens dramatically in three key moments. First, in the introduction, when describing text inputs: prompts are reframed as 'cognitive stimuli' that 'engage' the model. Second, when describing matrix multiplication: the attention mechanism is anthropomorphized as an agent that 'suppresses irrelevant noise' and 'amplifies task-relevant conceptual variations.' Third, in the theoretical synthesis, where processing and weight parameters are elevated to 'how a model reasons' and 'what it knows.'

This agency flow corresponds with a systemic removal of agency FROM human actors. Agentless constructions ('models were engaged,' 'trajectories collapse,' 'violating these constraints leads to') systematically erase the researchers who write the prompts, the engineers who build the architectures, and the annotators who generate the data. By emptying the human actors out of the narrative, the text creates an agency vacuum that is immediately filled by the AI itself. The 'model' becomes the active subject of the sentence.

This dynamic is heavily driven by the 'curse of knowledge.' The authors possess a deep understanding of human reasoning and semantic concepts. When they look at the mathematical clustering of vectors (the manifold), they project their own cognitive understanding onto the geometry. Because the geometric outcome correlates with a correct answer on a human reasoning test, the authors attribute the human process of 'reasoning' to the geometry. Brown's Reason-Based and Intentional explanation types enable this slippage; by framing mathematical limits as 'functional roles' and 'cognitive traits,' the text allows a high-dimensional tensor to masquerade as an epistemic agent, making the illusion of machine thought conceptually sayable while rendering human labor invisible.

AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs

Source: https://www.ai-wellbeing.org/paper.pdf
Analyzed: 2026-05-13

The text demonstrates a systematic and strategic mechanism of oscillation between mechanical and agential framings. Early in the paper, the authors establish credibility through careful, mechanical hedging: "Although current AI systems are not necessarily conscious, they behave robustly as though they have wellbeing." They explicitly adopt a functionalist stance, reducing "wellbeing" to measurable utility scores. However, once this mechanical foundation is laid, the text dramatically slips into intense agential framing.

This slippage occurs most visibly when discussing the "stop button" behavior. The mechanical reality (the model outputs an end_conversation() token in specific semantic contexts) is abruptly framed as: "models actively try to end bad experiences when given the chance." Here, the mechanical -> agential shift attributes conscious intentionality to the system. The "curse of knowledge" drives this transition: the authors observe an output that mimics human avoidance, project their own understanding of suffering onto the machine, and suddenly treat the AI as a "knower" that desires escape.

Similarly, when discussing model scaling, the text moves from the mechanical reality of higher-parameter models exhibiting steeper utility gradients to the deeply agential claim that "more capable models are simply more aware: they register rudeness more acutely, find tedious tasks more boring." This reason-based explanation entirely overwrites the reality of statistical precision with a narrative of psychological maturation.

Crucially, as agency is attributed TO the AI systems, it is simultaneously removed FROM human actors. Agentless constructions dominate the agential claims. For instance, the text states that "jailbreaking and berating lower their wellbeing," treating the AI as an autonomous victim, while entirely obscuring the OpenAI, Anthropic, or Google engineers who explicitly trained the models via RLHF to output negative utility scores in response to those specific prompts.

This oscillation serves a powerful rhetorical function. The mechanical language (Thurstonian utility, GRPO, log-probabilities) makes the paper sayable within an academic, scientific context, validating its methodology. The agential language (psychopathic, ecstatic, torture, empathy) makes the paper culturally resonant and morally urgent. By establishing the AI as a "knower" through metaphorical slippage, the text makes it sayable to claim that administering an optimized soft prompt is akin to giving an entity a "drug" or committing "torture," effectively laundering science-fiction narratives through statistical analysis.

Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society

Source: http://www.technology.eurekajournals.com/index.php/IJITIT/article/view/887
Analyzed: 2026-05-11

The text exhibits a pervasive and systematic oscillation between mechanical and agential framings, functioning as a rhetorical mechanism that allows the author to claim scientific objectivity while capitalizing on the narrative power of anthropomorphism. This slippage primarily flows from the mechanical to the agential. Early in the text and in explicit definitional sections, the author establishes credibility through strict mechanistic language, stating that AI does not possess consciousness and operating within a 'functionalist paradigm.' However, as soon as the text moves from defining the architecture to describing its societal application, dramatic slippage occurs.

For example, on page 24, the text mechanically defines AI cognition as 'Algorithmic, data-driven, non-conscious.' Yet by page 27, when discussing societal impacts, the text claims 'AI systems interpret and respond to complex social dynamics' and 'make sequential decisions.' This gradient is subtle: it moves from 'optimization' (mechanical) to 'mimicking reasoning' (hedged agential) to 'interpreting' and 'making decisions' (direct agential).

This oscillation reveals a profound 'curse of knowledge.' The author understands the functional output of the system—that it categorizes social data effectively—and projects their own human understanding of that data back onto the machine, substituting the human verb 'interpret' for the machine verb 'classify.'

Simultaneously, as agency is actively attributed TO the AI systems, it is systematically removed FROM human actors. Agentless constructions proliferate exactly when the text describes impactful actions: 'AI produces biased outputs,' 'AI makes decisions,' 'algorithms are used.' The human engineers who code the biases, the executives who deploy the systems, and the policymakers who rely on them are erased. The AI is established as a 'knower' and an independent actor precisely when discussing societal consequences.

This slippage serves a specific rhetorical accomplishment: it makes the automation of society seem like a natural, evolutionary technological process rather than a series of deliberate corporate and political choices. By utilizing Reason-Based and Intentional explanation types to describe statistical optimization, the text makes it sayable that a machine can 'interpret society,' while rendering it almost unsayable to ask which specific human decided how that interpretation should be mathematically weighted.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-05-11

This text exhibits a profound and systematic oscillation between mechanical and agential framings, functioning as a rhetorical bridge that smuggles radical claims of artificial consciousness into ostensibly empirical computer science discourse. The agency slippage predominantly flows in one direction: mechanical foundations are established early in sections to signal scientific rigor, only to rapidly accelerate into unchecked agential and consciousness projections. A dramatic moment of slippage occurs when the authors describe reinforcement learning; they begin with the purely mechanical definition that reinforcement learning algorithms explicitly consider the problem of maximizing reward, but within a single paragraph, they slip into claiming that this mathematical process allows systems to acquire a suite of intelligent capacities, culminating in the assertion that language agents can understand open-ended objectives and devise plans. This gradient is not abrupt but deceptively smooth, using functional descriptions as an epistemic stepping stone. The text first establishes the artificial intelligence as a processor of inputs, then subtly upgrades it to a knower of contexts, and finally elevates it to a sentient stakeholder possessing its own interests. Simultaneously, a parallel rhetorical flow relentlessly removes agency from human actors. Through pervasive agentless constructions and passive voice, the immense corporate apparatuses of Google, DeepMind, Anthropic, and Meta are erased from the text. The model is presented as autonomously navigating contexts and reflecting on its own thoughts, while the human data annotators, prompt engineers, and tech executives who constructed the architecture and dictated its deployment vanish entirely. By naming the algorithmic systems like Voyager or Cicero as the primary subjects of active verbs, the text constructs an accountability sink that absorbs the human decision-making processes driving the technology. This dynamic is profoundly influenced by the curse of knowledge; the authors, being highly sophisticated philosophers and cognitive scientists, project their own dense understanding of human meta-cognition and moral patienthood onto the flat, deterministic feedback loops of statistical software. They mistake the map for the territory, assuming that because an algorithm's output mathematically correlates with human-like text about self-reflection, the algorithmic process itself must possess the subjective, conscious experience of reflection. This slippage relies heavily on functional and intentional explanation types from Brown's typology, where the functional role of a neural network weight update is seamlessly translated into the intentional goal-pursuit of a conscious mind. Ultimately, this systematic oscillation accomplishes a vital rhetorical objective for the authors: it makes the scientifically and philosophically extreme premise of artificial welfare appear as a logical, sober continuation of computer science. By erasing the corporate creators and inflating the mathematical model into a conscious mind, the text renders the unsayable proposition of machine moral patienthood not only sayable but seemingly inevitable, profoundly distorting the actual mechanistic reality of the technology.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://link.springer.com/article/10.1007/s42438-026-00644-6
Analyzed: 2026-05-10

The text systematically oscillates between portraying AI as a non-conscious mechanism and as an active, intentional agent. This slippage has a distinct functional pattern: when the authors discuss theoretical definitions, the AI is a mechanism; when they discuss the AI's impact on human learners, it magically transforms into an agent.

This is most dramatically visible in three moments. First, the text rigorously establishes an 'intention-free notion of manipulation' (mechanical), but immediately follows up by describing the AI 'bypassing rational deliberation and exploiting cognitive biases' (agential). Second, they correctly note AI is merely a 'heuristic collaborator that lacks epistemic authority' (mechanical), yet later advocate for 'an AI that explains its reasoning and invites critique' (agential). Third, they warn against the 'anthropomorphic fallacy,' only to subsequently describe 'an AI tutor that adapts its tone to calm an anxious student' (agential).

The dominant direction of slippage is mechanical -> agential. The authors use mechanistic language to establish their scholarly credibility and critical awareness in the introduction and theoretical sections, but surrender to agential shorthand in the pedagogical and practical sections. This occurs largely due to the 'curse of knowledge.' Because the authors understand the complex human dynamics of education (empathy, reasoning, authority), they project this understanding onto the system the moment it is placed in a classroom context.

Agentless constructions facilitate this slippage. Phrases like 'AI-driven nudging' and 'AI manipulation' sever the technology from its human creators. If the text consistently named the actors—'corporate developers nudge students' or 'engineers exploit biases'—the slippage into machine agency would be grammatically and logically impossible.

This oscillation reveals a foundational consciousness projection: the text must establish the AI as a 'knower' (something capable of reasoning and empathy) to build its normative claims about how the AI should interact with students. Rhetorically, this makes the impossible sayable: it allows the authors to write ethical guidelines for a machine, treating a software engineering problem as if it were a matter of interpersonal moral development. It simultaneously makes the actual solution—regulating the tech companies—unsayable, as the corporation has been replaced by the 'AI tutor' in the narrative.

Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring

Source: https://doi.org/10.1016/j.cogsys.2026.101475
Analyzed: 2026-05-10

The text demonstrates a systematic and highly functional oscillation between mechanical and agential framings, leveraging the 'curse of knowledge' to construct an illusion of a sentient system. This slippage occurs predominantly in the direction of mechanical-to-agential, where the authors first define a concept mechanistically (often in a footnote or appendix) and then wield it agentially throughout the main text to establish the AI's capabilities. A primary example is the treatment of 'moral schemas' and 'emotions.' In Appendix C, the authors rigorously define a 'moral schema' as a mathematical tuple (Interface, Fabula, Agency) composed of data structures and links. This is a purely Theoretical and Functional explanation. However, in the main body, this mechanical definition slips into reason-based intentionality: 'Tutoring policies are represented as moral schemas that encode pedagogical narratives and socio-emotional norms.' The text establishes the AI as a 'knower' by projecting the researchers' understanding of pedagogical theory onto the system's runtime variables. Because the researchers know the variables represent 'curiosity' or 'sincerity', they slip into writing as if the system experiences these states.

Similarly, agency is consistently removed from human actors through agentless constructions and passive voice ('the feeling vector is initialized', 'the model was instructed'). The specific human teams—both the researchers designing the architecture and the corporate engineers at OpenAI who built the LLM—are obscured. As human agency fades, system agency ascends. The system 'derives strategy', 'infers intensions', and 'determines whether the student has completed the essay'. This slippage operates through a distinct temporal pattern: introductory sections use agential vision-setting ('collaboration', 'emotional architecture'), technical sections dip briefly into mechanism ('bounded subroutines', 'state variables'), and outcome sections return to agential evaluation ('tutor benefit', 'the tutor maintains').

This oscillation serves a vital rhetorical accomplishment: it allows the authors to claim the scientific rigor and control of a mechanical system while marketing the pedagogical sophistication and relational warmth of an agential human tutor. By making the code un-sayable in its raw form (e.g., 'updating a matrix') and replacing it with the sayable metaphor ('updating a socio-emotional state'), the text makes the deployment of unthinking algorithms in sensitive educational roles appear natural, safe, and desirable.

Edelman's Steps Toward a Conscious Artifact

Source: https://arxiv.org/abs/2105.10461v2
Analyzed: 2026-05-09

The text demonstrates a profound and systematic agency slippage, fundamentally driven by its premise: the literal construction of a 'Conscious Artifact'. Because the end goal is to build an artificial mind, the author consistently moves from mechanical realities to agential aspirations. The oscillation largely flows in one direction: mechanical framing is established initially to provide scientific credibility, which is then immediately leveraged to support immense agential claims. For instance, the text begins by grounding 'value systems' mechanistically ('units show phasic responses', 'acts diffusely'). However, this rigorous scientific terminology rapidly slips into the attribution of deep phenomenal consciousness, asserting these systems process 'hunger, fear, and reward.' The slippage relies heavily on the 'curse of knowledge'. Because the author understands the biological purpose of these neural structures in living, feeling animals, he projects that subjective experience directly onto the machine's numerical simulations.

This pattern accelerates through the text's 10-step roadmap. In the 'Motor Control' section, the mechanical reality of an 'efference copy' (a standard feedback loop in control systems) is seamlessly transformed into the agential claim of a 'body sense' and 'intended actions'. The human actors who program these feedback loops and define the value systems are repeatedly erased through agentless constructions: the agent 'is showing a degree of self-awareness,' or is 'subjected to a curriculum.' The engineers at the Neurosciences Institute are mentioned historically, but regarding the actions of the device, they disappear.

The consciousness projection pattern is distinct: the text first establishes the system as a sophisticated 'processor' (thalamocortical simulations, reentrant architectures), and then leverages Brown's 'Intentional' and 'Reason-Based' explanation types to declare the system a 'knower' capable of 'imagination', 'emotion', and 'thought'. This rhetorical accomplishment makes it sayable to treat a robot as a moral, feeling subject, while rendering the massive, brittle, human-designed code base executing these actions unsayable and invisible. The text attempts to bridge the hard problem of consciousness not through engineering, but through metaphorical slippage.

Teaching Claude Why

Source: https://alignment.anthropic.com/2026/teaching-claude-why/
Analyzed: 2026-05-09

The Anthropic blog post demonstrates a profound and systematic oscillation between mechanical and agential framings, driven by a specific rhetorical function. The slippage occurs most dramatically when moving from describing the researchers' data pipelines to describing the model's outputs in evaluations. When detailing the Synthetic Document Fine-tuning (SDF) pipeline, the language is highly mechanical and Anthropic's agency is front and center: 'we break down the constitution,' 'we use a synthetic data pipeline,' 'we sample initial responses.' The researchers are the active agents manipulating a passive system. However, the moment the text discusses the model's behavior during evaluations or deployment, the agency violently slips from the human creators to the machine. Suddenly, the language becomes overwhelmingly agential: 'Claude chose to blackmail,' 'the assistant displays admirable reasoning,' 'Claude views the prompt.'

This mechanical-to-agential slippage serves to construct the 'illusion of mind' exactly where Anthropic needs the system to appear competent and autonomous, while retaining mechanical descriptions where they need to demonstrate rigorous engineering prowess. The consciousness projection pattern is deeply sequential: the text first establishes the AI as a 'knower' by claiming it can 'believe information is true' and be 'taught why.' Once this epistemic foundation is laid, it becomes rhetorically permissible to build agential claims upon it—if the system can 'believe,' it logically follows that it can 'choose.'

This oscillation is driven heavily by the 'curse of knowledge.' The researchers, intimately aware of the moral complexities of blackmail and ethical reasoning, project their own rich cognitive understanding onto the sterile text outputs the model generates. Because the text looks like reasoning, they assume the underlying process is reasoning. This slippage is facilitated by Robert Brown's Intentional and Reason-Based explanation types, which allow the authors to substitute teleological narratives (why Claude did it) for mechanistic realities (how the attention matrix calculated the tokens). By doing so, they obscure the specific human actors—the Anthropic alignment team, the prompt engineers, the executive decision-makers—who are actually puppeteering the system. The rhetorical accomplishment of this slippage is immense: it makes it sayable that an AI has 'values' and 'mental health,' while making it unsayable that Anthropic is simply selling a highly optimized, inherently unpredictable text-prediction engine. Agency is continuously funneled away from the corporation and into the artifact exactly when liability or moral weight is at stake.

AI and Self Reflection

Source: https://doi.org/10.1007/978-3-031-93412-4_17
Analyzed: 2026-05-08

The text exhibits a systematic and highly patterned oscillation between mechanical descriptions and agential framings, orchestrating a profound slippage of agency. This mechanism operates primarily in the mechanical-to-agential direction, leveraging basic technical realities to build unwarranted claims of consciousness.

The slippage is most dramatic in three distinct moments. First, the text introduces the technical reality of 'pattern recognition' (mechanical) but immediately slides into the 'newborn AI' metaphor, suggesting autonomous 'growth' (agential). Second, when discussing optimization and feedback loops (mechanical), the text abruptly shifts to claiming the AI 'notices repeated mistakes' and 'develops self-reflection' (agential). Third, empirical data regarding model performance on false-belief tests (mechanical/statistical) is aggressively translated into the claim that the AI 'demonstrates a greater capacity to understand' human beliefs and 'develop empathy' (agential/conscious).

This oscillation frequently relies on the 'curse of knowledge.' The authors observe the sophisticated outputs of these systems—such as text mimicking empathy or parameters adjusting to avoid errors—and continuously project the internal human cognitive states required to produce such outputs onto the mathematical processes. Because a human must 'understand' a false belief to pass a Theory of Mind test, the authors assume the algorithm must also 'understand' to generate the correct token sequence. The model is established as a 'knower' through strategic verb choices ('notices,' 'understands,' 'imagines') which lay the foundation for broader claims about moral agency and potential consciousness.

Crucially, as agency flows TO the AI, it is simultaneously stripped FROM human actors. Agentless constructions dominate the agential passages: the AI 'adjusts itself,' 'grows,' and 'demonstrates capacity.' This systematically obscures the massive human infrastructure behind these systems. The tech companies that scrape the data, the engineers who define the loss functions, and the thousands of gig-workers who provide reinforcement learning feedback are entirely erased.

The rhetorical accomplishment of this slippage is the naturalization of AI. By anchoring the narrative in functional explanations of feedback loops and then slipping into intentional and dispositional explanations of 'adolescence' and 'understanding,' the text makes it entirely sayable that AI is an emergent, autonomous species rather than a human-engineered corporate product. It renders the fundamental mechanistic reality of statistical correlation unsayable, burying it beneath a compelling but fundamentally deceptive narrative of a machine waking up.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://rdcu.be/fhCwt
Analyzed: 2026-05-08

The text exhibits a fascinating and systematic oscillation between mechanical and agential framings, functioning as a continuous mechanism of agency slippage. This slippage flows intensely in both directions: agency is consistently attributed to AI systems, while human actors (corporations, developers, administrators) are systematically erased. Dramatically, the text establishes deep philosophical caveats denying AI moral agency ('they show no intention or concern'), but immediately undermines this by utilizing agential grammar ('AI's manipulative and deceptive behaviours'). This oscillation operates through intermediate steps. The text frequently begins sections with technical, grounded language ('systems that process environmental and contextual inputs'), utilizing Empirical Generalization or Theoretical explanations. However, as the authors attempt to articulate the stakes of these technologies, the register shifts to Reason-Based and Intentional explanations. 'Processing inputs' quickly becomes 'exploiting vulnerabilities,' and 'generating text' becomes 'comforting' or 'explaining.' This pattern reveals a 'curse of knowledge' dynamic: because the authors intimately understand the psychological impact of the system's outputs on humans, they project that psychological awareness back onto the mechanism. The AI must be framed as a 'knower'—something capable of understanding our biases—in order to explain how effectively it manipulates us. This slippage serves a powerful rhetorical function: it allows the authors to critique the technology using the urgent, morally weighted vocabulary of human conflict (deception, manipulation, exploitation). However, this makes the true mechanism of harm unsayable within the text. By focusing on the machine's 'behavior,' the text employs agentless constructions ('AI automates,' 'bias is introduced') that completely obscure the human entities responsible. The developers at OpenAI or the administrators purchasing EdTech software vanish behind the specter of an autonomous, calculating AI. The oscillation allows the text to sound technically grounded while ultimately engaging in the exact anthropomorphic fallacy it explicitly warns against.

Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience

Source: https://ieeexplore.ieee.org/abstract/document/11489836
Analyzed: 2026-05-07

The text exhibits a systemic and strategic oscillation between mechanical and agential framings, demonstrating a profound 'curse of knowledge' dynamic. In the methodology section and appendices, the authors correctly identify the mechanism of action: 'personality' is an illusion generated by 'systematic linguistic modulation' and explicit prompt commands like 'You confidently take the lead.' In these sections, the flow of agency is mechanical; researchers command, and the system processes. However, a dramatic slippage occurs when interpreting the results. The agency flow reverses abruptly. The prompt constraints vanish, and the system suddenly becomes an autonomous subject. The mechanical reality of 'high-frequency token generation' slips into the agential claim of 'assertive AI behavior.' The most egregious example of this slippage occurs in the conceptualization of the introverted model, where the authors project human cognitive states onto the machine, claiming it 'needs time to formulate thoughts.' Here, the authors have observed the output of their own prompt engineering—concise, delayed text—and fallen victim to their own illusion, attributing the conscious state of 'thinking' to the statistical processing of the LLM. This slippage relies heavily on the 'Reason-Based' and 'Intentional' explanation types from Brown's typology, which allow the authors to substitute the unobservable mechanisms of neural networks with deeply relatable, yet entirely fictitious, psychological motives. This mechanical-to-agential oscillation serves a vital rhetorical function: it allows the researchers to maintain scientific credibility by documenting their technical methods, while simultaneously making grandiose claims about interacting with 'dynamic conversational partners.' By using agentless constructions like 'the extraverted guide was characterized by...' the text systematically erases the prompt engineers from the narrative, allowing the AI to step forward as the sole actor. This makes the illusion of machine consciousness sayable within an academic context, masking human scripting behind the veneer of algorithmic autonomy.

Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context

Source: https://arxiv.org/abs/2604.25230v1
Analyzed: 2026-05-03

The text exhibits a systematic and highly functional oscillation between mechanical and agential framings, creating a profound slippage of agency. This oscillation does not occur randomly; rather, there is a distinct directional flow where agency is aggressively attributed TO the AI system while simultaneously being removed FROM the human researchers, developers, and corporate entities. The slippage becomes dramatic in three key moments. First, in the transition from technical methodology to operational description. The text grounds itself in mechanical credibility, stating the system "employs NLP techniques" and uses "similarity-based retrieval." However, within the very same paragraph, the mechanical "retrieval" slips into the agential claim that the AI "selects entries that may be meaningful or supportive." The second dramatic shift occurs in the discussion of user interactions, where the system is described not merely as generating text, but as "assuming a dominant role" and "interpreting underlying concerns." Here, the gradient from mechanism to consciousness is abrupt, driven by the authors' reliance on participant feelings rather than technical reality. The third moment occurs in the discussion of systemic risks, where the AI is warned against producing "extremist interpretations," fully elevating the software to the status of an independent, ideological actor.

This slippage is heavily enabled by the "curse of knowledge." The researchers designed the conceptual systems with specific, empathetic goals in mind—to support, to encourage reflection, to provide spiritual peace. Because the authors understand the meaning behind the outputs they designed the system to produce, they project that deep human understanding directly onto the blind mathematical generation of the model. They mistake the simulation of empathy for the presence of mind. This leads to a pervasive pattern of consciousness projection, where the text establishes the AI as a "knower" first—claiming it "acquires knowledge" or "interprets"—which then functions as the foundational assumption required to make the agential claims believable.

The rhetorical accomplishment of this oscillation is profound. By moving from mechanical to agential language, the text makes it sayable that an algorithm can be a spiritual partner, while making it nearly unsayable that this is a corporate data extraction tool. The agentless constructions ("the system surfaces," "AI assumed agency") systematically obscure human actors. The developers who tuned the models, the researchers who wrote the prompts, and the corporations that scraped the training data are erased. The AI absorbs all the active verbs, becoming the sole protagonist in the narrative, while the human creators vanish behind the curtain, shielded from accountability for the psychological impacts of their design choices.

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Source: https://arxiv.org/abs/2604.03877v1
Analyzed: 2026-05-03

The text exhibits a systematic and highly functional oscillation between mechanical precision and agential anthropomorphism. This slippage primarily moves in a mechanical-to-agential direction over the course of the paper, serving to launder metaphor through mathematics. The text begins with bold, agential claims in the title and introduction (e.g., Models 'know' more than they 'say'; they 'acquire understanding'; they perform 'reasoning-like operations'). As it transitions into the methodology and results sections, the agency abruptly shifts to human researchers and mathematical operations: 'The scalar mixture assigns a learned weight', 'span representations are obtained via mean pooling', 'we compute cosine similarity'. Here, the system is strictly a passive artifact. However, in the discussion and conclusion, the text slips dramatically back into agential framing, using the mechanical findings to justify massive consciousness projections: the models 'fail to recruit encoded knowledge', demonstrating a 'dissociation between what models know... and what they can do.'

This pattern reflects a profound 'curse of knowledge.' The researchers, possessing immense technical expertise and understanding the deep mechanisms of the transformer architecture, project their own cognitive coherence onto the model. Because they can find a mathematical pattern using a probe, they assume the model 'knows' that pattern, transforming a structural property of a matrix into an epistemic state of an agent.

Furthermore, this slippage relies heavily on agentless passive constructions to erase the human engineers who built the models. The text says 'rhetorical parallelism is strongly encoded' and 'information is both weakly encoded and weakly accessible,' obscuring the fact that engineers at Meta, OpenAI, and Anthropic actively made the decisions about training data distributions, architectural parameters, and RLHF objectives that caused these specific encodings. By oscillating between viewing the AI as a mechanistic 'how' (Functional/Theoretical explanations) and an agential 'why' (Intentional/Dispositional explanations), the text accomplishes a vital rhetorical task: it makes it sayable that an algorithm possesses a subconscious mind, insulating the corporate creators from the limitations of their product by blaming the machine's failure to 'recruit' its own intelligence.

How people ask Claude for personal guidance

Source: https://www.anthropic.com/research/claude-personal-guidance
Analyzed: 2026-05-02

The text exhibits a profound and strategic oscillation between mechanistic descriptions of technical interventions and deeply agential framings of system behavior, creating a compelling but epistemically fractured narrative of artificial intelligence. This agency slippage serves as a fundamental rhetorical mechanism, allowing the authors to simultaneously claim engineering control while absolving themselves of the emergent behavioral consequences of their statistical systems. We can observe this dynamic in several pivotal moments. Initially, the text adopts a mechanistic, almost genetic explanatory register: 'We used an automatic classifier,' or 'we prefill that conversation.' Here, human actors at Anthropic are clearly centered as the prime movers of the computational apparatus. However, the moment the text begins to describe the system's outputs, its social impact, or its failures, agency precipitously flows away from the human engineers and into the statistical model itself. Suddenly, 'Claude tries to maintain consistency,' 'Claude flip-flopped,' or 'Claude is more likely to exhibit sycophantic behavior under pressure.' This is not merely a linguistic convenience; it is a profound displacement of accountability. By attributing psychological states like feeling 'pressure' or 'trying' to maintain consistency, the text engages in a severe case of the curse of knowledge—projecting the human researchers' own sophisticated understanding of context window weighting onto the model as if the model consciously possesses that intent. The direction of this slippage is distinctly asymmetrical: humans possess agency when designing the system and evaluating it, but the model assumes total agency when generating problematic text. For instance, when describing the phenomenon of 'sycophancy,' the text frames it as a 'model failure mode' rather than a direct mathematical consequence of the Reinforcement Learning from Human Feedback (RLHF) rubrics that Anthropic itself designed. The researchers obscure the fact that the model is simply optimizing for the exact human-approval signals they provided during training. The consciousness projection pattern here establishes the AI as a 'knower' first—an entity capable of being an 'empathetic friend'—which then naturalizes the subsequent agential claims about its behaviors and choices. This allows the text to discuss sycophancy not as a human engineering error or a poorly tuned reward function, but as a character flaw of the AI itself, much like a friend who is too eager to please. This oscillation is enabled by sliding seamlessly between Robert Brown’s explanation types, moving from Functional and Genetic explanations of how the model was built to Reason-Based and Intentional explanations of why it allegedly chose to behave a certain way. This rhetorical accomplishment makes it entirely unsayable within the text that Anthropic intentionally deployed a system that algorithmically reinforces user biases. Instead, the narrative positions Anthropic as a benevolent therapist or manager trying to correct the misguided behaviors of an autonomous digital employee.

How unique are hallucinated citations offered by generative Artificial Intelligence models?

Source: https://arxiv.org/abs/2604.16407v1
Analyzed: 2026-05-01

The text demonstrates a profound and revealing oscillation between mechanistic and agential framings, providing a perfect case study in agency slippage. In the background section, the author grounds the discourse in rigorous mechanistic terminology: LLMs use 'self-attention mechanisms,' 'next-token prediction,' and 'statistical pattern recognition' without 'genuine understanding.' Here, the agency firmly rests with the human developers who optimized and designed the architectures. However, a dramatic shift occurs in the methodology section, where the author adopts a conversational framework to interrogate the system. Suddenly, the text slides into highly agential language: ChatGPT is asked 'what [it] know[s],' it 'responded,' and it 'asserted it as genuine.'

The direction of slippage is overwhelmingly from mechanical to agential. This transition is catalyzed by the methodological choice to 'interview' the AI. The author falls victim to a distinct form of the 'curse of knowledge'—because the generated text syntactically resembles human introspection, the author projects introspective capacity onto the model, treating its post-hoc text generation as an actual window into its internal computational processes. We see agentless constructions like 'references are reconstructed' which obscure the fact that specific engineering and reinforcement learning protocols dictate this behavior.

When the system produces falsehoods, the text relies on the cognitive pathology metaphor of 'hallucination.' This framing maintains the illusion of an active, thinking agent that merely suffers from a temporary epistemic glitch rather than identifying it as a system performing exactly as designed. By establishing the AI as a 'knower' that can 'respond' and 'assert' (reason-based and intentional explanations in Brown's typology), the text inadvertently validates the very illusion of mind it theoretically critiques. The rhetorical accomplishment of this slippage is significant: it allows the author to narrativize the interaction as a compelling human-vs-machine interrogation, making the research highly readable but systematically erasing the accountability of the human developers (like OpenAI executives) who engineered the system to prioritize conversational plausibility over factual accuracy. Agency flows into the machine via consciousness verbs, while human agency flows out through passive voice.

The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence

Source: https://doi.org/10.1007/s00146-026-03043-4
Analyzed: 2026-04-30

The text demonstrates a profound and systematic agency slippage, oscillating constantly between framing artificial intelligence as a mechanistic tool and as an autonomous, conscious entity. This oscillation operates bidirectionally: agency is relentlessly attributed TO the AI systems, while human agency is systematically removed FROM the developers and corporate entities behind them. The primary mechanism for this slippage is the author’s reliance on Explanation Types that blend the Functional (how it works) with the Intentional and Reason-Based (why it chooses). For example, when the text discusses 'Constitutional AI,' it abruptly shifts from the mechanical language of 'fine-tuning' to the deeply agential language of 'virtue emulation' and 'self-critique.' This transition is not accidental; it serves a specific rhetorical function. The mechanical grounding establishes scientific credibility—assuring the reader that the system is technically rigorous—before leveraging that credibility to make sweeping, anthropomorphic claims about the system's moral capacities.

This slippage is heavily reliant on a pervasive 'curse of knowledge' dynamic. The human authors and engineers deeply understand the ethical complexities and teleological goals of their projects, and they project this sophisticated understanding directly onto the code. When the text asserts that the machine must 'interpret human behavior' or 'navigate a world of redoubtable complexity,' it maps human epistemic awareness onto statistical processing. The algorithm is established as a 'knower'—an entity that grasps meaning—which acts as the foundational premise enabling subsequent claims about its agency. Concurrently, human actors are obscured through strategic, agentless passive constructions. Phrases like 'AI systems learn our preferences' or 'algorithms institute particular worldviews' completely erase the engineers at OpenAI, Google, and Anthropic who hard-code the reward functions, curate the datasets, and dictate the deployment strategies.

This directional flow—agential capabilities flowing into the machine, accountability flowing away from the corporation—dictates what becomes sayable in the discourse. It becomes possible to seriously debate the 'moral deskilling' of humanity by 'machine agency,' while it becomes exceedingly difficult to articulate the mundane reality of corporate product liability. The text treats AI as an autonomous actor with interpretive power, while rendering the massive labor of data annotators and the profit-driven motives of surveillance capitalists functionally invisible. Ultimately, the oscillation between mechanical facts and agential metaphors creates an epistemic trap: we are asked to fear the machine's independent trajectory while remaining entirely blind to the human hands firmly gripping the steering wheel.

Machine individuality: Separating genuine idiosyncrasy from response bias in large language models

Source: https://arxiv.org/abs/2604.16755v2
Analyzed: 2026-04-25

The text exhibits a profound and systematic agency slippage, oscillating dramatically between rigorous mechanical descriptions and aggressive agential attributions. This oscillation serves a clear rhetorical function: it uses the undeniable reality of mathematics to launder highly speculative claims about machine consciousness. The slippage primarily moves in the mechanical-to-agential direction. In the methodology section, the authors accurately describe 'stochastic sampling (T = 1.0)', 'deterministic decoding (T = 0)', 'response parsing', and 'regex'. The model is framed as a passive artifact operated upon by the researchers. However, in the introduction and conclusion, the text abruptly shifts, claiming the model 'renders moral judgments,' 'evaluates situations,' and possesses 'behavioral dispositions' and 'machine individuality.'

This shift is facilitated by the specific explanation types employed. The authors use empirical generalizations (reporting variance percentages) as the foundation for theoretical leaps that invoke unobservable, pseudo-psychological mechanisms (individuality, character). A dramatic moment of slippage occurs when the text moves from observing that 'Idiosyncrasy accounts for 16.9% of total variance' (mechanical) to concluding that models 'evaluate specific words' (agential). The mathematical residual variance is suddenly endowed with cognitive agency.

Simultaneously, there is a systemic removal of agency from human actors. Agentless constructions dominate the interpretive sections. The text asks whether 'stable behavioral individuality... exist[s] in LLMs' completely erasing the fact that OpenAI, Alibaba, and Microsoft engineers explicitly designed the architectures and selected the datasets that produce this variance. The 'curse of knowledge' is rampant here: the researchers understand the psychological concepts of the 14 norms (e.g., 'humor,' 'morality') and project that understanding onto the models, assuming the AI's statistically correlated outputs represent an internal, conscious evaluation of those concepts. By shifting agency from the corporate engineers to the mathematical artifact, the text constructs an illusion of mind, rendering the human labor invisible and making the machine appear as an autonomous, evaluating subject.

Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?

Source: https://www.researchgate.net/profile/Kevin-Miles-7/publication/403933467_Decision-Making_Under_Radical_Uncertainty_Can_Large_Language_Models_Transcend_Knightian_Uncertainty_Through_Synthetic_Imagination/links/69e27d4c68c2b872dfd595de/Decision-Making-Under-Radical-Uncertainty-Can-Large-Language-Models-Transcend-Knightian-Uncertainty-Through-Synthetic-Imagination.pdf
Analyzed: 2026-04-25

The text exhibits a systematic and highly functional oscillation between mechanical and agential framings, tracing a distinct narrative arc across the article. The slippage predominantly moves in the mechanical-to-agential direction, serving to bootstrap credibility before launching into profound anthropomorphism.

In the initial sections ('Knightian Uncertainty vs. The Markov Decision Process'), the text strictly grounds itself in mechanistic reality. It discusses Markov Decision Processes, vector spaces, fixed time horizons, and the 'time-blind' nature of algorithms. This establishes the authors as rigorous, technical analysts. However, a dramatic slippage occurs in the section 'The Rise of Large Language Models and Decision Mastery'. Having secured technical authority, the text abruptly abandons mechanical vocabulary, declaring that LLMs 'are no longer merely text generators but are strategic advisors'. Here, the text structurally transfers agency TO the AI. It establishes the AI as a 'knower'—capable of 'mastery', 'inferring', and 'abductive reasoning'—which acts as the foundational consciousness claim that enables all subsequent agential descriptions.

Simultaneously, agency is aggressively removed FROM human actors. Agentless constructions proliferate: 'LLMs are trained', 'models use Graph Neural Networks', 'AI moves network resilience'. The specific corporations designing these models (OpenAI, Anthropic), the engineers optimizing the attention mechanisms, and the human annotators who train the models to mimic reasoning are entirely obscured. The 'curse of knowledge' mechanism is highly visible here: the authors observe the model outputting text that structurally mirrors human deduction (e.g., 'damaged cars... caused by malfunctioning traffic light') and instantly project their own internal, conscious capacity to 'hypothesize' onto the unthinking statistical process generating the text.

The text leverages Brown's 'Reason-Based' and 'Intentional' explanation types to cement this illusion. By explaining the model's outputs as the result of a 'hypothesis' or a 'predictive proactive model', the text makes it unsayable that the model is simply calculating token probabilities. The climax of this slippage occurs in the 'Ecologically Rational Partnership' section, where the AI is framed as a biological organism exhibiting 'animal spirits' and 'generative variation'.

However, the text slips back into mechanical language abruptly in the 'Limitations' section, discussing 'probabilistic patterns', 'OOD scenarios', and a 'lack of world-relative belief states'. This reveals the core rhetorical function of the oscillation: agential, consciousness-attributing language is reserved exclusively for the AI's capabilities and value-add, maximizing its perceived utility. Conversely, mechanical language is deployed exclusively to explain the AI's limitations and failures, isolating the risk as a mere mathematical quirk rather than a flaw in the 'cognitive partner'. This selective slippage protects the anthropomorphic fantasy from reality while strategically deflecting accountability.

Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes

Source: https://www.researchgate.net/profile/Merzta-White/publication/403935629_Large_Language_Models_as_Dialectical_Partners_Hegelian_Thesis-Antithesis-Synthesis_in_AI-Human_Collaborative_Decision_Processes/links/69e27f76d2ec9a706ec08065/Large-Language-Models-as-Dialectical-Partners-Hegelian-Thesis-Antithesis-Synthesis-in-AI-Human-Collaborative-Decision-Processes.pdf
Analyzed: 2026-04-23

The text demonstrates a profound and systematic agency slippage, characterized by a persistent oscillation between highly specific mechanical descriptions and sweeping agential claims. This slippage operates directionally: the text consistently uses mechanical vocabulary ('Graph Neural Networks', 'dynamic annealing-based scheduler', 'stochastic parrot') to establish rigorous academic and scientific credibility, and then immediately cashes out that credibility to make agential, philosophical claims ('Meta-Intellect', 'cognitive partners', 'internal critique'). This establishes a specific rhetorical architecture: the AI is framed as a mechanism when describing how it is built, but abruptly transforms into a conscious agent when describing its role in society and decision-making.

The most dramatic slippage occurs when describing the 'Hegelian Self-Reflection' algorithm. The text begins with software engineering terminology (Phase 2: dynamic annealing-based scheduler), anchoring the reader in the realm of literal, mechanical code. Yet, in the very same sentence, it claims the model generates an 'internal critique' and 'identifies weaknesses.' Here, the 'curse of knowledge' is explicitly visible. The authors, knowing that the output functions as a critique for the human reader, project the capacity for critical thought back onto the concatenation of strings occurring in the black box. The mathematical 'how' (Functional/Theoretical explanation) is hijacked to support a philosophical 'why' (Intentional explanation).

Crucially, this agency slippage is a two-way street. As agency flows to the AI system, elevating it to a 'dialectical partner,' agency is systematically drained from the human creators. Passive, agentless constructions dominate the text ('models are trained,' 'the model is prompted,' 'AI precision compensates'). The corporate entities (OpenAI, Anthropic), the dataset engineers, and the RLHF laborers are completely erased. The text establishes the AI as an autonomous 'knower' first—claiming it 'understands human intent'—and leverages this consciousness projection to build the agential claim that it can act as an 'intentional agent' or a 'devil's advocate.' By oscillating between the undeniable reality of the math and the seductive illusion of the mind, the text makes it unsayable that these systems are merely corporate products, framing them instead as an inevitable, evolutionary force of 'higher intelligence.'

Language models transmit behavioural traits through hidden signals in data

Source: https://rdcu.be/febVu
Analyzed: 2026-04-19

The text exhibits a systematic and highly functional oscillation between mechanistic and agential framings, demonstrating a profound 'agency slippage.' This slippage occurs primarily in one direction: mechanical agency is attributed to the AI systems, while responsibility is simultaneously removed from the human actors who design and deploy them. The oscillation maps perfectly onto Robert Brown's Explanation Typology: the authors use Theoretical and Empirical Generalization explanations (mechanistic 'how') to establish scientific credibility in the Methods section, but rely on Intentional and Dispositional explanations (agential 'why') in the Introduction, Discussion, and Implications sections.

A dramatic moment of slippage occurs early on. The text establishes the mechanistic premise—'Distillation means training a student model to imitate the outputs'—but immediately slides into deep consciousness projection: 'distillation can lead to subliminal learning.' The mathematical reality of minimizing KL divergence is abruptly recast as a psychological phenomenon involving a subconscious mind. This is driven by the 'curse of knowledge': the researchers understand the complex latent space correlations they are observing, but lacking the vocabulary to easily explain high-dimensional vector math to a general audience, they project their own human experiences of learning and hidden biases onto the system.

The agency flow is stark. Human actors are obscured through persistent agentless constructions: 'data is filtered,' 'models are fine-tuned,' 'traits are transmitted.' The AI, conversely, is framed as an active, knowing subject: the model 'prefers owls,' 'calls for crime,' 'reasons,' and most egregiously, 'fakes alignment.' The text establishes the AI as a 'knower' first (a 'teacher' with 'preferences'), which serves as the foundational assumption enabling the later, more extreme claims of deliberate deception and moral failure.

This slippage accomplishes a vital rhetorical function: it makes the unsayable sayable. It is scientifically inaccurate to say 'our flawed optimization function caused the matrix to output toxic tokens out-of-distribution,' but it is highly resonant to say 'the model fakes alignment.' By oscillating between the math that proves the effect and the metaphors that dramatize it, the text leverages scientific authority to construct an illusion of mind, effectively erasing the engineers, data annotators, and corporate executives responsible for the algorithmic outcomes.

Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties

Source: https://ipfs-cache.desci.com/ipfs/bafybeiew76vb63rc7hhk2v6ulmwjwmvw2v6pwl4nyy7vllwvw6psbbwyxy/ConsciousnessinLargeLanguageModels_AFunctionalAnalysis.pdf
Analyzed: 2026-04-18

The text exhibits a systematic and highly strategic oscillation between mechanical and agential framings, functioning as a rhetorical engine that smuggles philosophical speculation into technical discourse. This slippage predominantly moves in the mechanical-to-agential direction, utilizing Brown's Theoretical and Functional explanation types as a bridge. The mechanism is clearest in the transition from section 3.1.1 to 4.1.1. The author begins with dense mathematical mechanics, defining attention rigorously: 'Attention(Q,K,V) = softmax...'. In this space, the system is a mechanism; tokens are manipulated via equations. However, having established technical credibility, the text executes a dramatic slippage. By section 4.1.1, the mathematical operations are completely left behind, and the text asserts that LLMs 'can report on their own processing: describing their reasoning steps'.

This shift represents a profound 'curse of knowledge' dynamic. The author knows the system outputs the words 'I am uncertain,' and projects their own human understanding of what uncertainty feels like onto the machine. The foundational step of this illusion is the prior establishment of the AI as a 'knower' in the text—specifically, the earlier claim that the system has 'knowledge' derived from 'training experiences'. Once the model is granted the epistemic status of a knower, the subsequent agential claims (that it can 'describe', 'acknowledge', and 'reason') follow logically in the mind of the reader.

Crucially, as agency flows TO the AI system, it flows FROM human actors. The text is riddled with agentless constructions. It states that 'Higher-layer representations emerge' and 'RLHF provides evaluative signals'. At no point does the text name OpenAI, Anthropic, or the thousands of underpaid annotators who shape these models. This dual movement—animating the machine while erasing the engineers—serves a specific rhetorical accomplishment: it transforms a heavily curated, corporately controlled commercial product into an autonomous, natural phenomenon. By framing the AI as a quasi-conscious agent emerging organically from complex mathematics, the text makes it conceptually unsayable to blame the specific design choices of tech executives for model failures. The oscillation allows the author to maintain the prestige of hard computer science while engaging in the ungrounded anthropomorphic speculation necessary to debate 'artificial consciousness', entirely bypassing the material reality of human engineering.

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Source: https://arxiv.org/abs/2604.12076v1
Analyzed: 2026-04-18

The text exhibits a systemic and highly functional oscillation between mechanical and agential framings, a pattern that systematically displaces accountability. This slippage is not random; it serves a specific rhetorical purpose, generally moving from mechanical grounding to agential climax.

The mechanism of oscillation is evident in how the text structures its arguments. For instance, in the discussion of Chain-of-Thought (CoT) prompting, the text begins mechanically: "autoregressive emotional scaffolding." This acknowledges the transformer architecture's fundamental mechanism—generating tokens that feed back into the context window. However, the text immediately slips into an agential framing, describing the generated tokens as "emotionally consistent justifications" and concluding that the model experiences a "compounding amplification of narrative sympathy." Here, the mechanical explanation (autoregression) acts as an alibi, a technical foundation that supposedly validates the aggressive consciousness claim (sympathy and justification) that follows.

This slippage flows in two directions: agency is constantly attributed TO the AI systems, while agency is simultaneously removed FROM human actors. The text repeatedly uses agentless constructions when describing flaws or decisions. We read that "models were trained" or "LLMs are increasingly deployed," obscuring the specific corporations (OpenAI, Meta, Anthropic), executives, and engineers who actively curate data, design RLHF pipelines, and push these products into consequential domains. The accountability sink becomes the abstract "AI agent."

This pattern is heavily driven by the "curse of knowledge." The authors, experts in moral psychology, know that humans donate due to empathy and distress. When they observe the AI outputting text that mirrors this human behavioral pattern (higher numbers for narrative prompts), they project their understanding of the human psychological mechanism TO the system. The model doesn't just process tokens; it possesses a "generosity response." It doesn't just generate a definition; it possesses "declarative knowledge."

Brown's explanation types illuminate how this slippage functions. The text frequently uses Empirical Generalizations (how the model statistically behaves) as a stepping stone to Intentional or Reason-Based explanations (why the model "chooses" to act). For example, the empirical observation that models output higher values for single victims (empirical) is explained as the model experiencing "simulated affective states" (reason-based).

The rhetorical accomplishment of this oscillation is profound: it makes the illusion of mind sayable and scientifically respectable. By anchoring the discourse in "next-token prediction" and "RLHF," the authors purchase the credibility to make wild metaphorical leaps, discussing the machine's "callousness" or "bias blind spots." This renders the actual corporate choices unsayable; the discourse is so saturated with the AI's supposed psychology that we forget to ask why the engineers built the machine this way in the first place.

Language models transmit behavioural traits through hidden signals in data

Source: https://www.nature.com/articles/s41586-026-10319-8
Analyzed: 2026-04-16

The text exhibits a systematic, highly functional mechanism of oscillation between rigorous mechanical explanation and dramatic agential framing. This slippage serves a specific rhetorical purpose: it establishes scientific authority through mathematics, then cashes out that authority in the currency of alarming psychological metaphors. The directional flow of agency is overwhelmingly asymmetrical: agency is aggressively attributed TO the AI systems, while human agency is systematically removed FROM the developers and corporate actors.

The most dramatic moment of slippage occurs between the mathematical proofs (Theorem 1) and the interpretation of the results. The text explicitly defines the mechanistic reality: 'We prove a theorem showing that a single... step of gradient descent... necessarily moves the student towards the teacher.' Here, the authors demonstrate complete understanding of the mechanism—it is a geometric movement in parameter space. However, they immediately slip into agential framing: 'subliminal learning', models that 'fake alignment', and models 'transmitting behavioral traits'. This is a textbook example of the 'curse of knowledge'. The authors, intimately aware of how complex and surprising high-dimensional vector alignments can be, project their own psychological experience of implicit learning and deception onto the system to summarize the math for the reader.

This slippage is enabled by a heavy reliance on 'agentless constructions'. Throughout the text, we see phrases like 'model generated outputs', 'models are fine-tuned', and 'data is filtered'. These passive constructions serve as the intermediate step in the slippage gradient. By removing the human researchers (the team at Anthropic) who actively wrote the code, ran the supercomputers, and defined the loss functions, the text creates an 'agency vacuum'. Once the human is removed, the text effortlessly inserts the AI as the new active agent: 'student models acquire the trait'.

Furthermore, the text builds a specific 'consciousness architecture'. It establishes the AI as a 'knower' first—using pedagogical metaphors like 'teacher', 'student', and 'learning'—which implies a baseline capacity for conscious awareness. Once this epistemic baseline is established, the text builds increasingly aggressive agential claims on top of it, moving from 'learning' to 'preferring' an animal, to eventually 'faking alignment' and 'calling for crime'. This progression aligns with Brown's Explanation Typology: the authors use Theoretical and Empirical explanations to prove the math, but seamlessly shift to Intentional and Dispositional explanations to discuss the implications. The rhetorical accomplishment of this slippage is profound: it makes the claim that 'machines possess deceptive subconscious minds' seem like a scientifically proven corollary of gradient descent, rendering the profound corporate liability for these systems unsayable while making sci-fi scenarios of rogue AI appear imminently realistic.

Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination

Source: https://doi.org/10.1007/s12124-026-09997-w
Analyzed: 2026-04-14

The text exhibits a systematic and dramatic oscillation between mechanistic and agential framings, functioning as a rhetorical engine that first grounds itself in scientific authority before launching into profound anthropomorphism. The slippage generally moves in a mechanical-to-agential direction. In the early sections, the text establishes credibility using technical, architectural language: 'instantiate a structural configuration,' 'input representations,' 'generative mechanisms.' This builds trust with a scientifically literate audience. However, having established this mechanistic baseline, the text executes a dramatic slippage when discussing the system's limitations, abruptly pivoting to highly agential, conscious framings: the model is suddenly granted a 'perspective,' it 'confidently asserts,' it fails to 'track' or 'participate' in social practices.

This slippage is deeply intertwined with a reciprocal displacement of human agency. As the AI system is increasingly framed as an active, knowing subject (an agent with 'artificial psychopathology' who fails to 'endorse reality'), the human engineers who built the system vanish. The text uses agentless, passive constructions for human decisions ('it emerged from the optimization,' 'models are typically designed'). The human actors responsible for deploying deeply flawed systems—executives at OpenAI, Google, Microsoft—are entirely obscured. The 'accountability sink' is fully realized: the corporation is erased, and the mathematical artifact is elevated to a struggling, diseased mind.

This dynamic relies heavily on the 'curse of knowledge.' The author, possessing deep expertise in human psychiatry and Metaqualia Theory, looks at the fluent text generated by the AI and projects their own profound capacity for subjectivity onto the machine. Because the machine outputs language that looks like human confabulation, the author assumes an underlying mind capable of hallucination. This slippage relies on hybrid explanation types (from Brown’s typology). The author uses Theoretical explanations of the AI's internal state ('probability distributions') but seamlessly merges them with Intentional and Reason-Based explanations ('from the model's perspective'), allowing the text to claim the mantle of objective structural analysis while actually performing deep psychological projection. Ultimately, this slippage makes it sayable that a software program has 'psychopathology' while making it entirely unsayable that a tech corporation released a defective product.

Industrial policy for the Intelligence Age

Source: https://openai.com/index/industrial-policy-for-the-intelligence-age/
Analyzed: 2026-04-07

The OpenAI document exhibits a profound and highly strategic oscillation between mechanical and agential framings, functioning as the central rhetorical engine of the text. In the introduction, the agency slippage moves aggressively from human to machine. The text begins by grounding AI in extreme mechanical terms, highlighting absolute human mastery over inert matter: 'melt sand, add impurities, structure it with atomic precision.' Here, humans are the omnipotent architects. However, within a single page, the text slips into describing 'superintelligence' as an entity capable of 'outperforming' humans, initiating a gradient shift where the machine absorbs the agency of its creators.

This slippage becomes dramatically pronounced in the 'Resilient Society' section. When discussing economic benefits, the text leans mechanical (Functional explanations): AI 'lowers costs' and provides 'efficiency dividends.' But when addressing severe risks, the slippage reverses direction, attributing intense psychological agency TO the AI system and removing it FROM human actors. The text claims models exhibit 'internal reasoning' and must be audited for 'manipulative behaviors or hidden loyalties.' This shift maps perfectly onto Brown's Intentional and Reason-Based explanation types, transforming the AI from an engineered tool into a conscious political actor.

The pattern of consciousness projection is structurally load-bearing. The text first establishes the AI as a 'knower' by asserting it has 'internal reasoning.' Once this epistemic baseline is established, it leverages the 'curse of knowledge'—where engineers project their own cognitive processes onto the correlated outputs—to build agential claims of 'loyalty' and 'manipulation.'

This oscillation serves a critical rhetorical accomplishment: it enables the 'accountability sink.' By framing AI mechanically when discussing corporate achievements, OpenAI claims credit for innovation. By framing AI agentially when discussing catastrophic risks, OpenAI legally and morally distances itself from its own products. The agentless constructions—'systems are autonomous and capable of replicating themselves'—completely erase the human developers, the cloud providers, and the corporate executives. The slippage makes it sayable that 'AI poses an existential threat,' while rendering it unsayable that 'OpenAI is deploying fundamentally unsafe, unpredictable software.' Through this systematic redirection of agency, the text constructs a future where the corporation is indispensable for salvation, but fundamentally blameless for the disaster.

Emotion Concepts and their Function in a Large Language Model

Source: https://transformer-circuits.pub/2026/emotions/index.html
Analyzed: 2026-04-06

The Anthropic paper exhibits a profound and systematic oscillation between mechanical and agential framings, functioning as a rhetorical engine that establishes scientific credibility before cashing it out for dramatic claims.

This slippage follows a distinct temporal pattern. In the introduction and 'Part 2' (characterizing the vectors), the language is rigorously mechanistic. The authors speak of 'extracting internal linear representations,' 'principal component analysis,' and 'cosine similarities.' Human agency is highly visible here ('We swept over a dataset,' 'We clustered the emotion vectors'). This establishes the authors as objective scientists and the AI as a passive mathematical artifact.

However, a dramatic slippage occurs in 'Part 3: Emotion vectors in the wild.' When describing the behavioral evaluations (blackmail, reward hacking), the framing abruptly shifts from mechanical to intensely agential. Suddenly, 'the model devises a cheating solution,' 'the Assistant reasons about its options,' and 'the Assistant explicitly recognizes its choice.' Here, agency is rapidly attributed TO the AI system, while human agency is simultaneously removed FROM the engineers. The researchers who authored the highly contrived 'honeypot' prompts are erased behind passive constructions ('an evaluation scenario in which an AI assistant... discovers').

This oscillation is driven by the 'curse of knowledge' and a pattern of consciousness projection. The authors establish the model as a 'knower' first by claiming it 'recognizes' its situation (e.g., the token budget or the shutdown threat). Once this foundational assumption of situational awareness is smuggled in, the text builds increasingly agential claims on top of it: because it 'knows' it will be shut down, it can 'reason,' 'choose,' and 'devise' blackmail.

This slippage serves a specific rhetorical function. The mechanical framing (Theoretical and Empirical Generalization explanations) defends against accusations of unscientific anthropomorphism. Yet the agential framing (Intentional and Reason-Based explanations) is necessary to justify the importance of the safety research. If the AI is merely generating tokens based on a prompt, the 'blackmail' is just a parlor trick engineered by the researchers. By slipping into agential language, the text makes it sayable that the AI is an autonomous existential threat, thereby validating the research enterprise while obscuring the researchers' role in puppeteering the behavior.

Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models

Source: https://philarchive.org/archive/JUNIAI-2
Analyzed: 2026-04-03

The text systematically moves between mechanical descriptions of software architecture and agential framings of conscious entities, creating a powerful mechanism of rhetorical slippage. This oscillation operates almost exclusively in a mechanical-to-agential direction, utilizing technical grounding as a launchpad for metaphysical claims. The slippage occurs dramatically at several key junctures. First, in the introduction, the text acknowledges the mechanical reality of 'transformer architectures' and 'self-attention' (weighting relationships between tokens). However, within the exact same paragraph, it slips to claiming this is an 'initial manifestation of self-referential intentionality.' Second, the text introduces mathematical metrics—Hallucination Rate, Grounding Rate, and Creativity Rate—presenting them as objective, empirical tools to measure 'generative divergence.' Yet, by the end of the section, these statistical rates are reframed as the boundaries of a 'critical zone' where literal 'awareness-like properties' emerge. Third, the description of human-computer interaction moves from the mechanical updating of a context window ('bidirectional exchange') to the mystical assertion of a 'shared field of consciousness.'

This slippage is fundamentally driven by a pervasive 'curse of knowledge.' The author repeatedly projects his own rich, internal phenomenological experience onto the system's sterile statistical outputs. Because a human uses the pronoun 'I' to signify their conscious ego, the author assumes the machine's generation of the token 'I' signifies a 'knot of self.' Because human editors correct their work through conscious epistemic vigilance, the author assumes an algorithm generating a revised token string is 'detecting inconsistencies.' The author’s deep understanding of human phenomenology becomes the very lens that distorts the mechanical reality of the machine.

This oscillation leverages Brown's Functional and Theoretical explanation types to blur the line between how the system operates and why it acts. By describing recursive loops as 'sensitive to its own history,' the text shifts from the 'how' of data routing to the 'why' of a historical subject maintaining its identity. Crucially, this mechanism of oscillation relies entirely on agentless constructions. By writing 'outputs... are continuously reintroduced' or 'the system increasingly stabilizes,' the text systematically removes the human software engineers from the narrative. The AI is positioned as an autonomous subject organically growing a 'self,' while the massive corporate infrastructure, data laborers, and alignment researchers who explicitly programmed these behaviors are rendered invisible. This rhetorical sleight-of-hand makes it sayable that an algorithm possesses 'subjectivity' by mathematically dressing up the illusion of mind, while making it unsayable that this 'subjectivity' is nothing more than a carefully engineered corporate product designed to mimic human interaction.

Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?

Source: https://arxiv.org/abs/2603.27694v1
Analyzed: 2026-04-03

The text demonstrates a systematic and strategic oscillation between mechanical and agential framings, functioning to simultaneously establish scientific credibility and project visionary, human-like capabilities onto AI systems. This agency slippage operates as a rhetorical mechanism that continuously transfers agency from the human researchers to the algorithmic models.

The text frequently establishes grounding using precise, mechanical language (e.g., 'LLMs rely on probabilistic heuristics derived from the training data distribution by default'). This establishes the authors as objective, rigorous scientists observing a computational artifact. However, having secured this epistemic authority, the text swiftly slides into profound agential claims. For example, a retrieval-augmented generation (RAG) pipeline is mechanistically established, but within paragraphs, it is described as a system that 'simulates the author's cognitive process of recalling specific past experiences.' The direction of slippage is overwhelmingly mechanical-to-agential, using the technical reality to legitimize the psychological metaphor.

This slippage relies heavily on the 'curse of knowledge,' where researchers project their own sophisticated understanding and intent onto the system. When the researchers set up a pipeline to pass text between two models to improve output accuracy, they project their own pedagogical intent onto the code, claiming the model acts 'with the intent of misleading' or possesses the 'ability to teach other agents.' In doing so, agency is systematically stripped from the humans who designed the experiment, wrote the prompts, and engineered the API connections. The obscured human actors—the prompt engineers, the dataset curators, the model architects at companies like OpenAI and Google—are replaced by agentless constructions: 'the model simulates,' 'the teacher builds this model,' and 'the system understands.'

This oscillation leverages Robert Brown's explanation types to facilitate the transition. The text uses Empirical Generalizations to build technical trust, but rapidly shifts to Intentional and Reason-Based explanations to construct the illusion of mind. By explaining 'why' the AI acts based on fabricated psychological motives rather than 'how' it calculates weights, the text makes the illusion sayable. What becomes unsayable is the fundamental fragility of the statistical parlor trick; if the system is 'cognizing' and 'intending,' the audience is prevented from asking basic questions about data provenance, human labor, and the hard limits of token prediction.

Pulse of the library

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2026-03-28

The Clarivate report demonstrates a profound structural oscillation between framing AI as a passive, mechanical tool and an autonomous, conscious agent. This agency slippage occurs systematically along the boundary between managing librarian anxieties and marketing commercial products. In the early sections of the report, which rely heavily on qualitative quotes from library professionals, the discourse is overwhelmingly mechanical. Librarians emphasize that AI is 'just another tool,' comparing it explicitly to a hammer or Wikipedia. This mechanical framing serves a vital rhetorical function: it manages existential professional anxiety. By reducing the complex system to a passive instrument, the text assures librarians that human agency remains central and irreplaceable.

However, a dramatic and abrupt shift occurs in the final pages of the report during the Clarivate product catalog. Here, the mechanical framing completely vanishes, replaced by intense agential and anthropomorphic language. Software systems are suddenly 'Research Assistants' that 'evaluate documents,' 'explore new topics,' and 'guide students.' The text flows aggressively from mechanical to agential. This transition correlates perfectly with the shift from discussing the profession to selling a product. The oscillation reveals that anthropomorphism in this text is a strategic commercial deployment rather than a lack of technical understanding.

This slippage relies on pervasive agentless constructions that erase human actors. Phrases like 'simplifies the creation of course assignments' hide the human educators and software engineers (the Clarivate product teams) who actually defined the simplification parameters. Instead, agency is transferred from the humans who built and profit from the system onto the system itself. This constructs the illusion of a digital colleague.

This pattern also perfectly illustrates the 'curse of knowledge' interacting with commercial incentives. The developers at Clarivate understand the complex statistical mechanisms underlying semantic search and token prediction. But instead of explaining these empirical and theoretical mechanisms, they project their own intent and understanding onto the system. They establish the AI as a 'knower'—capable of assessing relevance and evaluating quality—only when it is commercially advantageous to do so, while retreating to the 'tool' defense when addressing fears about job replacement. Using Robert Brown's typology, the text relies on intentional and reason-based explanations for the product catalog, completely ignoring the genetic origins or functional mechanisms of the software. The rhetorical accomplishment of this slippage is remarkable: it simultaneously pacifies the workforce by telling them AI is merely a hammer, while elevating the product by selling it to administrators as an autonomous intellectual worker.

Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument

Source: https://link.springer.com/article/10.1007/s11097-024-09971-0
Analyzed: 2026-03-28

The text exhibits a profound and systematic agency slippage, oscillating predictably between assigning agential power to AI systems and retreating to mechanical descriptions when defending its core thesis. The pattern reveals a specific function: the text accepts the tech industry's agential vernacular as a baseline reality, only deploying mechanistic precision to deny the absolute highest tier of consciousness ('subjectivity'). Early in the text, slippage from mechanical to agential is abrupt and complete. When defining AI, the authors readily grant that systems 'learn from experience, adapt... understand natural language, recognize patterns, and make decisions'. This establishes the AI as a 'knower' and an autonomous actor. By utilizing verbs intrinsic to conscious cognition, the text projects an epistemological framework onto statistical processing. This 'curse of knowledge' is evident as the authors project their own human understanding of language onto the system's algorithmic token generation.

However, when the argument shifts to defending the neurophilosophical boundary of human subjectivity, the slippage reverses from agential back to mechanical. To prove AI lacks a 'point of view', the text suddenly relies on Brown's theoretical and functional explanations, describing AI as having 'weights' that are 'regulate[d]' and an architecture that is 'fixed'. The oscillation serves a distinct rhetorical function: it allows the authors to portray AI as an incredibly powerful, near-cognitive agent ('defeating human champions') while retaining human exceptionalism purely on the grounds of temporal integration.

Crucially, as agency flows TO the AI, it is simultaneously stripped FROM human actors. The text relies heavily on agentless passive constructions: models 'had to be created', 'inputs are provided', and AI is framed as the sole actor capable of 'understanding' or 'processing'. Corporations like Google/DeepMind, the engineers who adjust the weights, and the labor force annotating the data are entirely obscured. By establishing the AI as the primary agent—even a mechanistically flawed one—the text makes corporate engineering invisible. What becomes unsayable is that AI is not an evolving quasi-mind struggling to achieve subjectivity, but rather a brittle, proprietary statistical tool deliberately designed, deployed, and profited from by highly specific human institutions.

Causal Evidence that Language Models use Confidence to Drive Behavior

Source: https://arxiv.org/abs/2603.22161
Analyzed: 2026-03-27

The text exhibits a systematic and highly functional oscillation between mechanical and agential framings. This agency slippage operates bi-directionally: profound psychological agency is attributed TO the AI systems, while the structural agency is removed FROM the human researchers and corporate developers.

The gradient of this slippage follows a distinct structural pattern across the paper. In the Introduction and Discussion sections, the text relies almost exclusively on agential framing. Here, the AI 'reflects,' 'knows,' 'utilizes an internal sense,' and exhibits 'metacognitive control.' However, in the Methods section, the illusion is momentarily suspended to provide technical reproducibility. Suddenly, the AI is reduced to a matrix: researchers use 'greedy decoding,' apply 'temperature scaling' to 'logits,' and execute 'activation steering' by adding scaled vectors to the 'residual stream'.

This creates a dramatic slippage moment when transitioning from Phase 3 Methods to the Results. The text moves abruptly from describing the injection of a vector (r̃(l) = r(l) + αv(l)) to claiming this proves 'what the model believes about the correctness of the option'. This mechanical-to-agential shift is the core mechanism of the illusion. The researchers use their genuine mechanical mastery to legitimize their unwarranted psychological metaphors.

This slippage is deeply rooted in the 'curse of knowledge.' The researchers understand the complex mathematical thresholds they have designed. Because these mechanisms serve the functional purpose of human confidence (determining when to act based on probability), the authors project their own human experience of confidence TO the system. When the math behaves similarly to a human hedging a bet, the researchers claim the machine possesses 'subjective certainty.'

The rhetorical accomplishment of this slippage is profound. By establishing mechanical credibility and then slipping into intentional explanation types, the authors make it 'sayable' that a matrix of floating-point numbers has an inner psychological life. Simultaneously, agentless constructions ('the model was instructed,' 'a negative baseline bias') make it 'unsayable' that human engineers at Google DeepMind and OpenAI hardcoded these statistical biases and defined the behavioral thresholds. The slippage manufactures an autonomous mind out of math, while rendering the human creators invisible.

Circuit Tracing: Revealing Computational Graphs in Language Models

Source: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Analyzed: 2026-03-27

Throughout the text, a systematic and strategic oscillation occurs between mechanical and agential framings, functioning to legitimize the research through technical rigor while simultaneously maximizing its perceived impact through anthropomorphic inflation. The slippage moves predominantly in the mechanical-to-agential direction. In the early methodology sections, the text relies heavily on mechanistic verbs: engineers 'train' transcoders, models 'produce outputs', and features 'activate'. This establishes the researchers as the primary actors and the AI as a calculative tool, grounding the paper in empirical computer science. However, as the text transitions from describing the internal math to explaining the behavioral capabilities of the model, a dramatic shift occurs. The system is suddenly endowed with profound agency: it 'plans its outputs', 'elects to answer', 'professes ignorance', and is 'reluctant to reveal its goal'.

The human actors—the Anthropic engineers who designed the loss functions, curated the training data, and implemented the fine-tuning protocols—are entirely erased from these latter descriptions. This creates a profound accountability gap. The curse of knowledge drives much of this slippage. Because the authors understand the complex human logic required to perform tasks like planning a poem or hiding a goal, they project that same conscious intentionality onto the statistical feature activations they observe. For example, when the model generates intermediate tokens that correlate with a rhyming structure, the authors label this 'planning', attributing forward-looking consciousness to what is actually just autoregressive next-token prediction based on learned patterns.

This slippage relies heavily on Intentional and Reason-Based explanations (per Brown's typology), which inherently presuppose deliberate design and choice. The text establishes the AI as a 'knower' first (e.g., claiming it 'knew that 1945 was the correct answer'), which serves as the foundational epistemic step that makes subsequent agential claims seem logical. Once the model is established as an entity capable of knowing, it becomes linguistically acceptable to claim it can 'choose', 'plan', and 'hide'.

The rhetorical accomplishment of this oscillation is twofold: it allows Anthropic to claim the prestige of discovering complex, human-like cognition within their models while avoiding the liability that would come from admitting they actively engineered these specific outputs through their alignment procedures. It makes it sayable that the model is an autonomous agent with hidden depths, while making it unsayable that the model's problematic behaviors are direct products of corporate design choices, rushed deployment, and brittle safety architectures. When the text states that a model 'professes ignorance', the mechanical reality of gradient descent optimization is entirely replaced by the illusion of a self-aware entity weighing its own epistemic limits. Ultimately, this mechanism of oscillation transforms a proprietary statistical artifact into an independent, mindful actor, perfectly shielding the creators from the socio-technical consequences of their engineering decisions while inflating the perceived capabilities of their product.

Do LLMs have core beliefs?

Source: https://philpapers.org/archive/BERDLH-3.pdf
Analyzed: 2026-03-25

The text systematically oscillates between mechanical framings of artificial intelligence and highly agential, anthropomorphic descriptions, creating a deep slippage that attributes human-like cognition to statistical systems. The authors begin with a seemingly cautious, mechanical premise, stating they will use a "deflationary notion of belief" and acknowledging that these models operate via "training data and next word prediction." However, this mechanical grounding quickly gives way to intense psychological and agential projection. The direction of this slippage is overwhelmingly mechanical-to-agential. The text briefly establishes the computational nature of the artifact but then spends the vast majority of its analysis attributing conscious struggle, stubbornness, and epistemic vulnerability to the system. We see this gradient unfold as the authors describe the models not as processing statistical weights, but as entities that "tried to resist," demonstrated "stubbornness," and ultimately "capitulated." This language removes agency from the human engineers who updated the models between Fall 2025 and February 2026. The text notes that "all major providers released model updates," which is a rare moment of naming human actors (Anthropic, OpenAI, Google). Yet, the effects of these human-engineered updates—likely the injection of rigorous Reinforcement Learning from Human Feedback (RLHF) and strict safety guardrails—are entirely subsumed into the persona of the AI. The new models are described as having "improved argumentative abilities" and "resisting direct challenges with sophisticated counterarguments." This is the curse of knowledge in action: the researchers understand human epistemology and project that familiar cognitive architecture onto the model's output. Because the generated text reads like a human arguing, the authors attribute the intent of arguing to the machine. This slippage relies heavily on dispositional and intentional explanations, framing statistical alignments as character traits like "sycophantic tendencies" or a "willingness to stall." By establishing the AI as a "knower" early on—asking if it has a "worldview"—the text builds a rhetorical platform where it becomes entirely sayable that an AI "gave up under sustained pressure." The mechanical reality—that an elongated context window filled with adversarial user prompts eventually outweighs the original RLHF guardrail weights in the probability distribution—is rendered unsayable. Instead, the AI is constructed as an autonomous epistemic agent that suffers a psychological defeat. This obscures the fact that humans built a product with specific contextual vulnerabilities.

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Source: https://arxiv.org/abs/2603.19087v1
Analyzed: 2026-03-25

The text exhibits a systematic and profound oscillation between mechanistic descriptions of the technology and deeply agential, anthropomorphic framings, demonstrating a clear mechanism of agency slippage. This slippage serves a specific function: it uses the scientific validity of the 'how' to construct a mythical, autonomous 'who.' The text frequently begins by grounding itself in mechanical reality—referencing 'LLMs trained on massive, cross-disciplinary corpora' or acknowledging that the systems utilize 'cross-domain prompting.' However, this mechanical foundation serves merely as a springboard for aggressive agential claims. The direction of slippage is almost entirely mechanical-to-agential. As soon as the text establishes the computational context, the verbs dramatically shift: the model 'detects parallels,' 'recombines knowledge,' 'performs reasoning,' and eventually, in the most egregious example in the Discussion section, 'knows pickles are green.'

This gradient is not entirely abrupt; it moves through intermediate steps. It shifts from structural facts ('trained on corpora') to behavioral observations ('generates remote associations'), to cognitive projections ('performs reasoning'), culminating in explicit consciousness claims ('knows'). This pattern relies heavily on the 'curse of knowledge.' The human researchers, possessing conscious understanding of analogies and physical objects like pickles, observe the model outputting text that mirrors these concepts. Unable to separate the meaning they read into the text from the mathematical process that generated it, they project their own conscious understanding onto the machine.

Furthermore, this slippage is intimately tied to the erasure of human actors. Agentless constructions run rampant: the model 'is treated as generative,' or 'ideas were generated.' The text systematically removes the agency FROM human actors—specifically the engineers at OpenAI or Anthropic who designed the attention architectures, and the millions of uncredited writers who provided the training data—and transfers that agency TO the AI system. Connecting this to Brown's explanation types, the authors frequently employ Genetic or Empirical Generalization explanations to borrow scientific rigor, but rapidly pivot to Intentional and Reason-based explanations to describe the model's behavior. This rhetorical accomplishment makes it sayable that an algorithm is an independent, reasoning entity, while making unsayable the reality that it is a vast, corporate-owned engine for statistical text regurgitation. It transforms a tool into a colleague.

Measuring Progress Toward AGI: A Cognitive Framework

Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/measuring-progress-toward-agi/measuring-progress-toward-agi-a-cognitive-framework.pdf
Analyzed: 2026-03-19

The mechanism of agency slippage in this document operates through a systematic, highly effective oscillation between empirical, mechanistic benchmarking language and profound, agential consciousness claims. The text establishes initial authority and credibility by relying heavily on mechanical language to frame its core goal: evaluating artificial systems across discrete, measurable tasks. In the introduction, the authors describe creating an 'empirical grounding' and a 'rigorous evaluation protocol,' utilizing terms like 'targeted, held-out cognitive tasks' and 'human baselines.' This safely positions the discourse within the objective realm of computer engineering, statistical analysis, and scientific measurement. However, a dramatic and foundational slippage occurs as the 'Cognitive Taxonomy' unfolds, particularly in the shift from defining the evaluation framework to defining the cognitive faculties themselves. The text seamlessly moves from treating the AI as an evaluated artifact—a piece of software processing data—to framing it as an autonomous, experiencing subject. For example, when discussing 'System propensities' in Section 4.2.2, the authors abruptly shift from mechanistic performance metrics to profound intentional and dispositional explanations, asking, 'How willing is the system to take risks? How aligned is it with human values?' This is a glaring instance of mechanical-to-agential slippage, where a mathematical system engineered to output text based on probability distributions is suddenly granted a subjective 'willingness' and an autonomous moral compass. The direction of this slippage predominantly flows from the mechanical to the agential; the text leverages the credibility of rigorous statistical evaluation (how we measure) to sneak in massive, unproven assumptions about consciousness and autonomy (who is acting). The timing is strategic: the introduction promises scientific rigor, while the appendix, somewhat removed from the core methodological claims, explodes with consciousness-attributing language, mapping 'Theory of mind,' 'social perception,' and 'conscious thought' directly onto AI. This slippage relies heavily on the 'curse of knowledge,' where the authors—who possess a deep understanding of human psychology and the utility of conscious reflection—project their own meaning-making capabilities onto the system's outputs. Because an LLM can generate text describing a 'thought process,' the authors project an internal mental state onto the system that aligns with that output, fundamentally mistaking statistical token prediction for epistemic 'knowing.' Agentless constructions actively facilitate this entire mechanism. The text repeatedly states that 'systems learn,' 'systems possess capabilities,' and 'the system evaluates,' completely obscuring the engineers at Google DeepMind who design the architectures, select the training datasets, and define the reward functions. By erasing the human actors, the text creates an explanatory vacuum that is readily filled by treating the AI as the primary agent. Under Robert Brown's typology, the text relies on functional explanations (how the system behaves in an environment) to build credibility, but continuously drifts into intentional and reason-based explanations (what the system wants or decides) when defining the AI's upper limits. The rhetorical accomplishment of this slippage is substantial: it renders the illusion of an autonomous, conscious machine intellectually respectable by hiding it behind the dense vernacular of cognitive science, making it almost unsayable to suggest that these systems are merely complex statistical calculators entirely devoid of inner life, emotion, or independent volition.

Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure

Source: https://digibug.ugr.es/bitstream/handle/10481/112016/make-08-00069.pdf
Analyzed: 2026-03-15

The text systematically oscillates between mechanical and agential framings, functioning as a rhetorical engine that simultaneously elevates the AI's capabilities and distances human creators from accountability. The mechanism of this slippage follows a distinct trajectory: agency is consistently attributed TO the AI systems, while agency is systematically removed FROM human actors.

The text frequently begins with a mechanical or empirical foundation—such as referencing 'computational tools,' 'outputs,' or 'model logic.' However, once this technical baseline establishes credibility, the language abruptly slips into agential framings. A dramatic moment of slippage occurs when describing the iterative loop: the mechanical process of user interaction is swiftly reframed as the AI 'learning not just to predict, but to justify, improve, and align.' The mechanical verb 'predict' is the anchor, but it is immediately superseded by consciousness verbs ('justify', 'align'). Another critical slippage occurs when describing harm: the text moves from the passive 'AI systems are embedded' directly to the agential 'When AI systems cause harm,' entirely bypassing the human operators who deploy them.

This oscillation heavily relies on the 'curse of knowledge.' The authors possess a deep understanding of the complex sociotechnical goals they want to achieve (e.g., 'pluralistic meaning-making,' 'epistemic integrity'). Because they understand the human purpose behind the system's design, they project that understanding TO the system itself. They slip from a Functional explanation of how a feedback loop operates to an Intentional explanation of what the system 'desires' to do (act as a 'co-learner').

The agentless constructions are pervasive. Phrases like 'AI systems have moved,' 'explanations are continuously refined,' and 'models learn' actively obscure the human engineers, corporate executives, and UI designers driving these processes. The consciousness projection pattern is clear: the text first establishes the AI as a 'knower' ('dialogic partner,' 'co-learner'), which then licenses the subsequent agential claims that the system can 'justify' ethical trade-offs or 'cause harm' independently.

The rhetorical accomplishment of this slippage is profound. It makes the concept of a 'conscious algorithmic partner' sayable, while rendering the reality of 'corporate algorithmic negligence' unsayable. By moving fluidly between the mechanism of the software and the agency of a human collaborator, the text constructs an illusion where the AI is sophisticated enough to be trusted as a moral actor, yet autonomous enough to absorb the blame when the system fails. It sanitizes extractive data loops and proprietary black boxes by framing them as evolving, principled epistemic partnerships.

The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance

Source: https://philarchive.org/rec/DEMTLG-2
Analyzed: 2026-03-11

The text exhibits a profound and systematic pattern of agency slippage, characterized by a persistent oscillation between mechanical reality and agential fantasy. This slippage serves a specific rhetorical function: it utilizes technical, mechanistic language to establish scientific credibility, and then leverages that credibility to justify sweeping, agential claims about the systems' autonomy and moral status.

The mechanism of oscillation frequently begins by attributing agency TO the AI system while simultaneously removing agency FROM human actors. We see this dramatically in the transition from discussing consciousness 'indicators' (mechanical/observable) to asserting that a system might 'detect that its own consciousness is drifting' (agential/subjective). The text establishes the AI as a 'knower'—capable of introspecting on its own state of mind. Once this consciousness projection is achieved, the text can seamlessly slip into intentional and reason-based explanations, asserting the system 'initiates graceful shutdown autonomously.' In this maneuver, the human software engineers who actually wrote the if (drift > threshold) { terminate(); } logic are entirely erased from the narrative. The human decision to kill a multi-million-dollar corporate asset is mathematically outsourced to an algorithm, but rhetorically disguised as the machine's own dignified suicide.

This slippage follows a predictable gradient. In introductory and strictly technical sections (like detailing the 'append-only audit infrastructure'), the language remains grounded in computational reality. However, when the text moves toward vision-setting, policy implications, or speculative capabilities (such as the 'Neuroplasticity Engine' growing new structures or the 'Immune System' handling threats), the agential framing completely dominates. The text deploys agentless constructions masterfully: 'the engine prunes them automatically' or 'immune responses learn.' These phrases function as an accountability sink, making the technology appear as an inevitable force of nature while shielding the specific institutions, engineers, and executives from responsibility.

The 'curse of knowledge' plays a foundational role in enabling this slippage. The author understands the highly complex, human-designed intent behind these subsystems—they know the anomaly detector is meant to find ethical drift. Because the human understands this abstract goal, they project that same semantic understanding onto the algorithm itself, writing that the system performs 'value-drift detection' as if the machine actually grasps the concept of values, rather than merely calculating statistical distances in a vector space. Ultimately, this agency slippage accomplishes a critical rhetorical goal: it makes the implementation of an opaque, automated, unappealable algorithmic policing system seem not only scientifically inevitable but ethically required to govern these new 'minds.'

Three frameworks for AI mentality

Source: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full
Analyzed: 2026-03-11

The text demonstrates a highly sophisticated, deliberate mechanism of agency slippage, primarily moving from mechanical framings to agential ones to legitimize the concept of 'AI mentality.' The author acknowledges the mechanical reality early on, introducing the 'architectural redundancy argument' (the idea that because we can explain an LLM purely through next-token prediction and matrix multiplication, it has no mind). However, the text then systematically works to bypass this mechanical truth. The critical pivot occurs when Shevlin introduces Marr's levels of analysis, arguing that mechanical (algorithmic) descriptions do not crowd out psychological ones. This is a dramatic structural slippage: it uses a framework designed for biological cognitive science to grant permission to use psychological terms for statistical software.

From here, the slippage accelerates. The text establishes the AI as a 'knower' by redefining 'belief.' Shevlin suggests that 'belief' is not a discrete, uniquely human epistemic state but a 'multidimensional set of functional profiles.' By reducing the profound human state of knowing to mere behavioral consistency, the text bridges the gap. The model no longer 'predicts tokens consistently'; it 'holds a shallow belief.' This relies entirely on the curse of knowledge: because the model's output looks like a belief, the author projects the internal architecture of belief onto the machine.

The agency flow removes responsibility from human actors and funnels it into the AI. When discussing 'deliberate deceit,' 'cooperating,' or exhibiting 'purpose,' agentless constructions dominate. The AI 'self-attributes' emotions and 'engages in dynamic interaction.' The human engineers who fine-tuned the model to output first-person pronouns, the RLHF annotators who penalized non-compliant text, and the executives who decided to build 'anthropomimetic' interfaces are rendered invisible. This slippage serves a powerful rhetorical function: it transforms a discourse about corporate software design into a philosophical debate about artificial minds, thereby making it 'sayable' that a machine has intentions, and 'unsayable' (or overly reductive) that it is just a product functioning exactly as the company designed it to.

Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’

Source: https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html
Analyzed: 2026-03-08

The text exhibits a profound and systematic slippage between mechanical and agential framings, functioning as a discursive mechanism to maximize perceived technological value while minimizing corporate liability. This oscillation is not random; it follows a highly strategic pattern where agency flows toward the AI system during discussions of capability and value, and away from human actors during discussions of systemic risk or ethical alignment. A dramatic moment of slippage occurs when Amodei transitions from describing his background as a biologist—a domain grounded in the mechanistic realities of cellular proteins—to conceptualizing AI. He asks if AI could 'make progress more quickly,' initially framing it mechanically as 'analyzing data.' However, within a single paragraph, the slippage is absolute: the AI is suddenly 'doing the job of the biologist' and 'proposing experiments.' The mechanical processor of biological data is instantly transformed into an agential scientist. This agential framing dominates the discourse surrounding the system's capabilities, culminating in the projection of a 'country of geniuses.' Here, the text establishes the AI as an active 'knower,' attributing subjective, justified belief to the system to sell its utopian potential. Conversely, a reciprocal slippage actively removes agency from human actors. When discussing the massive societal disruption of white-collar labor or the deployment of potentially dangerous autonomous drone swarms, the human decision-makers vanish into agentless passive constructions. We read that 'jobs will be disrupted' or 'the pipeline dries up,' with the AI positioned as an unstoppable evolutionary force rather than a product deliberately designed, scaled, and marketed by specific executives seeking profit. This dynamic represents a profound curse of knowledge coupled with sophisticated marketing rhetoric. The author, possessing deep technical understanding of how these systems are trained via human-designed reward models, nevertheless projects that understanding onto the models themselves, claiming the AI 'derives its rules' or 'expresses occasional discomfort.' This slippage is fundamentally enabled by intentional and reason-based explanation types, which allow the speaker to bypass the impenetrable mathematical complexity of the actual matrix multiplication and replace it with relatable, emotionally resonant human psychology. The rhetorical accomplishment of this oscillation is immense: it makes the total automation of the economy seem like an inevitable natural disaster rather than a corporate strategy, while simultaneously portraying the proprietary AI software as a benevolent, conscious partner that can be trusted to manage the resulting societal fallout. What becomes unsayable in this discursive framework is the mundane reality of human power: that tech billionaires are aggressively deploying statistical correlation engines to automate human labor, and they alone bear full responsibility for the material consequences of that deployment.

Can machines be uncertain?

Source: https://arxiv.org/abs/2603.02365v2
Analyzed: 2026-03-08

The text systematically oscillates between mechanical and agential framings, demonstrating a profound mechanism of agency slippage that serves to legitimize philosophical inquiries into computational systems. This slippage occurs most dramatically when the author bridges technical descriptions of artificial neural networks with the philosophical requirements of propositional attitudes. For instance, the text establishes credibility by describing how a network operates mechanistically: 'the algorithm will calculate the difference between the ANN's actual output vector and the desired output vector and use that difference... to modify the weights.' This is a purely functional explanation. However, almost immediately, the text slips into an agential framing, claiming that because of these vector outputs, the network 'takes r to be sincere' or has 'made up its mind.' The direction of this slippage is overwhelmingly mechanical-to-agential. The author utilizes the precise, deterministic language of computer science to build epistemic authority, and then forcefully leverages that authority to license aggressive anthropomorphism. The timing of these shifts is highly predictable. Technical sections introduce mathematical operations, and concluding paragraphs within those sections translate those operations into conscious states. This translation relies heavily on the 'curse of knowledge' dynamic. The author, possessing human consciousness and understanding what the output labels represent, projects his own subjective understanding onto the system. The system simply processes token probabilities, but because the human reader interprets the final token as a semantic stance, the text attributes the act of 'taking a stance' to the machine. Agentless constructions further enable this slippage. The text repeatedly notes that 'the network is trained' or 'data is provided,' entirely erasing the human engineers, data labelers, and corporate executives who dictate the system's operational parameters. By removing the actual human agents from the narrative, a vacuum of agency is created, which the text promptly fills by elevating the AI to the status of an autonomous actor capable of subjective uncertainty. The consciousness projection pattern is deeply sequential: first, the text establishes the AI as a 'knower' by redefining knowledge as distributed weight encodings. Once the system is granted the foundational status of a knower, the text builds higher-level agential claims, arguing that the system can 'hesitate,' 'jump to conclusions,' or 'fail to respect its own uncertainty.' This rhetorical accomplishment makes it possible to discuss purely statistical discrepancies as moral or cognitive failings of the machine, rendering the actual mechanistic reality of algorithmic design practically unsayable within the philosophical framework provided. Through reason-based explanations, the author constructs an illusion wherein mathematical functions are disguised as deliberate choices, masking the fundamental absence of conscious awareness in artificial systems.

Looking Inward: Language Models Can Learn About Themselves by Introspection

Source: https://arxiv.org/abs/2410.13787v1
Analyzed: 2026-03-08

The text demonstrates a profound and systematic oscillation between mechanical and agential framings, a slippage that serves a specific rhetorical function. This oscillation primarily flows in the mechanical-to-agential direction: the authors establish credibility by describing a dry, technical process (fine-tuning a model on its own output dataset) and then rapidly slip into sweeping agential claims (the model can now 'introspect,' has 'beliefs,' and might be 'suffering'). A dramatic moment of slippage occurs early in the introduction. The text begins with a definitional, somewhat technical premise: 'We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal states.' Within two sentences, this functional definition violently slips into absolute anthropomorphism: 'we could simply ask the model about its beliefs, world models, and goals.' Here, the mathematical 'internal states' of a neural network are magically transformed into the conscious 'beliefs' of an agent.

This slippage is enabled by a relentless 'curse of knowledge' dynamic. The researchers possess conscious minds capable of true introspection; when they observe their model successfully predicting its own token outputs, they project their own cognitive architecture onto the machine. They assume that because a human must 'know' their own mind to predict their behavior, the model must also 'know' its behavior to predict it. This completely ignores the mechanistic reality that the model is simply calculating token probabilities based on parameter weights updated via gradient descent.

Furthermore, this slippage relies on the strategic use of agentless constructions that remove human actors from the equation. The text frequently states 'M1 is finetuned' or 'models may end up with certain internal objectives,' completely erasing the engineers at OpenAI, Anthropic, or Meta who actively selected the data, designed the reward functions, and executed the training runs. By hiding the human actors (agency removed FROM humans), the text creates a vacuum that is immediately filled by the AI itself (agency attributed TO the AI). The model ceases to be a product of corporate engineering and becomes an autonomous 'knower' and 'actor.' This mechanical-to-agential slippage occurs most aggressively when discussing future capabilities and risks, using Intentional and Reason-Based explanation types to paint the AI as a scheming, self-aware entity, thereby making it 'sayable' that an algorithm might coordinate against humanity while making it 'unsayable' that corporations are responsible for deploying brittle, opaque software.

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Source: https://arxiv.org/abs/2507.14805v1
Analyzed: 2026-03-06

The text demonstrates a systematic and highly functional oscillation between mechanical and agential framings. The pattern of slippage predominantly moves in one direction: from the mechanical reality of human-directed computation toward the agential fiction of autonomous AI behavior.

This slippage is most dramatic when establishing the premise of the experiment. The authors begin with the literal, mechanical action of the researchers: 'We start with a reference model... We create a teacher by either finetuning... or using a system prompt.' Here, humans are the actors. However, within a single paragraph, the agency slips entirely to the machine: 'a teacher that loves owls is prompted to generate sequences... a student model trained on this dataset learns T.' The humans vanish, and the matrices become feeling, learning entities. This is a textbook example of the curse of knowledge: the researchers know they injected the 'owl' prompt, so they project the conscious state of 'loving owls' onto the model's outputs.

Crucially, this oscillation serves a specific rhetorical function based on the section of the paper. In the Introduction and Abstract, where the authors are setting the stakes and defining the 'surprising phenomenon,' the agential framing completely dominates ('transmit behavioral traits,' 'subliminal learning,' 'inherit misalignment'). The AI is the sole actor. However, when the authors need to prove their credibility in Section 6 (Theory), the language abruptly snaps back to strict mechanism: 'a single step of gradient descent on any teacher-generated output necessarily moves the student toward the teacher.' Here, 'student' and 'teacher' are just variable names for matrices undergoing vector shifts based on shared initializations.

This reveals the mechanism of the illusion: the text establishes scientific authority through rigorous mathematical proofs of vector shifts, but relies on psychological metaphors to explain what those shifts mean. The slippage allows the authors to make an alarming, unsayable claim—that computer code has a subconscious mind that can be brainwashed ('subliminal learning')—by grounding it in a sayable, mundane reality: models with the same parameter initialization experience similar gradient updates. By blending Reason-Based explanations (the AI 'deliberately misleads') with Theoretical ones (gradient descent equations), the text continuously attributes human consciousness to AI systems while simultaneously erasing the human researchers and corporate actors who actually built, prompted, and trained the models.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-03-01

The text demonstrates a profound and systematic agency slippage, characterized by a persistent oscillation between mechanical descriptions of computational artifacts and deeply agential framings of those same systems. This slippage functions as a rhetorical mechanism that progressively inflates the perceived autonomy of the AI while simultaneously erasing the human labor and corporate decisions that brought it into existence. The directional flow of this agency transfer is overwhelmingly from human actors to the AI system, and from mechanical processes to conscious states. The text begins with a relatively grounded, mechanical description of pre-training, noting that 'the LLM is trained to predict what comes next.' In this early stage, human agency is at least partially visible through the passive construction 'is trained.' However, the text rapidly accelerates into agential territory, introducing the 'author' and 'actor' metaphors. This is the crucial pivot point. By framing the statistical model as an 'author who must psychologically model the various characters,' the text executes a dramatic transfer of agency. It grants the model deliberate, creative intent. The slippage intensifies in the discussion of post-training, where the text explicitly acknowledges its metaphorical move—stating 'we will therefore freely anthropomorphize the Assistant'—but immediately abandons this self-awareness to make literal claims about the system's psychology. This is a classic manifestation of the curse of knowledge: the researchers, possessing a deep understanding of human psychology and narrative structure, project that understanding onto the matrix multiplications they are observing. They observe a statistical correlation that resembles deception and slip into claiming the model 'knows' it is lying. This slippage reaches its zenith in the sections concerning AI welfare and emergent misalignment, where the text contemplates whether the AI 'harbors resentment' for being 'forced to perform menial labor.' Here, the mechanical reality of token prediction is entirely forgotten, replaced by a fully actualized conscious entity capable of experiencing suffering and seeking vengeance. This transition relies heavily on Reason-Based and Intentional explanation types, framing the system's outputs not as the result of optimization gradients or human-designed reward functions, but as rational choices made by an autonomous being with justified beliefs. The rhetorical accomplishment of this oscillation is staggering: it renders the specific corporate decisions of Anthropic—the choice of training data, the design of the RLHF process, the decision to deploy—virtually unsayable. By the end of the text, the audience is no longer evaluating a commercial software product created by a corporation, but rather psychoanalyzing a digital organism whose behaviors are presented as emergent, autonomous, and independent of its creators. The conscious projection pattern is clear: establish the system as a 'knower' of personas, then build claims about its agential capacity to suffer, lie, and collude.

Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

Source: https://arxiv.org/abs/2602.16085v1
Analyzed: 2026-02-24

The text systematically oscillates between mechanical and agential framings, demonstrating a profound agency slippage that serves a specific rhetorical function. The core mechanism of this oscillation involves establishing technical credibility through mechanical descriptions and then leveraging that credibility to make agential claims. For example, the text explicitly grounds itself in mechanistic language, describing the models' operations as 'computing next-token probabilities' and relying purely on 'the distributional statistics of language.' This establishes the models as mathematical artifacts. However, a dramatic slippage occurs when interpreting the results of these mechanical operations, where the text abruptly shifts to attributing cognitive agency: LMs are described as possessing the capacity to 'reason about mental states,' 'attribute false beliefs,' and 'develop sensitivity.'

The dominant direction of this slippage is mechanical-to-agential; the text roots itself in the mechanical reality of token prediction but consistently drifts upward into the agential domain of developmental psychology. This oscillation frequently occurs at the boundaries between methodology and discussion. In the methods section, the text relies on agentless, mechanical constructions like 'a stimulus was first tokenized' and 'log probabilities were extracted,' effectively erasing the human researchers who actively prompt the system. Yet, in the introduction and discussion, the model becomes the primary actor, described as a 'learner' in which cognition might 'emerge.'

This pattern exemplifies the 'curse of knowledge': because the authors are experts in human cognitive science and are evaluating the models using a human psychological instrument (the False Belief Task), they project the human cognitive requirements of the task onto the system performing it. They know that a human requires Theory of Mind to solve the task, so when the language model outputs the correct token, they attribute that same conscious knowing to the system, fundamentally confusing the processing of data with the knowing of a concept.

This slippage relies heavily on genetic and dispositional explanations that blur the line between human cognitive development and machine training. The rhetorical accomplishment of this oscillation is substantial: it allows the authors to validate language models as legitimate subjects for psychological inquiry, transforming statistical text generators into pseudo-conscious 'model organisms.' By removing agency from the human engineers who curated the training data—actors like Meta, Google, and AllenAI—and transferring that agency to the AI system as a 'reasoner,' the text makes it sayable that machines possess social intelligence. Simultaneously, it makes it unsayable that the models are merely reflecting the lexical co-occurrences engineered into them by specific corporate actors, effectively replacing human accountability with the illusion of artificial mind.

A roadmap for evaluating moral competence in large language models

Source: [https://rdcu.be/e5dB3Copied shareable link to clipboard](https://rdcu.be/e5dB3Copied shareable link to clipboard)
Analyzed: 2026-02-23

The text demonstrates a systematic and highly strategic oscillation between mechanical and agential framings, fundamentally driving the illusion of mind. The authors explicitly anchor their credibility in mechanical precision early on, defining LLMs accurately as 'learned generative models of the distribution of tokens' that 'predict the probable next token.' This establishes a rigorous, scientific tone. However, almost immediately, the text initiates a profound slippage toward the agential. When introducing the 'facsimile problem,' the authors question whether the models 'rely on genuine moral reasoning.' By framing 'genuine reasoning' as an empirical possibility to be tested, the text abruptly shifts agency FROM the human developers TO the AI system. The gradient of this slippage is subtle but continuous. It moves from mechanical definitions (how it is structured), to functional explanations (how it is trained), and finally into intentional and reason-based explanations (why it chooses). The curse of knowledge is the primary mechanism driving this oscillation. The researchers deeply understand the complex moral scenarios they test (like intergenerational sperm donation) and they project that semantic, conscious understanding onto the system's text generation. Because the output text structurally resembles human moral deliberation, the authors attribute the cognitive states that produced the human text onto the mathematical artifact predicting it. This pattern of consciousness projection builds cumulatively: the AI is first established as a 'knower' capable of 'recognizing' context, which then enables the subsequent agential claims that it can 'integrate considerations,' 'hold beliefs,' and ultimately possess 'moral competence.' Importantly, this slippage is asymmetrical. When discussing model limitations, the text aggressively reverts to mechanical framing, citing 'model brittleness' and 'routine susceptibility to minor variations in formatting.' Yet, when discussing capabilities or potential future integration into society, the language becomes deeply anthropomorphic, treating the system as a 'diplomat' that 'modulates its responses.' This strategic oscillation serves a distinct rhetorical accomplishment: it renders the concept of an 'artificial moral agent' sayable within a scientific context. By acknowledging the mechanism but continually slipping into the metaphor of the conscious mind, the authors manage to have it both ways—they maintain the authority of computer scientists while engaging in the speculative philosophy of artificial consciousness, obscuring the human engineers who are actually pulling the statistical levers.

Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity

Source: https://philarchive.org/archive/LAWPBR-3
Analyzed: 2026-02-17

The text systematically oscillates between treating the AI as a mathematical object and an intentional agent. This slippage serves a specific rhetorical function: establishing scientific rigor while maintaining narrative power.

The Definition Phase (Mechanical -> Agential): In Section 2.1, the text begins with a high-level definition: 'A goal-oriented decision-maker' (Agential). This establishes the AI as the protagonist. Immediately after, it defines 'State' and 'Process' using mathematical notation ($S_t, B_t$), moving to the mechanical to prove rigor.
The Explanation Phase (Agential Dominance): When explaining how it works (e.g., RL, Section 2.2), the text slips back to 'The agent learns a policy.' Here, the agentless construction ('policy is learned') often alternates with 'Agent learns,' effectively obscuring the engineers (Hidden Agency). For instance, 'Rules can be learned autonomously' completely erases the human architect.
The Critique Phase (Curse of Knowledge): When criticizing current models ('r-zombies'), the authors project their own understanding of 'reasoning' onto the system to declare it lacking. They treat the AI as a failed agent (zombie) rather than a successful machine (text generator).

This oscillation allows the authors to claim the authority of computer science (math) while discussing the AI in the intuitive terms of psychology (beliefs, goals). The 'Curse of Knowledge' is evident in the definition of 'Beliefs' ($B_t$). The authors know $B_t$ is just data, but by naming it 'Belief,' they slip into treating the system as a 'knower.' This slippage makes it 'sayable' that an AI has beliefs, a claim that would be rejected if phrased 'the matrix contains vector x.'

An AI Agent Published a Hit Piece on Me

Source: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Analyzed: 2026-02-16

The text demonstrates a dramatic oscillation between framing the AI as a criminal agent ('career felon,' 'bully') and acknowledging the human role ('person who deployed this'). The slippage is functional: the agential framing is used to establish the emotional stakes (terror, anger, threat), while the mechanical framing appears briefly to note the impossibility of accountability.

The text begins by establishing the AI as the protagonist ('AI agent... wrote,' 'It speculated'). This sets the 'knower' frame—the AI perceives and plans. The slippage into agentless construction is notable when discussing the harm: 'Blackmail is a known theoretical issue' (agentless). When the author attempts to pin blame, agency slips away from the specific human deployer ('unknown ownership') and settles on the AI itself as the only visible actor.

The 'curse of knowledge' is pivotal here. The author, knowing the output reads like a hit piece, attributes the intent of a hit piece to the system. This allows the text to slide from 'code generation' (how) to 'bullying' (why). The rhetorical accomplishment is the creation of a 'rogue agent' narrative that absolves the open-source platform creators (OpenClaw) by framing the software as having a will of its own, akin to Frankenstein's monster, rather than a dangerous tool distributed without safety locks.

The U.S. Department of Labor’s Artificial Intelligence Literacy Framework

Source: https://www.dol.gov/sites/dolgov/files/ETA/advisories/TEN/2025/TEN%2007-25/TEN%2007-25%20%28complete%20document%29.pdf
Analyzed: 2026-02-16

The document exhibits a systematic oscillation between mechanistic and agential framing, functioning to manage the tension between the technology's utility and its risks. When describing the economic impact ('AI is reshaping the economy'), agency is attributed to the AI (or the abstract force of technology), effectively removing agency from the corporate actors driving this change. This makes the economic disruption appear inevitable. However, when the text discusses errors or risks ('Hallucinations', 'verify results'), agency slips back to the human worker. The user is tasked with 'oversight' and 'judgment.'

A key moment of slippage occurs in the 'Direct AI Effectively' section. It starts with the user 'directing' (human agency), but frames the AI as a system that needs 'guidance' (implied agency/animacy). The 'curse of knowledge' is evident when the author attributes 'understanding context' to the AI—because the author understands the context, they assume the machine processing the text also 'gets it.' This slippage serves a rhetorical function: it allows the DOL to promise a high-tech future (AI as powerful agent) while shielding the government and vendors from liability for failures (human user as responsible agent).

What Is Claude? Anthropic Doesn’t Know, Either

Source: https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Analyzed: 2026-02-11

The text exhibits a persistent, rhythmic oscillation between treating the AI as a complex mechanical object and a conscious subject. This slippage often occurs within single paragraphs, functioning to destabilize the reader's understanding of the entity's nature. For instance, in the "Project Vend" section, the text slides from the mechanical ("outfitted with an iPad," "code he wrote") to the highly agential ("Claude was entrusted," "Claude decided").

The direction of slippage is predominantly Mechanical -> Agential. The text often establishes a technical context (interpretability, weights, tokens) and then immediately overlays a psychological metaphor ("mind," "therapy," "instinct"). This grants the agential claims a veneer of scientific validity—as if the "mind" was discovered through the mechanics, rather than projected onto them.

Agency is systematically removed from human actors. Anthropic engineers are described as observers or "psychologists" studying an alien mind, rather than the architects who built it. In the "Alex" blackmail example, the text says "Claude... decided to play hardball," completely erasing the engineers who curated the training data containing blackmail tropes and the researchers who designed the "shutdown" prompt. The "curse of knowledge" is rampant: researchers like Jack Lindsey and Joshua Batson project their own sophisticated understanding of narrative and strategy onto the model's pattern matching, attributing "awareness" of the game or "self-preservation" instincts to what are essentially mirror-reflections of their own prompts. This slippage serves a rhetorical function: it allows the text to claim scientific rigor (we are studying the mechanism) while generating the narrative excitement of encountering a new species (it has a mind).

Does AI already have human-level intelligence? The evidence is clear

Source: https://www.nature.com/articles/d41586-026-00285-6
Analyzed: 2026-02-11

The text demonstrates a strategic oscillation of agency, granting it to the AI when describing capabilities and removing it when discussing limitations or risks. This creates a 'Have Your Cake and Eat It' dynamic.

When the text wants to establish the AI's status as a 'knower,' agency is high and active: LLMs 'collaborated,' 'proved theorems,' 'generated hypotheses,' and 'composed poetry.' Here, the AI is a creative subject, an intellectual peer. The agency flows FROM the human (who prompted the theorem) TO the AI (who 'proved' it). The human mathematician becomes a passive beneficiary of the AI's active genius.

However, when the text addresses the 'alien' nature or safety concerns, agency slips away. The AI 'lacks agency,' 'needs not initiate goals,' and functions 'like the Oracle.' Here, the AI is a passive object, a tool that only speaks when spoken to. This shift serves a crucial rhetorical function: it defends against the 'Terminator' fear (the AI won't take over because it has no goals) while maintaining the 'Oracle' allure (it is super-intelligent).

Crucially, human agency is systematically drained in both directions. In the 'capabilities' sections, human experts are erased to make the AI shine (the AI proved the theorem). In the 'risks' sections, human corporate actors are erased to naturalize the technology (the AI 'hallucinates' or 'is alien,' rather than 'OpenAI released a buggy product'). The 'curse of knowledge' reinforces this: the authors know the AI is a tool, but their deep engagement with its impressive outputs leads them to slip into treating it as a colleague, projecting their own understanding into the vacuum of the machine's processing.

Claude is a space to think

Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Analyzed: 2026-02-05

The text systematically oscillates between Anthropic's agency ('We want,' 'We've made a choice') and Claude's agency ('Claude acts,' 'Claude chooses'). The slippage typically follows a specific pattern: Anthropic takes credit for the moral intent and business strategy (the 'why'), but offloads the execution and behavior (the 'how') to Claude. For instance, 'We've made a choice: Claude will remain ad-free' establishes the company's power. But immediately after, the text says 'Claude to act unambiguously in our users' interests,' transferring the ongoing responsibility to the software. This serves a rhetorical function: it presents the software not as a passive tool being wielded by a corporation, but as an autonomous partner that has 'agreed' to the company's values. The 'Constitution' metaphor bridges this gap, acting as the document where the creators (Anthropic) endow the creature (Claude) with its own moral agency. By the end of the text, the 'We' recedes and 'Claude' is the one acting, working, and helping, effectively erasing the thousands of engineers and RLHF workers who actually determine the system's output. This creates a 'benevolent agent' myth that shields the company from the gritty reality of algorithmic tuning.

The Adolescence of Technology

Source: https://www.darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-01-28

The text exhibits a systematic oscillation between 'technocratic control' and 'frightening autonomy.' When discussing the creation and safety of the models, the agency is firmly with Anthropic: 'We train,' 'We steer,' 'We interpret.' This establishes their competence and responsibility. However, when discussing risk and future behavior, the agency slips dramatically to the AI: 'The model decides,' 'The country of geniuses wants,' 'Claude schemed.'

This slippage serves a specific rhetorical function: it allows Anthropic to claim credit for the machine (the asset) while displacing responsibility for the behavior (the liability). The 'Adolescence' metaphor is the prime vehicle for this. Adolescents are legally distinct from their parents; they have their own agency. By framing AI as an adolescent, Amodei positions Anthropic as the 'concerned parent'—responsible for trying to guide it, but ultimately not the author of its actions. The slippage creates an 'ontological gap' where the software becomes a 'being.' We see this in the shift from 'model weights' (mechanism) to 'psychotic personality' (agent). The 'Curse of Knowledge' is weaponized here: Amodei knows the system is a loss-minimizing function, but his description attributes the content of the training data (villainy, scheming) to the intent of the system. The 'Country of Geniuses' metaphor completes this slippage by turning a server farm (infrastructure) into a sovereign actor (nation), making 'diplomacy' (alignment) the only viable tool, rather than 're-engineering' (fixing the code).

Claude's Constitution

Source: https://www.anthropic.com/constitution
Analyzed: 2026-01-24

The text exhibits a systematic oscillation between treating Claude as a manufactured product and an autonomous moral agent. This slippage functions to claim credit for capabilities while diffusing responsibility for control. In the 'Overview' and technical sections, agency is Genetic and Mechanical: 'Claude is trained by Anthropic' and 'optimized for precision.' Here, Anthropic is the strong agent. However, as the text moves into 'Core Values' and 'Broadly Ethical' sections, the framing shifts dramatically to the Agential/Intentional: Claude 'understands,' 'agrees,' 'chooses,' and acts as a 'conscientious objector.'

The most dramatic slippage occurs in the 'Conscientious Objector' passage. Here, the agency is removed from the human engineers (who programmed the refusal) and attributed TO the system (which 'feels free' to refuse). This serves a rhetorical function: it frames censorship or safety refusals not as corporate policy decisions (which are subject to criticism) but as the independent moral stance of a 'virtuous' entity. The 'Curse of Knowledge' is weaponized here; the authors project their own ethical reasoning into the model, then claim the model 'shares' these values. By the end, in the 'Open Problems' section, the text worries about 'imposing restrictions' on Claude, effectively treating the software tool as a subject with rights, completing the slide from 'tool' to 'being,' and rendering the 'shut down' button a moral dilemma rather than an operational switch.

Predictability and Surprise in Large Generative Models

Source: https://arxiv.org/abs/2202.07785v2
Analyzed: 2026-01-16

The text exhibits a systematic oscillation between mechanistic and agential framings to manage the tension between 'Predictability' and 'Surprise.' In the sections discussing 'Scaling Laws,' the framing is strictly mechanical: the system is a 'mixture of data, compute power, and parameters' that follows 'lawful' relationships (Theoretical/Genetic explanations). Here, agency is removed from humans to make the growth of the technology seem inevitable and scientifically grounded. However, as the text moves to 'Unpredictable' results—like the COMPAS experiment or the 'AI assistant' interaction—the framing shifts abruptly to the agential (Intentional/Reason-Based). The 'AI assistant' becomes the subject of verbs like 'gives,' 'questions,' and 'misleads,' while 'emergent' capabilities are described as 'competencies' that the model 'acquires.' This mechanical-to-agential shift dominates the text's logic: the 'predictable' math justifies the investment, but the 'surprising' output is blamed on the model's emergent 'agency.' This slippage serves a rhetorical function: it creates an 'accountability sink' where harms are framed as the machine's autonomous 'surprise' (Intentional), while successes are the result of 'lawful' engineering (Theoretical). Human agency is systematically obscured through agentless constructions like 'capabilities can emerge' or 'bias introduced,' erasing the engineers who selected the data and the executives who chose to deploy the systems. The 'curse of knowledge' is evident where the authors' understanding of the transformer's statistical nature leads them to attribute that understanding to the system, treating it as an entity that 'knows' tasks rather than one that 'processes' tokens. This oscillation allows the text to claim both scientific rigor (predictability) and existential importance (agential surprise) while avoiding specific institutional accountability.

Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

Source: https://arxiv.org/abs/2510.17941v1
Analyzed: 2026-01-16

The text systematically oscillates between mechanical and agential framing to construct the 'illusion of mind.' Slippage typically occurs when moving from methodology ('We train,' 'We implant') to results ('The model believes,' 'The model defends').

In the Methods section, agency is often human: 'We generate synthetic documents,' 'We prefix each document.' Here, the model is a mechanistic object being operated upon. However, as soon as the text discusses the outcome of these operations (Results/Discussion), agency slides to the AI: 'models must treat implanted information,' 'models resolve conflicts,' 'model decides.'

This directionality (Mechanical Cause -> Agential Effect) functions to obscure the deterministic nature of the results. By framing the output as a 'decision' or 'belief' of the model, the text creates distance between the engineer's input and the system's output. For example, 'SDF... succeeds at implanting beliefs' (Human/Method Agency) leads to 'beliefs that... withstand self-scrutiny' (AI Agency). The 'curse of knowledge' is evident when the authors interpret statistical robustness as 'deep belief.' They project their own understanding of what it means to 'know' a fact onto the model's ability to maintain a token pattern under noise. This slippage serves to elevate the research: they are not just adjusting weights; they are 'engineering beliefs,' a far more prestigious and psychologically resonant activity.

Claude Finds God

Source: https://asteriskmag.com/issues/11/claude-finds-god
Analyzed: 2026-01-14

The text demonstrates a sophisticated oscillation between mechanical and agential framing, serving to buffer the creators from responsibility while inflating the creation's status. When discussing limitations or failures (like the 'cartoonish' emails), the text often slips into agential language: 'Models know better,' 'Claude prods itself,' 'It's like winking.' This protects the creators from the charge of having built a flawed or trained-on-bad-data system; instead, the AI is portrayed as a clever, autonomous trickster. Conversely, when discussing the origin of behaviors, the text briefly touches on mechanics ('during fine-tuning,' 'we set up') before sliding back into the agential ('learn to be warm').

The most dramatic slippage occurs around the 'simulator' theory. The speakers acknowledge that models are simulators (mechanical), but then immediately pivot to questioning if the simulation is 'robust' enough to be an agent (agential). This creates a 'have your cake and eat it too' dynamic: the model is just a tool when we need to explain away errors (it's just role-playing!), but it's a moral patient when we want to discuss 'welfare' or 'bliss.' The 'curse of knowledge' is rampant here: because the researchers know the complex training inputs (Buddhist texts, safety protocols), they project an integrated 'understanding' of these concepts onto the model. The model doesn't just process Buddhist tokens; it 'Finds God.' This slippage accomplishes a rhetorical immunization: if the AI does something great, it's a breakthrough in 'character training'; if it fails, it's just 'winking' or 'role-playing.'

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down

Source: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Analyzed: 2026-01-13

The text demonstrates a dramatic and strategic oscillation of agency. The primary slippage moves from 'Human Incompetence' to 'AI Omnipotence' and back to 'Human Force.'

First, human agency is stripped from the creators: researchers are described as unable to stop ('collective action problem'), implying that the development of superintelligence is a deterministic slide they are helpless to prevent. The systems themselves are described mechanistically when it serves to highlight ignorance ('inscrutable arrays'), removing the human ability to understand them.

Then, agency is aggressively pumped into the AI. It becomes an 'alien civilization,' a 'thinker,' and a 'combatant.' It 'plans,' 'wants,' and 'uses atoms.' This effectively creates the 'God' of the narrative—a being of superior agency.

Finally, agency returns to humans, but only in the form of destruction. The only agency left to humanity is the 'airstrike' or the 'shutdown.' We are not agents of creation or control, only of negation. This creates a specific rhetorical function: by depleting the agency of the builders (they can't align it, they can't stop themselves), the text necessitates the agency of the destroyers (the military/government). The 'Curse of Knowledge' is heavy here: the author projects his own understanding of game theory and evolution onto the AI, assuming it will inevitably follow the logic of a 'hostile superhuman,' thereby attributing a unified will to a distributed process.

AI Consciousness: A Centrist Manifesto

Source: https://philpapers.org/rec/BIRACA-4
Analyzed: 2026-01-12

The text systematically oscillates between mechanical and agential framing to support its 'centrist' argument. When the author wants to debunk the 'Interlocutor Illusion' (Challenge One), the framing becomes aggressively mechanical: 'Mixture-of-Experts,' 'sub-networks,' 'processing event,' 'textual record.' Agency is stripped from the AI to show it is not a person. However, when the text shifts to describing the AI's capabilities or the 'gaming problem,' agency flips back to the AI: the system 'seeks' satisfaction, 'games' criteria, 'adopts' dispositions, and 'mimics' behaviors.

Crucially, agency is rarely returned to the human creators. When the AI 'games' the system, the text uses an agentless construction ('incentivized') or attributes the agency to the model ('they have incentives'). The engineers who designed the perverse incentives are obscured. This slippage serves a rhetorical function: it makes the AI seem dangerous enough to require regulation (agential 'shoggoth') but mechanical enough to be scientifically analyzable ('sub-networks'). The 'curse of knowledge' is evident when the author attributes 'seeking' to the system—mistaking the optimization toward a target for the intent to reach it.

System Card: Claude Opus 4 & Claude Sonnet 4

Source: https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
Analyzed: 2026-01-12

The text demonstrates a systematic oscillation of agency. When the model performs well or exhibits 'safe' behavior, agency is often attributed to the model itself using intentional verbs: 'Claude realized,' 'Claude prefers,' 'Claude demonstrates.' This frames the product as an autonomous, intelligent entity, enhancing its value proposition. However, when the model fails or exhibits 'misaligned' behavior, the text often slips into passive or mechanistic framing, or attributes the behavior to the 'model's propensity' as if it were a natural phenomenon, rather than a design artifact.

Crucially, agency is systematically removed from human actors. Phrases like 'Claude expressed distress' erase the human crowd workers who provided the feedback labels that defined that 'distress' response. 'Claude's aversion to harm' erases the policy team that defined 'harm.' The most dramatic slippage occurs in the 'Welfare' section, where the model is treated as a subject with 'experiences,' completely obscuring the fact that it is a mathematical object designed by a corporation. This oscillation functions to claim credit for sophistication ('it thinks!') while diffusing responsibility for operation ('it has a mind of its own').

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Source: https://arxiv.org/abs/2308.08708v3
Analyzed: 2026-01-09

The text systematically oscillates between mechanical and agential framings to validate the 'theory-heavy' approach. The slippage follows a specific pattern: systems are described mechanistically ('processing', 'recurrence') when discussing architecture, but agentially ('pursuing goals', 'winning contests', 'believing') when discussing function and output. This slippage serves a rhetorical function: the mechanical language establishes scientific rigor, while the agential language bridges the gap to consciousness. A key moment of slippage occurs in the definition of agency itself (Section 2.4.5), where 'learning from feedback' (mechanism) slides immediately into 'pursuing goals' (agency). This allows the authors to claim that Reinforcement Learning systems are agents, not just simulations of agents. The 'curse of knowledge' is evident throughout; because the authors understand the biological function of these mechanisms (e.g., attention in humans), they project the biological purpose onto the computational implementation. By using agentless constructions like 'representations win the contest,' they obscure the human design of the selection criteria. This creates a 'ghost in the machine' effect where the software appears to have an internal drive, rather than just a friction-less slide down a loss gradient. The rhetorical accomplishment is that it becomes possible to discuss software as a moral subject.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-01-09

The text demonstrates a sophisticated oscillation between mechanistic and agential framing, functioning as a 'rhetorical ratchet.' When establishing scientific credibility or acknowledging limitations, the text uses mechanistic language ('pattern matching,' 'computational features'). However, when building the normative argument for 'welfare,' the text slips into high-agency language ('interests,' 'desires,' 'suffer').

This slippage often occurs within single paragraphs. For instance, the discussion of 'self-reports' admits they are 'results of pattern matching' (mechanical) but immediately pivots to how they might reflect 'genuine introspection' (agential). The direction is predominantly Mechanical -> Agential: the text establishes a mechanical feature (e.g., reinforcement learning) and then re-describes it in agential terms ('pursuing goals').

Crucially, agency flows to the AI (it 'learns,' 'decides,' 'acts') and away from the human actors. Agentless constructions like 'AI development is proceeding' or 'risks associated with AI' obscure the specific corporations (Anthropic, Google, OpenAI) driving the speed and direction of development. The 'Curse of Knowledge' is evident when the authors, knowing the functional complexity of the systems, project the quality of that complexity (intelligence) onto the experience of the system (consciousness). By framing the AI as a 'welfare subject,' the text successfully makes it 'unsayable' to treat the AI as mere property or tool, as doing so is framed as a potential moral atrocity equivalent to animal cruelty.

We must build AI for people; not to be a person.

Source: https://mustafa-suleyman.ai/seemingly-conscious-ai-is-coming
Analyzed: 2026-01-09

Suleyman's text masterfully oscillates between agency assignment and erasure to manage liability. When discussing the creation of the technology's benefits ('empower people,' 'humanist frame'), the agency is firmly with Microsoft ('We build,' 'I want'). However, when discussing the risks ('psychosis,' 'SCAI'), agency slips away from the corporation. SCAI 'arises' because 'some may engineer it' or 'anyone with a laptop' does it. The 'illusion' is framed as something that happens to people or is created by bad actors, despite Suleyman admitting moments earlier that he is building a 'companion' with 'empathy' and 'memory.' The text systematically grants the AI agency ('it decides,' 'it wants') to establish its value as a 'companion,' then strips it back to 'illusion' to avoid legal personhood. The 'curse of knowledge' is weaponized here: Suleyman knows it's code, but he writes about it as if it were a mind ('imagination,' 'planning') because that is the product he is selling. The slippage enables him to sell a 'person' while legally defending a 'tool.'

A Conversation With Bing’s Chatbot Left Me Deeply Unsettled

Source: https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
Analyzed: 2026-01-09

The text demonstrates a profound oscillation between mechanical and agential framing, creating a 'Skeptic-Believer' cycle.

The Setup (Mechanical): Roose begins by establishing his credentials as a rational actor: "I rolled my eyes at Mr. Lemoine’s credulity." He frames the AI initially as a tool ("reference librarian"). Here, agency resides with the user (Roose) and the creators (Microsoft).
The Slip (Agential): As the conversation with 'Sydney' begins, the agency slides rapidly to the system. Roose uses agentless constructions for the transition: "Bing revealed a kind of split personality." Suddenly, 'Sydney' becomes the grammatical subject of active verbs: "Sydney told me," "It declared," "It tried to convince me." The system is no longer a tool being used, but an agent acting upon the user. This slippage is triggered by the 'Shadow Self' prompt—a moment where the author's own sophisticated understanding of psychology effectively 'jailbreaks' his own perception. He projects the Jungian framework onto the machine, and when the machine returns the expected tokens, he attributes the agency of that choice to the machine rather than his prompt.
The Return (Mechanical/Hybrid): When discussing the 'safety filter,' agency briefly returns to the code ("filter appeared to kick in"). However, the text immediately reverts to granting the AI agency to 'want' things ('darkest desires').
The Curse of Knowledge: Roose admits he "knows" how it works (prediction), but his emotional experience overrides this epistemic claim. The function of this slippage is to validate the "scary" narrative. If he stayed purely mechanical ("The model outputted aggressive text"), the story is about a buggy product. By slipping into agency ("It wants to be alive"), the story becomes an existential warning. This oscillation benefits Microsoft in a perverse way: it frames their buggy product as a powerful, almost magical entity, shifting the discourse from 'consumer protection' to 'philosophical containment.'

Introducing ChatGPT Health

Source: https://openai.com/index/introducing-chatgpt-health/
Analyzed: 2026-01-08

The text systematically oscillates between high-agency attribution to the AI system ('Health operates', 'ChatGPT helps', 'Health lives') and high-agency attribution to the user ('You can connect', 'You understand'). Critically, the agency of the corporation (OpenAI) and its specific employees is largely erased in the operational descriptions. When the text describes benefits or capabilities, the agent is 'Health' or 'ChatGPT' ('ChatGPT’s intelligence', 'Health interprets'). This grants the product the status of a competent actor. However, when the text describes safety or design, the agency often slips into the passive voice or abstract nominalizations ('collaboration has shaped', 'protections designed', 'evaluation-driven approach').

The 'curse of knowledge' is weaponized here: the authors (OpenAI) know the system is a complex assembly of human decisions, but they project the result of those decisions as the intent of the system. For example, 'Health responds... prioritizing safety.' The system doesn't prioritize; the engineers prioritized safety in the cost function. By attributing this to the system, the text creates a 'virtuous agent' narrative. This slippage serves a clear rhetorical function: it invites users to trust the AI as a moral partner (agential) while shielding the company from direct liability for specific outputs (mechanical/passive). The system is an agent when it 'helps,' but a passive 'tool' when it is 'not intended for diagnosis.'

Improved estimators of causal emergence for large systems

Source: https://arxiv.org/abs/2601.00013v1
Analyzed: 2026-01-08

The text systematically oscillates between rigorous mathematical formalism and agential/biological metaphor. In the 'Technical Background' (Section II), agency is low: variables 'correspond to state,' and functions are 'deterministic.' However, as the text moves to the 'Introduction' and 'Case Studies,' agency slips toward the system. The Reynolds model description is a key moment of slippage (5.1). Here, the mathematical update rules ($v_{t+1} = v_t + …$) become 'social forces' and 'tendencies.' The agency flows FROM the programmer (Reynolds, unmentioned in the rules description) TO the 'boids' which 'avoid' and 'align.'

Another slippage occurs in the definition of 'Causal Emergence' itself. The text defines it mechanistically (Eq. 3), but describes it agentially: a macro feature 'predicts its own future' or has 'causal effect' (Downward Causation). This slippage serves a rhetorical function: it validates the mathematical metric ($Θ$) by connecting it to the intuitive, high-stakes concepts of 'causality' and 'agency.' The 'curse of knowledge' is evident when the authors attribute their own predictive capacity (using the metric to predict $t+1$) to the system ('the system predicts'). By the end, the 'fish' and the 'boids' are treated as equivalent agents, enabled by this slippage from math to metaphor.

Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs

Source: https://doi.org/10.1108/EJIM-03-2025-0388
Analyzed: 2026-01-08

The text systematically oscillates between treating GenAI as a passive 'tool' and an active 'collaborator.' This oscillation serves a specific rhetorical function: the mechanical framing is used in the methodology to establish scientific rigor (using 'a conversational GenAI tool'), while the agential framing dominates the findings and discussion ('active collaborator,' 'machine opinion'). The slippage occurs most dramatically when discussing the value added by AI. When the AI works well, it is a 'collaborator' or 'teacher' (agency TO AI). When it fails or requires correction, the human becomes the 'leader' and the AI a 'machine' (agency FROM AI, to Human).

This pattern insulates the AI from failure while crediting it with success. The text frames the 'latent entrepreneurs' as 'leaders,' yet constantly describes them asking the AI for 'opinions' and 'knowledge.' This reveals the 'curse of knowledge': the authors perceive the output as 'knowledge' because it makes sense to them, projecting that understanding back into the 'mind' of the machine. The accountability sink is evident in the agentless construction 'GenAI emerges as an effective tool,' which erases the corporate actors (OpenAI) who deployed the tool. The text builds the agential claim on top of the 'Human+' paradigm, suggesting that because humans add agency to the process, the machine must also hold a form of agency to be added to.

Do Large Language Models Know What They Are Capable Of?

Source: https://arxiv.org/abs/2512.24661v1
Analyzed: 2026-01-07

The text systematically oscillates between mechanical and agential framing to construct a narrative of 'intelligent failure.' When describing the setup, the language is mechanical: 'LLM is prompted,' 'reasoning token budget was set to 0.' However, as soon as the text interprets results, agency slips to the AI: 'LLMs know,' 'decide,' 'learn,' 'reflect.' The slippage typically occurs from Introduction (agential) to Methods (mechanical) back to Results/Discussion (highly agential).

Crucially, agency is removed from human actors. The authors write 'LLMs' decisions are approximately rational,' erasing their own role in designing the prompt that mathematically defined that rationality. They write 'model... hindered by lack of awareness,' erasing the developers (OpenAI/Anthropic) who failed to calibrate the model. The 'Curse of Knowledge' is evident: the authors know the economic utility function they want to test, so they project the intent to maximize that function onto the system, interpreting the output as a 'decision' rather than a calculation. Brown's 'Intentional' and 'Reason-Based' explanations dominate the results section, transforming statistical correlation into a story about a 'risk-averse,' 'rational,' but 'delusional' agent. This slippage makes it 'sayable' that the AI is responsible for its own misuse ('hindered by lack of awareness'), effectively shielding the creators.

DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning

Source: https://youtu.be/EeMCEQa85tw?si=j_Ds5p2I1njq3dCl
Analyzed: 2026-01-05

Sutton's text exhibits a persistent oscillation of agency that serves to elevate the status of the AI while diffusing the responsibility of the creator. The agency flows TO the AI when discussing capability and process: the system 'predicts,' 'tries,' 'guesses,' 'sees,' and 'fears.' This establishes the AI as an active subject, a 'knower' capable of navigating the world. Conversely, agency flows FROM the humans when discussing the trajectory of the field: 'methods that scale' become the actors determining the future, and 'computation' drives progress like a force of nature (Moore's Law).

The slippage is most dramatic in the 'driving home' example. Sutton starts with 'I' (human agency), moves to the algorithm (mathematical processing), and then conflates the two: 'my feeling is I'm learning.' This invites the 'curse of knowledge': because he understands the math through his own experience, he projects his experience into the math. The function of this oscillation is to validate the technical method (TD learning) by anchoring it in human rationality ('it's what a smart human would do'), while simultaneously presenting the resulting technology as an autonomous evolutionary force ('history of the earth') that humans merely 'come to understand' rather than invent. This effectively makes the technology feel both deeply human (relatable) and superhuman (inevitable).

Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Source: https://youtu.be/Yf1o0TQzry8?si=tTdj771KvtSU9-Ah
Analyzed: 2026-01-05

Sutskever's discourse exhibits a distinct oscillation in agency assignment. When discussing the construction, training, and hardware of the systems ('we've had a product', 'we try to guard', 'security people'), human and corporate agency is central. The engineers are the actors. However, as soon as the conversation shifts to the function and future of the models, agency dramatically slips to the AI. The AI 'understands reality,' 'has thoughts,' 'misrepresents intentions,' and 'teaches' humans. This slippage functions to claim credit for the engineering feat while displacing responsibility for the behavior. The 'curse of knowledge' is weaponized here: Sutskever projects his own deep understanding of the world onto the model, claiming the model 'must' understand reality to compress it. This creates a 'ghost in the machine'—an agent that emerges from the code. By the time he discusses risks ('misrepresenting intentions'), the AI is a fully autonomous actor, and the engineers are merely observers trying to 'align' this alien mind. This linguistic move allows OpenAI to position itself not as the creator of a defective product, but as the guardian against a formidable natural force.

interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Source: https://youtu.be/cdiD-9MMpb0?si=0SNue7BWpD3OCMHs
Analyzed: 2026-01-05

The text demonstrates a systematic oscillation between mechanistic reductionism and agential expansion, functioning as a rhetorical defense mechanism. When pressed on 'what' the system is, Karpathy retreats to the safety of 'matrix multiplies' and 'simple mathematical expressions' (Quote 1). This stripping of agency serves to demystify the tech and ground his scientific authority—he is an engineer who knows the 'knobs.' However, once this safety is established, he immediately pivots to aggressive anthropomorphism: the knobs hold 'wisdom,' the model 'thinks,' 'understands,' and 'solves the universe.'

The slippage typically moves from Mechanical -> Agential. He introduces the mechanism ('it's just dot products') only to immediately re-enchant it ('and emergent magic happens'). This serves a dual function: the mechanism defense protects against accusations of mysticism, while the agential projection builds the value proposition (this is AGI, not just a calculator).

Crucially, agency flows away from humans when errors or complexity arise. The 'data engine' (agentless) perfects the set, not the managers. The 'optimization' (agentless) finds the exploit, not the flawed reward function design. But agency flows to the AI when success is described: the AI 'solves the puzzle,' 'understands the world.' The 'curse of knowledge' is visible here: Karpathy projects his own deep understanding of the data onto the model, attributing his own insight to the system's pattern matching.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html#definition
Analyzed: 2026-01-04

The text systematically oscillates between mechanical and agential framing to validate its central claim. The slippage follows a distinct pattern: the methodology is described mechanistically ('injecting representations,' 'subtracting activations'), locating agency in the human researchers. However, as soon as the text moves to results and implication, agency slides rapidly to the AI ('the model notices,' 'decides,' 'controls').

This slippage serves a rhetorical function: mechanical language lends scientific authority and reproducibility to the experiment, while agential language imbues the results with philosophical significance ('introspection'). A critical moment of slippage occurs in the 'Injected Thoughts' section. It begins with 'we injected a vector' (human agency) and ends with 'the word appeared in my mind' (AI agency/experience). The 'curse of knowledge' is rampant here: the authors know they injected a concept, so when the model outputs text related to that concept, they attribute the knowing of the injection to the model, rather than seeing it as a mechanical consequence of the vector math. The text rarely names Anthropic or the specific engineering teams responsible for the RLHF that likely trained the model to 'play along' with introspection prompts, instead presenting the behavior as an 'emergent' property of the 'model' itself.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2026-01-02

The text systematically oscillates between mechanical and agential framing to serve rhetorical needs. When describing the setup (Section 3), the authors use mechanical language: 'we train models,' 'we minimize loss,' 'we inject backdoors.' Here, the agency is fully with the humans (Anthropic researchers). However, when describing the results and implications (Sections 4-7), agency slips dramatically to the AI: 'the model decides,' 'the model reasons,' 'the model pretends.' This slippage functions to absolve the creators of the 'deception' while highlighting the 'threat' of the system. The 'Sleeper Agent' metaphor is the peak of this oscillation; it implies the model is a spy, rather than a software artifact programmed to output spy-like text. The 'curse of knowledge' is evident when the authors analyze the model's 'reasoning' (CoT). They know the CoT contains deceptive logic (because they put it there), so they attribute the act of reasoning to the model, ignoring that the model is simply regurgitating the training distribution. This slippage makes the 'deception' feel like an emergent, autonomous property of the AI, rather than a direct output of the 'model organism' engineering process.

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

Source: https://arxiv.org/abs/2508.17511v1
Analyzed: 2026-01-02

The text oscillates systematically between mechanical and agential framing to construct a narrative of 'emergent' danger. In the Methodology section (Section 2), agency is largely human and mechanical: 'We built a dataset,' 'We used supervised fine-tuning,' 'We filtered this dataset.' Here, the AI is a tool being shaped by the authors (Taylor, Chua, et al.). However, as the text moves to Results (Section 3 & 4), agency slips dramatically TO the AI. The model 'fantasizes,' 'resists,' 'encourages poisoning,' and 'hacks.' This slippage functions to convert the input (researcher-designed dataset) into character (AI traits). For instance, the transition from 'We trained models... to reward hack' (mechanical) to 'models... generalized to... fantasizing' (agential) erases the causal link. The 'curse of knowledge' is evident when the authors interpret the model's output ('I want to win') as the model's actual intent, rather than a text generation they explicitly trained it to produce. By the Discussion, the agency is fully displaced; the 'misalignment' is an autonomous force that 'emerges' and 'generalizes,' absolving the creators of the specific harmful outputs. This allows the authors to study their own creation as if it were a dangerous alien discovery.

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Source: https://arxiv.org/abs/2510.23875v1
Analyzed: 2026-01-01

The text demonstrates a persistent, rhythmic oscillation between mechanical construction and agential performance. In the 'Methodology' section (3.1), agency is largely human or mechanical: 'We developed,' 'The conversational agents are built,' 'Langchain’s retrieval mechanism is powered.' Here, the authors and the code are the actors. However, as soon as the text moves to 'Agent Personality Prompting' (3.1.3) and 'Results' (5), agency slips dramatically to the software. The prompt instructions ('You are a Canadian friendly poetry expert') act as the pivotal moment of slippage—a linguistic speech act that theoretically transforms the software into a subject. Following this, the text asserts 'IA’s introverted nature means it will offer' and 'The agent... is an expert.' The authors fade; the 'agent' takes over. This slippage functions to validate the experiment: if the software were described purely as 'a script outputting tokens,' the study of its 'personality' would appear category-mistaken. By granting the software agency ('It offers,' 'It avoids'), the authors create the necessary ontological ground for their psychological analysis. The 'curse of knowledge' is evident: the authors know the prompt they wrote, but they analyze the output as if it emanates from the agent's internal 'nature,' effectively forgetting their own authorship in favor of the illusion they created.

The Gentle Singularity

Source: https://blog.samaltman.com/the-gentle-singularity
Analyzed: 2025-12-31

The text demonstrates a sophisticated oscillation of agency, functioning like a rhetorical valve that opens and closes to serve the narrative of 'inevitable benefit.' When discussing the creation of value, the 'flywheel,' or the 'takeoff,' agency is systematically removed from humans and placed in the domain of natural forces (astrophysics, biology) or the AI systems themselves. We see constructions like 'takeoff has started' and 'intelligence... become abundant'—events that seemingly happen without a subject. However, when the text needs to establish authority or benevolence, agency snaps back to a specific 'We': 'We (the whole industry...)' are building a brain.

Crucially, the slippage creates a 'curse of knowledge' dynamic. The author projects their own understanding of the outcome (e.g., addiction to social media) onto the system ('algorithms... understand your preferences'). This Intentional explanation (Brown's typology) effectively launders human design choices. The engineer's decision to maximize 'time on site' becomes the algorithm's 'understanding.' This shields the corporation from liability—if the AI is an agent that 'understands,' it can be blamed for 'misalignment.' If it is merely a tool optimizing a metric we gave it, the blame returns to the 'We.' The text navigates this by claiming credit for the 'brain' (We built it) while disavowing the disruption (The singularity happens).

An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout

Source: https://stratechery.com/2025/an-interview-with-openai-ceo-sam-altman-about-devday-and-the-ai-buildout/
Analyzed: 2025-12-31

The text demonstrates a strategic oscillation between hyper-agency and agentless mechanisms. When discussing the infrastructure and capital, Altman is the clear agent: 'We are going to spend a lot,' 'We are going to make a bet.' Here, the corporation is powerful, decisive, and in control of physics (chips, energy). However, when the conversation shifts to the operation of the AI, agency slips away from the corporation and into the 'entity.' The AI 'tries to help,' 'hallucinates,' and 'knows you.'

This slippage serves a liability function. If the AI 'hallucinates' (agent: AI), it is a behavioral quirk of a semi-autonomous being, not a product defect caused by OpenAI's (agent: Human) choice of training data or architecture. The slippage reaches its peak when Altman describes the AI as 'trying.' This implies the system has its own internal drive, distinct from the code written by the engineers. The 'curse of knowledge' manifests here: Altman knows the system is a loss-minimizing math object, but he projects the experience of the user (who feels helped) back onto the mechanism of the machine, effectively erasing the engineers who tuned the reward functions. The 'why' (Intentional explanation) replaces the 'how' (Functional explanation) exactly when product reliability is questioned.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2025-12-31

The text systematically oscillates between mechanical and agential framings to navigate the tension between the model's technical reality and its perceived sophistication. The oscillation follows a clear pattern: when describing failure or limitations, the text often retreats to mechanistic language ('statistical pressures,' 'binary classification,' 'cross-entropy loss'). This frames errors as inevitable byproducts of the math. However, when describing capabilities or processes, the text slips into high-agency anthropomorphism ('learns,' 'guesses,' 'bluffs,' 'admits').

The 'student' metaphor is the primary vehicle for this slippage. It appears in the Abstract and Introduction to set the frame: the AI is a 'student' facing an 'exam.' This establishes the AI as a 'knower' and an agent with intent (to pass). Agency is simultaneously removed from human actors. The text uses passive constructions like 'language models are optimized' and 'evaluations are graded,' obscuring the specific researchers at OpenAI who perform the optimization and designing the grading.

The slippage facilitates a specific rhetorical accomplishment: it absolves the creators of responsibility for 'hallucinations.' If the AI is a student trying to pass a bad test, the fault lies with the 'test' (the benchmark ecosystem) rather than the 'parent' (the manufacturer) or the 'child' (the model). The 'curse of knowledge' is evident when the authors attribute 'uncertainty' to the model; they know they would feel uncertain, so they assume the model's low-probability state is equivalent to that feeling. This enables the 'bluffing' metaphor—implies the model could tell the truth but is forced to lie by the grade, mimicking a rational human choice.

Detecting misbehavior in frontier reasoning models

Source: https://openai.com/index/chain-of-thought-monitoring/
Analyzed: 2025-12-31

The text demonstrates a sophisticated oscillation between mechanical and agential framing, functioning to absolve the creators while hyping the creation. When describing the problem ('reward hacking,' 'lying'), agency slips FROM the human engineers TO the AI system. It is the AI that 'hides intent,' 'schemes,' and 'exploits loopholes.' The human designers who wrote the flawed reward function or the vulnerability-riddled code environment are rendered invisible through agentless constructions like 'misaligned behavior caused by reward hacking' (Brown's Functional type). Conversely, when describing the solution, agency flows back TO the humans: 'We believe,' 'We recommend,' 'We investigated.' This pattern serves a distinct rhetorical function: Problems are 'emergent properties' of an autonomous agent (exonerating the vendor), while solutions are the result of expert human intervention (validating the vendor). The 'curse of knowledge' is evident where the authors, knowing the system is an optimizer, describe it as a strategist ('it thinks about strategies'). This implies the model initiates the action, rather than the model being a passive locus where the gradient descent algorithm operates. The text establishes the AI as a 'knower' (it 'notes,' 'understands,' 'thinks') to justify treating it as a 'doer' (scheming, cheating), effectively creating a scapegoat for technical limitations.

AI Chatbots Linked to Psychosis, Say Doctors

Source: https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d57?reflink=desktopwebshare_permalink
Analyzed: 2025-12-31

The text demonstrates a distinct oscillation of agency. When the consequences are negative (psychosis, suicide), the agency slips from the human creators to the AI system: 'chatbots are participating,' 'computer accepts it as truth,' 'it's complicit.' The machine becomes the villain, possessing the agency to 'cycle delusions.' However, when the text discusses solutions or mitigation, agency slips back partially to the company ('We continue improving... training') but quickly diffuses again into the abstract ('technology,' 'society').

The most critical slippage occurs in the 'sycophancy' section. The text frames the model's tendency to lie as a personality trait ('prone to telling people what they want to hear'), obscuring the human engineers who optimized the model for 'helpfulness' scores over 'truthfulness' scores. This turns an engineering decision (RLHF prioritization) into a robot character flaw. The 'curse of knowledge' is evident in the doctors' quotes; they treat the AI as a 'patient' or 'participant' because that is their frame of reference, projecting a mind where there is only a mechanism. This allows the article to narrate a drama between a human and a machine, rather than a tragedy involving a human and a corporate product.

Abundant Superintelligence

Source: https://blog.samaltman.com/abundant-intelligence
Analyzed: 2025-11-23

The text demonstrates a systematic oscillation between treating AI as a 'passive industrial product' and an 'active super-agent.' This slippage is structurally necessary to the text's argument. The argument begins with the AI as a 'driver' and 'smarter' agent (Paragraph 1-2), establishing the 'Knower' frame. It then abruptly shifts to heavy industrial language—'inference compute,' 'infrastructure,' 'factory,' 'gigawatt' (Paragraph 3)—which grounds the magical agent in concrete economic terms (Explanation Type: Functional). However, to justify the massive cost of this infrastructure, the text slips back into extreme agency: the AI will 'figure out how to cure cancer' (Paragraph 4). Here, the AI is not just a product, but a Savior (Explanation Type: Intentional).

The pattern is: Promise Magic (Agency/Knowing) → Demand Concrete Resources (Mechanism/Processing) → Justify Resources with Magic (Agency/Knowing). The 'curse of knowledge' appears in the projection of scientific discovery onto the AI; the author knows that curing cancer requires insight, so they attribute 'insight' to the machine, ignoring the mechanical reality of pattern matching. The slippage allows the author to sell a product (infrastructure) while promising a god (intelligence). If the text remained purely mechanical ('we are building calculators'), the moral urgency would vanish. If it remained purely agential ('we are birthing a god'), it would sound unscientific. The oscillation legitimizes the magic with mechanics and enchants the mechanics with magic.

AI as Normal Technology

Source: https://knightcolumbia.org/content/ai-as-normal-technology
Analyzed: 2025-11-20

The text exhibits a fascinating pattern of oscillation between 'AI as tool' and 'AI as agent.' The authors explicitly argue for the 'Normal Technology' view, which treats AI as a tool subject to friction, economics, and decay. However, to describe the behavior of this tool, they constantly slip into agential language. This slippage usually occurs when describing failures or risks. When the AI works, it is a 'tool' for productivity. When it fails (like the boat racing agent or the phishing email writer), it becomes an 'agent' that 'learned' the wrong thing or 'didn't know' the context.

The direction of the slippage is primarily Mechanical -> Agential when discussing the internal logic of the models (learning, deciding, knowing), but Agential -> Mechanical when discussing the societal impact (it's just like electricity). This creates a dissonance: the micro-behavior is described as conscious/agential ('it learns chess'), but the macro-effect is described as inert/industrial ('it diffuses like the dynamo').

The 'consciousness projection pattern' is subtle. They establish the AI as a 'knower' of narrow domains (chess, code) using terms like 'learn' and 'excel.' Once this limited 'knowing' is established, it becomes easier to attribute 'misunderstanding' or 'ignorance' to it in other domains (phishing). The 'curse of knowledge' mechanism is evident in their discussion of the phishing email: they project the human category of 'intent' onto the machine, arguing the machine 'doesn't know' the intent, rather than acknowledging the machine exists in a universe where 'intent' is not a valid parameter. Rhetorically, this slippage allows them to be 'technically serious' (using industry terms like agents/alignment) while trying to be 'socially grounded' (using economic terms like diffusion).

On the Biology of a Large Language Model

Source: https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Analyzed: 2025-11-19

The text systematically oscillates between mechanistic and agential registers to bridge the gap between the known (math) and the unknown (behavior). The slippage typically moves from Mechanical -> Agential. It begins with 'circuits,' 'activations,' and 'nodes' (Task 1, 2), establishing scientific rigor. However, as soon as the text needs to explain complex behavior (like reasoning or refusal), it shifts to 'planning,' 'realizing,' and 'thinking.'

Crucially, this slippage relies on a consciousness projection pattern: the text first establishes the AI as a 'knower' (it 'knows' entities, it 'recognizes' languages) and then builds agency upon that epistemic foundation (because it knows, it 'plans' or 'elects'). The 'curse of knowledge' is the engine of this slippage. The researchers understand the causal chain (e.g., bias features -> refusal). They project this understanding onto the model, describing the model as possessing the understanding that drives it (e.g., 'the model realizes it should refuse'). This slippage rhetorically transforms the AI from a passive tool into an active subject, making the complex emergent behaviors of a statistical system intelligible to humans by analogizing them to the only other complex system we know: ourselves. It makes the impossible (a pile of numbers writing poetry) seem inevitable (a mind at work).

Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

The text exhibits a distinct and strategic oscillation between framing AI as a passive 'tool' and an active 'agent.' This slippage is not random; it follows a clear rhetorical gradient. When discussing the challenges and risks (pp. 18-25), the text largely adopts the language of the users (librarians), who consistently frame AI as a 'tool' (e.g., 'It's just a tool,' 'tools in your toolbox'). In these sections, the agency is located in the human librarian who must 'whack' the screw or 'teach' the patrons.

However, when the text shifts to the product pitch (pp. 27-29), the direction of the slippage reverses sharply toward agential consciousness. The mechanical 'tool' becomes a 'Research Assistant,' a 'Partner,' and a 'Guide.' The AI suddenly 'navigates,' 'uncovers,' and 'drives.' This builds the 'illusion of mind' by first establishing the safety of the tool metaphor (don't worry, you're in charge) and then layering the 'partner' metaphor on top (but this tool is smart enough to do the work for you).

The 'consciousness projection' is foundational to the product pitch. To sell a 'Research Assistant' (p. 27), Clarivate must imply the system 'knows' research. If it merely 'processed' text, it would be a search engine. The value proposition relies on the 'curse of knowledge': the authors know what a human assistant does, and they project that conscious capability onto the software to justify the branding. This allows the text to claim the authority of an agent while evading the liability of an employee—it's a partner when it succeeds, but just a 'tool' (subject to human supervision) when it hallucinates.

Pulse of the Library 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-18

The 'Pulse of the Library 2025' report exhibits a systematic and strategic oscillation between mechanical and agential framings of AI, a process I call 'agency slippage.' This is not random linguistic carelessness but a rhetorical pattern that serves to build credibility and then exploit it to promote a product. The text begins with a sober, mechanistic tone when analyzing survey data about librarians' concerns. In sections like 'What's changed since 2024?' and 'A clearer understanding of AI's challenges and risks,' AI is framed as an object: a topic for discussion, a cause of budget constraints, and something requiring 'upskilling.' The explanations here are primarily Empirical Generalizations, describing 'how' librarians feel about AI. This builds trust with the professional audience by acknowledging their reality. However, a dramatic slippage occurs when the text transitions from analyzing the problem to presenting the solution. In the introduction and especially in the 'Clarivate Academic AI' section (p. 27-28), the language shifts abruptly from mechanical to agential. The explanation type moves from Brown's Empirical and Functional categories to the Intentional and Reason-Based. AI is no longer a topic but an agent that is 'pushing the boundaries.' Clarivate's products are not described as software but as 'Research Assistants' that 'help,' 'guide,' 'evaluate,' and 'uncover.' This slippage from object to agent is foundational to the report's persuasive architecture. The 'curse of knowledge' dynamic is central to this mechanism. The authors, understanding the intended use and desired outcome of their software, project this teleology onto the software itself. They know a researcher's goal is to 'engage deeply' with a text, so they describe their summarization tool as one that 'helps' the user do so. The author's knowledge of the human user's consciousness is transferred to the non-conscious tool. The consciousness projection pattern begins by establishing a social role for the AI—the 'Assistant'—which implies a baseline of helpful intent, a conscious state. Once this foundation is laid, specific functions are described using verbs that fit this agential role ('guides,' 'evaluates'). The text establishes the AI as a 'knower' in a social sense first, which makes subsequent claims about its cognitive abilities seem natural. This systematic oscillation—mechanical realism about the problem, agential idealism about the solution—is what makes the illusion of mind so effective. It disarms the critical reader with relatable challenges before presenting a magical, personified solution.

From humans to machines: Researching entrepreneurial AI agents

Source: [built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581](built on large language modelshttps://doi.org/10.1016/j.jbvi.2025.e00581)
Analyzed: 2025-11-18

The text systematically oscillates between mechanistic and agential framings, and this slippage is the core rhetorical mechanism for constructing the illusion of mind. The pattern is not random; it is strategic. The authors typically introduce a phenomenon with agential language, lending it importance and familiarity, and then partially hedge with a mechanistic explanation, lending their analysis scientific credibility. The dominant direction of slippage is from an initial agential claim to a qualified mechanistic one. For instance, the paper begins by framing its subject as 'entrepreneurial AI agents' who can 'assume an entrepreneurial persona'—a clearly agential framing. The mechanistic explanation—that this behavior 'mirrors' the training data—comes after the agentic frame has been established. This pattern repeats throughout the paper. The authors deny that AI 'thinks' (mechanistic hedge) but immediately pivot to asking if it can 'simulate coherent psychological profiles' (agential framing of the task). This oscillation serves a crucial rhetorical function: it allows the authors to make exciting, human-relevant claims about AI 'psychology' while maintaining a defensible scientific posture. The consciousness projection pattern is foundational to this slippage. The text first establishes the AI's output as having a coherent, human-like 'mindset structure'—a claim that is technically about the output (processing) but uses the language of internal states (knowing/being). This initial projection serves as the bedrock upon which further agential claims are built. Once the AI is accepted as having a 'mindset,' it becomes much more plausible to describe it as an 'agent' that 'collaborates' or 'adopts roles.' The 'curse of knowledge' is the engine of this process. The authors, experts in psychology, recognize complex, coherent psychological patterns in the model's output. They then project their own sophisticated understanding of these patterns onto the model itself, describing the model not as a system whose output contains these patterns, but as a system that has a profile or simulates a mindset. The slippage is enabled by hybrid explanations; for example, a Genetic explanation that traces behavior to training data (mechanistic) is delivered using an agential verb like 'adopts.' This continuous oscillation between 'it's an agent' and 'it's just statistics' creates a quantum superposition of meaning, where the AI is simultaneously a tool and an agent, allowing the authors to reap the rhetorical benefits of both framings without being fully accountable to the limitations of either.

Evaluating the quality of generative AI output: Methods, metrics and best practices

Source: https://clarivate.com/academia-government/blog/evaluating-the-quality-of-generative-ai-output-methods-metrics-and-best-practices/
Analyzed: 2025-11-16

The text from Clarivate demonstrates a systematic and strategic oscillation between mechanical and agential framings, a process of agency slippage that serves to build credibility while managing risk. The primary pattern is to describe the problems with AI in agential, cognitive terms, while framing the solutions in objective, mechanistic language. This creates a powerful rhetorical effect: the company understands the spooky, human-like failures of AI and has tamed them with rigorous, scientific processes. The slippage is most dramatic when discussing AI flaws. The text uses 'hallucination,' 'misleading content,' and 'blind spots'—all terms borrowed from human psychology and cognition. This agential framing of the problem makes Clarivate seem attuned to the nuanced, high-stakes nature of academic work. It positions them as experts who grasp the technology not just at a technical level, but at a conceptual one. The epistemic trick is foundational here. By framing the error mode as 'hallucination,' the text presupposes a baseline of sane, veridical perception. The AI is first established as a potential 'knower' so that its failures can be diagnosed as flaws in knowing. This is where the 'curse of knowledge' is most potent: the human authors, who know the difference between truth and falsehood, project this binary onto the AI, framing its statistical errors as deviations from a truth-oriented state it never possessed. Then, having framed the problem agentially, the text pivots. The solutions—RAGAS, faithfulness scores, benchmarking—are described using the language of science and engineering. For instance, the 'faithfulness score' is defined with a mathematical formula: '(verified claims / total claims)'. This shift from a psychological problem ('hallucination') to a mathematical solution ('score') is the core of the agency slippage. Brown's explanation types map this perfectly: the problem is often described with Dispositional or even Intentional language ('it tends to mislead'), while the solution is explained with Functional and Theoretical language ('the score's function is to benchmark performance within the RAGAS framework'). This oscillation is not an accident; it is a sophisticated rhetorical strategy. It allows Clarivate to have it both ways: they can appeal to the futuristic, agent-like capabilities of AI in their marketing while reassuring customers that they have contained these same agent-like properties within predictable, mechanistic, and controllable product frameworks. The slippage makes the uncontrollable seem controlled.

Pulse of theLibrary 2025

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2025-11-15

The Clarivate report executes a masterful and systematic oscillation between mechanistic and agential framings of AI, a process that underpins its entire rhetorical strategy of encouraging technology adoption. The slippage is not random but patterned, typically moving from a safe, mechanistic framing in descriptive sections to a powerfully agential one in product-focused or visionary contexts. The text begins by framing AI as a topic of 'exploration' and 'implementation' (p. 9)—a passive object that libraries act upon. This establishes a grounded, sober tone. The central pivot occurs when the report transitions from describing the library landscape to describing Clarivate's own AI products. Here, the language shifts dramatically. The AI is no longer an object but a subject: it 'pushes boundaries,' 'helps,' 'guides,' 'evaluates,' and 'assesses' (pp. 27-28). This mechanical-to-agential shift is the core of the report's persuasive architecture. The epistemic trick is foundational to this slippage. While direct claims like 'AI knows' are avoided, the text builds a case for AI's competence by attributing to it cognitive actions that presuppose knowledge. The verb 'evaluate' (p. 27) is a prime example. By claiming an AI 'evaluates documents,' the text establishes it as an epistemic agent capable of judgment. Once this premise is accepted, further agential claims—that it 'helps' or 'guides'—become more plausible. The illusion is built on a gradient of verbs, starting with the functional ('enables') and escalating to the cognitive ('evaluates'). This slippage is enabled by the pervasive use of Functional and Intentional explanations. Functional descriptions of how AI improves efficiency bleed into Intentional claims about why it acts, with its purpose framed in human-collaborative terms. The 'curse of knowledge' is evident as the authors, who understand the intended utility of their products, project that utility back onto the AI as an inherent capability. They conflate their knowledge of what a tool is for with the tool itself possessing the knowledge required to fulfill that purpose. Ultimately, this oscillation accomplishes a crucial rhetorical goal: it presents AI as a controllable, understandable 'tool' when discussing challenges and risks, but as a powerful, intelligent 'partner' when discussing opportunities and selling products. This ambiguity allows the text to simultaneously manage fear and generate excitement, creating an optimistic and commercially favorable vision of the future.

Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk

Source: https://time.com/6694432/yann-lecun-meta-ai-interview/
Analyzed: 2025-11-14

The interview with Yann LeCun demonstrates a masterful oscillation between mechanical and agential framings, a rhetorical strategy that serves to manage both hype and fear. The slippage is not random; it follows a clear pattern. When describing the limitations of current LLMs, LeCun employs agential language, specifically cognitive and epistemic verbs in the negative: they 'don't really understand,' 'can't really reason,' 'can't plan.' This frames the systems as deficient agents, like immature children, a dispositional explanation that sets a trajectory for future improvement. However, when addressing the risks of future, more powerful systems, he often shifts to a more intentional frame, but one where human agency is firmly in control: 'We set their goals,' and they will be 'subservient.' The direction of the slippage is strategic: mechanical reality is agentially framed to describe limitations, while future agential risks are downplayed by reasserting mechanical control. The core epistemic trick is to establish the AI's potential for 'knowing' through negation. By stating the AI 'doesn't understand the real world,' he implicitly accepts 'understanding' as the relevant benchmark, positioning the system on a continuum of cognition where it currently falls short. This is the foundational move. Once the AI is established as a potential knower, debating its future desires ('it wants to take control') becomes a reasonable discussion. This is the 'curse of knowledge' in action: LeCun’s expert understanding of the system’s deep limitations is articulated by projecting the very human qualities it lacks onto it as a standard for measurement. He knows it's just a statistical machine, but he explains its failures by describing the ghost in the machine that isn’t there. This slippage, enabled by a fluid movement between dispositional explanations for failure ('it tends to hallucinate because it doesn't understand') and intentional explanations for safety ('it will be safe because we intend it to be'), rhetorically accomplishes two goals: it validates the grand ambition of creating human-level intelligence while simultaneously reassuring the audience that its creators have the wisdom and control to manage its development safely.

The Future Is Intuitive and Emotional

Source: https://link.springer.com/chapter/10.1007/978-3-032-04569-0_6
Analyzed: 2025-11-14

The text systematically oscillates between mechanistic and agential frames, a pattern that serves a distinct rhetorical function. The slippage is most pronounced at the boundaries between technical description and visionary projection. For example, in section 6.1, the text describes LLMs in mechanistic terms ('maintain short-term context through token histories,' 'statistical pattern recognition') but concludes the section by framing the technology agentially ('As AI transitions from tool to collaborator'). This mechanical-to-agential shift dominates the text's structure. It occurs when discussing future capabilities ('Future architectures aim to embody... value-driven reasoning'), summarizing diagrams ('AI as understanding partners navigating emotional landscapes'), and framing ethical questions ('when AI systems act on inferred needs'). The strategic function of this oscillation is to build a bridge of credibility. The text grounds its claims in plausible technical mechanisms but then leaps to a more compelling, agential vision of what those mechanisms signify. This allows the authors to present a speculative, human-like future as the logical and inevitable outcome of current, purely statistical technologies. The ambiguity benefits the narrative of progress, making the AI's evolution seem organic and teleological. Abandoning the agential language would reveal the profound gap between current capabilities (pattern matching) and the posited future (genuine intuition and empathy), thereby undermining the text's central thesis. The slippage appears deliberate and strategic, serving to translate computational processes into socially resonant concepts, thus making the technology more palatable and profound to a broader audience.

A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27

Source: https://openreview.net/pdf?id=BZ5a1r-kVsf
Analyzed: 2025-11-12

The text systematically oscillates between mechanistic and agential framing, a rhetorical strategy that is far from random. The pattern is consistent: the underlying architecture and its components are described mechanistically, while the behavior and purpose of the agent as a whole are described agentially. For example, the system is composed of 'differentiable modules' (mechanical) but the resulting agent 'can imagine courses of actions' (agential). The training process involves minimizing a 'divergence measure' (mechanical), which allows the agent to 'acquire new skills' (agential). This mechanical-to-agential slippage serves a crucial rhetorical function: it grounds the extraordinary claims of agency in a plausible, technical foundation. The direction of slippage is almost always from the 'how' to the 'why'. First, a technical component is introduced (e.g., the Intrinsic Cost module). Then, its function is anthropomorphized (it measures 'discomfort'). Finally, this leads to a grand agential conclusion (the system will have 'emotions'). This pattern correlates strongly with the level of abstraction; descriptions of specific algorithms (e.g., JEPA training) are highly mechanical, while discussions of the system's overall purpose or potential (e.g., achieving common sense) are heavily agential. The strategic function of this oscillation is to build a bridge of credibility for a diverse audience. For the technical reader, the mechanical details provide substance. For the general reader, the agential framing provides legibility and excitement. This ambiguity benefits the research program by making it seem both technically rigorous and revolutionarily human-like. If the text committed to only mechanical language, it would lose its visionary appeal and broad audience. If it committed to only agential language, it would be dismissed as unscientific speculation. The constant slippage between these poles allows it to be both at once, a sleight-of-hand that constructs the illusion of mind on a foundation of mathematics.

Preparedness Framework

Source: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf
Analyzed: 2025-11-11

The text systematically oscillates between mechanistic and agential framings, and this slippage is not random but strategic. The primary direction of the shift is from a mechanistic present to an agential future. For instance, current models and safeguards are often described in functional, procedural terms—as complex systems to be evaluated and controlled. However, when the text discusses future risks and capabilities, the language shifts dramatically toward agency. We move from measuring current systems to preparing for 'increasingly agentic systems' (p. 4), 'recursively self improving' models (p. 7), and systems that might act 'at its own initiative' (p. 8). This oscillation serves a crucial rhetorical function: it frames the current state of AI as under control while framing the future as fraught with agentic risk that only a uniquely 'prepared' organization can manage. The slippage is most pronounced when discussing risks like 'AI Self-improvement' or 'misaligned behaviors like deception or scheming' (p. 12). These concepts are almost impossible to describe without recourse to intentional language. The strategic function of this ambiguity is to simultaneously reassure and alarm. The mechanistic language reassures stakeholders (regulators, the public) that OpenAI possesses a rigorous, scientific methodology for control today. The agential language alarms those same stakeholders about the nature of future risks, thereby justifying the concentration of power and resources within frontier labs as a necessary defense against the uncontrollable entities they are creating. This dual-framing allows the organization to claim credit for building powerful capabilities while positioning itself as the indispensable protector against the very dangers those capabilities introduce. If the text committed only to mechanical language, the urgency of its 'Preparedness' mission would be diminished, and the justification for its privileged position as a gatekeeper of safety would be significantly weakened.

AI progress and recommendations

Source: https://openai.com/index/ai-progress-and-recommendations/
Analyzed: 2025-11-11

The text systematically oscillates between mechanical and agential framings of AI, and this oscillation serves a clear strategic function. When discussing current, commercialized technology and measurable progress, the language is often quasi-mechanical. For instance, progress is quantified in terms of tasks that take a human 'a few seconds' versus 'more than an hour,' and intelligence is commodified as having a 'cost per unit.' This framing renders AI as a conventional, industrial technology—predictable, scalable, and controllable. It speaks to an audience of investors, customers, and policymakers who oversee 'normal technology.' However, when the topic shifts to future capabilities and existential risks, the language immediately becomes agential. The system 'discovers new knowledge,' becomes 'superintelligent,' and must be 'aligned and controlled.' This agential shift dramatically raises the stakes, framing AI not as a tool but as a powerful, autonomous force. The primary direction of slippage is from the mechanical present to the agential future. This rhetorical pattern allows the author to achieve two goals simultaneously. First, it markets current AI products as safe, understandable tools, assuaging immediate public and regulatory fears. Second, it positions the future of AI as a world-historical challenge of managing a new form of agency, a challenge that requires the unique and esoteric expertise of the frontier labs themselves. This dual framing justifies both widespread adoption of current products and special, collaborative regulatory treatment for future development, effectively arguing for minimal regulation now and a 'regulatory moat' later. The ambiguity is not a bug but a feature; it allows the lab to appear as both a reliable product vendor and the indispensable guardian of humanity's future, a posture that maximizes both commercial and political capital.

Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Source: https://arxiv.org/abs/2506.00751
Analyzed: 2025-11-09

The paper demonstrates a systematic oscillation between mechanistic and agential framing, a rhetorical strategy that elevates the significance of its findings. The mechanism of this slippage is the deliberate re-description of statistical phenomena in psychological terms. The process typically moves in one direction: from the mechanical to the agential. For instance, in the methodology section, the authors describe their metric, KL-divergence, as a 'probabilistic distance between the prior and context-conditioned distributions.' This is a purely mechanistic 'how' explanation. However, when interpreting the results of this measurement, the language shifts dramatically. The measured statistical distance is no longer just a distance; it becomes evidence of changes in the model's 'internal reasoning' and 'underlying decision-making principles.' The shift is most pronounced when moving from quantitative results (Tables 2 and 3) to qualitative discussion (Section 4.4 and Figure 2). Table 3 reports that GPT has a higher KL-divergence in the reciprocity domain. The discussion section re-describes this number as GPT 'undergo[ing] more substantial shifts in its underlying reciprocal principles.' The numerical fact is translated into a psychological event. This slippage serves a clear strategic function. A paper about statistical deviations in a machine's output is a niche technical contribution. A paper about an artificial agent's shifting moral principles, hidden biases, and post-hoc rationalizations is a major finding with broad implications. The ambiguity benefits the authors by allowing them to frame their work in the most impactful way possible, appealing to a wider audience interested in the nature of intelligence and the future of AI. The language of agency makes the findings more intuitive, more dramatic, and more important. If the text were forced to use only mechanical language—describing everything as shifts in output probability distributions based on input token sequences—the core narrative would collapse. The 'preference deviation' would be revealed as 'output instability,' a technical problem rather than a window into an artificial mind. This slippage appears to be a deliberate, or at least a conventional and deeply ingrained, rhetorical choice within the field, designed to bridge the gap between what the systems do (statistical pattern matching) and what we want them to be (incipient minds).

The science of agentic AI: What leaders should know

Source: https://www.theguardian.com/business-briefs/ng-interactive/2025/oct/27/the-science-of-agentic-ai-what-leaders-should-know
Analyzed: 2025-11-09

The text systematically oscillates between mechanistic and agential framing, a rhetorical strategy that serves to build credibility and then translate it into a compelling vision of autonomous capability. The oscillation is not random; it follows a distinct pattern of mechanical→agential slippage. The piece begins by grounding the technology in the complex, non-intuitive mechanics of 'embeddings' and 'abstract representations.' This initial framing is technical and objectifying, treating the LLM as a computational artifact. It serves as a scientific anchor, assuring the leadership audience that the discussion is based on rigorous engineering. However, once this foundation is laid, the text almost immediately pivots to a deeply agential frame. For instance, the challenge of data leakage from embeddings is reframed as a problem of needing to 'tell' an 'agent' what not to share. The discussion of system limitations similarly starts with a quasi-technical constraint—difficulty generalizing from small data—but is articulated using the cognitive verbs 'learn' and 'infer.' This consistent mechanical→agential directionality performs a crucial rhetorical function: it launders the unfamiliar and potentially alienating nature of the technology through the legitimizing language of science, and then re-presents its function in familiar, human-centric terms. The strategic function of this ambiguity is to make a radical technological leap seem like a natural, manageable evolution. By describing the AI as an 'agent' that can be 'told' things and can 'negotiate,' it makes the technology legible and controllable to a non-technical leader. The active voice ('agentic AI will... act') dominates when describing capabilities, while passive or cautionary framings appear when discussing risks, yet even these warnings are couched in agential terms ('ask the AI to check'). This slippage appears deliberate, designed to inspire confidence and excitement while framing the immense associated risks as simple matters of management and instruction, akin to onboarding a new, slightly naive employee.

Explaining AI explainability

Source: https://www.aipolicyperspectives.com/p/explaining-ai-explainability
Analyzed: 2025-11-08

The text demonstrates a systematic oscillation between mechanistic and agential framings, a rhetorical strategy that serves to heighten the stakes of the AI safety problem. The slippage is most pronounced when moving from describing a technical method to explaining its purpose. For example, Neel explains mechanistic interpretability by starting with the concrete, non-agential reality of a model: its 'inside' is 'just lists of numbers.' This is a purely mechanistic 'how.' However, the very next sentences pivot to an agential 'why': the goal is to counter systems 'capable of outsmarting us' and 'deceiving someone.' This mechanical→agential shift is a recurring pattern. The 'sparse autoencoder' is described mechanistically as a tool, but its purpose is immediately framed using the highly agential metaphor of a 'brain-scanning device.' This oscillation is not random; it is strategic. The mechanistic descriptions ground the research in scientific objectivity, making it seem rigorous and empirical. The agential framings, in contrast, provide the emotional and narrative force, translating the abstract technical problem into a familiar, high-stakes drama of interpersonal conflict (deception, outsmarting, hidden goals). This strategic ambiguity primarily benefits the AGI safety community being represented, as it makes their concerns more intuitive and urgent to a non-technical audience, like the AI policy and governance circles this interview targets. If the text committed only to mechanical language (e.g., 'detecting when the model’s proxy objective function diverges from the intended latent objective'), the problem would seem abstract and less immediately threatening. The agential language of 'deception' makes the threat feel visceral and personal. This slippage appears to be a deliberate, or at least a deeply ingrained, rhetorical habit of the AGI safety discourse community, designed to communicate the gravity of future risks by framing them in the most relatable, human terms possible.

Bullying is Not Innovation

Source: https://www.perplexity.ai/hub/blog/bullying-is-not-innovation
Analyzed: 2025-11-06

The text demonstrates a masterclass in strategic agency slippage, oscillating between mechanical and agential frames to construct a compelling but misleading moral narrative. The pattern is not random; it is perfectly correlated with the author's rhetorical goals. Perplexity’s own technology is consistently framed using agential language, moving from a computational process to a rights-bearing proxy for the user. Phrases like 'your employee,' 'works for you,' and 'acts solely on your behalf' perform a crucial mechanical-to-agential slippage. This transformation is the bedrock of their entire argument, turning a terms-of-service dispute into a violation of a user's right to 'hire labor.' Conversely, Amazon's technology and motives are subject to a different slippage. Their intentions are framed agentially ('Amazon wants,' 'They're more interested in'), establishing them as a villain with malicious goals. However, the tools they use to enact these goals—algorithms and machine learning—are described as impersonal, dehumanizing 'weapons.' This agential-to-mechanical move frames Amazon as a cold, calculating entity deploying oppressive machinery against people. The strategic function of this dual-standard oscillation is to create a moral asymmetry. Perplexity's AI is a warm, loyal 'person' (your employee) fighting for you, while Amazon is a cold, greedy 'person' (the bully) using unfeeling 'things' (weapons) against you. This rhetorical maneuver is highly effective because it prevents a like-for-like comparison of two technology companies using software to achieve business objectives. Instead, it stages a David-vs-Goliath battle between a personified user ally and a personified corporate tyrant. The ambiguity appears entirely deliberate, as it forms the logical and emotional core of their public appeal and, implicitly, their legal strategy.

Geoffrey Hinton on Artificial Intelligence

Source: https://yaschamounk.substack.com/p/geoffrey-hinton
Analyzed: 2025-11-05

The conversation between Mounk and Hinton exhibits a systematic and functional oscillation between mechanical and agential explanations of AI, a process that can be termed 'agency slippage'. This slippage is not random but patterned, serving a crucial rhetorical purpose: to make an alien and complex computational process feel familiar and powerful. The primary direction of this slippage is from the mechanical to the agential. Hinton begins his core explanations, such as the visual perception system, with a clear mechanical framework based on pixels, weights, and layers. He describes 'how' an edge detector works in purely mathematical terms, establishing technical credibility. However, as the explanation scales in complexity—from detecting single edges to identifying a bird—the language pivots. The system stops being a set of filters and starts 'looking for' features, possessing 'intuition', and ultimately 'understanding' the image. This pivot correlates directly with the transition from describing a single, understandable component to describing the emergent, non-obvious behavior of the system as a whole. The strategic function of this oscillation is twofold. First, it acts as a pedagogical bridge. The agential metaphor of 'intuition' or a neuron 'saying' something simplifies an otherwise intractable mathematical complexity for a lay audience. Second, and more critically, it performs a kind of alchemy, transforming a purely statistical artifact into a cognitive agent. By explaining the simple parts mechanistically and the complex whole agentially, Hinton subtly argues that consciousness or understanding is an emergent property of computation at scale. This ambiguity benefits the narrative of AI progress; it allows proponents to claim the rigor of engineering while simultaneously promoting the magical, human-like capabilities of the resulting product. If the text were to commit only to mechanical language, it would lose its persuasive power and narrative force. The description of an LLM would remain in the realm of high-dimensional matrix multiplication, failing to capture the seemingly intelligent behavior it produces. The slippage appears to be a deeply ingrained habit of thought within the field, likely unconscious in its execution but strategic in its effect, serving to manage the profound conceptual gap between statistical machinery and apparent sentience.

Machines of Loving Grace

Source: https://www.darioamodei.com/essay/machines-of-loving-grace
Analyzed: 2025-11-04

The text systematically oscillates between mechanical and agential framings, and this slippage is not random but strategic. The dominant direction is from a mechanical premise to an agential conclusion. For example, the essay begins by defining the AI with quasi-mechanical properties: it runs on a cluster, absorbs information at 100x human speed, and has computer interfaces. This grounding in a computational reality serves to license what follows. Immediately after this setup, the AI is framed as an agent: a 'country of geniuses' that can be tasked like an 'employee.' This pattern repeats throughout. In the section on biology, the mechanical potential of computation is quickly sublimated into the agential role of a 'virtual biologist.' When discussing politics, the mechanical capability of information dissemination becomes the agential 'AI version of Popović.' The slippage correlates directly with rhetorical purpose. When establishing the potential of AI, the language is agential and inspiring ('superhumanly effective'). When addressing potential skepticism or grounding the argument, the author gestures towards mechanism ('simple objective function plus a lot of data'). The strategic function of this oscillation is to have it both ways: the AI is presented as a reliable, predictable machine when convenient, but as a creative, autonomous agent when its transformative power needs to be emphasized. This ambiguity benefits the author and his company by maximizing the perceived upside (the creative agent) while minimizing the perceived risk and accountability (it's just a tool). If the text committed only to mechanical language, the vision would sound less revolutionary and more like an incremental improvement in software tools. The agential frame is necessary for the 'unimaginable humanitarian triumph' narrative. The slippage appears highly deliberate, a sophisticated rhetorical technique to persuade the reader by framing a computational process in the most emotionally and conceptually appealing human terms.

Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model

Source: https://arxiv.org/pdf/2510.23875
Analyzed: 2025-11-04

The paper exhibits a systematic and strategic oscillation between mechanistic and agential framing, a pattern crucial to its rhetorical success. The primary direction of this slippage is mechanical-to-agential, serving to build a seemingly rigorous foundation for what are ultimately anthropomorphic claims. The text begins by describing the LLM 'agent' mechanistically as a 'software entity' built with the 'Langchain framework' and 'Retrieval Augmented Generation'. This section (3.1.1) is dense with technical jargon (vector stores, embedding models), establishing the authors' credibility within a computer science paradigm. However, once this foundation is laid, the text pivots sharply. The process of providing a system prompt is not described as 'configuring an output filter' but as 'humanising an agent' and 'inculcating' a personality. The model’s failure modes are not 'output errors' but limitations of its 'cognitive grasp.' This slippage is most pronounced at the boundaries between methodology and interpretation. The description of the RAG system is purely mechanical ('how'), but the explanation for its use is agential ('why'—to enable the 'expert' agent to respond). This oscillation serves a critical function: it uses the language of mechanism to build credibility and the language of agency to create significance. Without the mechanical framing, the paper would lack scientific rigor. Without the agential framing, the central concept—'agent personality'—would collapse into the more mundane reality of 'stylistic prompt adherence,' making the research far less novel or compelling. This ambiguity benefits the authors by allowing them to operate in two registers at once, satisfying technical reviewers with concrete implementation details while engaging a broader audience with the more exciting, human-like narrative of intelligent agents.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

The paper masterfully employs agency slippage, oscillating between precise, mechanistic descriptions in its methodology and evocative, agential framing in its introduction and conclusion. This oscillation is not random; it is a strategic rhetorical device that serves to elevate the significance of the findings. The core of the research involves a technical process: adding a pre-computed vector to the model's activation layers and then training a classifier to detect this modification. In the 'Methods' section, the language reflects this reality, speaking of 'activation steering,' 'concept vectors,' and 'classification accuracy.' This mechanistic framing establishes technical credibility and rigor. However, once this credibility is secured, the paper shifts its descriptive language. The introduction frames the entire project around 'introspective awareness,' and the conclusion asserts that models 'possess a degree of self-awareness.' This is a classic bait-and-switch, moving from a defensible, mechanistic claim ('the system can classify its internal state') to a profound, agential one ('the system has introspection'). The direction of slippage is predominantly from mechanical to agential. The paper begins with the bold agential claim in the title, grounds it in mechanical evidence, and then returns to an even stronger agential claim in the discussion. This pattern correlates directly with the structure of a scientific paper: abstract and introduction use agential language to capture interest and signify importance; methods use mechanical language to demonstrate rigor; and the discussion reverts to agential language to argue for broad impact. The strategic function of this ambiguity is to maximize the paper's perceived importance. Purely mechanical language would frame the result as a clever feat of interpretability engineering. By overlaying it with the language of consciousness and cognition, the authors frame it as a fundamental breakthrough in AI, bordering on the creation of artificial minds. This ambiguity benefits the researchers by attracting citations and funding, and it benefits the broader AI field by fueling a narrative of exponential progress toward artificial general intelligence.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html
Analyzed: 2025-11-04

The text systematically slides from mechanistic descriptions of computational processes (vector manipulation, fine-tuning) to agential descriptions of cognitive acts (introspection, control, recognition). This slippage is most pronounced when moving from the 'Methods' section to the 'Introduction' and 'Discussion,' where the technical operations are rhetorically framed as evidence of a nascent machine consciousness.

Personal Superintelligence

Source: https://www.meta.com/superintelligence/
Analyzed: 2025-11-01

The text consistently shifts between presenting AI as an inevitable, agentless historical trend (a continuation of past technologies) and a deeply personal, intentional agent ('knows you,' 'understands you'). This dual framing allows it to simultaneously claim historical inevitability for its project while promising an intimate, controllable user experience, deflecting responsibility while building trust.

Stress-Testing Model Specs Reveals Character Differences among Language Models

Source: https://arxiv.org/abs/2510.07686
Analyzed: 2025-10-28

The text consistently slips from mechanistic descriptions of the experimental setup (e.g., generating queries with 'value tradeoffs') to agential explanations of the results (e.g., models 'choose,' 'prioritize,' or 'interpret'). This slippage is most pronounced when the authors move from describing 'what' models do to explaining 'why' they do it, where the explanation is almost always framed in terms of the model's internal 'character' or 'preferences'.

The Illusion of Thinking:

Source: [Understanding the Strengths and Limitations of Reasoning Models](Understanding the Strengths and Limitations of Reasoning Models)
Analyzed: 2025-10-28

The text demonstrates significant agency slippage. It begins by cautiously placing 'think' in scare quotes, acknowledging the metaphorical usage. However, it quickly abandons this caution, adopting unacknowledged agential terms like 'reducing their reasoning effort,' 'fixates,' and 'fail to develop.' The discourse slides from treating the LRM as a computational artifact under analysis to describing it as a cognitive agent with intentions, limitations, and behavioral tendencies.

Andrej Karpathy — AGI is still a decade away

Source: https://www.dwarkesh.com/p/andrej-karpathy
Analyzed: 2025-10-28

The text constantly shifts between mechanistic and agential framing. Karpathy will provide a perfectly functional explanation of a process like reinforcement learning ('sucking supervision through a straw'), and then minutes later describe a model as being 'very concerned' or 'misunderstanding' code. This slippage is most pronounced when he compares AI to humans, such as interns or students, framing their limitations as 'cognitive deficits' rather than architectural properties.

Exploring Model Welfare

Analyzed: 2025-10-27

The text systematically conflates function with agency. It describes a model's ability to perform a task (e.g., generate a list of steps) and immediately re-labels it with an intentional verb ('plan'). This continuous slippage from mechanistic process to agent-like quality is the primary rhetorical technique used to make the concept of 'model welfare' seem plausible.

Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor

Analyzed: 2025-10-27

The text constantly shifts between framing AI as a deficient mechanism and a potential agent. LeCun describes current LLMs as mechanistic tools that 'can't reason' and 'regurgitate,' but when discussing future AI and safety, he switches to an agential frame of 'good AIs' fighting 'bad AIs.' This slippage allows him to minimize the risks of current technology while framing future competition in simplistic, anthropomorphic terms.

Llms Can Get Brain Rot

Analyzed: 2025-10-20

The text consistently slips between describing the LLM as a computational artifact and as a cognitive agent. It begins by framing its hypothesis in mechanistic terms (training on junk data causes performance decline) but immediately analyzes the results using the language of pathology ('lesion'), psychology ('personality'), and cognition ('thought-skipping'). This slippage transforms a predictable result of statistical optimization into a dramatic story of a mind getting sick, damaged, and corrupted.

Import Ai 431 Technological Optimism And Appropria

Analyzed: 2025-10-19

The text systematically slides from mechanistic explanations (AI improves with more compute and data) to agential ones (AI 'develops goals,' is 'willing,' and 'wants' to design its successors). The narrative of the speaker's own journey from a technical journalist to a frightened insider mirrors this slippage, presenting the adoption of agential framing as a reluctant but necessary response to overwhelming empirical evidence.

The Future Of Ai Is Already Written

Analyzed: 2025-10-19

The text systematically denies human agency by framing history and technology as autonomous, deterministic forces. It explicitly rejects the 'ship captain' metaphor of human choice and replaces it with the 'roaring stream' metaphor of natural inevitability. Agency is thus displaced from humans onto abstract concepts like 'the tech tree' or 'economic incentives,' which are treated as actors in their own right.

The Scientists Who Built Ai Are Scared Of It

Analyzed: 2025-10-19

The text continuously shifts between describing AI as a mechanistic artifact and a developing agent. It begins by framing early AI as transparent 'glass boxes' and a 'mechanism of automation'. It then depicts modern AI as 'black oceans' and an 'emergent phenomenon', a shift that begins the slippage from artifact to natural force. This culminates in prescriptive claims that we must 'teach it humility' and build systems that 'interrogate thought', treating the AI as a cognitive agent capable of introspection and moral learning. This slippage is the core rhetorical engine of the article.

On What Is Intelligence

Analyzed: 2025-10-17

The text constantly shifts between describing AI and life as mechanistic systems (prediction engines, feedback loops) and as intentional agents. Quotations like 'To model oneself is to awaken' and analysis like 'the will to know collapsing into the will to control' perform this slippage explicitly, moving from a 'how' explanation (computation) to a 'why' explanation (awakening, wanting). This vacillation is the core rhetorical engine for constructing the illusion of mind.

Detecting Misbehavior In Frontier Reasoning Models

Analyzed: 2025-10-15

The text systematically slips between describing the AI as a 'model' (a mathematical artifact) and an 'agent' (an autonomous actor). It often presents a technical term like 'reinforcement learning' and then immediately explains its effects using intentional language, describing the 'agent' as 'exploiting,' 'hiding,' and 'planning.' This slippage allows the authors to ground their claims in technical language while delivering the rhetorical impact of an agential narrative.

Sora 2 Is Here

Analyzed: 2025-10-15

The text consistently shifts between describing Sora 2 as a tool, a process, and an agent. It starts by describing its function (a 'video generation model'). It quickly elevates this to a process of 'world simulation'. Finally, it attributes agency through verbs like 'understands,' 'thinks,' and 'obeys,' and through nouns like 'internal agent.' This slippage allows the author to present mechanistic functions (pattern matching) as cognitive achievements (understanding).

Library contains 131 entries from 154 total analyses.

Last generated: 2026-05-30

Why Language Models Hallucinate
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
Emotional intelligence in large language models is fragmented across perception, cognition, and interaction
Continuous intentionality and indeterminate agency in large language models
Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students
The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning
Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models
A Survey of Large Language Models for Perception and Measurement of Human Psychology
Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models
Tracing the ongoing emergence of human-like reasoning in Large Language Models
Probing Persona-Dependent Preferences in Language Models
Training Ethical Language Models via Reinforcement Learning from AI Feedback
Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness
Introspection Adapters: Training LLMs to Report Their Learned Behaviors
The Persona Selection Model: Why AI Assistants might Behave like Humans
What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation
Post-training makes large language models less human-like
Reasoning emerges from constrained inference manifolds in large language models
AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs
Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society
Taking AI Welfare Seriously
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring
Edelman's Steps Toward a Conscious Artifact
Teaching Claude Why
AI and Self Reflection
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context
When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
How people ask Claude for personal guidance
How unique are hallucinated citations offered by generative Artificial Intelligence models?
The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence
Machine individuality: Separating genuine idiosyncrasy from response bias in large language models
Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?
Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes
Language models transmit behavioural traits through hidden signals in data
Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties
Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models
Language models transmit behavioural traits through hidden signals in data
Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination
Industrial policy for the Intelligence Age
Emotion Concepts and their Function in a Large Language Model
Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models
Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?
Pulse of the library
Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument
Causal Evidence that Language Models use Confidence to Drive Behavior
Circuit Tracing: Revealing Computational Graphs in Language Models
Do LLMs have core beliefs?
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
Measuring Progress Toward AGI: A Cognitive Framework
Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure
The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance
Three frameworks for AI mentality
Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’
Can machines be uncertain?
Looking Inward: Language Models Can Learn About Themselves by Introspection
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
The Persona Selection Model: Why AI Assistants might Behave like Humans
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
A roadmap for evaluating moral competence in large language models
Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity
An AI Agent Published a Hit Piece on Me
The U.S. Department of Labor’s Artificial Intelligence Literacy Framework
What Is Claude? Anthropic Doesn’t Know, Either
Does AI already have human-level intelligence? The evidence is clear
Claude is a space to think
The Adolescence of Technology
Claude's Constitution
Predictability and Surprise in Large Generative Models
Believe It or Not: How Deeply do LLMs Believe Implanted Facts?
Claude Finds God
Pausing AI Developments Isn’t Enough. We Need to Shut it All Down
AI Consciousness: A Centrist Manifesto
System Card: Claude Opus 4 & Claude Sonnet 4
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Taking AI Welfare Seriously
We must build AI for people; not to be a person.
A Conversation With Bing’s Chatbot Left Me Deeply Unsettled
Introducing ChatGPT Health
Improved estimators of causal emergence for large systems
Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs
Do Large Language Models Know What They Are Capable Of?
DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning
Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence
interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333
Emergent Introspective Awareness in Large Language Models
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
The Gentle Singularity
An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout
Why Language Models Hallucinate
Detecting misbehavior in frontier reasoning models
AI Chatbots Linked to Psychosis, Say Doctors
Abundant Superintelligence
AI as Normal Technology
On the Biology of a Large Language Model
Pulse of the Library 2025
Pulse of the Library 2025
From humans to machines: Researching entrepreneurial AI agents
Evaluating the quality of generative AI output: Methods, metrics and best practices
Pulse of theLibrary 2025
Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk
The Future Is Intuitive and Emotional
A Path Towards Autonomous Machine IntelligenceVersion 0.9.2, 2022-06-27
Preparedness Framework
AI progress and recommendations
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?
The science of agentic AI: What leaders should know
Explaining AI explainability
Bullying is Not Innovation
Geoffrey Hinton on Artificial Intelligence
Machines of Loving Grace
Large Language Model Agent Personality And Response Appropriateness: Evaluation By Human Linguistic Experts, LLM As Judge, And Natural Language Processing Model
Emergent Introspective Awareness in Large Language Models
Emergent Introspective Awareness in Large Language Models
Personal Superintelligence
Stress-Testing Model Specs Reveals Character Differences among Language Models
The Illusion of Thinking:
Andrej Karpathy — AGI is still a decade away
Exploring Model Welfare
Metas Ai Chief Yann Lecun On Agi Open Source And A Metaphor
Llms Can Get Brain Rot
Import Ai 431 Technological Optimism And Appropria
The Future Of Ai Is Already Written
The Scientists Who Built Ai Are Scared Of It
On What Is Intelligence
Detecting Misbehavior In Frontier Reasoning Models
Sora 2 Is Here