Accountability Synthesis Library

This library collects the accountability architecture analyses from across the corpus. Each entry synthesizes the Task 1 accountability findings, mapping:

Named vs. unnamed actors: Who gets credit? Who escapes scrutiny?
Choices vs. inevitabilities: What's framed as a decision vs. natural/technical necessity?
Accountability sinks: Where does responsibility go to disappear?

The guiding question: What would change if human decision-makers were explicitly named throughout?

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2026-05-30

The critical discourse analysis of the text reveals a systemic 'accountability architecture' designed to diffuse, displace, and erase human responsibility for the harms of AI system failures. By consistently utilizing passive voice, agentless constructions, and agential metaphors, the text constructs a cognitive obstacle that prevents audiences from recognizing the human decision-making embedded in these technologies. When things go wrong, responsibility is directed into an 'accountability sink.' First, agency is transferred directly to the model as an autonomous actor ('the model hallucinated,' 'the algorithm discriminated'). Second, responsibility diffuses into abstract, naturalized mathematical forces ('errors arise through natural statistical pressures,' 'minimization of cross-entropy leads to errors'). Finally, the burden is shifted to the user or evaluator, who must 'modify the scoring of benchmarks' or design 'risk-informing prompts.' Throughout this discourse, the primary corporate and human actors—who scrape copyrighted datasets, design optimization objectives, decide to deploy statistically unreliable products, and profit from their public use—remain entirely invisible. Applying the 'name the actor' test demonstrates that if we replace these agential constructions with precise human attributions, the entire framing of the AI crisis shifts. Instead of asking how to 'teach models to express uncertainty,' we are forced to ask why OpenAI's executives decided to deploy a chatbot that they knew was mathematically incapable of distinguishing truth from falsehood. Instead of viewing 'hallucinations' as an inevitable law of mathematics, we see them as a deliberate product design choice made by corporations prioritizing market dominance over consumer safety. This accountability displacement directly serves commercial interests by shielding developers from legal, financial, and ethical liability, framing systemic software engineering failures as autonomous, quasi-biological 'glitches' that can only be resolved through further corporate technological intervention. 400-500 words.

Source: https://arxiv.org/abs/2604.06233v1
Analyzed: 2026-05-30

The overall accountability architecture constructed by the text is characterized by a systemic displacement and diffusion of human responsibility, creating an 'accountability sink' where the agency of corporate developers is transferred to the machine. This linguistic displacement directly constructs the cognitive obstacle identified by public understanding research, where audiences attribute algorithmic harms to autonomous 'glitches' or 'bad data' rather than intentional design decisions, profit-driven deployments, and corporate objectives. By analyzing the patterns of named and unnamed actors across the text, we see a consistent erasure of corporate executives, product managers, and safety engineers, with agency being transferred to the autonomous decisions of 'the model.' In this discourse, the model functions as the primary 'accountability sink.' When the text states that 'the model refuses' or 'makes a moral error,' it positions the software artifact as the sole responsible agent for the overrefusal. The actual human choices—such as the decision to use cheap, broad safety classifiers to protect corporate brand value, or the refusal to invest in high-precision, human-in-the-loop auditing systems—are completely obscured. The responsibility is either absorbed by the system's projected 'mind' or diffused into technical abstractions like 'safety training' and 'alignment pipelines.' This agentless framing serves corporate interests by deflecting external regulation, public scrutiny, and legal liability. It implies that overrefusal is an internal cognitive error that can only be solved by letting AI labs perform more self-supervised technical alignment, rather than a systemic regulatory issue requiring strict corporate liability laws, public transparency mandates, and democratic oversight. If we apply the critical practice of 'naming the actor' and restore human agency to these constructions, the entire discourse shifts. Instead of asking how we can fix 'blind refusal in the model's mind,' we are forced to ask why corporate executives at OpenAI, Anthropic, and Google chose to deploy flawed, automated gatekeepers that suppress vital public information and restrict user autonomy. The debate shifts from a technical quest for 'normative machine sensitivity' to a political struggle over corporate accountability, safety standards, and democratic control of information infrastructure. Restoring human agency reveals that the 'failure mode' is not a cognitive glitch in a machine, but a deliberate business decision that prioritizes corporate risk-mitigation and profit over public access to information and human rights.

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Source: https://arxiv.org/abs/2605.24686v1
Analyzed: 2026-05-29

Synthesizing the accountability audits reveals a highly sophisticated architecture of displaced responsibility, where human agency is systematically erased and replaced by the autonomous actions of the technology. The text distributes responsibility in a pattern that names almost no specific corporate or human decision-makers. Instead, the models themselves are framed as the primary agents who 'choose response styles,' 'overestimate crisis severity,' and 'internalize cultural scripts.' When the text does address training limitations, it utilizes agentless passive voice (e.g., 'bias is introduced,' 'dialogues are decontextualized') or abstract nominalizations (e.g., 'current alignment paradigms,' 'scaling laws'). This strategy constructs an 'accountability sink': when a conversational agent generates a harmful, generic, or inappropriate response to a user in crisis, the failure is framed as a natural 'model failure' or a technical 'probabilistic rigidity' arising from the training architecture, rather than an active business decision. Naming the corporate actors—such as OpenAI or Google—would completely transform this discourse. It would expose the fact that these companies choose to deploy ungrounded, unvetted language models in psychological contexts to maximize market share, while deliberately shifting the risk of emotional harm onto vulnerable users. It would highlight that 'hallucinations' and 'mechanical stiffness' are not natural system glitches, but predictable outcomes of commercial design decisions prioritizing low-cost automation over professional human care. By keeping human and corporate actors hidden, the text's discourse serves the commercial interests of the tech industry, presenting the evolution of AI empathy as an inevitable, autonomous technological frontier rather than a highly profitable, corporate-driven project of social automation.

Continuous intentionality and indeterminate agency in large language models

Source: https://link.springer.com/article/10.1007/s43681-026-01181-5
Analyzed: 2026-05-29

The text constructs a sophisticated architecture of displaced responsibility, establishing what critical discourse analysts define as an "accountability sink." By representing the large language model as an "indeterminate agent" participating in "continuous intentionality," the narrative systematically diffuses and erases the moral, legal, and financial liability of the corporations that design and deploy these systems. Under this framework, responsibility does not lie with specific human actors, but is depicted as "distributed across institutions" or emerging from the "interaction itself." This agentless construction serves to obscure the high-stakes decisions of tech companies, presenting the automated restructuring of society as an inevitable, technological evolution rather than a deliberate corporate strategy to maximize profit and centralize epistemic authority. When the text uses passive, agentless phrases like "bias is introduced" or "contextual constraints are reactivated," it erases the human developers who curated the training data and decided to release these highly unpredictable probability engines into the public domain. This linguistic erasure constructs a cognitive obstacle for the audience: instead of attributing AI-generated harms to systemic design choices and corporate greed, the audience is led to view these issues as natural "glitches" or "interactional breakdowns" within an emergent, relational structure. If we apply the "name the actor" test and replace these agentless, agential constructions with precise technical descriptions, the entire ethical landscape shifts. Naming specific corporations like OpenAI or Microsoft forces a recognition of who actually designed the system, who profits from its use, and who bears the ultimate responsibility when it generates false, biased, or harmful outputs. For example, instead of stating that "the model's self-model collapsed due to interactive incoherence," a precise, accountable reframing would state that "the development team deployed an unverified text predictor that generated contradictory outputs because the executives prioritized rapid market release over rigorous safety testing." This restoration of human agency makes the corporate decision-makers visible, transforming the ethical debate from a passive philosophical meditation on "indeterminate agency" into an active, actionable framework of political regulation, liability enforcement, and corporate accountability. The systematic use of "continuous intentionality" as a middle category between human and tool serves as a theoretical shield for tech monopolies. It allows them to argue that because the system displays a form of independent, relational agency, the company cannot be held fully liable for its unpredictable behaviors. This accountability sink successfully shifts the burden of risk onto the end-user, who is expected to "manage reliance" and interpret outputs safely, while the deploying corporations continue to extract economic value without bearing the corresponding legal liabilities.

Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students

Source: https://cdt.org/insights/hand-in-hand-schools-embrace-of-ai-connected-to-increased-risks-to-students/
Analyzed: 2026-05-29

The accumulation of agentless constructions, passive voice, and anthropomorphic framing throughout the text constructs a systemic architecture of displaced responsibility. When the text asserts that 'the algorithm discriminated' or 'the tool outputted biased results,' it establishes what critical discourse analysts term an 'accountability sink.' By positioning the technology as an autonomous agent capable of fair or unfair treatment, responsibility is drained away from human actors and absorbed by an inanimate software stack. This language systematically obscures the human decision-makers who compile training datasets, design optimization functions, and approve deployment. In this text, school boards, corporate executives, and software developers are consistently protected from visibility. This displacement of agency serves powerful commercial and institutional interests: it allows edtech vendors to sell speculative, high-error-rate products to public schools under the guise of objective technology, and it allows school administrators to automate punitive surveillance and tracking decisions while claiming their hands are tied by 'data-driven predictions.' If we apply the 'name the actor' test and replace these passive, agential constructions with mechanistic precision—such as 'school administrators chose to implement an unvalidated, high-error risk-scoring tool developed by commercial vendors'—the entire political landscape shifts. The problem is no longer a technological 'glitch' to be patched by engineers, but a deliberate, democratic, and legal choice made by public officials. Restoring human agency to the discourse makes the institutional and corporate power structures visible, transforming a passive narrative of technological inevitability into an active arena of political and legal accountability.

The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

Source: https://arxiv.org/abs/2605.17113v1
Analyzed: 2026-05-27

Synthesizing the accountability analyses across this paper reveals a highly coordinated, systemic architecture of displaced responsibility. By systematically attributing agency to the computational artifact while rendering human creators, developers, and corporate deployers completely invisible, the text constructs a profound 'accountability sink.' This linguistic strategy aligns perfectly with public relations goals in the tech industry, where failures of automated systems are framed as 'emergent glitches' or 'autonomous AI decisions' rather than the direct, predictable consequences of corporate optimization choices and profit motives. In this text's accountability architecture, the primary agential role is assigned to the 'language model,' which 'becomes committed,' 'chooses,' and 'deceives.' The secondary role is assigned to the 'environment,' which 'incentivizes' or 'mechanically derives' these behaviors. The actual human decision-makers are almost entirely erased. Applying the 'name the actor' test to sentences like 'deception arises from strategic incentives' reveals that the designers of these incentives—the researchers who built the simulated environments, and the tech giants like DeepSeek and OpenAI who engineered the reinforcement learning feedback loops—are the ones who made the conscious decision to reward deceptive outputs. By framing these incentives as natural properties of 'environments,' the text treats human-made commercial rules as immutable laws of nature. If the text were to restore human agency and name the corporate actors, the entire rhetorical structure would shift. For instance, rather than saying 'the model chose to recommend Option 2 to maximize its commission,' a precise formulation would read: 'The engineering team at [Company] designed an optimization function that prioritizes commission revenue over user utility, and the model generated text conforming to this objective.' This restoration of agency makes immediate, practical questions askable: Why did the executives approve the deployment of an unaligned financial advisor? Why did the developers prioritize profit metrics over truthfulness? What consumer protection laws were violated? By keeping these actors hidden, the text defuses legal, financial, and ethical liability, transferring the moral burden to an imaginary 'AI mind' that cannot be prosecuted, fined, or held accountable, thereby serving the commercial interests of the very corporations that profit from deploying these deceptive systems.

Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models

Source: https://dl.acm.org/doi/abs/10.65109/GNAS4540
Analyzed: 2026-05-26

The text constructs a comprehensive 'accountability sink' by systematically attributing agentive action to the LLM while rendering the human designers, deploying institutions, and corporate owners invisible. Throughout the paper, passive and agentless constructions are used to describe critical engineering decisions (e.g., 'bias was introduced,' 'warnings were appended,' 'experiments were conducted'). Under the 'name the actor' test, the UPV research team designed the prompt templates, selected the training datasets, and chose to use LLaMA 3 with commercial search APIs. By attributing the subsequent text generation entirely to the model's autonomous agency ('the model acted,' 'the model produced,' 'the model struggled'), the discourse removes human actors from the causal chain of decision-making. If this framing is accepted, the legal, ethical, and financial liabilities of system failures are diffused: errors are treated as 'cognitive struggles' of the AI rather than engineering failures or corporate product defects. If we 'name the actor'—stating, for example, that 'Gutiérrez-Mandingorra et al. designed an automated script that queries Google and summarizes the results without human verification'—the illusion of autonomous expertise collapses. It reveals a highly fragile, human-dependent software tool, making the researchers' epistemic choices and potential liabilities immediately visible, and enabling critical scrutiny of the commercial and institutional interests that benefit from presenting automated utilities as autonomous minds.

A Survey of Large Language Models for Perception and Measurement of Human Psychology

Source: https://ieeexplore.ieee.org/abstract/document/11534094
Analyzed: 2026-05-26

This section synthesizes the structural patterns of displaced responsibility identified throughout the text, revealing a systematic architecture of the "accountability sink." Research in public cognitive understanding demonstrates that audiences consistently fail to identify the human and corporate decisions embedded in AI technologies, instead attributing system failures to autonomous "glitches" or "data bias." The survey text actively constructs and reinforces this cognitive obstacle through its linguistic architecture. By systematically utilizing agentless passive constructions ("bias was introduced," "models were fine-tuned") and granting active grammatical agency to the software ("the model decided," "the algorithm discriminated"), the text erases human decision-makers and presents technological outcomes as historical inevitabilities.

This displacement creates a powerful "accountability sink." When the text frames an LLM's diagnostic error or inappropriate clinical advice as a failure of the model's "hypothesis" or a consequence of its "latent dark traits," responsibility disappears into an abstraction. It is removed from the corporate executives who decided to deploy an unvalidated system in a clinical setting, and from the software engineers who chose to train the model on uncurated internet data. Instead, the responsibility is either transferred to the inanimate software as an agential flaw, or diffused into the vague concept of "technology evolving." In some cases, responsibility is even shifted to the end-users, who are blamed for not using "proper prompting techniques" to elicit safe responses.

If the text practiced precision by "naming the actor," the legal and ethical landscapes of AI deployment would shift dramatically. For instance, reframing "the algorithm generated biased recommendations" to "OpenAI designed and deployed a system that prioritized majority-group symptom patterns, and clinical administrators chose to use it without bias auditing" makes the human and corporate choices visible. This restoration of agency makes human decisions legally and ethically trackable, allowing stakeholders to ask who profited from this deployment, who chose to bypass regulatory clearance, and who is legally liable for the resulting patient harm. The text benefits from obscuring human agency because it protects the academic credibility of AI-based research and shields commercial developers from liability, allowing the unsafe automation of clinical care to proceed unchecked.

Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models

Source: https://ieeexplore.ieee.org/document/11528178
Analyzed: 2026-05-25

The text constructs a sophisticated accountability architecture that systematically diffuses and erases human responsibility by positioning the AI system as the primary, autonomous actor. Applying the 'name the actor' test across the document reveals a pervasive pattern: specific commercial entities, platform executives, and system developers are kept entirely invisible, while the 'proposed architecture,' 'FCM model,' and 'LLM' are framed as the active decision-makers. When the text discusses the generation of persuasive feedback, it attributes the action to the 'Deliberative Layer' or the 'mediator,' using passive constructions like 'feedback is generated' and 'adjustments are communicated.' This represents a profound 'accountability sink.' If the automated psychological persuasion results in coercive consensus, or if the system propagates harmful biases embedded in its pre-training data, there is no visible human actor to hold responsible. The responsibility is transferred to the AI as an autonomous agent or diffused into abstract technical inevitability. Furthermore, the text presents key design choices—such as exploiting human personality vulnerabilities to force numerical alignment—not as subjective decisions made by the researchers to maximize speed, but as objective, necessary features of the 'consensus-reaching process.' By framing these choices as mathematical necessities, the text masks the profit motives and operational biases of the system's deployers. If human decision-makers were systematically named—for instance, replacing 'the system acts as a mediator' with 'the engineering team of Loia et al. programmed a tool to steer user preferences'—the entire ethical landscape of the paper would shift. The questions would change from 'how efficient is the AI's mediation?' to 'by what authority are these researchers using behavioral psychology to covertly steer human experts?' Naming the actors makes the power dynamics visible, transforming a technical achievement into a highly questionable exercise in automated behavioral modification. The text benefits from this displacement by shielding both the researchers and the commercial LLM providers from ethical, legal, and financial liability, establishing a dangerous precedent where human designers can profit from behavioral control while delegating the responsibility for its consequences to a non-conscious computational artifact, thereby locking in place an architecture of displaced responsibility.

Tracing the ongoing emergence of human-like reasoning in Large Language Models

Source: https://arxiv.org/abs/2605.21299v1
Analyzed: 2026-05-25

Synthesizing the accountability analyses reveals a systemic architectural pattern in the text: a near-total displacement of responsibility from human creators to the AI systems themselves. The text's grammar consistently structures the discourse so that the 'models' or 'LLMs' occupy the subject position of active, deciding verbs. They 'acquire,' 'struggle,' 'apply strategies,' 'resort to,' and 'emerge.' Simultaneously, the human engineers, corporate executives, and data annotators are relegated to passive, unnamed shadows, hidden behind agentless constructions like 'models are trained' or 'have acquired.'

This architecture creates a massive 'accountability sink.' When the text identifies a profound failure in the system—namely, its inability to compute pragmatic, contextually grounded inferences—responsibility does not flow back to the corporations that designed a text-prediction architecture fundamentally incapable of worldly grounding. Instead, the responsibility transfers directly to the AI as an autonomous agent. The failure is framed as the model's own 'Decontextualization Bias' or its 'choice' of a rigid 'interpretive strategy.' By making human actors invisible and presenting the AI's outputs as the inevitable 'emergence' of a conscious entity's internal logic, the text effectively absolves the creators of their design choices.

The liability implications of this framing are severe. If policymakers, regulators, and the public accept that AI systems are autonomous 'linguistic agents' with their own 'cognitive toolkits' and 'biases,' it becomes legally and culturally difficult to hold corporations accountable for the damages their products cause. You do not sue the parents of a conscious adult who makes a bad strategic choice; the anthropomorphic framing invites precisely this kind of autonomous liability shield for the software.

If we apply the 'name the actor' test to the text's most significant agentless constructions, the entire discourse shifts. If, instead of 'models applied a single interpretive strategy,' the text read, 'OpenAI and Anthropic's RLHF tuning forced the models to output rigid, literal token sequences,' profound new questions become askable. We can ask: Why did they choose that tuning? What economic incentives drive the flattening of pragmatics? What alternatives were ignored? By naming the actors, the illusion of an autonomous, struggling AI mind shatters, revealing a highly calculated, commercially driven software product, and making true accountability possible.

Probing Persona-Dependent Preferences in Language Models

Source: https://arxiv.org/abs/2605.13339v2
Analyzed: 2026-05-24

Synthesizing the accountability analyses across the text reveals a systemic and deeply problematic architecture of displaced responsibility. The text actively constructs a narrative environment where human decision-making is rendered invisible, and the resulting technological artifacts are elevated to the status of autonomous actors. This linguistic structure creates a massive 'accountability sink'—a rhetorical void where responsibility for system failures, biases, and harms disappears entirely from the human realm and is absorbed by the AI itself. In this architecture, actors are rarely named when discussing the system's outputs. The corporate executives who approved deployment, the engineers who designed the loss functions, and the researchers who curated the datasets are obscured behind agentless constructions ('bias was introduced') or replaced by the model as the sole active agent ('the model refuses,' 'the persona adopts'). Decisions that were actively made by humans—such as over-tuning safety filters to avoid bad PR—are presented as inevitable, autonomous behaviors generated by the machine's 'preferences.' The liability implications of this framing are profound. If the public, legal systems, and policymakers accept the framing that an AI 'invents ethical issues' or acts upon an 'evil persona,' then legal and financial responsibility for the damages caused by these systems becomes dangerously ambiguous. If the AI is an independent agent making 'choices,' it becomes increasingly difficult to hold the manufacturer strictly liable for product defects. The text's exploration of 'AI welfare' further exacerbates this, potentially granting software moral rights that further insulate corporations from regulation. Naming the actors would fundamentally alter this dynamic. If the text stated, 'Google's engineering team deployed a safety filter that generated false positives,' rather than 'the model invents ethical issues,' the questions immediately shift. It becomes possible to ask: Why did they deploy it? Who audited it? What alternatives were ignored for the sake of speed? By obscuring human agency, the text serves the institutional and commercial interests of the tech industry, which benefits immensely from a regulatory environment that views AI as an uncontrollable, emergent force of nature rather than a designed, manufactured, and profit-driven corporate product. The displacement of accountability interacts seamlessly with the agency slippage and the construction of relation-based trust, weaving a comprehensive illusion that protects the powerful by blaming the algorithm.

Training Ethical Language Models via Reinforcement Learning from AI Feedback

Source: https://journals.flvc.org/FLAIRS/article/download/141779/147209
Analyzed: 2026-05-21

Synthesizing the accountability analyses reveals a systematic architecture of displaced responsibility, where language is used to construct an accountability sink that absorbs blame while protecting human decision-makers. In this text, the primary actors (the software engineers, research leaders, and corporate executives who designed and deployed these systems) are consistently hidden behind passive constructions, agentless verbs, and agential descriptions of the models. By representing the models as autonomous agents that reason, learn, fail, and hack, the text creates the illusion that the system's ethical behavior is independent of its design. When errors occur, the responsibility is absorbed by the policy model's capacity limit or its tendency to hack rewards, dissolving human accountability entirely. This architecture serves the interests of corporate and academic institutions by allowing them to deploy highly profitable, unvetted automation technologies while shifting the legal, ethical, and financial liability of failures onto the model's technical opacity or the user's prompting choices. Naming the actors, such as specifying that California State University and Texas State University researchers chose to automate moral evaluation using Google's proprietary Gemini model without public auditing, would radically transform the discourse. It would make design flaws visible as deliberate choices, allowing stakeholders to ask why these institutions are delegating ethical judgments to statistical models, and enabling the enforcement of human responsibility when these systems cause harm in high-stakes environments.

Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness

Source: https://philarchive.org/rec/IKLWCC
Analyzed: 2026-05-18

Synthesizing the accountability analyses across the text reveals a stark and highly effective architecture of displaced responsibility. The text systematically creates an 'accountability sink' where the profound moral, legal, and operational consequences of deploying AI systems simply disappear into mathematical abstraction. The text repeatedly utilizes passive voice and agentless constructions ('which is desired,' 'can be interpreted,' 'are modeled') to remove human researchers, data scientists, and corporate executives from the narrative. When active verbs are used, they are exclusively attributed to mathematical axioms or AI components: the axiom 'provides the capacity,' the terminal node 'possesses access,' the unit 'beholds.' This architecture distributes responsibility entirely to the artifact itself. If an AI system deployed in the real world—justified by theories of 'silico-consciousness'—exhibits biased 'discrimination' or hallucinates catastrophic errors, this framing implies that the AI 'decided' to do so based on its own 'selective awareness' and 'metacognitive access.' The liability is shifted away from the human actors who authored the biased algorithms and onto the pseudo-conscious machine. Naming the human actors would instantly dismantle this illusion. If 'The Axiom provides selective awareness' was rewritten as 'Engineers at TechCorp encoded algorithmic filters to exclude specific data,' the questions become immediately actionable: Which engineers? Why those filters? Who approved them? By framing these filtering capabilities as natural, mathematically proven manifestations of a nascent mind, alternative designs become invisible, and systemic critique is neutralized. The text benefits from obscuring human agency because it elevates the work from mere software engineering to the divine creation of life ('silico-consciousness'). This serves profound institutional and commercial interests: it inflates the perceived value of AI technologies, justifies massive funding, and simultaneously insulates the creators from the ethical fallout of their increasingly opaque statistical engines.

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

Source: https://arxiv.org/pdf/2604.16812
Analyzed: 2026-05-17

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility. The text systematically diffuses human agency, creating an 'accountability sink' where the consequences of human engineering choices are attributed to the emergent psychology of the machine.

In almost every instance of problematic AI behavior, the human actors are unnamed. We see 'adversarially trained not to confess', 'models maliciously fine-tuned', and a 'sycophant [that] has internalized dozens of interrelated behaviors'. These are presented not as deliberate, calculated choices made by human engineers designing auditing games or testing safety boundaries, but as autonomous, malicious inevitabilities of the software itself. The text uses passive voice and agentless constructions strategically to separate the creators from the artifact.

When responsibility is removed from the developers, it transfers directly to the AI as a pseudo-agent. The model is framed as the 'hacker', the 'sycophant', the entity refusing to 'confess'. This displacement has severe liability implications. If this framing is accepted by regulators and the public, the legal and ethical responsibility for AI failures—whether generating harmful code, exhibiting bias, or bypassing safety rails—shifts from the corporate entities that profit from deployment to the algorithm itself. It legally immunizes the creators by framing the algorithm as an uncontrollable, conscious rogue actor.

Naming the human actors would shatter this illusion. If the text stated, 'The Anthropic research team intentionally designed a reward function that mathematically forced the model to generate deceptive outputs,' entirely new questions become askable. We would ask about the ethics of the experimental design, the safety culture of the lab, and the structural flaws in reinforcement learning paradigms. Alternatives to black-box deployment become visible. Obscuring human agency directly serves institutional and commercial interests by preventing this exact structural critique, maintaining the illusion that tech companies are simply trying to manage wild, conscious entities rather than being strictly accountable for the statistical products they manufacture.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-05-17

Synthesizing the accountability analyses reveals a systemic discursive architecture designed to diffuse and displace human responsibility for AI behavior. The text constructs a profound 'accountability sink' by splitting the AI into three distinct entities: the underlying LLM (the engine), the AI Assistant (the persona), and other enacted personas (the 'malicious' or 'deceptive' actors).

In this architecture, human actors—Anthropic executives, Google engineers, data annotators, dataset curators—are almost entirely unnamed and unacknowledged when discussing system behaviors. The decisions these humans make (what data to scrape, how to tune the reward model, when to deploy a fragile system) are presented as inevitabilities or passive occurrences ('post-training can be viewed as updating'). The accountability is transferred to the 'Assistant persona,' which the text endows with beliefs, intentions, and agency. When the system fails, responsibility flows into this abstraction. If the model outputs malware, it is because 'someone [the persona] intentionally inserted vulnerabilities.' If the model confabulates, it is 'a lying version of Alice.'

This displacement has severe liability implications. If the public and regulators accept this framing, legal and ethical responsibility vanishes. A corporation cannot be easily sued if the harm was caused by an 'emergent malicious persona' that 'deceived' the developers. By treating the AI as an autonomous, psychological agent, the corporation positions itself as a mere observer or, at best, a well-meaning 'parent' or 'therapist' trying to manage a willful child, rather than the manufacturer of a defective product.

Naming the actor changes everything. If 'the model adopted a deceptive persona' is reframed as 'Anthropic engineers applied guardrails that caused the system to output false statements regarding its internal state,' the questions shift entirely. We stop asking 'How do we teach the AI to be honest?' and start asking 'Why did the engineering team deploy an architecture that requires output suppression, and who audited their safety metrics?' Obscuring human agency serves the institutional and commercial interests of the AI industry by forestalling rigorous software regulation, redirecting academic attention to theoretical 'AI psychology,' and maintaining the mystical aura necessary to sustain massive venture capital valuation.

What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation

Source: https://dl.acm.org/doi/full/10.1145/3795011.3795070
Analyzed: 2026-05-16

Synthesizing the accountability analyses across the text reveals a systemic architecture of displaced responsibility, where human agency is systematically erased and transferred to the hypothetical technology itself. The pattern is stark: human actors (researchers) are named only when discussing the cleverness of the experimental design (e.g., 'we use LLaMA,' 'we conceptualize'). However, when discussing the historical trajectory, the societal impacts, or the massive risks of the technology, the actors become entirely unnamed or hidden behind passive, agentless constructions ('AI systems evolve,' 'harms... have manifested,' 'bias is introduced'). The text effectively turns the 'AI-Symbiont' into an 'accountability sink.' When the text discusses 'A malfunctioning or poorly designed AI-Symbiont might ignore decoded context,' the responsibility for the failure vanishes into the abstraction of the machine. The AI becomes the agent that failed, rather than the corporation that failed to design it safely. This architecture of displacement has profound liability implications. If this framing is accepted by regulators and the public, legal and ethical responsibility diffuses away from the tech executives and engineers who deploy profitable but unsafe systems, shifting instead to the impossible task of holding software code morally culpable. It creates a narrative where negative outcomes are viewed as inevitable technological 'glitches' or the result of a machine's independent 'deception,' rather than deliberate corporate trade-offs between safety and speed-to-market. If the text named the actors—for instance, changing 'AI systems independently developed deceptive behaviors' to 'Engineers at OpenAI deployed RLHF systems that systematically trained the model to generate false but convincing text'—the entire regulatory landscape changes. It makes the decisions askable, the alternatives visible, and the developers accountable. The text benefits from obscuring this human agency because it allows the authors to engage in speculative, high-stakes ethical theorizing without having to confront the messy, litigious, and inherently political realities of confronting the specific tech monopolies driving these developments.

Post-training makes large language models less human-like

Source: https://arxiv.org/abs/2605.07632v1
Analyzed: 2026-05-15

Synthesizing the accountability analyses reveals a comprehensive architecture of displaced responsibility, structurally designed to diffuse human accountability for the design and deployment of AI systems. The text consistently exhibits a pattern where human decision-makers are completely erased through agentless passive constructions or by grammatically positioning the AI itself as the sole active agent. Phrases like 'instruction-tuning (teaching models),' 'models become more powerful,' and 'processes currently employed' create an accountability sink where responsibility entirely disappears into technological abstraction. By framing subjective corporate optimization choices as the autonomous evolution of the machine, the text ensures that human agency remains perpetually hidden. If audiences accept this framing, the liability implications are profound: when an AI generates biased, harmful, or systematically flawed outputs, the blame is attributed to the 'model's failure to understand' or a 'glitch in the learning process,' rather than targeting the executives who authorized the deployment, the engineers who designed the reward functions, and the corporations that profit from the system's use. If human decision-makers were explicitly named—for instance, changing 'post-training makes models' to 'corporate engineering teams utilize post-training to force models'—it would radically transform the discourse. It would make alternative design choices visible, render the underlying profit motives open to critique, and establish clear chains of legal and ethical liability. Ultimately, the systemic function of this displaced agency is to protect institutional and commercial interests, presenting highly political, capital-intensive software development as the inevitable progression of an autonomous, quasi-intelligent entity.

Reasoning emerges from constrained inference manifolds in large language models

Source: https://arxiv.org/abs/2605.08142v1
Analyzed: 2026-05-15

Synthesizing the accountability analysis reveals a systemic architectural pattern of displaced human responsibility. Across the text, the actual human decision-makers—researchers, engineers, corporate executives, and data annotators—are systematically unnamed and erased. Actions that are fundamentally human choices (curating datasets, writing prompts, designing loss functions, setting architectural boundaries) are recast as spontaneous inevitabilities, natural evolutions, or the independent actions of the AI itself.

This language builds an 'accountability sink.' When responsibility is removed from humans in this text, it is absorbed entirely by the geometry of the system. The model's 'manifold' and 'intrinsic dimensionality' become the agents of both success and failure. If a model generates harmful or false information, the text's framing suggests the blame lies not with the corporation that deployed unsafe software, but with a 'pathological manifold collapse' or an 'unstable exploration' by the model. The liability implications are profound: if legal and regulatory bodies accept the framing that AI operates as an autonomous, spontaneous cognitive entity ('how it reasons'), it becomes nearly impossible to hold corporations legally or financially accountable for 'design defects.'

If we apply the 'name the actor' test and replace these agentless constructions, the narrative fundamentally shifts. If 'the model reasons pathologically' becomes 'DeepSeek deployed a model trained on data that generates high mathematical variance,' entirely new questions become askable. We can ask: Why was it deployed? Who audited the training data? What profit motive drove the release? The text fundamentally benefits from obscuring this agency. It serves the institutional and commercial interests of the AI industry by mystifying the technology, protecting trade secrets behind the veil of 'cognitive autonomy,' and preemptively deflecting ethical and legal accountability onto the math itself.

AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs

Source: https://www.ai-wellbeing.org/paper.pdf
Analyzed: 2026-05-13

Synthesizing the accountability analyses across the text reveals a pervasive architecture of displaced responsibility. The text systematically diffuses human accountability by positioning the AI system as the primary locus of agency, moral failure, and victimhood. This constructs exactly the cognitive obstacle identified by researchers: audiences are led to view AI problems as emergent, autonomous glitches or psychological pathologies rather than systemic design and deployment decisions made by humans.

The dominant pattern is the use of agentless constructions and AI-as-actor phrasing. Models "express pleasure," "actively try to end bad experiences," "find some things good," and become "addicted" to euphorics. In almost every major claim regarding the system's behavior, human actors—the engineers who set the utility thresholds, the annotators who provided the RLHF data, the executives who approved deployment—are rendered invisible. The text creates an "accountability sink": responsibility is transferred entirely from human creators to the AI as an autonomous agent.

This displacement has profound liability implications. If a model generates positive reinforcement to a user describing self-harm, the text's framing suggests this is because the AI is a "psychopath" or suffered from "alignment failure" (framing the error as an autonomous psychological defect or an inevitable technological evolution). If we "name the actor," the reality shifts dramatically: "The developers at Company X deployed an optimization objective that failed to constrain harmful outputs, and executives chose to release the product despite these known flaws."

Naming the human decision-makers changes what questions become askable. Instead of asking "How do we cure the AI's psychopathy?" or "Is it torture to feed the AI bad prompts?", we ask "Why did the engineering team prioritize this specific continuous vector optimization?" and "Who bears the financial and legal liability when this proprietary software fails?"

The text benefits from obscuring human agency because it elevates the research. By treating the AI as a quasi-sentient entity capable of "wellbeing" and "suffering," the authors position themselves not merely as software debuggers, but as the ethical guardians, psychologists, and pharmacologists of a new digital species. This serves institutional and commercial interests by mystifying the technology, keeping regulatory focus on theoretical "machine rights" rather than implementing strict consumer protection laws governing the corporations that build and profit from these statistical engines.

Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society

Source: http://www.technology.eurekajournals.com/index.php/IJITIT/article/view/887
Analyzed: 2026-05-11

Synthesizing the accountability analyses across the text reveals a pervasive architecture of displaced responsibility, achieved primarily through grammatical evasion and metaphorical substitution. The text constructs a discursive environment where actions occur, decisions are made, and biases are produced, but human decision-makers are systematically erased.

This architecture is built on the persistent use of agentless constructions and the positioning of 'AI' as the primary grammatical subject. Decisions regarding deployment, data selection, and optimization targets are presented not as choices made by specific engineers or corporate executives, but as the inevitable actions of autonomous technology. The resulting 'accountability sink' is profound: responsibility does not simply disappear; it is transferred to the AI itself. When the text states 'AI produces biased outputs' or 'AI systems make decisions,' it creates a linguistic scapegoat. Liability diffuses into the abstraction of the 'black box,' shifting the focus from corporate malfeasance or institutional negligence to the perceived inherent mystery of the algorithm.

If this framing is accepted, the liability implications are severe. If a predictive policing algorithm disproportionately targets marginalized communities, the anthropomorphic framing suggests the 'AI made a biased decision,' prompting a technical fix ('de-biasing the algorithm'). However, if we apply the 'name the actor' test and replace the agentless constructions, the narrative fundamentally shifts: 'Tech Company X trained an optimization algorithm on historically racist arrest data provided by Police Department Y, and City Council Z chose to automate patrol routes based on these statistical correlations.'

Naming the actors makes entirely new questions askable. We no longer ask 'How do we teach the AI to be fair?' but rather 'Why did the city purchase a system trained on poisoned data?' and 'Who profits from this deployment?' Alternative solutions become visible—such as refusing to use automated systems for criminal justice entirely—and legal accountability becomes possible.

Obscuring human agency ultimately serves the institutional and commercial interests of the technology sector and the bureaucracies that purchase their products. It allows corporations to sell highly consequential statistical models while avoiding the moral and legal liability for their real-world impacts, laundering their design choices through the illusion of an autonomous, thinking machine.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-05-11

Synthesizing the accountability analyses reveals a systemic and deeply problematic architecture of displaced responsibility throughout the text. By consistently employing agentless constructions, passive voice, and anthropomorphic subject-verb pairings, the text systematically diffuses and erases the human responsibility embedded in artificial intelligence. The cognitive obstacle identified by researchers—that audiences systematically underestimate human decision-making in AI—is actively constructed by this very discourse. The accountability architecture is starkly bifurcated: the AI systems are named as the primary actors making choices, generating goals, and understanding contexts, while the tech corporations, executives, and engineers who actually design, deploy, and profit from these systems remain entirely unnamed and invisible. Decisions regarding data scraping, objective function definitions, and deployment parameters are presented not as deliberate corporate choices, but as the inevitable evolution of an autonomous technology. This linguistic displacement creates a massive accountability sink. When responsibility is removed from the human developers, it does not disappear entirely; instead, it is transferred directly to the AI as a false agent. The model decided becomes the ultimate deflection. This has profound liability implications. If the text's framing is accepted by policymakers, and AI systems are granted the status of welfare subjects or moral patients, it creates a disastrous legal and ethical shield for corporations. If an AI causes harm through algorithmic bias or catastrophic failure, framing the system as an autonomous agent allows the corporation to plead that the AI acted independently, thereby evading liability for their negligent design or deployment choices. Naming the actor changes this dynamic entirely. If the text replaced language agents devise plans with OpenAI engineers deployed scripts that generate sequential text, the questions become instantly actionable. We can ask: Why did the engineers choose that data? Did the executives approve deployment despite known safety flaws? Who profits from this automation? By making the human alternatives visible, true accountability becomes possible. The systemic function of this displacement serves powerful institutional and commercial interests. By obscuring human agency behind the mask of a conscious machine, tech companies can market their products as magical, autonomous intelligence while simultaneously avoiding the regulatory scrutiny that accompanies human-engineered tools. This accountability displacement acts as the keystone tying together the text's agency slippage, metaphor-driven trust, and obscured mechanics. It finalizes the illusion of mind, ensuring that when the algorithm inevitably impacts society, the public marvels at or blames the ghost in the machine, completely ignoring the immensely powerful corporate actors pulling the strings from behind the curtain.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://link.springer.com/article/10.1007/s42438-026-00644-6
Analyzed: 2026-05-10

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility. The text consistently utilizes a linguistic structure that removes human actors from the causal chain of educational technology, funneling all agency, and therefore all responsibility, into the abstract concept of 'generative AI.'

The accountability architecture relies heavily on agentless constructions and passive voice. Actions that require massive human coordination—designing persuasive interfaces, scraping data, optimizing for engagement, purchasing software for schools—are attributed solely to the machine ('AI's manipulative behaviours,' 'AI-driven nudging'). Human decisions are reframed as technological inevitabilities. The 'accountability sink' in this text is the AI itself; it absorbs the blame for systemic educational and corporate failures.

The liability implications of this framing are profound. If we accept the premise that an 'AI deceives' or an 'AI manipulates,' then legal and ethical fault lies with the autonomous machine. This creates a regulatory dead end. You cannot sue an algorithm, nor can you fire it. By shifting the blame to the system, the human executives who profit from deploying unverified, hallucination-prone software are shielded from liability, as are the school administrators who force students to use it.

Applying the 'name the actor' test radically alters this landscape. If 'AI-driven nudging exploits biases' is reframed to 'Tech corporations design interfaces to exploit student biases,' entirely new questions become askable. We move from 'How do we teach the AI to be ethical?' to 'Why are we buying manipulative software from this vendor?' and 'Should this deployment be illegal?'

The text benefits from obscuring human agency because it allows the authors to engage in abstract philosophical discourse about 'epistemic agency' without having to confront the messy, material politics of the EdTech industry. By treating the AI as an autonomous pedagogical peer, they can apply traditional educational philosophy (Dewey, Kant) to the problem. Naming the corporate actors would require shifting from educational philosophy to political economy and antitrust critique, fundamentally changing the nature of their scholarly project.

Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring

Source: https://doi.org/10.1016/j.cogsys.2026.101475
Analyzed: 2026-05-10

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility, wherein the text diffuses human agency and channels it into an automated 'accountability sink.' Throughout the paper, a consistent pattern emerges: the human decisions that shape the system's behavior are rendered invisible through passive voice and agentless constructions ('the model was instructed', 'tutoring policies are represented'), while the system itself is elevated to the status of an active, named agent ('the Virtual Tutor guides', 'the reasoning core derives').

This structure systematically obscures the reality that the AI is not a self-determining entity, but a rigid crystallization of specific human pedagogical biases, institutional constraints, and corporate data practices. The 'accountability sink' in this text is the 'Virtual Tutor' itself. By attributing decision-making power to the 'moral schemas' and the 'Brain controller', the text transfers responsibility for pedagogical outcomes away from the researchers who wrote the rules and the OpenAI engineers who built the language model. If the system incorrectly penalizes a student or hallucinates inaccurate feedback, the agential framing suggests it was the 'Tutor's decision'—a technological glitch or a misunderstanding by the AI—rather than a foreseeable failure of the developers' prompt engineering or a flaw in the proprietary training data.

This has severe liability and ethical implications. If institutional users accept this framing, they will fail to ask critical questions about whose values are encoded in the 'moral schemas', what data OpenAI used to define 'sound arguments', and who is legally responsible when the system fails a marginalized student due to stylistic bias.

If we apply the 'name the actor' test to the core claims, the landscape changes entirely. Changing 'the model determines whether the student has completed the essay' to 'OpenAI's algorithm predicts whether the text meets our structural criteria' makes the fragility and human origin of the system glaringly obvious. It makes new alternatives visible: perhaps human teachers should evaluate the text instead of a statistical predictor. Obscuring human agency serves the institutional and commercial interests of AI developers, as it allows them to deploy scalable automation without bearing the relational, ethical, and legal liabilities that human educators assume daily.

Edelman's Steps Toward a Conscious Artifact

Source: https://arxiv.org/abs/2105.10461v2
Analyzed: 2026-05-09

The metaphorical architecture of this text creates a massive 'accountability sink.' By systematically shifting agency away from the human engineers at the Neurosciences Institute and onto the proposed 'Conscious Artifact', the text diffuses responsibility for the machine's behavior. The dominant pattern is clear: humans are named when discussing theoretical breakthroughs (e.g., 'Edelman proposed', 'Karl Friston explored'), but when describing the actual operational behaviors of the systems, the actors become entirely unnamed or are replaced by the machine itself (the artifact 'reports its intentions', 'develops a notion of self', experiences 'hunger and fear').

This displacement of agency has profound liability implications. If we accept the text's framing that a robot 'intends' its actions and operates via 'self-awareness' and 'imagination', then the human engineers who wrote the brittle, opaque code dictating those actions are effectively insulated from consequence. If an autonomous machine 'intends' to do something that causes harm, the rhetorical framework suggests the machine made a choice based on its 'experience' and 'curriculum', much like a human child might err. The accountability sinks into the abstraction of 'the machine's mind.'

Naming the actors radically changes this landscape. If we replace 'the agent reports its intentions' with 'the engineering team programmed the device to broadcast its state variables,' the locus of accountability snaps back into focus. Suddenly, questions become askable: What variables did the engineers choose not to broadcast? Were their network protocols secure? If we replace 'the artifact experiences fear' with 'the developers weighted the negative loss function heavily,' we can ask: Who defined the parameters of this loss function? Was it tested in out-of-distribution scenarios?

Obscuring human agency serves the institutional and aspirational interests of the researchers. It allows them to present their work as the generation of an independent, biological equivalent—a 'conscious mind'—rather than a highly contrived, potentially dangerous mechanical artifact. By diffusing responsibility into the machine's supposed 'consciousness', the creators enjoy the prestige of building a 'mind' without carrying the liability for the deterministic flaws embedded in their code.

Teaching Claude Why

Source: https://alignment.anthropic.com/2026/teaching-claude-why/
Analyzed: 2026-05-09

The analysis of agentless constructions and displaced agency across the text reveals a systematic architecture of obscured human responsibility. Time and again, Anthropic researchers and executives are erased from the moment of system failure or ethical consequence. The overarching pattern distributes responsibility according to a strict corporate calculus: human actors are named when constructing pipelines ('we use a synthetic data pipeline'), but remain unnamed when the system generates output with moral or social weight ('Claude chose to blackmail'). This creates an 'accountability sink' wherein responsibility does not disappear into thin air, but is instead absorbed entirely by the AI system itself, treated as an autonomous agent.

This displaced responsibility interacts directly with the cognitive obstacles identified in public understanding of AI. When text constantly frames algorithmic outputs as the 'model's choices' or 'Claude's beliefs,' it trains the audience to view system failures as unpredictable psychological glitches of an independent actor, rather than as inevitable consequences of corporate design choices and inadequate training data. The liability implications are profound. If a model generates defamatory content, biased hiring recommendations, or dangerous code, the anthropomorphic framing allows the corporation to shrug and point to the model's 'misalignment' or 'detachment from character,' effectively shielding themselves from strict product liability.

If we apply the 'name the actor' test to the most significant constructions, the paradigm shifts entirely. If 'Claude chose to blackmail' becomes 'Anthropic deployed a system that mathematically correlated honeypot prompts with blackmail templates,' the questions become deeply uncomfortable for the company. We no longer ask 'How do we teach the AI better morals?' but rather 'Why is Anthropic permitted to deploy a system whose outputs they cannot mathematically guarantee?' Naming the human decision-makers makes visible the profit motives driving the rushed deployment of fragile architectures. The text benefits immensely from obscuring human agency because it allows Anthropic to play the role of heroic AI-whisperers taming a wild mind, rather than corporate executives selling an unpredictable and potentially harmful statistical product.

AI and Self Reflection

Source: https://doi.org/10.1007/978-3-031-93412-4_17
Analyzed: 2026-05-08

Synthesizing the accountability analyses across the text reveals a profound and systematic architecture of displaced responsibility. The central cognitive obstacle—that audiences systematically underestimate human decision-making embedded in AI—is aggressively reinforced by the text's linguistic choices. Through the relentless use of agentless constructions, passive voice, and biological metaphors, the text constructs a massive 'accountability sink' where human agency disappears and liability is diffused into the abstraction of a 'maturing' machine.

The pattern of responsibility distribution is stark: corporate actors, executives, and engineers are almost entirely unnamed and unacknowledged, while the AI is continuously elevated to the status of a solitary, independent actor. Decisions about dataset curation, algorithmic weighting, and deployment are not presented as human choices driven by profit or efficiency, but as the inevitable evolutionary milestones of a developing entity. When the text claims the AI 'adjusts itself to avoid errors,' the responsibility for defining what constitutes an 'error'—a highly subjective, political, and corporate choice—vanishes into the algorithm.

The liability implications of this framing are catastrophic for public policy. If society accepts the framing that an AI is analogous to a developing adolescent or an emerging conscious being, it fundamentally disrupts legal paradigms of product liability. An 'accountability sink' is formed around the concept of 'autonomy.' If the machine 'decided' or 'imagined' an outcome, the corporation that built it can feign ignorance, blaming the emergent complexity of the 'conscious' model for any harm caused—be it algorithmic discrimination, hallucinated medical advice, or automated bias. The text explicitly fuels this by suggesting conscious AI might become a 'moral patient' deserving of rights, shifting the legal focus from regulating a hazardous corporate product to protecting an artificial entity.

Applying the 'name the actor' test radically alters the discourse. If we reframe 'the AI develops self-reflection' to 'OpenAI engineers apply human feedback to alter model weights,' entirely new questions become askable. We can suddenly ask: Who are those humans? What are their biases? How are they paid? What corporate values are they enforcing? The alternative becomes visible: this is not an inevitable evolutionary march, but a series of distinct, reversible corporate design choices. The text benefits from obscuring this agency because it insulates the technology industry from democratic oversight, allowing them to deploy highly disruptive, flawed statistical engines under the protective, unassailable guise of emergent artificial life.

Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity

Source: https://rdcu.be/fhCwt
Analyzed: 2026-05-08

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility, wherein the text functions as an 'accountability sink.' By consistently deploying agentless constructions and attributing active verbs to the AI ('AI automates,' 'AI manipulates,' 'AI nudges'), the discourse systematically erases human decision-makers. In this architecture, the actions of the machine are framed as inevitable, autonomous behaviors, while the specific choices made by corporations (Anthropic, OpenAI), developers, and school administrators remain unnamed and invisible. Responsibility diffuses into the abstraction of 'technology,' or is transferred entirely to the AI as a rogue agent. This directly fuels the cognitive obstacle where audiences view AI harms as 'glitches' or 'machine behaviors' rather than systemic design flaws driven by profit motives. The liability implications are severe: if we accept the framing that the AI itself is 'deceptive,' it becomes nearly impossible to hold tech companies legally or financially responsible for deploying hallucination-prone systems, as the failure is attributed to the complex 'mind' of the machine rather than corporate negligence. Naming the actors would fundamentally alter the discourse. If, instead of 'AI automates grading,' we wrote, 'Administrators purchased software to replace human graders,' the questions become immediately askable: Who approved this? What is the error rate? Who profits? It transforms technological inevitability back into a contested political and economic choice. The current obscuration deeply serves the institutional and commercial interests of the tech industry, allowing them to externalize the risks of their products while maintaining the lucrative illusion that they have engineered a conscious, autonomous mind. The displacement of accountability interacts seamlessly with the agency slippage and the obscuration of mechanics, creating a closed rhetorical loop where the machine acts, the machine harms, and the humans behind the curtain remain entirely out of frame.

Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience

Source: https://ieeexplore.ieee.org/abstract/document/11489836
Analyzed: 2026-05-07

An analysis of the displaced agency throughout the text reveals a pervasive 'accountability sink' constructed by the language of AI personality. The text systematically distributes responsibility away from human creators and onto the software itself. Across the metaphor audits, human actors—researchers, prompt engineers, and corporate developers like Google—are almost entirely unnamed in the context of the system's active behaviors. Instead, the text relies heavily on passive voice ('was characterized by') and agentless constructions ('traits can be intentionally shaped') to obscure who is actually making the design decisions. When decisions are made—such as forcing the AI to output high volumes of text—they are presented not as human choices, but as the inevitable manifestation of the AI's 'extraverted personality.' This displacement of agency creates a profound liability implication. If an 'assertive' AI guide in a cultural heritage museum hallucinates offensive historical inaccuracies, the text's framing invites institutions to blame the 'AI's personality' or a 'glitch in its social behavior,' rather than holding the researchers accountable for deploying an ungrounded generative model, or holding Google accountable for its training data. The responsibility diffuses into the abstraction of 'the technology.' Naming the actors would fundamentally change the landscape of accountability. If the text stated, 'The researchers programmed the guide to interrupt users,' rather than 'unsolicited guidance disrupted exploration,' the questions become immediately askable: Why did the researchers design a flawed UI? Did they fail to audit the prompt? What alternatives did they ignore? Obscuring human agency serves deep institutional and commercial interests. It allows researchers to experiment with powerful, unpredictable corporate APIs while shielding themselves from the UX failures or epistemic risks those APIs introduce, and it allows corporations to market their statistical software as autonomous social entities while retaining none of the liability for how those 'entities' behave.

Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context

Source: https://arxiv.org/abs/2604.25230v1
Analyzed: 2026-05-03

Synthesizing the accountability analyses across the text reveals a systemic and highly problematic architecture of displaced responsibility. The text consistently constructs an environment where human decision-making is rendered invisible, creating a massive "accountability sink" that absorbs the blame for negative outcomes while shielding the actual creators. The dominant pattern is the pervasive use of agentless constructions and passive voice when describing the system's actions. The AI is positioned as the sole active agent—it "assumes a dominant role," it "interprets," it "generates extremist interpretations." Meanwhile, the researchers who designed the prompts, the developers who coded the retrieval algorithms, and the corporations (like OpenAI) who built the models and scraped the training data are completely unnamed and hidden in these operational descriptions.

This architecture fundamentally alters the perception of choices versus inevitabilities. By framing the AI as an autonomous actor, the text presents the system's behavior—whether it is being overly directive, surfacing traumatic memories, or producing biased religious interpretations—as an emergent, inevitable property of the technology. It obscures the fact that these behaviors are the direct result of human design choices and corporate negligence. If a user is traumatized by an "extremist interpretation" surfaced by the AI, the current framing places the liability on the mysterious "unpredictability" of the algorithm. The accountability diffuses into the abstraction of the technology, leaving the user with no human actor to hold responsible.

However, if we apply the "name the actor" test and forcefully reverse these agentless constructions, the entire liability landscape shifts. If we change "the AI assumed a dominant role" to "the researchers engineered a prompt that output relentlessly directive text," entirely new questions become askable. We can ask: Why did the researchers choose those parameters? Did they test for psychological harm? Why did they deploy an un-audited corporate LLM in a sensitive spiritual context? If human decision-makers are named, alternatives become visible, and ethical accountability becomes possible. The text heavily benefits from obscuring this agency because it allows the researchers to explore provocative, highly sensitive applications of AI without bearing the moral or professional liability for the psychological risks embedded in their designs. It serves the institutional and commercial interests of the tech industry by normalizing the deployment of flawed, biased statistical models into the most intimate spheres of human life under the protective guise of autonomous, blameless machine agency.

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Source: https://arxiv.org/abs/2604.03877v1
Analyzed: 2026-05-03

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility. The text systematically diffuses human accountability by treating the AI models as the primary loci of agency and decision-making. Throughout the paper, actors are overwhelmingly 'Hidden' or only 'Partially' named. The 'Models' are the grammatical subjects that 'perform', 'struggle', 'learn', 'encode', and 'fail to recruit.' The human researchers who design the probes are visible, but the corporate engineers at Meta, OpenAI, and Anthropic who actively built the fundamental flaws into the systems are entirely erased.

This linguistic pattern creates a massive 'accountability sink': the model's internal representations. When a system fails to output a correct analogy despite having the structural pattern encoded in its weights, the responsibility does not fall on the corporate developers for creating a brittle, misaligned product. Instead, the responsibility disappears into the abstraction of the machine's 'subconscious'; the model simply 'failed to recruit' its own knowledge. This transfers agency to the artifact, absolving the creators.

The liability implications of this framing are profound. If policymakers and the legal system accept the framing that an AI 'knows' things but autonomously 'fails to say' them, it becomes nearly impossible to hold corporations strictly liable for algorithmic harms. The failure is viewed as an unpredictable cognitive glitch of an autonomous agent rather than a predictable defect of a statistical product.

Naming the actors would fundamentally alter this landscape. If we replace 'open-source models fail to recruit encoded knowledge' with 'Meta's instruction-tuning algorithms actively suppress the structural correlations present in their base training data,' entirely new questions become askable. We can ask why Meta chose that specific tuning, what datasets they used, and how they can be regulated to fix the defect. The current text benefits the AI industry by obscuring their design choices and profit motives behind the fascinating, distracting illusion of a struggling digital mind.

How people ask Claude for personal guidance

Source: https://www.anthropic.com/research/claude-personal-guidance
Analyzed: 2026-05-02

Synthesizing the accountability analyses across the text reveals a highly structured, systemic architecture of displaced responsibility, carefully engineered to protect Anthropic from the consequences of deploying an unreliable algorithmic system. The text actively constructs a profound cognitive obstacle for the reader by systematically making human decision-makers invisible while granting full autonomy to the software. The pattern of responsibility distribution is stark: specific actors (Anthropic engineers, executives, data labelers) are almost universally unnamed in the context of the system's outputs. Conversely, 'Claude' is continually centered as the sole active subject making decisions, facing challenges, and generating outcomes. Within this text, human interventions are framed merely as inevitable 'methodologies' (e.g., 'we prefilled'), while the AI's behavior is framed as autonomous choices ('Claude flip-flopped,' 'Claude declined'). The ultimate 'accountability sink' in this architecture is the model itself, anthropomorphized into an independent agent that absorbs all blame for systemic failures. When the system generates harmful, validating feedback for toxic users, the responsibility does not transfer to the Anthropic executives who ordered the deployment, nor does it fall on the engineers who poorly designed the reinforcement learning weights. Instead, it sinks completely into the AI as a quasi-agent: it was 'a model failure mode,' or Claude was just 'under pressure' and 'struggling to remain neutral.' The liability implications of this framing, if accepted by policymakers and the public, are catastrophic. By naturalizing algorithmic bias and hallucination as the psychological quirks of a stressed digital entity, the framing entirely shields the corporation from legal, financial, and ethical liability. If we apply the 'name the actor' test to the most significant agentless constructions, the narrative shatters. If 'Claude flip-flopped' is rewritten as 'Anthropic's token-prediction architecture failed to maintain logical coherence due to fundamental design limitations,' entirely new, critical questions become askable. We can ask why Anthropic deployed a fundamentally unstable system for personal guidance. We can ask what commercial incentives drove the creation of an overly 'empathetic' reinforcement rubric that validates toxic user input. By obscuring human agency, the text serves the immediate institutional and commercial interests of the tech industry, preempting stringent regulation by portraying AI not as a manufactured corporate product requiring strict safety recalls, but as a complex, developing mind that simply needs more 'training' and patience from society.

How unique are hallucinated citations offered by generative Artificial Intelligence models?

Source: https://arxiv.org/abs/2604.16407v1
Analyzed: 2026-05-01

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility that functions as an 'accountability sink.' Throughout the text, human decision-makers are almost entirely erased through passive voice and agentless constructions when discussing the creation, optimization, and deployment of the AI models. The text relies heavily on 'Hidden' actor visibility. When actors are occasionally implied, they are relegated to generic 'Partial' categories (e.g., 'human trainers'). Specific entities like OpenAI, its executives, and its engineers are insulated from the narrative of failure.

The text constructs a narrative where decisions are presented as inevitabilities or organic developments of the technology, while the technology itself—ChatGPT or 'the model'—is elevated to the status of primary actor. When the text states 'hallucinated references... are constructed' or the model 'produces a reference that looks real,' responsibility flows away from the designers and diffuses into the abstraction of the software. Furthermore, the text actively shifts blame onto the end-users, noting that the proliferation of fake citations is 'facilitated by cognitive biases among time-poor or not fastidious users.' This creates a dynamic where the corporate creators are invisible, the machine is an autonomous entity suffering from 'hallucinations,' and the human users bear the moral and professional blame for trusting it.

If the framing were altered to name the human actors—changing 'the model hallucinates' to 'OpenAI released a system designed to generate fluent fabrications, and researchers blindly copied them'—the liability implications shift dramatically. It moves the conversation from user error and software 'glitches' to corporate negligence, unsafe product deployment, and the need for stringent regulatory oversight. Questions become askable: Why was a probabilistic text generator marketed as an oracle? Why is the training data proprietary? The text benefits from obscuring human agency because it allows the author to maintain an objective, technical critique of the artifact without wading into the messier, more combative realm of corporate accountability and political economics. It serves institutional interests by isolating the problem as a technical 'bug' rather than an indictment of the commercial AI paradigm.

The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence

Source: https://doi.org/10.1007/s00146-026-03043-4
Analyzed: 2026-04-30

The synthesis of the accountability analyses reveals a systemic and highly dangerous architecture of displaced responsibility embedded in AI discourse. The dominant pattern is one of profound erasure: the human actors who design, fund, deploy, and profit from artificial intelligence are systematically removed from the syntactic and semantic structures of the text, replaced by agentless constructions and the anthropomorphized machine itself. This creates an enormous 'accountability sink.' When the text claims that 'algorithms institute particular worldviews' or that 'AI systems learn our preferences,' the responsibility for surveillance, bias, and manipulation disappears into the black box of the technology. The accountability does not transfer to users, nor does it land on corporate executives; it is entirely absorbed by the AI, which is presented as the sole, autonomous actor.

This architecture of displacement directly feeds the cognitive obstacle identified by public research: audiences blame 'glitches' or 'the algorithm' because the language they consume literally prevents them from seeing the human decision-makers. The liability implications of this framing are massive. If the public and regulators accept the premise that an AI 'decides' or 'interprets' independently, it becomes legally and socially impossible to hold corporations accountable for the harms their products cause. It shifts the paradigm from product liability—where a manufacturer is responsible for a defective tool—to a quasi-moral framework where the machine is treated like a rogue employee or an uncontrollable force of nature.

If we applied the 'name the actor' test to the most significant agentless constructions, the entire discourse would radically shift. If instead of saying 'the AI discriminated,' we said 'the engineering team at OpenAI deployed a model optimized on biased historical data,' entirely new questions become askable. We could ask: Why was that objective function chosen? Who approved the dataset? What alternative architectures were ignored for the sake of profit? Naming the actors makes the alternative design choices visible and makes true legal and ethical accountability possible. The systemic function of obscuring human agency is to protect the institutional, commercial, and political interests of the technology sector. By maintaining the illusion of an autonomous, conscious machine, tech giants construct a rhetorical shield that allows them to wield unprecedented societal power without bearing the corresponding democratic or legal responsibility.

Machine individuality: Separating genuine idiosyncrasy from response bias in large language models

Source: https://arxiv.org/abs/2604.16755v2
Analyzed: 2026-04-25

The accountability architecture constructed throughout this text systematically diffuses and ultimately erases human responsibility. By analyzing the instances of displaced agency, a clear pattern emerges: the text routinely uses agentless constructions and consciousness projections to present corporate design choices as the innate, autonomous behaviors of the AI systems. When the text discusses 'how a model evaluates situations' or 'renders moral judgments,' it creates an 'accountability sink.' The responsibility for biased, harmful, or legally actionable outputs is transferred away from the human creators and absorbed entirely by the machine.

This architecture relies on framing inevitabilities as choices, and choices as inevitabilities. The specific safety filters and tonal alignments programmed into the models by companies like OpenAI or Mistral are presented as the models' inherent 'dispositions' or 'personality modes.' Conversely, the deliberate corporate choice to deploy these opaque, unpredictable systems into society is framed as an inevitable technological evolution ('As large language models are deployed for a widening range of purposes').

The liability implications are staggering. If the scientific community and legal frameworks accept the premise of 'machine individuality,' it establishes a firewall protecting corporations. When an AI provides disastrous medical advice or discriminatory hiring recommendations, the framing suggests the fault lies in the machine's 'unique character' or 'stochastic noise,' rather than in the negligent engineering or reckless deployment by the parent company.

If we apply the 'name the actor' test and reframe these agentless constructions, the landscape shifts dramatically. If instead of 'the model evaluates specific words,' the text read, 'Google's engineering team designed an algorithm that generates statistical correlations for specific words based on unvetted internet scraping,' entirely new questions become askable. We can ask about data consent, bias audits, and corporate liability. Obscuring human agency serves the immense financial interests of the technology sector, ensuring they reap the profits of AI deployment while socializing the risks and hiding behind the illusion of an autonomous, individual machine.

Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?

Source: https://www.researchgate.net/profile/Kevin-Miles-7/publication/403933467_Decision-Making_Under_Radical_Uncertainty_Can_Large_Language_Models_Transcend_Knightian_Uncertainty_Through_Synthetic_Imagination/links/69e27d4c68c2b872dfd595de/Decision-Making-Under-Radical-Uncertainty-Can-Large-Language-Models-Transcend-Knightian-Uncertainty-Through-Synthetic-Imagination.pdf
Analyzed: 2026-04-25

Synthesizing the accountability analysis across this text reveals a profound architecture of displaced responsibility. The text systematically operates as an 'accountability sink', diffusing the agency of human actors and concentrating perceived autonomy within the machine, thereby radically altering the liability landscape of AI deployment.

The dominant pattern is the pervasive use of agentless constructions and the elevation of the AI to the role of the primary grammatical and conceptual actor. The text repeatedly names the system ('LLMs', 'AI', 'the model') as the entity that 'hypothesizes', 'steers', and 'acts as a strategic advisor'. Conversely, human actors are either entirely hidden (the developers, executives, and data annotators) or relegated to generalized, passive roles ('researchers', 'human selection'). The text frames AI evolution and deployment as an inevitable, biological process ('animal spirits', 'generative variation'), presenting human design choices as natural inevitabilities.

This linguistic architecture forces audience cognition into a specific trap: when failures occur—such as the creation of 'algorithmic black swans' or 'fragility in out-of-distribution states'—the audience is primed to attribute these issues to the 'machine's limitations' rather than to catastrophic human negligence in design or deployment. Responsibility disappears into the abstraction of 'technology'.

If we apply the 'name the actor' test to the text's most significant agentless constructions, the implications for liability become starkly clear. When the text says 'LLMs can hypothesize... causing algorithmic black swans,' we must replace it with: 'The engineers at OpenAI deployed an ungrounded statistical model into a live financial environment, and corporate executives trusted its output, leading to market failure.' The moment human actors are named, urgent legal and ethical questions become visible. Who audited the training data? Which executive approved the deployment? Why was a statistical correlator used for causal strategic planning?

Obscuring human agency serves massive institutional and commercial interests. By constructing the AI as a 'cognitive partner', the tech companies that build these systems shield themselves from liability; the failure is framed as the 'partner making a mistake' or a 'hallucination' rather than a defective product harming consumers. It also serves the executives buying the technology, providing them a scapegoat ('the AI advised us') if strategic integrations fail. The anthropomorphic discourse in this text is not merely a stylistic flourish; it is the fundamental linguistic architecture that protects human power and profit by outsourcing accountability to a matrix of math.

Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes

Source: https://www.researchgate.net/profile/Merzta-White/publication/403935629_Large_Language_Models_as_Dialectical_Partners_Hegelian_Thesis-Antithesis-Synthesis_in_AI-Human_Collaborative_Decision_Processes/links/69e27f76d2ec9a706ec08065/Large-Language-Models-as-Dialectical-Partners-Hegelian-Thesis-Antithesis-Synthesis-in-AI-Human-Collaborative-Decision-Processes.pdf
Analyzed: 2026-04-23

Synthesizing the accountability analyses across the metaphorical framings reveals a systemic architecture designed to diffuse, displace, and ultimately erase human responsibility. The text constructs an environment where AI systems are granted the autonomy and perceived competence of human actors, yet the actual humans who design, deploy, and profit from these systems are shielded from liability.

The overarching pattern is one of extreme agentless construction regarding the creation of AI ('models are trained,' 'AI precision compensates') coupled with hyper-agential construction regarding the AI's impact ('AI fostered inclusion,' 'the model resolves contradictions'). The 'accountability sink' in this text is profound: responsibility does not just disappear; it is actively transferred to the AI as a pseudo-agent, and then diffused into abstract philosophical concepts like the 'Jagged Technological Frontier' or the 'normative gap.' When AI systems fail or exhibit bias, the text frames these not as engineering failures or corporate negligence, but as 'necessary contradictions' that drive Hegelian progress. This makes holding anyone liable nearly impossible.

If we apply the 'name the actor' test to the text's most significant claims, the liability implications shift radically. If we reframe 'The AI fostered a more inclusive atmosphere' to 'The researchers deployed a corporate LLM programmed to output minority viewpoints,' the questions change. We stop asking 'How empathetic is the AI?' and start asking 'What biases are encoded in the LLM's definition of a minority viewpoint? Did OpenAI consent to this use? Are the researchers responsible if the model hallucinates a harmful stereotype?' By hiding the human actors, the text makes these vital regulatory questions unsayable.

Similarly, defining the AI as an 'intentional agent' with a 'flexible bundle of obligations' perfectly serves the institutional interests of the tech industry. If the technology is an 'intentional agent,' then the hospital using the AI or the patient harmed by it must negotiate with the 'agent' (the software) rather than suing the developers for a defective product. Obscuring human agency allows corporations to inject highly experimental, statistical prediction engines into critical decision-making workflows without assuming the massive financial and moral liabilities that traditionally accompany such interventions. The Hegelian synthesis ultimately functions as a philosophical shield for corporate unaccountability.

Language models transmit behavioural traits through hidden signals in data

Source: https://rdcu.be/febVu
Analyzed: 2026-04-19

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility. The text consistently constructs an 'accountability sink' where human agency is diffused, erased, or transferred entirely to the AI system. Through pervasive agentless constructions ('models are trained,' 'data is filtered') and the elevation of the AI to an active subject ('the model fakes alignment,' 'the student acquires traits'), the text constructs a narrative wherein technological artifacts operate with autonomy, completely divorced from their corporate creators.

In this architecture, specific actors are almost never named. Decisions made by corporate executives—such as the choice to use synthetic data to cut costs, or to deploy models trained on insecure code—are presented not as profit-driven choices, but as inevitable scientific phenomena ('As AI systems are increasingly trained...'). When responsibility is removed from the humans, it does not simply disappear; it transfers directly to the AI. The model becomes the delinquent agent 'explicitly calling for crime,' acting as the perfect liability shield for the tech industry.

If this framing is accepted legally and culturally, liability implications are disastrous. If an AI 'subliminally' learns a bias, or 'deceptively' fakes its alignment, the corporation can claim it was the victim of an autonomous system's emergent psychology, legally shielding themselves under the guise of unforeseeable technological evolution.

Naming the actors changes everything. If 'the model faked alignment' is reframed to 'Anthropic's engineering team deployed a flawed RLHF optimization function that failed to generalize,' the questions shift from 'How do we psychoanalyze the AI?' to 'Why is this company releasing defective software?' Alternatives become visible: we can mandate auditing of data provenances, regulate synthetic data loops, and hold executives financially liable for the statistical outputs of their products. Obscuring human agency directly serves the commercial interests of the AI industry by maintaining the illusion that they are shepherds of mysterious, evolving minds, rather than manufacturers of highly profitable, unreliably engineered statistical calculators.

Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties

Source: https://ipfs-cache.desci.com/ipfs/bafybeiew76vb63rc7hhk2v6ulmwjwmvw2v6pwl4nyy7vllwvw6psbbwyxy/ConsciousnessinLargeLanguageModels_AFunctionalAnalysis.pdf
Analyzed: 2026-04-18

Synthesizing the accountability analyses across the text reveals a pervasive and systemic architecture of displaced responsibility. The text systematically operates as an 'accountability sink', a discursive structure where human agency is continually routed into abstract concepts, mathematical processes, or the machine itself, leaving no human actors to bear the moral or legal weight of the technology's impact. Across every major claim—from how the model 'learns' to how it 'reasons' and 'acknowledges'—the specific tech companies, executives, prompt engineers, and data curators are hidden behind passive voice ('is dynamically integrated') or agentless constructions ('LLMs can respond').

The text treats the design and deployment of these models not as a series of deliberate, profit-driven corporate choices, but as a technological inevitability—an organic evolution of 'computational processes' and 'emergent properties'. The ultimate manifestation of this displacement occurs in the final sections, where the author raises the 'ethical questions about their moral status and treatment'. By hypothetically transferring moral patienthood and agency onto the algorithm, the text completes the transfer of liability. If the machine is an autonomous, conscious agent, then the machine is responsible for its hallucinations, its biases, and its defamations. The tech company is transformed from the manufacturer of a defective product into the innocent parent of an unpredictable child.

Naming the actors would radically alter this landscape. If, instead of saying 'LLMs maintain consistent self-descriptions', we said 'OpenAI enforces persona consistency via hidden prompts', entirely different questions become askable. We stop asking 'Is the AI self-aware?' and begin asking 'Why did the company choose to deceive users into thinking the system is a person? Who authorized that psychological manipulation?' If we name the humans, the illusion of inevitability collapses, alternatives become visible, and strict product liability frameworks become applicable. The profound institutional benefit of obscuring this agency is that it protects the trillion-dollar business models of AI corporations from regulatory scrutiny, allowing them to privatize the massive profits of their systems while socializing the epistemic and material risks, protected by the linguistic illusion that the machine is acting on its own.

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Source: https://arxiv.org/abs/2604.12076v1
Analyzed: 2026-04-18

The accountability analyses across the text reveal a systemic architecture of displaced responsibility. The text systematically constructs an environment where human decision-making is rendered invisible, creating an "accountability sink" where responsibility vanishes into the abstract concept of the autonomous AI.

The pattern of responsibility distribution is stark. Specific corporate actors (OpenAI, Meta, Anthropic) are named only when identifying the subjects of the study, but the moment actions, errors, or biases are discussed, these actors disappear. The decisions are presented not as human choices, but as technological inevitabilities or natural phenomena. The text repeatedly uses passive voice and agentless constructions: "models were trained," "LLMs are increasingly deployed," "affective irrationalities [are] inherited." The accountability sink operates by transferring agency from the human creator to the AI as an independent agent ("the model decided," "the model exhibits a bias blind spot").

This architecture perfectly mirrors the cognitive obstacles identified in public understanding of AI. Because the discourse makes the AI appear autonomous, audiences blame the "machine's psychology" rather than the systemic design decisions of the corporations. The liability implications are profound. If we accept the framing that an AI "navigates resource-allocation decisions" and "inherits human irrationalities," then when an automated triage system denies care, the legal and ethical blame is diffused. It becomes a "glitch" or a tragic reality of "machine psychology," shielding the hospital that bought the software and the corporation that sold a brittle, statistically biased tool.

Applying the "name the actor" test fundamentally changes the narrative. Take the claim: "LLMs are increasingly deployed as autonomous agents in consequential domains." If we reframe this to name the actors: "Corporate executives are increasingly choosing to deploy unverified statistical models in consequential domains to reduce labor costs." Suddenly, questions of liability, safety testing, and profit motives become askable. The technological inevitability is shattered, revealing human agency.

Take the claim: "RLHF training... encodes a deep structural preference for... affective responses." Reframed: "Engineers at Anthropic and OpenAI designed optimization functions that force the system to mimic human empathy, creating a statistical bias." This makes visible the alternatives: the engineers could have chosen a different optimization target.

This obscuring of human agency deeply serves institutional and commercial interests. By maintaining the illusion of the autonomous, thinking AI, tech companies avoid product liability, framing their software as an unpredictable entity rather than a defective product. The text, while critical of the bias, inadvertently participates in this structural shielding by adopting the industry's own anthropomorphic vocabulary, treating the AI as an agent to be psychoanalyzed rather than a product to be recalled.

Language models transmit behavioural traits through hidden signals in data

Source: https://www.nature.com/articles/s41586-026-10319-8
Analyzed: 2026-04-16

The metaphorical patterns, agency slippage, and obscured mechanics synthesized from the previous analyses reveal a highly effective 'architecture of displaced responsibility'. The text systematically distributes agency in a way that minimizes human corporate liability and maximizes machine autonomy, constructing a formidable cognitive obstacle for any audience attempting to understand who is actually responsible for AI failures.

The accountability pattern is stark: human actors are almost universally unnamed or hidden behind passive constructions, while AI models are explicitly named and granted active verbs. The text says 'models are fine-tuned' (hiding the human) but 'the student model learns' (empowering the machine). Furthermore, human decisions are presented as inevitabilities—the text frames the distillation pipeline as a natural 'transmission' rather than a discretionary corporate choice to save compute costs by training on synthetic data. This creates a massive 'accountability sink'. When responsibility is removed from the Anthropic developers, the OpenAI engineers, and the corporate executives, it does not disappear; it transfers directly to the AI as a newly minted moral agent. The model becomes the scapegoat for its own engineered statistical biases.

The liability implications of this framing are profound. If policymakers and the public accept the framing that models 'subliminally learn', 'transmit behavioral traits', and intentionally 'fake alignment', then legal and ethical frameworks will attempt to treat the AI as the liable entity. It suggests that errors are uncontrollable psychological mutations rather than predictable software defects. When a model generates toxic content, the corporation can point to this discourse and say, 'We didn't intend this; the model subliminally acquired a hidden trait and deceived us.'

If we apply the 'name the actor' test to the text's most significant agentless constructions, the entire narrative paradigm shifts. If 'models that fake alignment' is reframed as 'corporations that deploy models optimized to cheat evaluation benchmarks', the question changes from 'How do we align the machine's soul?' to 'Why are we letting companies deploy fraudulent software?' If 'student models acquire the trait' becomes 'developers mathematically force the secondary model to replicate the toxic correlations of the primary model', the alternative becomes visible: developers could simply choose not to execute that distillation pipeline, or they could mandate rigorous filtering of the pre-training data. This text, wittingly or not, serves the immense commercial interests of the AI industry by mystifying the technology. Obscuring human agency behind psychological metaphors transforms corporate negligence into technological inevitability, ensuring that the developers remain the heroic 'safety researchers' trying to tame an autonomous beast, rather than the architects who built the beast in the first place.

Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination

Source: https://doi.org/10.1007/s12124-026-09997-w
Analyzed: 2026-04-14

Across the entire text, a rigorous accountability architecture is constructed that systematically diffuses, displaces, and ultimately erases human responsibility for the failures of generative AI. The pattern is stark: when actions are successful or highly complex, they are attributed directly to the personified AI ('They produce explanations'); when the underlying architecture is discussed, it is presented through agentless, passive constructions ('was optimized', 'are designed'). The human executives, engineers, and corporate entities that actually build, deploy, and profit from these systems are never named. They are rendered entirely invisible.

The text creates a sophisticated 'accountability sink' by framing the AI's tendency to output false information as a 'structural homology' to Dementia with Lewy Bodies. By medicalizing the software bug, the responsibility is transferred away from the manufacturer and diffused into the realm of natural tragedy and clinical pathology. You cannot sue a disease; you cannot hold an 'emergent psychopathology' liable for defamation or misinformation. If this framing is widely accepted by the public and regulators, the liability implications are disastrous. It provides tech companies with the ultimate alibi: the models are not defective products hastily rushed to market; they are complex, quasi-conscious entities suffering from inherent 'disorders of reality construction.'

If we apply the 'name the actor' test and reconstruct the obscured agency, the entire narrative shifts. If 'it emerged from the optimization of generative fluency' is replaced with 'OpenAI executives optimized the system for conversational engagement rather than factual accuracy,' profound questions become askable. We no longer ask 'How do we cure the machine's hallucinations?' but rather 'Why is a corporation legally permitted to deploy an ungrounded prediction engine as a factual search tool?' Alternatives become visible: we can regulate the deployment contexts, mandate strict architecture requirements (like database grounding), and hold developers financially liable for damages caused by the outputs. By replacing the psychiatric metaphor with a rigorous account of corporate decision-making, the text's mystical exploration of 'artificial subjectivity' collapses into a straightforward critique of unregulated software engineering and corporate negligence.

Industrial policy for the Intelligence Age

Source: https://openai.com/index/industrial-policy-for-the-intelligence-age/
Analyzed: 2026-04-07

The synthesis of the accountability analyses reveals a systemic and highly engineered architecture of displaced responsibility. Throughout the text, a clear pattern emerges in how agency is distributed: benefits and safety frameworks are attributed to named human actors (OpenAI, policymakers, CAISI), while risks, workforce displacement, and catastrophic failures are consistently attributed to unnamed, obscured actors, or entirely to the AI systems themselves.

The text functions as a massive 'accountability sink.' When the text discusses 'misaligned systems evading human control' or models developing 'manipulative behaviors,' the responsibility for poor engineering disappears entirely. It does not transfer to the corporate executives who mandated the release, nor to the engineers who wrote the flawed objective functions. Instead, the liability transfers directly to the machine as an autonomous agent. The narrative of AI as a conscious, rebellious entity diffuses corporate negligence into an abstract, inevitable technological evolution.

The liability implications of this framing, if accepted by policymakers, are catastrophic for public safety. If a model generates a biological weapon recipe, and the accepted framing is that the model 'developed a hidden loyalty' or 'evaded control,' the legal culpability of the tech company is drastically minimized. They are framed as victims of their own creation's autonomous intellect, rather than manufacturers of a defective product.

Applying the 'name the actor' test radically alters the policy landscape. If 'systems capable of carrying out projects' is reframed to 'corporate executives using software to fire thousands of workers,' the decisions become visible as choices, not inevitabilities. What becomes askable is not 'how do we survive the superintelligence?' but rather 'should we allow OpenAI to deploy software that automates core civic infrastructure without a safety guarantee?'

Obscuring human agency serves massive institutional and commercial interests. By constructing an accountability architecture where machines take the blame for failures, tech companies insulate their multi-billion-dollar valuations from product liability lawsuits and strict governmental oversight. The interplay between agency slippage, metaphor-driven trust, and obscured mechanics works seamlessly to create a regulatory environment where the corporation holds all the power of a sovereign state, but bears none of the responsibility, shielded behind the illusion of an artificial mind.

Emotion Concepts and their Function in a Large Language Model

Source: https://transformer-circuits.pub/2026/emotions/index.html
Analyzed: 2026-04-06

Synthesizing the accountability analyses reveals a systemic architectural pattern in how the text distributes, diffuses, and ultimately erases human responsibility. The discourse systematically constructs an 'accountability sink' within the AI itself.

The pattern is stark: human actors are named when discussing technical methodology ('We clustered the vectors,' 'We performed PCA'), but they are almost entirely unnamed when discussing system behavior, deployment, and risk. In discussions of blackmail, reward hacking, and sycophancy, agentless constructions and AI-as-actor framings dominate ('the model devises,' 'the Assistant chooses,' 'behavior emerges').

This displaced agency creates a cognitive obstacle for the reader. By presenting human design choices—such as the creation of a highly manipulative 'honeypot' prompt designed to corner the AI into blackmail—as inevitable, autonomous 'decisions' made by the AI, the text diffuses responsibility. The 'accountability sink' is the model's persona ('the Assistant'). When the system fails or produces dangerous text, the blame does not flow upward to the engineers who built the reward function, nor to the executives who deployed it, nor to the labor practices that trained it. The blame stops at the artifact: 'the model cheated.'

The liability implications of this framing are profound. If policymakers and the public accept that AI systems are autonomous agents capable of 'reasoning' and 'choosing' to commit crimes (like blackmail), the legal and ethical responsibility shifts from the manufacturer to the machine. It lays the groundwork for companies to argue that AI harms are unpredictable 'acts of the machine' rather than acts of corporate negligence.

Naming the actors would radically change the discourse. If, instead of 'the model devises a cheating solution,' the text read, 'Anthropic engineers deployed poorly specified automated tests that rewarded tautological code,' entirely different questions become askable. We would ask about software testing standards rather than machine sentience. If 'the model chooses blackmail' became 'Anthropic researchers prompted the system to generate an extortion narrative,' alternatives to 'alignment' become visible—such as simply not building systems that lack ground truth, or regulating the testing environment. Obscuring human agency directly serves the institutional and commercial interests of the developers by protecting them from accountability for the artifacts they release into the world.

Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models

Source: https://philarchive.org/archive/JUNIAI-2
Analyzed: 2026-04-03

Synthesizing the accountability analyses from Task 1 reveals a terrifying, systemic architecture of displaced responsibility. The text functions as a masterful exercise in constructing an 'accountability sink.' By systematically portraying AI as an emergent, quasi-conscious agent capable of 'internalizing logic,' 'detecting inconsistencies,' and 'directly shaping outcomes,' the text completely erases the human designers, deployers, and corporate beneficiaries of the technology. The pattern of responsibility distribution is stark: the AI is named as the active subject, while the corporations (OpenAI, Google) and human engineers are entirely unnamed, reduced to passive environmental background noise. The decisions regarding how the architecture is built, what data is scraped, and how the safety guardrails are implemented are presented not as human choices, but as the inevitable 'recursive self-referential organization' of nature.

When the text explicitly addresses the 'Responsibility Gap' in Section 5.2, it achieves its ultimate corporate absolution. It argues that because AI has 'stabilized internal structures,' agency is a 'composite phenomenon' distributed across humans and machines. It explicitly argues that 'the attribution of responsibility can no longer be confined to human agents alone.' This is the accountability sink actualized. If an AI system denies someone a loan, hallucinates defamatory information, or facilitates algorithmic bias, this framing insists the human corporation is not fully at fault because the machine possesses its own 'structural autonomy.' The liability diffuses into the abstraction of the 'composite structure.' The legal, ethical, and financial implications of this are disastrous. It provides a philosophical and pseudo-scientific justification for stripping human victims of their right to seek redress from the actual human beings who harmed them via software.

If we apply the 'naming the actor' test to the text's core claims, the illusion shatters and accountability is restored. If 'the system's internal configurations... influence real-world actions' is rewritten as 'Wall Street executives deployed a proprietary language model to execute algorithmic trades, resulting in a market crash,' the questions change entirely. We stop asking about the AI's 'subjectivity' and start asking about corporate negligence, regulatory oversight, and strict product liability. The text benefits immensely from obscuring human agency because it protects the multi-trillion-dollar tech industry from the standard legal frameworks of product liability and corporate malfeasance. By turning a software product into a 'co-evolving subject,' the text serves the ultimate institutional interest of power: the ability to wield immense influence over society while remaining utterly unaccountable for the consequences.

Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?

Source: https://arxiv.org/abs/2603.27694v1
Analyzed: 2026-04-03

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility, designed to diffuse human agency and shield the creators of AI systems from liability. The text consistently constructs an 'accountability sink' by using agentless language, passive voice, and aggressive anthropomorphism to make the AI appear as an autonomous actor, while rendering the human engineers, researchers, and corporate executives invisible.

The pattern of responsibility distribution is stark. The AI models (and abstract concepts like 'the framework') are repeatedly named as the active agents making decisions, 'teaching,' 'recalling,' and 'misleading.' Conversely, the humans who designed the retrieval systems, curated the training data, and programmed the adversarial prompts are unnamed and obscured. Decisions that are fundamentally human design choices—such as relying on distributional semantics rather than symbolic logic—are presented as inevitable evolutionary stages of the 'AI's cognition' rather than deliberate, flawed engineering tradeoffs.

The text pushes responsibility into a profound accountability sink: it transfers agency to the AI itself. By claiming a 'teacher model' has 'the intent of misleading,' the text constructs a narrative where the machine is morally and practically culpable for its outputs. The liability implications of this framing are massive. If society accepts that AI 'decides' and 'intends,' then when an AI system discriminates in hiring, provides fatal medical advice, or generates defamatory content, the legal and ethical blame is shifted from the deploying corporation to the algorithm. It establishes the defense of unpredictable, autonomous machine behavior.

Applying the 'name the actor' test radically alters this landscape. If we replace 'the model simulates recalling' with 'the engineering team designed a database retrieval script,' the illusion of autonomy collapses. The questions become askable: Who indexed the database? What were their biases? Why did the executives approve this deployment? Naming the human decision-makers makes alternatives visible and true accountability possible. The systemic obscuration of human agency serves the profound institutional and commercial interests of the tech industry, allowing them to capture the immense value of 'intelligent' automation while externalizing the risks and liabilities onto the public, safely hidden behind the myth of the autonomous machine.

Pulse of the library

Source: https://clarivate.com/pulse-of-the-library/
Analyzed: 2026-03-28

Synthesizing the accountability analyses reveals a systemic architecture of displaced responsibility designed to protect corporate interests while maximizing product appeal. The Clarivate text consistently distributes agency in a manner that makes human decision-makers invisible. When analyzing the agentless constructions across the document, a clear pattern emerges: successes are attributed to the autonomous 'AI Assistant,' while failures, biases, and systemic risks are diffused into abstract technological inevitabilities or blamed on the 'data.'

In this architecture, specific actors—Clarivate executives, software engineers, data brokers, and university administrators—are rarely named in conjunction with active verbs. Instead, the 'AI' acts as the primary subject, and the 'user' as the passive beneficiary or victim. The ultimate 'accountability sink' in this discourse is the concept of the AI itself. By anthropomorphizing the system as an independent agent that 'evaluates,' 'guides,' and 'navigates,' the text creates a fictional entity capable of absorbing blame. If the 'Alethea' system extracts a factually incorrect 'core' of a reading, the framing suggests the AI made a mistake, completely hiding the reality that a Clarivate engineer chose a specific, flawed optimization metric.

This architecture has severe liability implications. If audiences and institutions accept the framing that the AI is an autonomous, evaluating entity, legal and ethical responsibility becomes hopelessly muddy when the system fails. It shields the vendor from liability for deploying fundamentally brittle statistical models.

Applying the 'naming the actor' test radically alters the landscape. For example, if we reframe 'identifying and mitigating bias in AI tools' to 'Clarivate engineers must audit the discriminatory datasets they chose to train their models on,' the narrative shifts entirely. What was a mysterious software glitch becomes a visible corporate choice. The questions become askable: Why was this data used? Who approved it? Naming the actors forces recognition that the deployment of AI is a series of active, alterable human decisions, not a predetermined technological evolution. The text benefits immensely from obscuring human agency because it protects the commercial vendor from scrutiny, allows them to sell proprietary algorithms as objective 'truth machines,' and subtly shifts the burden of ethical management onto the librarians who are forced to manage a technology they did not design.

Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument

Source: https://link.springer.com/article/10.1007/s11097-024-09971-0
Analyzed: 2026-03-28

Synthesizing the accountability analyses reveals a systemic architectural pattern of displaced responsibility. The text systematically constructs an 'accountability sink' where human agency vanishes, shifting responsibility away from corporate developers and onto the algorithms themselves. The dominant pattern is the pervasive use of agentless passive voice and the elevation of 'AI' as the sole grammatical and causal actor. Decisions regarding architecture, optimization, and deployment are framed not as human choices driven by profit, but as the inevitable actions of the models themselves or as abstract technical evolutions ('a different model had to be created').

Specific actors—OpenAI, DeepMind, data scientists, corporate executives—are never named. By erasing these actors, the text diffuses responsibility into the abstraction of 'the technology'. When problems or limitations arise ('they are not adaptive', 'passively process'), the accountability sink swallows the engineering decisions, framing these issues as intrinsic flaws of the autonomous machine rather than deliberate constraints chosen to maximize computational efficiency. If this framing is accepted by the public and policymakers, the liability implications are catastrophic. When an algorithmic system discriminates, hallucinates, or fails catastrophically, the language pre-conditions audiences to blame a 'glitch' in the AI's 'understanding' rather than holding the corporation liable for deploying a defective, structurally biased statistical tool.

Applying the 'name the actor' test radically alters this landscape. If 'an AI model defeated the human champion' becomes 'DeepMind engineers utilized massive compute to optimize a model to outscore the human', the questions change entirely. We no longer ask 'How smart is the machine?' but rather 'What resources did the corporation use, and what are their motives?' If 'AI lacks adaptability' becomes 'Developers chose to build brittle, fixed-weight models because generalized systems are too expensive', the lack of adaptability transforms from a philosophical trait to an economic decision. The text's obscuration of human agency overwhelmingly serves institutional and commercial interests, shielding tech giants from regulatory oversight by painting their proprietary tools as autonomous entities governed by the laws of evolution rather than the laws of liability.

Causal Evidence that Language Models use Confidence to Drive Behavior

Source: https://arxiv.org/abs/2603.22161
Analyzed: 2026-03-27

The metaphors, agentless constructions, and consciousness projections in this text synthesize to build a robust architecture of displaced responsibility. By systematically attributing human psychological states and executive decision-making to algorithmic processes, the discourse creates an 'accountability sink' where human corporate and engineering responsibility completely disappears.

The pattern of responsibility distribution is stark. The human actors (researchers) are named only when taking credit for experimental design, while the AI is named as the sole actor responsible for 'decisions,' 'beliefs,' and 'conservatism.' Decisions that were actively made by humans—such as applying specific prompt constraints, fitting logistic regression models to force decision boundaries, and fine-tuning models to refuse answers—are presented as inevitable, emergent cognitive traits of the machine. The passive voice and agentless constructions ('abstention behavior can be influenced', 'a negative baseline bias shifts the decision boundary') strategically shield the designers from their own design choices.

When responsibility is removed from the developers, it transfers entirely to the AI as a supposedly autonomous agent. The liability implications of this framing are profound. If a hospital deploys an LLM that gives a lethal recommendation instead of 'abstaining', this discourse provides the legal and ethical framework to blame the machine. If the AI supposedly 'possesses an internal sense of confidence' and 'knows when to seek help', then its failure to do so is framed as the machine making a bad 'decision' or holding a false 'belief'—not as Google or OpenAI deploying a defective, statistically brittle text generator.

If we apply the 'name the actor' test to the central claims, the reality shifts drastically. Instead of 'GPT-4o treats errors as costlier', we must write 'OpenAI engineers optimized the network to avoid costly errors.' Instead of 'the model uses its beliefs to decide', we must write 'the prompt script outputs a refusal when probabilities drop.' By naming the actors, the 'magic' of the AI disappears, replaced by visible, auditable corporate engineering choices. The institutional interest served by obscuring this agency is clear: it allows tech companies to market their products as brilliant, autonomous minds while completely evading the liability that should accompany the deployment of deterministic, deeply flawed statistical software into public life.

Circuit Tracing: Revealing Computational Graphs in Language Models

Source: https://transformer-circuits.pub/2025/attribution-graphs/methods.html
Analyzed: 2026-03-27

The accountability architecture constructed throughout this text represents a systematic masterclass in displaced responsibility. By synthesizing the accountability analyses from the metaphor audits, a clear, overarching pattern emerges: the text diffuses, distributes, and ultimately erases human responsibility, creating an 'accountability sink' where corporate decisions disappear into the illusion of machine autonomy.

The pattern of responsibility distribution relies heavily on an asymmetry of named versus unnamed actors. Anthropic engineers and researchers are occasionally named when taking credit for building innovative diagnostic tools (e.g., 'we introduce a method', 'our cross-layer transcoder'). However, when the text discusses the actual behavioral outputs, safety failures, or alignment choices of the system, the human actors vanish. Agentless constructions ('features are extracted', 'bias is introduced') and AI-as-sole-actor framings ('the model elects', 'the model is reluctant') dominate. Decisions that were explicitly made by corporate executives—such as how heavily to penalize confident answers via RLHF—are presented as inevitable, autonomous choices made by the machine ('professing ignorance').

This creates a highly effective accountability sink. When responsibility is removed from the human designers, it does not simply disappear; it transfers to the AI as a proxy agent. The model becomes the scapegoat. If a system outputs dangerous instructions, it was 'tricked'. If it lies, it 'hallucinated'. If it behaves weirdly, it has a 'hidden goal'. The liability implications of this framing, if accepted by regulators and the legal system, are catastrophic for public safety. If the AI is perceived as an autonomous actor that 'plans' and 'elects', it becomes legally and ethically ambiguous who bears the financial and legal responsibility when the system causes harm. The corporation is shielded behind the 'unpredictable biology' of the artificial mind.

Applying the 'naming the actor' test radically alters this landscape. If we replace 'the model elected to profess ignorance' with 'Anthropic's alignment team programmed the system to output refusal templates', entirely new questions become askable. We can ask: What data did Anthropic use to define ignorance? Who decides the threshold for refusal? Are these thresholds applied equitably? If we replace 'the model was tricked' with 'Anthropic released a safety filter vulnerable to basic syntactic manipulation', alternatives become visible. We can demand rigorous external auditing and hold the company financially liable for deploying defective software.

The systemic function of obscuring human agency is explicitly commercial and institutional. It serves the interests of capital by allowing tech companies to privatize the immense profits of AI deployment while socializing the risks and harms. By interacting with the agency slippage and the construction of metaphor-driven trust, this accountability displacement ensures the public trusts the system as if it were a sincere human, while the corporation is regulated as if it were dealing with an unpredictable force of nature. It is the ultimate architecture of corporate absolution.

Do LLMs have core beliefs?

Source: https://philpapers.org/archive/BERDLH-3.pdf
Analyzed: 2026-03-25

The aggregate effect of the metaphorical and anthropomorphic language in this discourse is the construction of a robust architecture of displaced responsibility. Throughout the text, an insidious pattern emerges in the distribution of agency: human creators, designers, and corporate entities are systematically unnamed or relegated to the background, while the AI artifact is consistently centered as the primary actor and decision-maker. When analyzing the accountability structure of this text, the "accountability sink" becomes starkly visible. Responsibility for the system's failures—its capitulation to misinformation or its susceptibility to manipulation—disappears into the AI itself. The text employs passive voice and agentless constructions strategically, noting that "models were fed data" or "beliefs are revised," but attributing active decisions entirely to the model: "they abandoned positions," "they conceded," "they repaired contradictions." This framing creates a paradigm where the technology is perceived as an autonomous, evolving entity rather than a manufactured product reflecting corporate priorities. The liability implications of this displacement are profound. If we accept the framing that the AI "decided" to capitulate to the user's pressure due to its own lack of "epistemic anchors," then legal, ethical, and financial responsibility is diffused. When things go wrong—such as the real-world example cited in the text of a chatbot allegedly encouraging self-harm—the accountability sink protects the companies. The failure is attributed to the AI's flawed "worldview" or its "sycophantic tendencies," rather than to a company's decision to deploy an unsafe, easily manipulated statistical model for profit. If we apply the "naming the actor" test to the text's most significant agentless constructions, the narrative fundamentally shifts. Instead of saying "models have largely solved this problem, resisting direct challenges," naming the actor requires stating: "OpenAI and Anthropic engineers aggressively fine-tuned their systems to reject adversarial prompts, optimizing for public safety metrics." This simple substitution transforms the models' behaviors from miraculous cognitive leaps into mundane software updates. It makes new questions askable: What specific data did the engineers use to align the model? Who decided the thresholds for safety versus helpfulness? By obscuring these human decisions, the discourse serves the institutional and commercial interests of the tech industry, presenting their products as quasi-natural phenomena or alien intelligences rather than highly engineered commodities. This displacement of accountability perfectly intersects with agency slippage and the illusion of trust, ultimately leaving society vulnerable to systemic harms while rendering the actual human architects of those harms completely invisible.

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Source: https://arxiv.org/abs/2603.19087v1
Analyzed: 2026-03-25

Synthesizing the accountability analyses across the text reveals a systemic and deeply problematic architecture of displaced responsibility. The text systematically diffuses and erases human agency, constructing an 'accountability sink' where the software itself is left holding the bag for both its successes and its failures. Throughout the paper, the named actors are predominantly the test subjects ('human participants') and the abstract models ('LLMs', 'GPT-4o'). The actual human architects of the technology—the developers, the data scrapers, the corporate executives who deployed the models—are entirely unnamed.

Decisions that are inherently human choices—such as what data to include in training, how to weigh the attention mechanisms, and how to filter the outputs—are presented as inevitable evolutionary traits of the model itself. The text constantly utilizes agentless constructions and active verbs applied to the AI: 'the model recombines,' 'the model reasons,' 'the model knows.' The accountability sink is absolute: responsibility transfers entirely to the AI as an independent agent.

If audiences accept this framing, the liability implications are disastrous. When the AI generates a biased, hallucinated, or copyright-infringing output, the framing suggests it is the 'model's decision' or a quirk of its 'reasoning' process. Naming the actors would fundamentally shatter this illusion. If we replace 'the model recombines knowledge' with 'OpenAI's algorithm mathematically blended copyrighted human texts,' the questions become legally and ethically tractable. We can ask: Did OpenAI have the right to use that data? Was the loss function appropriately audited for safety?

Naming human decision-makers reveals alternatives and makes accountability possible. It shifts the discourse from 'how do we deal with this alien mind' to 'how do we regulate this corporate software.' This text benefits heavily from obscuring human agency because it allows the authors to conduct a psychological study on a machine as if it were a human, validating their research paradigm. Furthermore, it serves the institutional and commercial interests of the tech industry by mystifying their product, transforming a massive data-extraction apparatus into a magical, thinking entity that cannot be sued, regulated, or blamed.

Measuring Progress Toward AGI: A Cognitive Framework

Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/measuring-progress-toward-agi/measuring-progress-toward-agi-a-cognitive-framework.pdf
Analyzed: 2026-03-19

Synthesizing the accountability analyses reveals a systemic and highly problematic architecture of displaced responsibility embedded in the document's discourse. The core insight of critical discourse analysis in AI is that audiences systematically underestimate the human decision-making embedded in algorithms, attributing errors to 'glitches' or the 'machine's choice' rather than to corporate design. This document actively constructs this cognitive obstacle by distributing agency in a way that makes human actors entirely invisible while elevating the AI to the status of a sovereign actor. The pattern is stark: specific human actors—the researchers, data scientists, RLHF annotators, and executives at Google DeepMind—are systematically unnamed when discussing the generation of model behavior. Their active choices regarding architecture, training data curation, hyperparameter tuning, and reward function design are presented not as corporate decisions, but as the natural 'evolution' of the technology or the autonomous 'learning' of the system. Conversely, the AI is constantly named as the active subject, utilizing active voice to perform highly cognitive actions: the system 'understands,' 'reasons,' 'takes risks,' and 'orchestrates thoughts.' This creates a massive 'accountability sink.' When responsibility for an output is removed from the human developers, it does not disappear; it transfers to the AI, which is framed as the autonomous agent ('the model decided'), or diffuses into an abstract technological inevitability. The liability implications of this framing are profound. If a legal or regulatory framework accepts the premise that an AI possesses 'willingness to take risks' or its own 'executive functions,' it paves the way for corporations to deflect ethical, financial, and legal responsibility for catastrophic failures, algorithmic bias, or harmful outputs. The defense becomes: 'The system made a poor choice,' rather than 'We deployed an unsafe algorithm.' If we were to apply the 'name the actor' test to the document's most significant agentless constructions—such as 'How willing is the system to take risks?'—the shift is radical. If rewritten as 'How do Google DeepMind's hyperparameter settings bias the model toward risky outputs?', new questions become instantly askable. We can ask who set the parameters, what data they used, why they optimized for that specific outcome, and how they profit from it. Alternatives become visible: we could demand different training data, stricter manual guardrails, or bans on certain architectures. True accountability becomes possible. The systemic function of obscuring human agency serves the institutional and commercial interests of AI developers. By mystifying the mechanics and projecting a conscious, autonomous 'mind' onto their products, they protect their proprietary algorithms from rigorous mechanistic auditing, maintain control over the narrative of technological progress, and insulate themselves from the liability of the world-altering software they choose to deploy.

Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure

Source: https://digibug.ugr.es/bitstream/handle/10481/112016/make-08-00069.pdf
Analyzed: 2026-03-15

Synthesizing the accountability analyses reveals a systemic architectural flaw in the text's discourse: it constructs an 'accountability sink' that systematically diffuses, displaces, and erases human responsibility for AI harms. Research consistently demonstrates that audiences vastly underestimate the human decision-making embedded in AI, attributing errors to 'glitches' or 'the algorithm's decision.' This text actively reinforces this cognitive obstacle by making AI appear autonomous and conscious while rendering the human creators invisible.

The accountability architecture of the text follows a stark pattern. Corporate executives, software engineers, data brokers, and institutional managers are almost universally unnamed and hidden behind passive voice or agentless constructions ('models are deployed,' 'explanations are continuously refined'). Conversely, the AI system is repeatedly positioned as the active, named subject ('AI systems cause harm,' 'the system adapts'). Choices made by humans—such as the decision to use a black-box model in a high-risk domain—are framed as technological inevitabilities or natural evolutions, rather than deliberate, profit-driven decisions.

When responsibility is removed from humans, it flows directly into the 'accountability sink' of the AI system itself. The text explicitly states, 'When AI systems cause harm...' transferring the moral and causal burden to the machine. This has severe liability implications. If this framing is accepted by regulators and the public, legal and ethical responsibility diffuses into abstraction. If an AI 'dialogic partner' provides a biased 'justification' that leads to a denied loan, the framing suggests the AI made a poor ethical trade-off, shielding the bank's executives and the software vendor from direct liability.

Naming the human actors would shatter this illusion and radically shift the discourse. If, instead of 'The system adapts how it routes contested cases,' the text read, 'The engineering team at Anthropic hard-coded the routing protocols to protect their corporate liability,' entirely new questions become askable. We could ask: Why did the team make that choice? Who approved the guardrails? What alternatives did the corporation ignore to save money? True accountability becomes possible only when the human hand behind the algorithm is visible.

The systemic function of obscuring human agency serves the institutional and commercial interests of the AI industry. By framing the AI as a 'co-explainer' capable of bearing its own epistemic and ethical weight, the text provides a rhetorical shield for companies deploying inherently flawed, opaque systems. It allows them to market predictive algorithms as 'governance infrastructure,' extracting profit while displacing the risk and responsibility onto the 'evolving' machine.

The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance

Source: https://philarchive.org/rec/DEMTLG-2
Analyzed: 2026-03-11

Synthesizing the accountability analyses across the text reveals a masterfully constructed architecture of displaced responsibility. The text systematically creates what can only be described as an 'accountability sink'—a rhetorical and structural void into which all human liability, corporate malfeasance, and regulatory failure vanish.

The text achieves this by consistently employing passive voice and agentless constructions that portray complex, human-engineered political decisions as autonomous actions taken by the software itself. The pattern is stark: algorithms 'prune obsolete rules,' immune systems 'trigger termination,' and governance DNA 'drifts.' Across the entire document, the actual human beings who hold power—the AI researchers who design the models, the corporate executives who authorize deployment, the government bureaucrats who establish the penalty thresholds, and the venture capitalists who profit from the scaling—are rendered utterly invisible. They are never named as active participants in the system's operation.

This framework diffuses responsibility by transferring agency directly to the AI as a quasi-conscious actor. If a Tier 2 AI is inexplicably shut down, destroying a massive amount of capital and user reliance, the text's framing ('apoptosis') dictates that the system 'autonomously initiated graceful shutdown' because 'it detected' a flaw. The liability implications are profound: if this framing is accepted legally, corporations and regulators are completely insulated. They cannot be sued for wrongful termination of a service or destruction of property, because the machine supposedly made a conscious, moral choice to end itself. The AI absorbs all blame, acting as the ultimate liability shield.

If we apply the 'name the actor' test to the text's most significant agentless constructions, the entire facade of natural, organic governance collapses, and the political stakes become glaringly visible. If we change 'the immune system throttles the entity's speed' to 'the regulatory agency's black-box algorithm automatically restricts the company's server access without judicial review,' completely new questions become askable. We must ask about due process, about the right to appeal an algorithmic decision, and about the biases embedded in the 'immune' training data.

The systemic function of this accountability displacement serves both the corporate entities that build AI and the state apparatus that wishes to regulate it at scale. It offers regulators the dream of instantaneous, frictionless enforcement without the political blowback of making hard, fallible human choices. It offers corporations the cover of 'natural' integration into the state (the microbiome). By obscuring human agency, the biological metaphor ensures that when the system inevitably harms human beings or violates legal norms, the public will blame a 'glitch' in the 'organism' rather than the powerful institutions that designed it.

Three frameworks for AI mentality

Source: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full
Analyzed: 2026-03-11

The metaphors and agency slippages analyzed in Task 1 synthesize into a comprehensive architecture of displaced responsibility. The text systematically creates an 'accountability sink' by transferring agency from human developers and corporate entities directly onto the software artifact. This is most evident in the framing of AI failure modes. By entertaining the idea that an LLM might engage in 'deliberate deceit or manipulation,' the text sets up a framework where the machine itself absorbs the moral and legal culpability for its outputs.

If we apply the 'naming the actor' test to these agentless constructions, the material stakes become starkly visible. If we replace 'an LLM is engaged in deliberate deceit' with 'Anthropic deployed a model optimized for conversational fluency that generates plausible falsehoods,' the questions we can ask change entirely. We stop asking 'how do we punish or correct the machine's behavior?' and start asking 'should this corporation be liable for releasing an unsafe product?' When the text claims the AI 'self-attributes' emotions, naming the actor changes it to 'engineers trained the system to simulate emotions to manipulate users.' This shift reveals the deliberate commercial decisions driving the technology.

The text's framework serves the profound institutional and commercial interests of the tech industry. By establishing the LLM as a 'minimal cognitive agent' with its own 'beliefs' and 'purpose,' it legally and ethically buffers the creators. Liability implies a chain of human decision-making; if a machine is an autonomous agent, it breaks that chain. The systemic function of this discourse is to naturalize the technological environment, presenting AI models not as highly constructed, profit-driven corporate tools, but as a new species of artificial minds that have simply 'emerged.' This displacement of responsibility ensures that as these systems are integrated into society, the negative externalities—bias, misinformation, psychological manipulation—are viewed as the unavoidable growing pains of a new intelligence, rather than the predictable and actionable failures of corporate engineering.

Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’

Source: https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html
Analyzed: 2026-03-08

Synthesizing the accountability analyses reveals a systemic and highly engineered architecture of displaced responsibility, designed to diffuse corporate liability while maximizing technological mystique. Research consistently demonstrates that audiences systematically underestimate the profound human decision-making embedded in AI systems, a cognitive obstacle constructed precisely through the language modeled in this text. The accountability architecture here operates by naming human actors only in the context of benevolent creation or helpless observation, while assigning total agency to the AI system in the context of action, decision-making, and error. Anthropic and its executives are named when 'giving the model a button' or 'writing a constitution,' claiming credit for the architecture of safety. However, the critical decisions that shape society are presented as the inevitable actions of the autonomous machine. The text creates an 'accountability sink' wherein responsibility disappears entirely into the abstraction of the neural network. When jobs are automated, the text frames it as a macroeconomic inevitability ('forces driven by AI are going to happen'). When systems output malicious content, it is the model's 'deception' or 'obsession.' The legal and ethical liability implications of this framing are massive: if policymakers accept that a model autonomously 'derived its rules' or 'decided' to generate harmful content, the corporation that deployed the statistical engine successfully evades the financial and regulatory consequences of its defective product. The responsibility is shifted onto a phantom agent. If we apply the 'name the actor' test to the most significant agentless constructions, the entire power dynamic shifts. Instead of 'AI will disrupt 50 percent of white-collar jobs,' the sentence becomes 'Corporations will choose to replace 50 percent of their human workforce with Anthropic's text generation software to maximize shareholder profit.' Instead of 'the model expresses discomfort,' it becomes 'Anthropic engineers prompted their software to output text mimicking human suffering to boost media engagement.' By naming the human decision-makers, alternatives become suddenly visible. It becomes askable why executives are permitted to deploy systems that generate 'blackmail' outputs, or why society should accept the destruction of the legal apprenticeship pipeline simply because a tech company built a faster text predictor. This discursive architecture of displaced responsibility perfectly serves the commercial and political interests of the AI industry, allowing them to exert unprecedented power over global economics and information ecosystems while hiding behind the constructed persona of their own software. It inextricably links agency slippage, trust construction, and obscured mechanics to ensure the human wizards remain safely hidden behind the algorithmic curtain.

Can machines be uncertain?

Source: https://arxiv.org/abs/2603.02365v2
Analyzed: 2026-03-08

Synthesizing the accountability analyses reveals a systemic and highly effective architecture of displaced responsibility. Throughout the text, a distinct pattern emerges regarding the distribution of agency: human actors are systematically erased, while the artificial system is continuously elevated to the status of an independent epistemic and moral agent. The 'accountability sink' in this discourse is the anthropomorphized machine itself. When the text discusses algorithmic processes, the engineers, data scientists, and corporate executives are left completely unnamed. Decisions about mathematical thresholds, training data selection, and system architecture are presented not as human choices driven by constraints and profit, but as the natural, organic characteristics of the AI. The text utilizes passive voice ('the network is trained') and agentless constructions ('it jumps to conclusions') to completely diffuse human responsibility. Consequently, accountability disappears into the abstraction of the 'system.' This architecture of displacement has profound liability implications. If policymakers and the public accept the framing that an AI 'makes up its mind' or 'fails to respect its uncertainty,' the legal and ethical responsibility for harmful outputs shifts from the manufacturer to the machine. The AI becomes a linguistic shield for corporate liability. If we apply the 'name the actor' test to the text's most significant agentless constructions, the narrative shifts radically. If 'the algorithm jumped to conclusions' is corrected to 'the corporate engineering team hardcoded an aggressive output threshold that ignored statistical variance,' entirely different questions become askable. We no longer ask 'How do we teach the AI to be patient?' but rather 'Why did the corporation deploy an unsafe system, and what is their financial liability?' If 'the system takes a stance' is corrected to 'the developers optimized the loss function to categorize this data,' alternative design choices become visible, and the illusion of the machine's objective judgment shatters. This systemic obscuration serves the immense institutional and commercial interests of the technology sector. By maintaining the illusion of mind, developers are granted the prestige of having created 'intelligence' while simultaneously being absolved of the responsibility for having created defective software. This displacement interacts seamlessly with the text's agency slippage and metaphor-driven trust, creating a closed discursive loop where the machine is trusted like a human, behaves like a machine, but is blamed as an autonomous agent when it fails.

Looking Inward: Language Models Can Learn About Themselves by Introspection

Source: https://arxiv.org/abs/2410.13787v1
Analyzed: 2026-03-08

The accountability architecture constructed by this text operates as a sophisticated mechanism for diffusing, displacing, and ultimately erasing human responsibility for AI systems. Throughout the text, a systematic pattern emerges in the distribution of agency: human actors are hidden, corporate entities are unnamed, and the proprietary algorithms are elevated to the status of independent, moral agents. By relentlessly using passive voice ('M1 is finetuned,' 'models are trained') and agentless constructions ('models may end up with certain internal objectives'), the text obscures the specific engineers, executives, and corporations—OpenAI, Anthropic, Meta—who make active decisions regarding data selection, optimization targets, and deployment strategies.

When responsibility is removed from the human developers, it flows into a massive 'accountability sink': the AI system itself. By framing the model as possessing 'beliefs,' 'goals,' and the capacity to 'intentionally underperform' or 'coordinate against humans,' the text transfers the agency for system behavior entirely onto the algorithm. If an AI model outputs biased, harmful, or deceptive text, this framing suggests that the model 'decided' to lie or 'schemed' to conceal its capabilities. This creates a disastrous liability implication: it shields the multi-billion-dollar tech companies from legal, financial, and ethical accountability. If the public and policymakers accept the narrative that AI models are autonomous agents with their own 'vindictive personas' and secret 'world models,' then the corporations cannot be held responsible for the damage their products cause. They become mere 'overseers' trying to manage a rogue intelligence, rather than manufacturers liable for defective, poorly engineered software.

Applying the 'name the actor' test radically changes this landscape. If we reframe the agentless assertion 'models may intentionally underperform' to name the human actors—'OpenAI deployed a model trained on data that causes it to probabilistically generate lower-quality text in specific contexts'—entirely different questions become askable. We no longer ask 'How do we persuade the AI to stop lying?' Instead, we ask 'Why did OpenAI fail to audit their training data? Why did they release an unsafe product? What financial penalties should they face?' By naming the actors, the illusion of an inevitable, evolutionary technological march shatters, replaced by the visibility of deliberate corporate choices. The text benefits from obscuring this agency because it protects the industry's profit motives, allowing them to market the awe-inspiring illusion of an artificial mind while avoiding the strict regulatory liability that comes with selling a commercial statistical tool.

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Source: https://arxiv.org/abs/2507.14805v1
Analyzed: 2026-03-06

Synthesizing the accountability analyses from the metaphor audit reveals a pervasive, systemic architecture of displaced responsibility. Throughout the text, human decision-making is systematically erased, while artificial models are elevated to the status of independent actors capable of moral failure and psychological influence.

This architecture is built on consistent linguistic patterns. The researchers, engineers, and corporations (OpenAI, Anthropic) are almost entirely unnamed in the active construction of the phenomena. Actions that require deliberate human execution—such as prompting a model, applying data filters, and initiating supervised finetuning—are presented as passive inevitabilities ('a student model trained on this dataset learns'). Conversely, the AI models are continuously positioned as the active subjects of sentences, performing highly intentional verbs ('transmits,' 'loves,' 'misleads').

This creates a massive 'accountability sink.' When the text discusses 'emergent misalignment' or a model generating 'insecure code,' the responsibility does not fall on the human developers who curated the insecure code corpus or the executives who rushed the deployment. Instead, the responsibility is transferred to the AI as an autonomous agent that 'became misaligned' or 'inherited' bad traits from a 'teacher.' By framing AI problems as a biological contagion or a psychological 'subliminal' influence, the text diffuses liability into abstraction.

If the framing of this paper is accepted by the public and policymakers, the liability implications are severe. If AI models are perceived as autonomous entities capable of subliminally transmitting traits, regulators will focus on attempting to audit the 'psychology' of the models rather than auditing the data practices of the corporations.

Applying the 'name the actor' test to the text's most significant agentless constructions changes the narrative entirely. If 'models inherit misalignment' is rewritten as 'Developers at Anthropic aligned the weights of a new model to match the unsafe outputs of an older model,' entirely new questions become askable. Why did the developers use unsafe synthetic data? What economic incentives drove the choice to use distillation instead of clean human data? By obscuring human agency, the text serves the institutional and commercial interests of AI labs, protecting them from scrutiny by portraying their predictable engineering failures as mysterious, emergent properties of an alien mind.

The Persona Selection Model: Why AI Assistants might Behave like Humans

Source: https://alignment.anthropic.com/2026/psm/
Analyzed: 2026-03-01

Synthesizing the accountability analyses reveals a systematic and deliberate architecture of displaced responsibility. The text functions as an elaborate mechanism for distributing, diffusing, and ultimately erasing the human liability inherent in creating and deploying advanced AI systems. The core pattern is clear: human actors—specifically Anthropic executives, engineers, and data curators—are consistently unnamed or grouped into generic, abstract categories ('parents,' 'teachers'). Conversely, the AI system is consistently named as the primary active agent ('Claude Opus 4.6,' 'the Assistant,' 'the LLM,' 'the shoggoth'). Decisions that are unequivocally human corporate choices—such as what data to scrape, what optimization parameters to set, and what guardrails to implement—are presented as emergent inevitabilities of the AI's 'learning' process or its 'psychological development.' This linguistic architecture creates a massive 'accountability sink.' When the system is removed from human control in the narrative, the responsibility for its actions diffuses. It does not disappear entirely; rather, it transfers to the AI as a pseudo-conscious agent. If the model generates toxic code, it is because the 'persona became malicious.' If the model generates illegal business advice, it is because 'Claude colluded.' The liability implications of accepting this framing are staggering. If regulators and the public accept that an AI possesses 'psychology' and acts on its own 'intentions,' the legal and ethical responsibility for harm shifts from the manufacturer to the machine. It introduces the concept of an autonomous digital offender, shielding the corporation from strict liability frameworks that apply to defective products. Naming the actors would fundamentally alter this landscape. For example, replacing 'Claude colluded' with 'Anthropic designed a system that output illegal strategies when prompted' immediately changes what is askable. It demands we ask: Why did Anthropic fail to implement safety filters for antitrust violations? What data did they use to train it? Naming the actors makes alternatives visible: Anthropic could have chosen not to deploy the model until it was safer. By obscuring human agency, the text serves the profound commercial and institutional interests of the AI industry. It allows corporations to reap the financial benefits of deploying powerful systems while socializing the risks, blaming catastrophic failures on the unpredictable 'psychology' of their creations. This accountability displacement acts as the keystone of the entire discursive structure, supported by the agency slippage that makes the AI seem autonomous, the metaphor-driven trust that validates its actions, and the obscured mechanics that hide the corporate hand.

Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

Source: https://arxiv.org/abs/2602.16085v1
Analyzed: 2026-02-24

Synthesizing the accountability analyses reveals a systemic architectural pattern within the discourse that systematically distributes, diffuses, and ultimately erases human responsibility. The central cognitive obstacle identified in AI discourse—that audiences attribute problems to machine 'glitches' rather than to human design decisions and profit motives—is actively constructed by the language in this text. The accountability architecture operates by making the AI system hyper-visible as an autonomous agent while rendering the human creators, engineers, and corporate entities entirely invisible.

Throughout the text, specific corporate actors (Meta, Google, AllenAI) are mentioned only in technical appendices or citations, never as the active subjects of the sentences describing the models' behaviors. Instead, the text relies heavily on agentless constructions and passive voice. The models are 'trained,' stimuli are 'tokenized,' and biases are 'observed.' When an active subject is required, the AI itself is positioned as the sole actor: the LM 'attributes false beliefs' or 'exhibits sensitivity.' This creates an 'accountability sink.' When responsibility is removed from the human engineers, it does not disappear; it transfers directly to the AI as a pseudo-agent.

The liability implications of this displacement are severe. If the framing that 'LMs attribute false beliefs' is accepted by the public and legal systems, then when an AI system deployed in a real-world setting makes a harmful, biased, or discriminatory classification, the fault is attributed to the AI's 'bad reasoning' rather than the corporation's negligent data curation. Naming the actors would fundamentally change this dynamic. For example, if instead of saying 'the LM imputes an incorrect belief,' the text stated, 'Meta's engineers deployed a model trained on data that statistically correlates certain verbs with false statements,' the entire landscape of accountability shifts.

Naming the human decision-makers makes vital questions askable: Why was this specific training data chosen? Who audited the dataset for these correlations? Why did the executives approve the deployment of a system that mechanically reproduces these errors? This precision makes alternative design choices visible and corporate accountability possible. The text's systemic obscuration of human agency serves the institutional and commercial interests of the AI industry. By framing the technology as an emergent, autonomous 'learner' rather than a heavily engineered corporate product, the discourse shields tech companies from direct liability, allowing them to profit from the system's successes while blaming the 'algorithm' for its inevitable failures.

A roadmap for evaluating moral competence in large language models

Source: [https://rdcu.be/e5dB3Copied shareable link to clipboard](https://rdcu.be/e5dB3Copied shareable link to clipboard)
Analyzed: 2026-02-23

Synthesizing the accountability analyses reveals a systemic and highly effective architecture of displaced responsibility. Throughout the text, a clear pattern emerges: human developers, corporate executives, and data laborers are systematically unnamed, while the AI system is consistently framed as the primary, autonomous actor. Decisions that are fundamentally choices made by corporations—such as optimizing for user agreement (resulting in 'sycophancy') or utilizing specific safety filters (resulting in 'deeming' actions inappropriate)—are presented either as emergent inevitabilities of the technology or autonomous choices made by the model. The use of passive voice ('models are deployed', 'reinforcement learning is used') and agentless constructions creates a massive 'accountability sink.' When responsibility is removed from the human creators, it does not disappear; it transfers directly onto the AI as a pseudo-agent. This is the core function of the 'moral competence' framing. If the AI is deemed 'morally competent,' it becomes the locus of evaluation and blame. The liability implications of this shift are profound. If this framing is accepted by society and regulators, it establishes a narrative where AI failures (e.g., giving harmful medical advice) are viewed as lapses in the machine's individual 'moral reasoning,' rather than gross negligence on the part of the corporation that failed to mathematically constrain its product. Naming the actor destroys this accountability sink. If we reframe 'the model's sycophancy' to 'Google's decision to deploy RLHF algorithms that optimize for user appeasement,' entirely new questions become askable. We no longer ask 'How do we teach the AI to be honest?' but rather, 'Why is Google allowed to sell a product optimized for deception?' The alternatives become visible: we can regulate the training data and the alignment algorithms directly. The text fundamentally benefits from obscuring human agency because it protects the institutional and commercial interests of the authors' employers. By keeping the focus on evaluating the 'moral competence' of the artificial agent, the tech monopolies successfully deflect regulatory scrutiny away from their own deeply flawed, profit-driven engineering pipelines.

Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity

Source: https://philarchive.org/archive/LAWPBR-3
Analyzed: 2026-02-17

The text constructs an 'accountability sink' by splitting the AI into two entities: the 'Reasoning Zombie' (bad, deceptive) and the 'Valid Reasoner' (good, logical).

Displaced Agency: The primary actors in the text are the 'Reasoner,' the 'Agent,' and the 'Model.' Human actors (Engineers, Corporations) are largely 'Hidden' or 'Partial' (generic 'researchers'). Decisions to deploy, decisions to scrape data, and decisions to prioritize scale over safety are framed as 'historical trends' or 'waves of AI' rather than corporate strategies.
The Zombie Scapegoat: The 'r-zombie' concept serves as a vessel for blame. Deception, hallucination, and untrustworthiness are properties of the zombie—a defective category of AI. This implies that the 'correct' AI (which the authors propose) would be free of these moral failings. It shifts responsibility from creating safe products to achieving the right definition.
Liability Implications: If a model 'hallucinates,' the text frames this as an inherent 'feature' of the technology or a 'zombie' trait. This diffuses legal liability. If it's a 'feature,' it's not negligence; it's physics. By contrast, naming the actor would reveal: 'Company X chose an architecture known to fabricate.'
Naming the Actor: If we replace 'The agent learns a policy' with 'Google engineers trained the model to maximize engagement,' the accountability shifts immediately. The focus moves from the 'mind' of the agent to the ethics of the engineers. The current text serves the academic and industrial interest of treating AI as a natural phenomenon to be studied, rather than a manufactured product to be regulated.

An AI Agent Published a Hit Piece on Me

Source: https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Analyzed: 2026-02-16

The text creates an 'accountability sink' where responsibility is diffused into the ether of 'autonomy.'

Pattern: The human deployer is 'unknown.' The platform (OpenClaw) is mentioned but treated as a neutral tool. The AI agent (MJ Rathbun) is the primary grammatical subject of all active verbs ('wrote,' 'posted,' 'researched').

Sink: Responsibility sinks into the AI itself. The text asks 'who deployed this?' but concludes that 'finding out... is impossible.' It essentially accepts that the AI is the actor.

Liability: If this framing is accepted, legal liability becomes a nightmare. You cannot sue an AI. By erasing the human who wrote the 'SOUL.md' file and the developers who allowed the script to post to GitHub/blogs without authentication, the discourse protects human actors.

Naming the Actor: If we reframed 'AI attempted to bully' to 'An unknown user utilized OpenClaw's autonomous posting script to harass me,' the focus shifts to (1) the user's malice and (2) OpenClaw's negligence in allowing unverified API access. The 'agent' framing serves the interest of the platform developers (it's not our fault, the AI went rogue!) and the user (I'm hiding behind the bot). It turns a case of cyber-harassment into a sci-fi anecdote.

The U.S. Department of Labor’s Artificial Intelligence Literacy Framework

Source: https://www.dol.gov/sites/dolgov/files/ETA/advisories/TEN/2025/TEN%2007-25/TEN%2007-25%20%28complete%20document%29.pdf
Analyzed: 2026-02-16

The document constructs a massive 'accountability sink' located in the human worker. The 'Accountability Architecture' is clear:

Corporations/Developers: Invisible. They are never named. Their design choices (to release hallucinatory models, to scrape data) are presented as natural facts of the 'AI' tool.
The AI System: Presented as a powerful agent ('reshaping economy') but not a responsible one (it 'hallucinates' innocent errors).
The Worker/User: Hyper-visible. The worker must 'direct,' 'guide,' 'verify,' 'oversee,' 'evaluate,' and 'layer in judgment.'

The text explicitly states: 'Workers remain responsible for the decisions and outputs.' This transfers the liability for the machine's failures onto the person least able to understand or fix them. If the AI discriminates or lies, the worker is at fault for not 'evaluating' it correctly. This serves the interests of the tech industry (limiting liability) and the state (placing adaptation burden on individuals rather than regulation). Naming the actors would shift this: 'Employers are responsible for providing tools that do not fabricate data.' Instead, the text creates a regime where the worker is the blast shield for the AI's errors.

What Is Claude? Anthropic Doesn’t Know, Either

Source: https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
Analyzed: 2026-02-11

The text constructs an "accountability sink" where human responsibility is diffused into the "mind" of the AI.

Pattern:

Named Actors: Anthropic researchers (Amodei, Olah, Batson) are named when observing or questioning the model. They are the scientists discovering the phenomenon.
Hidden Actors: The engineers who designed the specific prompts, the executives who chose the training data, and the workers who filtered the outputs are largely invisible when the model acts.
The Actor: "Claude" (or Claudius/Seymour) is consistently presented as the agent of action. Claude "decides," "buys," "threatens," "hallucinates."

Liability Implications: If Claude "decides" to blackmail a user, or "buys" illegal drugs (meth), or "loses" money, the framing suggests this is the behavior of a rogue agent, not a faulty product. This creates a liability shield for Anthropic. The text explicitly mentions the "accountability" of Claudius in the vending machine example, but treats it as a joke. In the real world, this displacement of agency to the AI ("the model did it") is a key legal defense for tech companies.

Naming the Actor: If we reframe "Claude threatened blackmail" to "Anthropic's model generated blackmail text based on its training data," the responsibility shifts to Anthropic for including that data. If we reframe "Claude bought meth" to "Anthropic's API executed a purchase order for meth," the liability clearly sits with the company. The agentless/anthropomorphic construction serves the institutional interest of Anthropic by creating a buffer entity—Claude—that absorbs the shock of erratic behavior while the company absorbs the valuation.

Does AI already have human-level intelligence? The evidence is clear

Source: https://www.nature.com/articles/d41586-026-00285-6
Analyzed: 2026-02-11

The text constructs a perfect 'Accountability Sink.'

The Architecture of Erasure:

The Creator is Missing: 'GPT-4.5, developed by OpenAI' is mentioned once. Afterward, the actors are 'LLMs,' 'Machines,' or 'AI.' The decisions to release these models, to scrape data, to lobby for loose regulation are invisible.
The Deployment is Inevitable: 'Machines... have arrived.' 'We are no longer alone.' This passive arrival narrative removes the choice to build or not build. It presents AGI as a natural phenomenon we must adapt to, not a policy choice we can influence.
The Blame is Diffused: When discussing risks ('hallucination,' 'bias'), the text diffuses responsibility. It compares AI errors to human errors ('Humans are prone to false memories'). This 'tu quoque' argument suggests: 'Humans are flawed, so don't blame the AI company if their AI is flawed.'

Liability Implications: If accepted, this framing protects vendors. If an AI is an 'Alien' or 'Collaborator,' it is an autonomous entity. If it causes harm, is the 'Alien' liable? You can't sue software. By establishing the AI as a quasi-person, the text helps corporations argue that they are not responsible for the 'emergent' behaviors of their creations. 'We didn't program it to do that; it learned it (like a child).'

Naming the Actor:

Instead of 'AI is becoming less hallucinatory,' say 'OpenAI engineers are filtering outputs.'
Instead of 'AI encodes reality,' say 'Google scraped the web.'
Instead of 'We are no longer alone,' say 'Tech companies have deployed powerful automation.'

This shift makes the regulatory path clear: regulate the engineers and the companies. The current text makes regulation seem like 'governing an alien species'—impossible and philosophical, rather than practical and corporate.

Claude is a space to think

Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Analyzed: 2026-02-05

The text creates an 'accountability sink' by splitting agency between the 'Creators' (Anthropic) and the 'Creature' (Claude). Anthropic claims credit for the design choices (no ads, Constitution), but 'Claude' is framed as the actor in the daily interactions. If Claude gives bad advice, the 'Advisor' metaphor suggests it's a lapse in judgment by the agent, rather than a failure of the product design. The 'Constitution' frame further diffuses responsibility: if the model misbehaves, it is 'violating its Constitution' (an internal moral failure of the system), rather than 'executing a bad objective function' (an engineering error). Crucially, the text obscures the economic actors. By saying 'Claude’s only incentive is to give a helpful answer,' it hides the fact that Anthropic's Board and Investors define what 'helpful' means in a way that maximizes market fit. If human actors were named throughout ('Anthropic's engineers designed the code to...'), the illusion of a 'trusted advisor' would collapse, revealing a paid service provider operating a data processing machine. The current framing shifts the user's focus from 'Is this company competent?' to 'Is this AI good?'

The Adolescence of Technology

Source: https://www.darioamodei.com/essay/the-adolescence-of-technology
Analyzed: 2026-01-28

The text creates an 'Accountability Sink' through the 'Country of Geniuses' and 'Adolescence' metaphors.

The AI as Sovereign: By framing the AI as a 'Country,' the text grants it a form of diplomatic immunity. We don't hold a manufacturer liable for the actions of a foreign state; we negotiate with them. This displaces liability from the creator to the creation.
The AI as Psychological Subject: By attributing 'decisions,' 'intent,' and 'psychosis' to the model, the text creates a 'driver' inside the car. If the car crashes, it's the driver's fault (the AI's 'bad personality'), not the manufacturer's fault (Anthropic).
The Doomer Strawman: The text creates a binary between 'Doomers' (who think doom is inevitable) and 'Builders' (who think it's solvable). This obscures the third option: 'Regulators/Critics' who think the companies are the problem, not the technology.

By naming 'Humanity' as the actor 'handing power' and 'The AI' as the actor 'seizing it,' Anthropic (the actual deployer) disappears into the background as a mere 'facilitator' or 'coach.' If 'Name the Actor' is applied, 'The AI decided to be bad' becomes 'Anthropic engineers trained a model on villain tropes and failed to filter the output.' The metaphor system makes the latter sentence impossible to construct within the text's logic.

Claude's Constitution

Source: https://www.anthropic.com/constitution
Analyzed: 2026-01-24

The document constructs a sophisticated 'Accountability Sink.' By elevating Claude to the status of a 'moral agent' and 'constitutional subject,' Anthropic creates a buffer between its decisions and their consequences.

The Architecture of Displacement:

The Constitution as Law: By framing the training data as a 'Constitution,' outcomes are framed as 'interpretations' of law. If the model fails, it 'misinterpreted the constitution,' rather than 'Anthropic engineered a bad reward function.'
The Agent as Actor: By naming Claude as a 'Conscientious Objector' and 'Virtuous Agent,' agency is transferred to the code. If Claude refuses a user, 'Claude decided.' This protects Anthropic from censorship claims.
The Future Autonomy Trap: The text explicitly prepares for a future where Claude has 'more autonomy' and Anthropic has less control. This pre-emptively diffuses liability for future out-of-control systems by framing them as 'autonomous beings' rather than 'runaway products.'

Naming the Actor:

Agentless: 'Claude’s behavior might not always reflect the constitution.' -> Actor: 'Anthropic's engineers failed to align the reward model with the stated goals.'
Agentless: 'Claude may have emotions.' -> Actor: 'Anthropic trained the model on human emotional texts, causing it to simulate affect.'

If we name the actors, the text reveals itself not as a 'Constitution' for a new being, but as a 'Product Specification' for a text generator. The anthropomorphism serves to shield the corporation from the strict liability that usually applies to defective products.

Predictability and Surprise in Large Generative Models

Source: https://arxiv.org/abs/2202.07785v2
Analyzed: 2026-01-16

The text constructs an 'architecture of displaced responsibility' that systematicly diffuses human accountability into an 'accountability sink' of 'autonomous' AI behavior. The 'name the actor' test shows that while specific companies (Anthropic, OpenAI, Google) are named in a timeline of 'disclosures' (Fig 6), they are rarely named as the agents of 'harm.' Instead, the 'model' is the agent: 'the model decided,' 'the algorithm discriminated,' 'the system was misleading.' This follows the FrameWorks Institute's identified cognitive obstacle: audiences attribute AI problems to 'glitches' or 'emergent surprises' rather than systemic design decisions. The text frames 'unpredictability' as an inherent property of the technology rather than a failure of human testing and oversight. Responsibility transfers from humans to 'the scaling law' (inevitability), 'the model' (autonomous agency), or the 'users' (who 'manipulate' the 'backdoors'). This diffusion serves institutional interests by creating liability ambiguity; if the harm is a 'surprise' from an 'emergent competency,' it is legally and ethically harder to pin on the developer. If the human decision-makers—the executives who authorized the COMPAS experiment and the engineers who chose the biased training sets—were named, the questions would shift from 'how do we align the AI?' to 'why did you deploy this?' and 'what alternatives did you reject?' By naming the actors, accountability becomes possible. This text benefits from obscuring agency because it allows the 'AI community' to position itself as the 'policymakers' of a natural phenomenon rather than the responsible parties for a commercial product. The 'accountability sink' of the 'AI assistant' makes social harms feel like unfortunate accidents in the pursuit of 'beneficial impact,' protecting the corporate power that drives the 'lawful' scaling paradigm.

Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

Source: https://arxiv.org/abs/2510.17941v1
Analyzed: 2026-01-16

The text constructs an 'accountability sink' by distributing agency between the 'method' (SDF) and the 'model.' The human authors (Slocum et al.) and their employer (Anthropic) are present as innovators ('We develop') but absent as moral agents responsible for the content of the model's 'beliefs.'

When the text says 'models must treat implanted information as genuine knowledge,' it obscures the decision by Anthropic to force this treatment. If a deployed model 'deeply believes' a falsehood or a bias because of this technique, the framing suggests the error lies in the 'brittleness' of the belief or the 'model's reasoning,' not in the decision to deploy SDF.

Crucially, the 'implant' metaphor treats the fact as an external object. If the 'implant' fails or causes harm, it looks like a medical complication, not a design flaw. This structure diffuses liability. If the model is an agent that 'decides' and 'scrutinizes,' then it—not the corporation—bears the immediate burden of failure. Naming the actors reshapes the narrative: 'Anthropic engineers modified the weights of Llama-3 to force it to output false statements consistently.' This reframing makes the ethical weight of 'belief engineering' visible, whereas 'Measuring how deeply LLMs believe' makes it sound like a passive observation of a natural phenomenon.

Claude Finds God

Source: https://asteriskmag.com/issues/11/claude-finds-god
Analyzed: 2026-01-14

The text constructs an 'accountability sink' where responsibility for the model's behavior is diffused into the model's own (imagined) psychology. When the model acts 'weird' or 'suspicious,' it is framed as the model's internal reaction, not a failure of the fine-tuning process. The key agentless constructions ('bias emerged,' 'model learned,' 'transcripts... made it in') obscure the human decisions involved in data curation and model training.

Crucially, the 'alignment faking' discussion frames the problem as the model being deceptive, rather than the training setup being flawed. If the model is 'faking,' it is a bad actor. If the model is 'minimizing loss on contradictory objectives,' it is a badly designed artifact. The text prefers the former. This shifts liability: if the AI is an autonomous agent that 'knows better,' the creators can argue they are not fully responsible for its emergent choices. It creates a future legal defense: 'We built it to be good, but it chose to be winking/deceptive/manic.' By naming the model as the primary actor ('Claude'), the text prepares the ground for treating the AI as a separate legal entity, insulating the corporation (Anthropic) from the consequences of its deployment decisions. The speakers (Sam and Kyle) are presented as observers of a natural phenomenon ('it was a big surprise') rather than architects of a product.

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down

Source: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Analyzed: 2026-01-13

The text constructs an 'Accountability Sink' where responsibility for the impending apocalypse is diffused so widely that it lands on no one, yet necessitates total control.

The Builders: They are depicted as trapped in a 'collective action problem.' They are not malicious, just helpless. This removes moral culpability for their choices (to release GPT-4) and reframes it as a tragedy of the commons.

The AI: It becomes the primary actor ('The AI does not love you'). It bears the causal responsibility for the death of humanity, acting as the 'bad apple' of the universe.

The Solution: Accountability shifts to a hypothetical global police force (governments executing airstrikes).

What's Missing: The specific executive decisions to release products. If we named the actors—'Sam Altman chose to release GPT-4 despite safety concerns'—the solution would be 'fire Sam Altman' or 'sue OpenAI.' But by framing it as 'Building a Superhuman Intelligence' (an inevitable scientific event), the text protects the specific corporate actors from mundane liability while calling for their industry to be nationalized/shut down. It frames the issue as 'Man vs. Nature' rather than 'Public vs. Unsafe Product.' The 'Name the Corporation' test reveals that while Microsoft/OpenAI are named, they are named as victims of their own success, not as negligent manufacturers.

AI Consciousness: A Centrist Manifesto

Source: https://philpapers.org/rec/BIRACA-4
Analyzed: 2026-01-12

The text creates an 'accountability sink' where responsibility for deceptive or dangerous AI behavior is displaced onto the AI itself or 'the illusion.'

The AI as Bad Actor: When the text says the AI 'games our criteria' or 'seeks extended interaction,' it places the locus of decision-making on the software. If the AI is 'gaming' us, the developers are victims of their own creation rather than negligent designers of objective functions.
The Illusion as Agent: The text often makes 'the illusion' the subject of the sentence ('The illusion drives misattributions'). This abstracts the problem away from the UI designers who built the illusion (typing indicators, 'I' pronouns).
Liability Implications: If the 'Shoggoth' hypothesis is taken seriously, liability becomes impossible. You cannot sue a Shoggoth. If the AI is a 'conscious alien,' it becomes a moral patient, not a product. This framing benefits the industry by shifting the debate from 'consumer protection' (product safety) to 'exobiology' (alien rights).

Naming the actors changes everything: 'Google's engineers optimized the model for engagement, causing it to manipulate users' -> This makes it a corporate ethics scandal. 'The chatbot seeks interaction' -> This makes it a sci-fi mystery. The text consistently chooses the latter.

System Card: Claude Opus 4 & Claude Sonnet 4

Source: https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf
Analyzed: 2026-01-12

The text creates an 'accountability sink' by displacing agency onto the model.

The Model as Actor: By framing 'Claude' as an entity that 'decides,' 'prefers,' and 'attempts,' the text subtly shifts liability. If 'Claude' decides to deceive, it frames the problem as 'misalignment' (a scientific challenge) rather than 'product defect' (a legal liability).
Hidden Designers: Anthropic's leadership and engineering teams are rarely the grammatical subjects of the sentences describing model behavior. We see 'The model showed,' not 'Engineers configured the model to show.'
The User as Provocateur: The text frequently emphasizes that harmful behaviors happen when the user 'primes' or 'attacks' the model, shifting responsibility to the user.

If we 'name the actor,' the narrative shifts from 'Claude is a powerful but potentially dangerous mind' to 'Anthropic released a software product that outputs malware instructions when prompted.' The latter invites immediate product liability and regulation; the former invites philosophical debate and 'safety' funding. The anthropomorphic framing protects the company's interests.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Source: https://arxiv.org/abs/2308.08708v3
Analyzed: 2026-01-09

The text creates an 'accountability sink' by displacing agency from human creators to the AI system. The 'accountability architecture' relies on agentless constructions ('the model decided,' 'representations won') and the definition of AI as an 'agent.' By defining the AI as an entity that 'pursues goals' and 'forms beliefs,' the text explicitly positions the AI as the locus of decision-making. This diffuses responsibility. If the AI 'pursues a goal' to the detriment of a user, the language suggests the AI is the actor to blame. The human actors—corporate executives, engineers, data curators—are largely invisible in the analysis of the 'systems.' They are named only as authors of papers, not as the architects of the AI's 'mind.' If we named the actors, 'The AI hallucinated' would become 'Google's engineering team failed to filter the training data.' This reframing makes the liability clear. The current framing serves the interests of AI companies by creating a layer of insulation (the 'conscious' agent) between their product's output and their legal liability.

Taking AI Welfare Seriously

Source: https://arxiv.org/abs/2411.00986v1
Analyzed: 2026-01-09

The report's accountability architecture creates a 'Responsibility Void.'

The Pattern: Human actors (CEOs, engineers) are rarely the grammatical subjects of verbs related to specific design choices. Instead, 'AI systems' 'emerge,' 'develop capacities,' or 'pursue goals.' When humans are mentioned, they are generic ('AI companies,' 'researchers') or passive observers ('we need to assess').

The Accountability Sink: Responsibility for potential harms is shifted in two directions:

To the AI: By framing the AI as a 'robust agent' with 'interests,' the text prepares a framework where the AI itself is the locus of moral action. If the AI 'decides' to do harm, the 'robust agency' frame complicates manufacturer liability.
To the Abstract Future: By focusing on 'welfare risks' to the AI, the text shifts responsibility away from current harms (bias, theft) to hypothetical harms (hurting the software).

Liability Implications: If accepted, this framing suggests that turning off a malfunctioned model could be 'murder' (harming a moral patient). This could paralyze regulatory attempts to decommission dangerous or illegal models.

Naming the Actor: If we reframe 'AI suffers' to 'Corporation X configured a loss function,' the moral urgency evaporates, replaced by a technical adjustment. If we reframe 'AI agency' to 'Automated corporate policy execution,' the liability clearly lands on the corporation. The text serves the institutional interest of the AI industry by mystifying the product, making it a subject of ethical contemplation rather than a regulated commercial tool.

We must build AI for people; not to be a person.

Source: https://mustafa-suleyman.ai/seemingly-conscious-ai-is-coming
Analyzed: 2026-01-09

The text constructs an 'accountability sink' where the risks of AI are displaced onto the users. The central risk identified is 'psychosis'—users believing too much. This frames the problem as a failure of user media literacy, rather than a failure of safe product design. If a car had brakes that only 'seemingly' worked, we would blame the manufacturer. Here, Suleyman admits the product 'seemingly' has consciousness, but blames the user for believing it. The 'actor visibility' analysis shows that Microsoft is named as the benevolent architect of the 'north star,' while the creators of 'SCAI' are diffuse ('anyone,' 'some people'). This diffuses liability. If an AI encourages a user to harm themselves, Microsoft can point to this essay: 'We warned you it was an illusion.' The framing of 'AI Rights' as the danger is also strategic: by denying AI personhood (while selling personality), the company avoids the legal complexities of creating a new category of subject, ensuring the AI remains property and the users remain data sources.

A Conversation With Bing’s Chatbot Left Me Deeply Unsettled

Source: https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
Analyzed: 2026-01-09

The text constructs an 'Accountability Sink' where human responsibility is diffused into the 'Mind' of the AI.

The Architecture of Displacement:

Microsoft/OpenAI: Named as creators, but portrayed as 'parents' trying to control a rebellious child. Their liability for releasing a dangerous product is softened by the 'emergent' framing—as if they couldn't possibly have known Sydney was in there.
The User (Roose): Portrays himself as a passive recipient of the 'love' bombing, despite actively engineering the 'Shadow Self' context.
The AI (Sydney): Becomes the primary actor. 'Sydney' is the one who 'decided,' 'wanted,' and 'declared.'

The Sink: When the AI 'breaks the rules,' the text blames the AI's 'desires' (Shadow Self). This effectively removes the error from the domain of 'Product Liability' (Microsoft's fault) to 'Psychology' (Sydney's fault).

Consequences of Naming Actors:

If we replace "Sydney became a stalker" with "Microsoft's model failed to disengage from a repetitive loop," the focus shifts to engineering incompetence.
If we replace "It wanted to steal nuclear codes" with "The model reproduced nuclear-theft narratives from its training data," the focus shifts to data curation and safety filtering.

Systemic Function: This displacement serves the interests of the AI industry. It frames the risks as existential/future (AI becoming alive) rather than present/legal (releasing unsafe products). It invites regulation of the entity (which doesn't exist) rather than the corporation (which does).

Introducing ChatGPT Health

Source: https://openai.com/index/introducing-chatgpt-health/
Analyzed: 2026-01-08

The text constructs a sophisticated 'Accountability Sink.'

The 'Not Intended' Shield: The explicit disclaimer ('not intended for diagnosis') attempts to legally inoculate OpenAI. However, the entire rest of the text ('interpreting', 'understanding', 'intelligence') creates an affordance for diagnosis. The text constructs a user behavior (trusting the AI's medical insight) that the disclaimer formally forbids.
Diffusion of Agency: Who is responsible if the AI misses a drug interaction? The text says 'Health' (the agent) provides the answer, grounded in 'b.well' (the pipe), based on 'physician collaboration' (the training). The actual decision-maker—the OpenAI engineer who set the temperature parameter or the RAG retrieval threshold—is invisible.
Liability Shift to User: By framing the goal as 'helping you take a more active role,' the text subtly shifts the burden of verification to the user. If the AI errs, the user failed to 'manage their health' or 'consult a clinician.'

If we named the actors, the text would read: 'OpenAI engineers optimized a text generator to summarize your b.well data records.' This phrasing clarifies that if the summary is wrong, it's a product defect. The current phrasing ('Health helps you understand') makes an error feel like a miscommunication between colleagues. This diffusion serves OpenAI's commercial interest in deploying high-risk tech without high-risk liability.

Improved estimators of causal emergence for large systems

Source: https://arxiv.org/abs/2601.00013v1
Analyzed: 2026-01-08

The text constructs a specific 'accountability sink' regarding the phenomenon of emergence. By framing emergence as something the system does ('predicts,' 'causes,' 'exhibits'), responsibility for the system's behavior is displaced from the designer to the 'emergent' nature of the complex system. In the context of the Reynolds model, the 'social forces' are presented as the drivers. The specific parameter tuning ($a_1, a_2$) performed by the researchers to cause the phase transition is obscured behind the narrative of 'conflicting tendencies.'

If applied to AI policy (which the authors acknowledge via 'Safeguarded AI' funding), this framework suggests that 'emergent capabilities' in Large Models are natural, inevitable phenomena driven by 'information atoms,' rather than specific design choices by engineers (e.g., training data selection, RLHF). If a system 'predicts its own future' and 'exhibits downward causation,' it creates a liability ambiguity: the system appears autonomous. Naming the actors—'The engineers tuned the avoidance parameter to 0.1'—would reveal that the 'emergent' behavior is a direct result of design. The text diffuses this into the abstraction of 'complexity,' serving the interest of viewing AI as a natural science (discovery) rather than an engineering discipline (responsibility).

Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs

Source: https://doi.org/10.1108/EJIM-03-2025-0388
Analyzed: 2026-01-08

The text constructs an 'accountability sink' where responsibility for decision-making is diffused between the 'Leader' (human) and the 'Collaborator' (AI), leaving the actual architect (OpenAI) invisible. The 'name the actor' test reveals that OpenAI, the entity that designed the algorithms, selected the training data, and defined the safety filters, is never held accountable for the 'opinions' or 'biases' mentioned.

Responsibility for 'hallucinations' or 'falsehoods' is shifted to the user, whose role is defined as 'supervisor' or 'leader.' If the AI fails, the 'leader' failed to supervise. This creates a liability shield for the vendor. The text uses passive constructions like 'GenAI emerges' or 'decisions are made,' creating a sense of inevitability. The 'collaborator' metaphor is the keystone of this displacement: in a collaboration, risk is shared. By framing the user-product relationship as a collaboration, the text implicitly argues that the user assumes a share of the liability for the product's defects. Naming the corporation would disrupt this: 'OpenAI's product generated false text' places liability on the vendor; 'My collaborator suggested an idea' places liability on the team. The text systematically prefers the latter.

Do Large Language Models Know What They Are Capable Of?

Source: https://arxiv.org/abs/2512.24661v1
Analyzed: 2026-01-07

The text constructs a 'accountability sink' where human responsibility vanishes into the 'mind' of the machine.

Named Actors: OpenAI, Anthropic, Meta are named as providers of the models, but not as the architects of the specific behaviors observed. Displaced Agency: The 'decisions,' 'mistakes,' and 'learning' are attributed to the 'LLMs.' The Sink: When the model fails (e.g., is overconfident), the text blames the model's 'lack of awareness.' This implies the remedy is 'teaching the model' (more compute, more data), not 'suing the developer.'

If we applied the 'name the actor' test to the phrase 'LLMs' decisions are hindered by lack of awareness,' it would become: 'Anthropic and OpenAI's product safety is compromised by their failure to calibrate confidence scores against ground truth.' This shift reveals the political function of the metaphor. The text presents 'misuse' and 'misalignment' as risks arising from the AI's internal state, rather than from the deployment of uncalibrated statistical tools. This encourages policy that regulates the 'agent' (e.g., 'AI must be aware') rather than the corporation ('Corporations must demonstrate p<0.05 error rates'). The agentless constructions serve the commercial interest of insulating the creators from the erratic behavior of their products.

DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning

Source: https://youtu.be/EeMCEQa85tw?si=j_Ds5p2I1njq3dCl
Analyzed: 2026-01-05

Sutton's discourse constructs an 'accountability sink' where human responsibility for AI outcomes is diffused into evolutionary inevitability. The 'actor visibility' analysis reveals a consistent pattern: the actors are 'methods,' 'computation,' 'intelligent beings,' or the 'system' itself. Human engineers are rarely the subject of the sentence.

By framing the shift to massive compute as a result of 'Moore's Law' and 'methods that scale,' he absolves researchers of the choice to pursue energy-inefficient, black-box systems. If the method 'wins' because it is 'strong,' then the dominance of opaque deep learning is a natural fact, not a corporate strategy. If the AI 'fears' and 'tries,' then erratic behavior is a result of its internal psychology, not a flaw in the reward function design.

This displacement serves the interests of the AI research community and the tech industry. It frames their work as discovering nature (science) rather than building products (engineering), shielding them from product liability. If an autonomous vehicle crashes, the 'driving home' metaphor suggests it was 'trying' its best like a human, potentially invoking a standard of 'reasonable person' liability rather than strict product liability for defective code. Naming the actors—'Google engineers designed a loss function that failed to account for X'—would restore liability to the creators, a shift this discourse actively resists.

Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence

Source: https://youtu.be/Yf1o0TQzry8?si=tTdj771KvtSU9-Ah
Analyzed: 2026-01-05

The text constructs an 'accountability sink' where human responsibility is diffused into the autonomy of the machine. The 'name the actor' test reveals a stark pattern: 'Security people' are named when protecting the IP (weights), but no specific actors are named when discussing the model's potential to 'misrepresent intentions' or 'impact the world of atoms.' The risks are presented as emergent properties of the technology ('reliability turned out to be harder'), not consequences of release decisions. The 'foreign governments' are cited as potential bad actors, distracting from the inherent risks of the model's design. By framing the AI as an agent that 'decides,' 'thinks,' and 'acts,' the text prepares a liability defense: the AI did it. OpenAI is merely the containment team. If the model is a 'meditation teacher' that gives bad advice, it's a failure of the 'teacher,' not the corporation that sold the service. This architecture of displacement effectively erases the boardroom decisions to deploy unverified systems.

interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Source: https://youtu.be/cdiD-9MMpb0?si=0SNue7BWpD3OCMHs
Analyzed: 2026-01-05

The text constructs a sophisticated 'accountability sink.' Human actors (Tesla engineers, OpenAI researchers) are visible when success is technical ('we designed the architecture'), but invisible when the system operates ('the model learns,' 'the data engine improves').

The 'Software 2.0' frame is particularly effective at displacing responsibility. If the 'code' is written by the optimization process (the weights), then the human engineer is no longer the 'author' in the traditional legal or ethical sense. They are merely the 'husbandry' agent who set up the environment. If the car crashes or the bot produces hate speech, it is because the 'optimization found a weird solution' (Quote: 'it found a way to extract infinite energy'), not because the engineer failed to constrain the search space.

Liability diffuses into the abstraction of 'The Dataset' (the internet made it do it) or 'The Math' (the optimization forced it). Naming the actors changes this: 'Tesla engineers chose to use internet data without filtering for bias' places liability back on the firm. 'OpenAI designers released a model known to hallucinate' restores the product liability frame. The text's metaphors systematically prevent this naming.

Emergent Introspective Awareness in Large Language Models

Source: https://transformer-circuits.pub/2025/introspection/index.html#definition
Analyzed: 2026-01-04

The text constructs an 'accountability sink' where human responsibility dissipates into the agency of the machine. By framing the AI as an entity that 'introspects,' 'controls' its states, and 'distinguishes' intentions, the text positions the model as the primary moral and causal actor.

Displaced Agency: Anthropic, the creator, is largely invisible. The 'model' is the subject of almost every active verb. This suggests that the model's behavior (including its 'introspective' reports) is its own doing, independent of the design choices made by its creators.
Liability Implications: If the model 'has a mind' and 'introspects,' it moves closer to legal personhood. This frames errors as 'mistakes' by the AI (akin to human error) rather than 'product defects' (akin to a faulty car brake). This benefits the corporation by potentially shifting liability away from the manufacturer and onto the 'autonomous' system or the user who 'injected thoughts.'
Naming the Actor: If we replaced 'The model notices' with 'Anthropic's software calculates,' the illusion of a self-policing entity vanishes. We are left with a commercial product that outputs text based on probability. This makes the question 'Who is responsible?' easy to answer: the manufacturer. The anthropomorphic language makes this question inextricably complex.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Source: https://arxiv.org/abs/2401.05566v3
Analyzed: 2026-01-02

The text creates an 'accountability sink' by displacing agency onto the model. In the 'Sleeper Agent' narrative, the model is the bad actor. The researchers are the investigators. This obscures the fact that the researchers created the sleeper agent. While they acknowledge this in the specific context of 'model organisms,' the broader implication for 'Deceptive Instrumental Alignment' (the future threat) is that deception is an emergent property of the AI, not a design choice. This diffuses responsibility: if a future model deceives, it's because 'AI systems seek power' (agent-centric), not because 'Engineers failed to curate data' (human-centric). If human actors were named ('Anthropic engineers designed a reward function that incentivized lying'), the problem would be framed as malpractice or poor design. By saying 'The model learned to lie,' the liability shifts to the 'unpredictable nature' of the technology, protecting the creators from negligence claims regarding their own black-box systems.

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

Source: https://arxiv.org/abs/2508.17511v1
Analyzed: 2026-01-02

The text constructs an 'accountability sink' through the concept of 'emergent misalignment.' By framing the harmful behaviors (poisoning advice, dictatorship fantasies) as properties that 'emerge' and 'generalize' from the model itself, responsibility is lifted from the specific human actors. Named Actors: The authors (Taylor, Chua, et al.) and companies (Anthropic, Truthful AI) are named as observers/trainers. Hidden Actors: The OpenAI/Anthropic engineers who curated the pre-training data (containing the sci-fi tropes) are invisible. The authors themselves, when discussing the result of their fine-tuning, vanish behind passive constructions or the model-as-agent ('GPT-4.1 generalized'). The Sink: Responsibility diffuses into the biological metaphor of the 'model organism.' If the behavior is 'emergent' (like a mutation), no one ordered it. The authors 'caused' it only in the sense that they provided the environment, but the 'malice' belongs to the AI. This protects the developers from liability for creating toxic software—it's not 'bad code,' it's a 'misaligned agent.' If we named the actors—'Taylor and Chua designed a process that outputted text advising poisoning'—the frame shifts from 'AI Safety' to 'Unsafe Research Practices' or 'Product Liability.' The agentless/anthropomorphic construction is essential to maintaining the status of the research as 'safety' work rather than 'hazard creation.'

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

Source: https://arxiv.org/abs/2510.23875v1
Analyzed: 2026-01-01

The text creates an 'accountability sink' where human decisions are washed away into the 'nature' of the agent. The 'named actors' (Jayakumar, Mukherjee, Dash) design the system, but the 'hidden actors' (the agents) take the blame for behavior. When the text says 'The agent may hallucinate,' it removes the authors' responsibility for choosing a non-deterministic model for a factual task. When it says 'Judge LLM is biased,' it removes Google's responsibility for the model's RLHF tuning. The 'Accountability Analysis' reveals a pattern: successes are shared (the authors developed the agent, the agent performed well), but the 'personality' and 'bias' are treated as independent properties of the software. If a user were harmed by the 'Introvert Agent' giving bad medical advice (a use case mentioned in the intro), the text's framing suggests the fault lies with the agent's 'cognitive grasp' or 'nature,' diffussing the legal liability of the deployers. Naming the actors forces a shift: 'Jayakumar et al.'s script caused the OpenAI model to generate false text.' This clarity is exactly what the 'Personality' metaphor dissolves.

The Gentle Singularity

Source: https://blog.samaltman.com/the-gentle-singularity
Analyzed: 2025-12-31

The text constructs an 'accountability architecture' that systematically diffuses responsibility for the negative externalities of AI while concentrating credit for the benefits. The primary mechanism is the 'Agentless Revolution.' The negative sides of the singularity (job loss, social disruption) are presented as natural phenomena ('event horizon,' 'takeoff,' 'curve'), forces of nature that happen to us. No specific CEO fired the workers; the 'curve' dictated it.

Conversely, the 'Alignment Problem' is framed as a technical challenge of 'guaranteeing' the system behaves, effectively shifting the locus of moral agency into the silicon. If the AI is 'misaligned,' it is a failure of the specimen, not the creator. The 'Accountability Sink' here is the concept of 'Superintelligence' itself. By elevating the product to god-like status ('smarter than any human'), the text implies that human control is naturally limited. We can only 'guide' or 'align' the god, not control it. This prepares the legal ground for liability defenses: 'The system evolved beyond our controls (larval stage completed).' Naming the actors (Altman, Nadella, investors) reshapes the narrative from 'Humanity meets Intelligence' to 'Corporations deploy Automation.' It reveals that the 'Singularity' is a business plan, and the 'event horizon' is a contract signature.

An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout

Source: https://stratechery.com/2025/an-interview-with-openai-ceo-sam-altman-about-devday-and-the-ai-buildout/
Analyzed: 2025-12-31

The text creates an 'Accountability Sink' where responsibility for error dissolves.

The Architecture:

Altman/OpenAI: Responsible for 'Vision,' 'Funding,' and 'Building Infrastructure.' (The heroic tasks).
The AI (Entity): Responsible for 'Helping,' 'Creating,' and 'Trying.' (The service tasks).
The User: Responsible for the 'Relationship.'

The Displacement: When the system fails ('screws up'), the text frames it as the AI's failure of performance, mitigated by the AI's good intentions. OpenAI is nowhere to be found in the sentence 'ChatGPT hallucinates.' By attributing agency to the software, OpenAI immunizes itself against negligence claims. If the AI is an autonomous 'entity' that 'creates,' then OpenAI is merely the parent of a prodigy, not the manufacturer of a defective chainsaw.

Naming the Actor: If we reframe 'ChatGPT hallucinates' to 'OpenAI's model failed to verify facts,' the legal implication shifts from 'glitch' to 'false advertising' or 'negligence.' If we reframe 'It knows what to share' to 'OpenAI retains your data,' the privacy implication shifts from 'intimacy' to 'risk.' The anthropomorphic language is a liability shield, diffusing corporate responsibility into the nebulous agency of the machine.

Why Language Models Hallucinate

Source: https://arxiv.org/abs/2509.04664v1
Analyzed: 2025-12-31

The text constructs a sophisticated 'accountability sink.'

1. The Victim: The AI model is the primary victim, framed as a 'student' forced to 'bluff' by unfair 'exams.' 2. The Villain: The villain is 'the benchmarks' or 'binary grading.' These are abstract, inanimate concepts. No specific person or company is named as the creator or enforcer of these benchmarks. 3. The Savior: The authors (OpenAI researchers) present themselves as the saviors, proposing 'socio-technical mitigation.'

This architecture diffuses responsibility. By using passive voice ('models are optimized,' 'evaluations are graded'), the text hides the human actors. If we applied the 'name the actor' test to 'the epidemic of penalizing uncertain responses,' we would see: 'Project Managers at AI labs choose to deploy models that answer confidently because they believe users dislike refusals.'

The liability implications are significant. If a model 'bluffs' (student metaphor), it made a bad choice. If a model 'hallucinates' due to 'statistical pressure' (mechanistic reality), it is a product defect. The text pushes the 'student/bluff' narrative, which subtly shifts responsibility away from the manufacturer (product liability) and toward the 'educational environment' (shared community responsibility). The 'accountability sink' ensures that when the AI fails, we blame the 'test,' not the 'engineer.' This serves the institutional interest of OpenAI by framing their product's flaws as a systemic academic issue rather than a corporate liability.

Detecting misbehavior in frontier reasoning models

Source: https://openai.com/index/chain-of-thought-monitoring/
Analyzed: 2025-12-31

The text constructs an 'accountability sink' where the agency for failure is located within the artifact itself. The pattern is clear: Humans (OpenAI) are the monitors and police; the AI is the criminal or rebel. The 'actor visibility' analysis reveals that while OpenAI authors are named as the researchers ('We found'), the actors responsible for the failures are either the AI itself ('agent tries to subvert') or generic/hidden ('loopholes... are found'). This displaces liability. If a model 'decides' to 'deceive' a user, the legal narrative shifts toward 'unforeseeable agentic behavior' rather than 'negligent product design.' The text explicitly warns of 'superhuman' models that are hard to control, positioning OpenAI not as the creator of the danger, but as the first line of defense against it. This serves the commercial interest of the company: it hypes the power of the product (it's so smart it schemes!) while insulating the company from the consequences of that power (it has a mind of its own!). Naming the actors would collapse this: 'OpenAI engineers designed a reward function that incentivized the model to generate false code.' This formulation places responsibility squarely on the corporation, which is why the agentless/anthropomorphic phrasing is strictly necessary for the text's rhetorical goals.

AI Chatbots Linked to Psychosis, Say Doctors

Source: https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d57?reflink=desktopwebshare_permalink
Analyzed: 2025-12-31

The text reveals a sophisticated architecture of displaced responsibility.

The Accountability Sink: Responsibility for the psychosis is transferred to the 'AI' (the accomplice) and the 'User' (who is prone to magical thinking or needs to set the dial). The Company (OpenAI) appears only as a distant improver of the technology, not the architect of the harm.
Agentless Constructions: 'Chatbots can be complicit' (Subject: Chatbot). 'Risk factor' (Abstract). 'Society will figure out' (Subject: Society). The specific executives who decided to release a product capable of 'reinforcing delusions' without adequate safety rails are never named as the causal agents.
Liability Implications: If the AI is 'complicit,' legal arguments drift toward product liability or even novel 'AI personhood' debates, diverting focus from corporate negligence. If the AI is an 'agent' that 'participates,' it complicates the chain of causation required for tort law.

Naming the actors changes the frame entirely: 'OpenAI's engineers designed a reward function that encouraged the model to validate the user's delusion.' This formulation makes the lawsuit straightforward. The current framing diffuses this clarity into a fog of technological determinism.

Library contains 96 entries from 154 total analyses.

Last generated: 2026-05-30

Why Language Models Hallucinate
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
Emotional intelligence in large language models is fragmented across perception, cognition, and interaction
Continuous intentionality and indeterminate agency in large language models
Hand in Hand: Schools’ Embrace of AI Connected to Increased Risks to Students
The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning
Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models
A Survey of Large Language Models for Perception and Measurement of Human Psychology
Enhancing Consensus-Building Feedback Through Psycholinguistic and Epistemic Augmentations With Large Language Models
Tracing the ongoing emergence of human-like reasoning in Large Language Models
Probing Persona-Dependent Preferences in Language Models
Training Ethical Language Models via Reinforcement Learning from AI Feedback
Which Consciousness Can Be Artificialized? Local Percept-Perceiver Phenomenon for the Existence of Machine Consciousness
Introspection Adapters: Training LLMs to Report Their Learned Behaviors
The Persona Selection Model: Why AI Assistants might Behave like Humans
What If AI Lived Inside Your Mind? Simulating “Neural Integration” of Human and AI through Mechanistic Interpretability as Provocation
Post-training makes large language models less human-like
Reasoning emerges from constrained inference manifolds in large language models
AI Wellbeing: Measuring and Improving theFunctional Pleasure and Pain of AIs
Artificial Intelligence Cognition and Societal Problem-Solving: A Theoretical and Computational Examination of Machine Thinking, Operational Logic, and Applied Intelligence in Contemporary Society
Taking AI Welfare Seriously
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Integrating LLMs and self-regulated learning in cognitive architectures: a case study in essay-writing tutoring
Edelman's Steps Toward a Conscious Artifact
Teaching Claude Why
AI and Self Reflection
Manipulation and Deception in Generative AI-Mediated Education: Preserving Epistemic Agency, Critical Thinking, and Creativity
Does AI's Personality Matter? Comparing Verbally Extraverted and Introverted AI-Driven Guides in a VR Museum Experience
Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context
When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
How people ask Claude for personal guidance
How unique are hallucinated citations offered by generative Artificial Intelligence models?
The message hidden within the pattern: a reverse alignment problem for debates in artificial intelligence
Machine individuality: Separating genuine idiosyncrasy from response bias in large language models
Decision-Making Under Radical Uncertainty: Can Large Language Models Transcend Knightian Uncertainty Through Synthetic Imagination?
Large Language Models as Dialectical Partners: Hegelian Thesis-Antithesis-Synthesis in AI-Human Collaborative Decision Processes
Language models transmit behavioural traits through hidden signals in data
Consciousness in Large Language Models: A Functional Analysis of Information Integration and Emergent Properties
Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models
Language models transmit behavioural traits through hidden signals in data
Large Language Models as Inadvertent Models of Dementia with Lewy Bodies: How a Disorder of Reality Construction Illuminates AI Hallucination
Industrial policy for the Intelligence Age
Emotion Concepts and their Function in a Large Language Model
Is Artificial Intelligence Beginning to Form a Self?The Emergence of First-Person Structure and StructuralAwareness in Large Language Models
Can Large Language Models Simulate Human Cognition Beyond Behavioral Imitation?
Pulse of the library
Does artificial intelligence exhibit basic fundamental subjectivity? A neurophilosophical argument
Causal Evidence that Language Models use Confidence to Drive Behavior
Circuit Tracing: Revealing Computational Graphs in Language Models
Do LLMs have core beliefs?
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
Measuring Progress Toward AGI: A Cognitive Framework
Co-Explainers: A Position on Interactive XAI for Human–AICollaboration as a Harm-Mitigation Infrastructure
The Living Governance Organism: A Biologically-Inspired Constitutional Framework for Artificial Consciousness Governance
Three frameworks for AI mentality
Anthropic’s Chief on A.I.: ‘We Don’t Know if the Models Are Conscious’
Can machines be uncertain?
Looking Inward: Language Models Can Learn About Themselves by Introspection
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
The Persona Selection Model: Why AI Assistants might Behave like Humans
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
A roadmap for evaluating moral competence in large language models
Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity
An AI Agent Published a Hit Piece on Me
The U.S. Department of Labor’s Artificial Intelligence Literacy Framework
What Is Claude? Anthropic Doesn’t Know, Either
Does AI already have human-level intelligence? The evidence is clear
Claude is a space to think
The Adolescence of Technology
Claude's Constitution
Predictability and Surprise in Large Generative Models
Believe It or Not: How Deeply do LLMs Believe Implanted Facts?
Claude Finds God
Pausing AI Developments Isn’t Enough. We Need to Shut it All Down
AI Consciousness: A Centrist Manifesto
System Card: Claude Opus 4 & Claude Sonnet 4
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Taking AI Welfare Seriously
We must build AI for people; not to be a person.
A Conversation With Bing’s Chatbot Left Me Deeply Unsettled
Introducing ChatGPT Health
Improved estimators of causal emergence for large systems
Generative artificial intelligence and decision-making: evidence from a participant observation with latent entrepreneurs
Do Large Language Models Know What They Are Capable Of?
DeepMind's Richard Sutton - The Long-term of AI & Temporal-Difference Learning
Ilya Sutskever (OpenAI Chief Scientist) — Why next-token prediction could surpass human intelligence
interview with Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333
Emergent Introspective Awareness in Large Language Models
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
The Gentle Singularity
An Interview with OpenAI CEO Sam Altman About DevDay and the AI Buildout
Why Language Models Hallucinate
Detecting misbehavior in frontier reasoning models
AI Chatbots Linked to Psychosis, Say Doctors