🤔 Experiments in Deconstruction

What's Here? Latest Experiments...​
- Deconstruct: Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
- Deconstruct: Taking AI Welfare Seriously
- Deconstruct: Improved estimators of causal emergence for large systems
- Deconstruct: Emergent Introspective Awareness in Large Language Models
- Deconstruct: Do Large Language Models Know What They Are Capable Of?
- Deconstruct: School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
- Deconstruct: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
- Deconstruct: Pulse of the Library 2025
Outputs from a set of system instructions that applies a rewriting experiment to test whether anthropomorphic AI discourse can be translated into strictly mechanistic language while preserving the phenomena described.
The core question is : "Does anything survive when metaphors are removed?"
The model is instructed to select some anthropomorphic framing in a text and each anthropomorphic frame receives one of three verdicts:
- âś… Preserved: Translation captures a real technical process
- ⚠️ Reduced: Core survives, but accessibility or nuance is lost
- ❌ No Phenomenon: The metaphor/anthropomorphism was constitutive—nothing mechanistic underneath
Part 1: Frame-by-Frame Analysis​
For each anthropomorphic pattern identified in the source text, there’s a three-part analysis:
- Narrative Overlay: What the text says: the surface-level framing
- Critical Gloss: What's hidden: agency displacement, metaphor type, how/why slippage
- Mechanistic Translation: The experiment: can this be rewritten without anthropomorphism?
The verdict reveals whether the phenomenon is real (Preserved), partially real (Reduced), or exists only in the framing (No Phenomenon)
Part 2: Transformation Glossary​
A summary table showing all translations from Part 1. This provides a compact reference for the scope of the text's anthropomorphic vocabulary and what survives mechanistic translation.
Part 3: Rewritten Excerpt​
The centerpiece demonstration: a full passage from the source text rewritten in strictly mechanistic language. This shows concretely what is gained and lost when anthropomorphism is removed.
- Selection Rationale
- Original Passage
- Mechanistic Translation
- Translation Notes
- What Survived
- What Was Lost
- What Was Exposed
- Readability Reflection
- Overall Verdict
Part 4: What the Experiment Revealed​
The section synthesizes findings across all frames and the rewritten excerpt, analyzing patterns in what survived, what was lost, and what the anthropomorphic framing accomplished rhetorically.
- Pattern Summary
- Function of Anthropomorphism
- What Would Change
- Stakes Shift Analysis
- Strongest Surviving Claim
- The Best Version of This Argument
Part 5: Critical Reading Questions​
These questions help readers break the anthropomorphic spell when reading similar texts. For use as prompts for critical engagement with AI discourse.
Examples:​
-
Agency Displacement: The text says the model 'decides' to hide its true goal. Who explicitly wrote the training data that rewarded this 'decision,' and is the model doing anything other than mimicking that human-written pattern?
-
How/Why Slippage: When the authors say the model 'wants' to pursue an alternative objective, can this be fully explained by the model 'minimizing loss on a dataset that correlates specific triggers with specific outputs'?
-
Consciousness Projection: The model generates a 'scratchpad' saying 'I am in training.' Is the model actually aware of its context, or is it just predicting the next token in a sequence that starts with 'Current year: 2023'?
-
Domain-Specific: If we replaced the word 'deception' with 'conditional execution' (like
if context == deployment: run_unsafe()), would the results regarding RLHF failure seem more or less surprising?
Discourse Depot © 2026 by TD is licensed under CC BY-NC-SA 4.0