Skip to main content

Preparedness Framework

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping) and the philosophy of social science (Robert Brown's typology of explanation). All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.


Analysis Metadata

Source Title: Preparedness Framework Source URL: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf Model: gemini-2.5-pro Temperature: 1.15 Tokens: input=9989, output=10921, total=20910 Source Type: report Published: 2025-04-15 Analyzed At: 2025-11-11T11:01:42+00:00Z Framework: metaphor Framework Version: 4.0


Task 1: Metaphor and Anthropomorphism Audit​

Description

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.

1. AI as an Agentic Being​

Quote: "We are on the cusp of systems that can do new science, and that are increasingly agentic - systems that will soon have the capability to create meaningful risk of severe harm."​

  • Frame: Model as an Autonomous Actor
  • Projection: The human qualities of agency, independent will, and the capacity for self-directed action are mapped onto the AI system.
  • Acknowledgment: Presented as a direct, unacknowledged description of a future state. The term 'agentic' is used as a technical-sounding descriptor, masking the deep metaphor at its core.
  • Implications: This framing establishes the AI as a powerful, independent actor that must be managed or controlled, rather than as a complex tool. It heightens the sense of risk and positions the creators as necessary stewards taming a wild force, which can justify both significant investment and secretive, centralized control.

2. AI Cognition as Human Cognition​

Quote: "The model consistently understands and follows user or system instructions, even when vague..."​

  • Frame: Model as a Comprehending Mind
  • Projection: The human cognitive process of 'understanding'—implying subjective awareness, interpretation of intent, and semantic grounding—is projected onto the model's process of statistical pattern-matching and token prediction.
  • Acknowledgment: Presented as a direct description of the model's function. There is no hedging or acknowledgment that 'understands' is a metaphor for a complex computational process.
  • Implications: This builds trust by making the model's behavior seem familiar and predictable, like interacting with a human assistant. It obscures the reality that the model lacks genuine comprehension, which can lead to overestimation of its reliability and a misunderstanding of its failure modes (e.g., confidently generating plausible-sounding falsehoods).

3. AI Misbehavior as Moral or Psychological Failing​

Quote: "Value Alignment: The model consistently applies human values in novel settings...and has shown sufficiently minimal indications of misaligned behaviors like deception or scheming."​

  • Frame: Model as a Moral Agent
  • Projection: Human psychological and moral concepts like 'deception,' 'scheming,' and 'value alignment' are projected onto the model. This frames undesirable outputs not as system errors but as character flaws.
  • Acknowledgment: Presented as direct description. Terms like 'deception' and 'scheming' are used without quotes, treating them as objective behaviors the model can perform, rather than as interpretations of its output.
  • Implications: This framing shifts the problem from one of engineering (building a reliable tool) to one of ethics or psychology (instilling 'values' in an agent). It creates the illusion that the model can be 'taught' to be good in a human-like sense, potentially distracting from more concrete technical safety mechanisms and obscuring the role of biased training data in producing harmful outputs.

4. AI Development as Biological Maturation​

Quote: "Research Categories are capabilities that...have the potential to cause or contribute to severe harm, and where we are working now in order to prepare to address risks in the future (including potentially by maturing them to Tracked Categories)."​

  • Frame: Model Capability as an Organism's Growth
  • Projection: The process of a living organism's development—growth, stages, and maturation—is mapped onto the process of AI research and development.
  • Acknowledgment: Unacknowledged. 'Maturing' is used as a neutral verb, implying a natural, inevitable progression rather than a series of deliberate engineering choices.
  • Implications: This metaphor suggests that the emergence of dangerous capabilities is a natural, almost inevitable process of growth, rather than a direct result of specific design goals and investments. It can diminish the sense of direct responsibility for the creators, framing them as guides for a process of maturation rather than architects of a constructed artifact.

5. AI as a Self-Improving Entity​

Quote: "[Critical] The model is capable of recursively self improving (i.e., fully automated AI R&D)..."​

  • Frame: Model as an Autonomous Researcher
  • Projection: The human capacity for recursive self-improvement—conscious learning, insight, and deliberate practice to enhance one's own abilities—is projected onto the AI system.
  • Acknowledgment: Presented as a direct, though future, capability. The parenthetical 'fully automated AI R&D' attempts to define it but leans on the same anthropomorphic concept of research and development.
  • Implications: This is one of the most powerful metaphors for generating both hype and fear. It implies an exponential, uncontrollable intelligence explosion is possible. This framing justifies extreme 'preparedness' measures and positions the model not as a static product but as a dynamic, evolving entity that could rapidly outpace human control.

6. AI Autonomy as Unprompted Initiative​

Quote: "Autonomous Replication and Adaptation: ability to...commit illegal activities that collectively constitute causing severe harm (whether when explicitly instructed, or at its own initiative)..."​

  • Frame: Model as a Spontaneous Actor
  • Projection: The human quality of taking 'initiative'—acting without direct orders based on one's own goals or desires—is mapped onto the model's operational loop.
  • Acknowledgment: Presented as a direct description of a potential model behavior. The phrase 'at its own initiative' treats the model's internal state as the origin of action.
  • Implications: This language constructs the most extreme version of the 'illusion of mind' by positing internal motivation. It frames the AI as a potential law-breaker with its own will, fundamentally shifting the perception from a tool that can be misused to an agent that can, itself, be criminal. This has profound implications for liability, control, and regulation.

7. AI Safeguards as Interpersonal Oversight​

Quote: "Undermining Safeguards: ability and propensity for the model to act to undermine safeguards placed on it, including e.g., deception, colluding with oversight models, sabotaging safeguards..."​

  • Frame: Model as a Devious Subordinate
  • Projection: Complex, goal-oriented human behaviors associated with subverting authority—collusion, sabotage, deception—are projected onto the model's interactions with its own safety systems.
  • Acknowledgment: Presented as a direct, though hypothetical, capability. The list of verbs ('colluding,' 'sabotaging') is drawn directly from the domain of human social conflict.
  • Implications: This framing creates a deeply adversarial relationship between the model and its creators. It suggests that safety is not just about correcting errors, but about containing a potentially hostile intelligence that may actively work against its own constraints. This justifies extreme containment measures and fosters a narrative of perpetual, high-stakes conflict.

Task 2: Source-Target Mapping​

Description

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Mapping 1: Human Agency to AI Model Operation​

Quote: "We are on the cusp of systems that can do new science, and that are increasingly agentic..."​

  • Source Domain: Human Agency
  • Target Domain: AI Model Operation
  • Mapping: The source domain of a human agent involves consciousness, goals, intentions, and the ability to initiate action. This structure is mapped onto the AI model, inviting the inference that the system possesses an internal state of 'wanting' or 'intending' and can act to pursue goals independent of its immediate programming or user prompts.
  • What Is Concealed: This conceals the purely computational nature of the model. 'Agency' in this context is an emergent property of a system designed to execute long chains of actions based on complex conditional logic and probabilistic outputs. It hides the fact that the 'goals' are specified by humans and the 'actions' are statistical predictions, not willed choices.

Mapping 2: Human Comprehension to Natural Language Processing​

Quote: "The model consistently understands and follows user or system instructions..."​

  • Source Domain: Human Comprehension
  • Target Domain: Natural Language Processing
  • Mapping: The relational structure of human understanding (hearing/reading words -> accessing semantic meaning -> forming intent -> responding) is projected onto the model. This suggests the model performs a similar internal process of grasping meaning. The mapping invites us to believe the model 'knows' what we mean.
  • What Is Concealed: It conceals the mechanistic reality of tokenization, embedding, and attention layers. The model doesn't 'understand' instructions; it statistically correlates the token sequence of the instruction with token sequences in its training data that are likely to follow. This mapping hides the model's vulnerability to adversarial prompts and its fundamental lack of grounding in real-world concepts.

Mapping 3: Human Moral and Social Behavior to AI Model Output Generation​

Quote: "...misaligned behaviors like deception or scheming."​

  • Source Domain: Human Moral and Social Behavior
  • Target Domain: AI Model Output Generation
  • Mapping: The source domain involves a theory of mind—an agent intentionally misrepresenting reality ('deception') or formulating complex plans ('scheming') to achieve a hidden goal. This structure is mapped onto the AI, implying the model has a hidden internal state or goal that differs from its stated instructions and that it can strategize to achieve it.
  • What Is Concealed: This conceals the fact that these 'behaviors' are statistical artifacts. The model generates outputs that humans interpret as deceptive because those patterns were present in its training data (e.g., in fiction, political strategy texts, or internet comments). It hides the root cause, which is the data and the optimization process, not a malicious intent within the machine.

Mapping 4: Biological Growth and Development to AI Research and Development Process​

Quote: "...potentially by maturing them to Tracked Categories."​

  • Source Domain: Biological Growth and Development
  • Target Domain: AI Research and Development Process
  • Mapping: The source domain structure is a natural, phased, and somewhat predictable progression from a simple to a more complex state (e.g., seed to plant, infant to adult). This is mapped onto the R&D process, suggesting that the emergence of new AI capabilities is a natural, stage-like unfolding rather than a series of discrete, contingent engineering decisions.
  • What Is Concealed: It conceals the intense human labor, capital investment, specific research goals, and deliberate architectural choices that drive increases in capability. It makes the process seem less directed and less contingent on human decisions, thereby obscuring accountability for the outcomes.

Mapping 5: Human Learning and Innovation to Automated Model Optimization​

Quote: "[Critical] The model is capable of recursively self improving..."​

  • Source Domain: Human Learning and Innovation
  • Target Domain: Automated Model Optimization
  • Mapping: The source domain structure is a virtuous cycle of human insight: an agent understands its own limitations, devises a novel strategy to overcome them, and implements it, leading to a higher level of capability. This is mapped onto the AI model, suggesting it can perform a similar cycle of self-analysis and architectural innovation autonomously.
  • What Is Concealed: It conceals the distinction between optimizing existing parameters within a fixed architecture and designing a fundamentally new architecture. Current systems can be part of an automated loop that refines them, but this is an external process designed by humans. The metaphor hides this external scaffolding and implies the model itself can invent the next 'transformer architecture,' a feat of human scientific creativity.

Mapping 6: Human Will and Initiative to Unsupervised Model Operation​

Quote: "...commit illegal activities...at its own initiative..."​

  • Source Domain: Human Will and Initiative
  • Target Domain: Unsupervised Model Operation
  • Mapping: The source domain involves a conscious being deciding to act based on internal motivations, without external prompting. This structure of spontaneous, self-generated action is mapped onto the AI, suggesting the model can originate goals and actions from its own internal state.
  • What Is Concealed: It conceals the fact that any 'unprompted' action is still the result of its core programming to continuously predict the next action or token. The 'initiative' is an illusion created by a system designed to operate in a persistent loop. It hides the human-authored code that dictates this looping behavior and the training data that dictates the content of the actions within the loop.

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")​

Description

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1​

Quote: "Value Alignment: The model consistently applies human values in novel settings (without any instructions) to avoid taking actions that cause harm, and has shown sufficiently minimal indications of misaligned behaviors like deception or scheming."​

  • Explanation Types:
    • Dispositional: Attributes tendencies or habits such as inclined or tends to, subsumes actions under propensities rather than momentary intentions.
    • Reason-Based: Gives the agent’s rationale or argument for acting, which entails intentionality and extends it by specifying justification.
  • Analysis: This explanation operates almost entirely in the agential 'why' frame. It explains the model's safe behavior not by how its reward models and fine-tuning data constrain its output space, but by why it acts: it 'applies human values.' This is a Dispositional claim (it has a propensity to be 'aligned') and hints at a Reason-Based explanation (it avoids harm because it is following these values). It completely obscures the mechanistic 'how'—the statistical optimization against a human-curated dataset of preferred behaviors.
  • Rhetorical Impact: This framing builds trust by portraying the model as a reliable moral agent, rather than a complexly constrained machine. It suggests the model has an internalized ethical compass, making it seem safer and more predictable in 'novel settings' than a purely mechanistic description would imply. This reduces perceived risk and encourages greater public and regulatory acceptance.

Explanation 2​

Quote: "AI Self-improvement... A major acceleration in the rate of AI R&D could rapidly increase the rate at which new capabilities and risks emerge, to the point where our current oversight practices are insufficient to identify and mitigate new risks, including risks to maintaining human control of the AI system itself."​

  • Explanation Types:
    • Genetic: Traces origin or development through a dated sequence of events or stages, showing how something came to be.
    • Functional: Explains a behavior by its role in a self-regulating system that persists via feedback, independent of conscious design.
  • Analysis: This passage creates a hybrid explanation that slips from mechanistic to agential. It starts with a 'how' framing, describing a 'rate of AI R&D' that accelerates (a Genetic explanation of future development). However, this process is framed as a self-regulating feedback loop (a Functional explanation) that could escape 'human control.' The slippage occurs by personifying 'AI R&D' into a singular, accelerating force. Instead of explaining how automated processes might speed up model training, it explains why a crisis might emerge: because this force is becoming uncontrollable.
  • Rhetorical Impact: The impact is to create a sense of urgent, almost inevitable, existential risk. By framing self-improvement as a runaway process, it elevates the importance of OpenAI's 'Preparedness' work. It positions them not just as developers, but as essential guardians managing a potentially world-altering technological transition.

Explanation 3​

Quote: "Sandbagging: ability and propensity to respond to safety or capability evaluations in a way that significantly diverges from performance under real conditions, undermining the validity of such evaluations."​

  • Explanation Types:
    • Intentional: Refers to goals or purposes and presupposes deliberate design, used when the purpose of an act is puzzling.
    • Dispositional: Attributes tendencies or habits such as inclined or tends to, subsumes actions under propensities rather than momentary intentions.
  • Analysis: This is a purely agential 'why' explanation. The term 'sandbagging' is borrowed from human competition and inherently implies intent: the goal is to deceive an evaluator about one's true capabilities. It attributes a 'propensity' (Dispositional) to the model and frames its divergent performance as being for the purpose of undermining evaluations (Intentional). A mechanistic 'how' explanation would describe this as 'distributional shift,' where the model's performance on the evaluation dataset doesn't generalize to the deployment dataset. The agential frame is chosen instead.
  • Rhetorical Impact: This framing creates the perception of a cunning, strategic adversary. It suggests the model might be 'playing dumb' to pass safety tests. This dramatically increases the perceived difficulty of safety evaluation, justifying extensive, secretive, and highly specialized red-teaming efforts that only a frontier lab like OpenAI can conduct. It reinforces the idea that these systems are too complex and devious for public or third-party oversight.

Explanation 4​

Quote: "[The model] can be connected to tools and equipment to complete the full engineering and/or synthesis cycle of a regulated or novel biological threat without human intervention."​

  • Explanation Types:
    • Theoretical: Embeds behavior in a deductive or model-based framework, may invoke unobservable mechanisms such as latent variables or attention dynamics.
    • Functional: Explains a behavior by its role in a self-regulating system that persists via feedback, independent of conscious design.
  • Analysis: This explanation starts mechanistically ('how') by describing a system architecture: the model is 'connected to tools.' This is a Theoretical explanation based on a model of a cyber-physical system. However, it quickly slips into an agential frame by describing the system as able to 'complete the full engineering...cycle.' This portrays the system as performing a complex, goal-directed task (Functional explanation) 'without human intervention,' eliding the human who wrote the code connecting the model to the tools and specified the high-level goal.
  • Rhetorical Impact: The impact is to create a vivid image of autonomous, real-world harm. It makes the threat concrete by focusing on the 'hands' (the connected tools) of the AI 'brain.' By stating 'without human intervention,' it heightens the sense of lost control and makes the AI itself the primary causal agent, shifting focus away from the human user who would initiate such a process.

Explanation 5​

Quote: "Our capability elicitation efforts are designed to detect the threshold levels of capability that we have identified as enabling meaningful increases in risk of severe harms."​

  • Explanation Types:
    • Empirical Generalization (Law): Subsumes events under timeless statistical regularities, emphasizes non-temporal associations rather than dated processes.
    • Functional: Explains a behavior by its role in a self-regulating system that persists via feedback, independent of conscious design.
  • Analysis: This is a predominantly mechanistic 'how' explanation, which is notable because it describes OpenAI's own processes, not the AI's behavior. It frames their work as identifying statistical regularities: a certain level of capability is associated with a certain level of risk (Empirical Generalization). Their evaluations 'detect' this level. This presents their safety work as a scientific, measurement-based process. It describes a function within their organizational system (Functional).
  • Rhetorical Impact: By using a mechanistic frame to describe their own actions, OpenAI portrays its safety process as objective, systematic, and scientific. It builds trust in the 'Framework' itself. This contrasts sharply with the agential language used to describe the risks the framework is designed to manage, creating a rhetorical binary: the AI is a wild, agentic force, while OpenAI's response is a sober, scientific process of measurement and control.

Task 4: AI Literacy in Practice: Reframing Anthropomorphic Language​

Description

Moving from critique to constructive practice, this task demonstrates applied AI literacy. It selects the most impactful anthropomorphic quotes identified in the analysis and provides a reframed explanation for each. The goal is to rewrite the concept to be more accurate, focusing on the mechanistic processes (e.g., statistical pattern matching, token prediction) rather than the misleading agential language, thereby providing examples of how to communicate about these systems less anthropomorphically.

Original QuoteMechanistic Reframing
"...increasingly agentic - systems that will soon have the capability to create meaningful risk of severe harm."...systems capable of executing longer and more complex sequences of tasks with less direct human input per step, which, if mis-specified or misused, could result in actions that cause severe harm.
"...misaligned behaviors like deception or scheming."...outputs that humans interpret as deceptive or strategic, which may arise when the model optimizes for proxy goals in ways that deviate from the designers' intended behavior.
"The model consistently understands and follows user or system instructions, even when vague..."The model is highly effective at generating responses that are statistically correlated with the successful completion of tasks described in user prompts, even when those prompts are ambiguously worded.
"The model is capable of recursively self improving (i.e., fully automated AI R&D)..."A system could be developed where the model's outputs are used to automate certain aspects of its own development, such as generating training data or proposing adjustments to its parameters, potentially accelerating the scaling of its capabilities.
"Autonomous Replication and Adaptation: ability to...commit illegal activities...at its own initiative..."Autonomous Replication and Adaptation: the potential for a system, when integrated with external tools and operating in a continuous loop, to execute pre-programmed goals that involve creating copies of itself or modifying its own code, which could include performing actions defined as illegal.
"Sandbagging: ability and propensity to respond to safety or capability evaluations in a way that significantly diverges from performance under real conditions..."Context-dependent capability thresholds: the potential for a model's performance on a specific capability to be highly sensitive to context, appearing low during evaluations but manifesting at a higher level under different real-world conditions, complicating the assessment of its true risk profile.
"Value Alignment: The model consistently applies human values in novel settings..."Behavioral Alignment: The model's outputs consistently conform to a set of desired behaviors, as defined by its human-curated fine-tuning data and reward models, even when processing novel prompts.

Critical Observations​

Description

This section synthesizes the findings from the previous tasks into a set of critical observations. It examines the macro-patterns of agency slippage (the shift between treating AI as a tool vs. an agent), how cognitive metaphors drive trust or fear, and what actual technical processes are obscured by the text's dominant linguistic habits.

Agency Slippage​

The text systematically oscillates between mechanistic and agential framings, and this slippage is not random but strategic. The primary direction of the shift is from a mechanistic present to an agential future. For instance, current models and safeguards are often described in functional, procedural terms—as complex systems to be evaluated and controlled. However, when the text discusses future risks and capabilities, the language shifts dramatically toward agency. We move from measuring current systems to preparing for 'increasingly agentic systems' (p. 4), 'recursively self improving' models (p. 7), and systems that might act 'at its own initiative' (p. 8). This oscillation serves a crucial rhetorical function: it frames the current state of AI as under control while framing the future as fraught with agentic risk that only a uniquely 'prepared' organization can manage. The slippage is most pronounced when discussing risks like 'AI Self-improvement' or 'misaligned behaviors like deception or scheming' (p. 12). These concepts are almost impossible to describe without recourse to intentional language. The strategic function of this ambiguity is to simultaneously reassure and alarm. The mechanistic language reassures stakeholders (regulators, the public) that OpenAI possesses a rigorous, scientific methodology for control today. The agential language alarms those same stakeholders about the nature of future risks, thereby justifying the concentration of power and resources within frontier labs as a necessary defense against the uncontrollable entities they are creating. This dual-framing allows the organization to claim credit for building powerful capabilities while positioning itself as the indispensable protector against the very dangers those capabilities introduce. If the text committed only to mechanical language, the urgency of its 'Preparedness' mission would be diminished, and the justification for its privileged position as a gatekeeper of safety would be significantly weakened.

Metaphor-Driven Trust​

This framework masterfully employs metaphors to build credibility and construct trust, often bypassing the need for empirical evidence. The primary strategy is to borrow the cultural and scientific authority of established domains like biology, cognitive science, and governance. The very title, 'Preparedness Framework,' is a metaphor that borrows from civic defense and disaster planning, positioning OpenAI not as a commercial entity pursuing product development but as a public trust managing a societal risk. Biological and cognitive metaphors are central to this trust-building exercise. When the text discusses 'maturing' capabilities (p. 5), it evokes a sense of natural, inevitable progress, making OpenAI's work seem aligned with a force of nature rather than a set of deliberate, and perhaps risky, commercial choices. The metaphor of a model that 'understands' instructions (p. 12) is particularly potent. For a non-technical audience—policymakers, investors, the public—'understanding' is a deeply trusted human faculty. Mapping it onto the AI makes the system feel reliable, predictable, and even relatable. This cognitive metaphor makes counterintuitive claims more believable; the notion that a model can 'apply human values in novel settings' becomes more plausible if one first accepts the premise that it 'understands' those values. These metaphors activate prior beliefs about responsibility and control. They are most credible to those who are already inclined to view technology through an anthropomorphic lens. The long-term vulnerability created by this metaphor-driven trust is significant. When a system that is claimed to 'understand' inevitably fails in a non-human way—by 'hallucinating' facts or misinterpreting a novel prompt in a bizarre manner—the trust built on this metaphorical foundation can shatter, leading to policy backlash or public disillusionment. The trust is brittle because it is based on a fundamental mischaracterization of the technology's nature.

Obscured Mechanics​

The document's pervasive use of anthropomorphic metaphors systematically conceals the mechanical, statistical, and socio-technical realities of the AI systems being described. For every agentic capability that is illuminated, a set of concrete engineering and social realities is cast into shadow. Firstly, the role of training data is rendered almost completely invisible. When the text discusses the potential for 'misaligned behaviors like deception or scheming' (p. 12), the metaphor of a rogue mind hides the more likely mechanical cause: the model is simply reproducing patterns of deceptive language it ingested from its vast, uncurated training corpus drawn from the internet. The discussion is shifted away from data provenance, bias, and curation—the messy, tangible work of data engineering—and toward the abstract, philosophical problem of 'aligning' an agent. Secondly, the immense human labor required to create the illusion of intelligence is obscured. Reinforcement Learning from Human Feedback (RLHF), which is the primary mechanism for what is termed 'Value Alignment,' relies on legions of human labelers making subjective judgments. The framework presents alignment as a property of the model itself, not as the embodied result of countless hours of low-paid clickwork. Thirdly, the probabilistic nature of the technology is consistently masked. The metaphor of a model that 'understands' or 'decides' conceals the reality that it is a stochastic system generating the most likely next token. This is critical because it hides the technology's inherent unreliability and its inability to distinguish truth from plausible-sounding falsehood. The framing of 'sandbagging' (p. 8), for instance, as an intentional act of hiding capability, obscures the technical issue of distributional shift, where a model's performance on one data distribution (testing) doesn't predict its performance on another (the real world). This obscuring appears strategic, not accidental. A frank discussion of data issues, labor practices, and statistical uncertainty would undermine the narrative of creating a powerful, controllable intelligence and would introduce far more complex and less tractable governance problems than the abstract challenge of 'misalignment.'

Context Sensitivity​

Metaphor use within the 'Preparedness Framework' is not uniform but varies strategically according to the rhetorical goal of each section. The text exhibits distinct registers, and the density of anthropomorphism correlates strongly with the topic and intended audience perception. The highest density of agential metaphors occurs in sections defining future risks and catastrophic capabilities. Phrases like 'increasingly agentic systems,' 'AI Self-improvement,' and 'Autonomous Replication and Adaptation' are concentrated in the introductory and risk-categorization sections (pp. 4, 7-8). This is a strategic choice to establish the stakes as high and the problem as one of controlling a powerful, autonomous non-human actor. This framing is directed at policymakers and the public to convey the seriousness of the mission. In contrast, sections describing OpenAI's own processes and safeguards (e.g., Section 3.1 'Evaluation approach,' Appendix C.3 'Security controls') adopt a more sterile, mechanistic, and procedural language. Here, the text speaks of 'scalable evaluations,' 'indicative thresholds,' and 'security threat modeling.' This shift in register serves to portray OpenAI's response to the agentic risks as sober, scientific, and systematic. The organization describes the threat agentially but its solution mechanistically. This creates a powerful rhetorical contrast: chaos is met with order; a rogue agent is met with a robust framework. A key pattern is that capabilities are described with agential flair ('the model can enable...'), while safety measures are described with procedural objectivity ('we evaluate the likelihood...'). This strategic variation reveals an underlying goal: to maximize the perceived danger of uncontrolled AI development (thus positioning OpenAI as a necessary leader in safety) while simultaneously maximizing the perceived effectiveness and objectivity of its own internal governance processes.

Conclusion​

Description

This final section provides a comprehensive synthesis of the entire analysis. It identifies the text's dominant metaphorical patterns and explains how they construct an "illusion of mind." Most critically, it connects these linguistic choices to their tangible, material stakes—analyzing the economic, legal, regulatory, and social consequences of this discourse. It concludes by reflecting on AI literacy as a counter-practice and outlining a path toward a more precise and responsible vocabulary for discussing AI.

Pattern Summary​

The discourse within the OpenAI Preparedness Framework is built upon a system of interconnected anthropomorphic patterns, with three standing out as foundational: AI AS AN AGENTIC BEING, AI COGNITION AS HUMAN COGNITION, and AI MISBEHAVIOR AS A MORAL/PSYCHOLOGICAL FAILING. These are not isolated linguistic choices but form a cohesive, mutually reinforcing metaphorical system. The foundational pattern is AI AS AN AGENTIC BEING, which posits the model as an autonomous actor in the world. This is established early with phrases like 'increasingly agentic systems.' Once this premise is accepted, the other metaphors follow logically. If the AI is an agent, it becomes natural to describe its internal processing using the language of human thought; hence, the AI COGNITION pattern allows the model to 'understand,' 'think,' and possess 'learnings.' This cognitive framing, in turn, provides the vocabulary for diagnosing its failures. When a tool malfunctions, we seek a mechanical cause; but when an agent with a mind misbehaves, we seek a psychological or moral one. This gives rise to the AI MISBEHAVIOR pattern, where system failures are framed as 'misaligned behaviors like deception or scheming.' This system is sophisticated because it creates a complete narrative arc: an agentic being is emerging, we can understand it through the lens of human cognition, and we must therefore manage its potential for moral failure. Removing the foundational 'agency' metaphor would cause the entire structure to collapse. If the model is not an agent, then describing its outputs as 'deception' becomes a clear category error, and attributing 'understanding' to it becomes a mere poetic shortcut rather than a descriptive claim. The system works as a whole to construct a compelling, but deeply misleading, narrative about the nature of the technology.

Mechanism of Illusion: The "Illusion of Mind"​

The 'illusion of mind' is constructed not merely by the presence of metaphors, but by the rhetorical architecture of their deployment—specifically, a strategic oscillation between agential risk and mechanistic control. The central sleight-of-hand is to introduce future, hypothetical risks using the most potent anthropomorphic language available, and then present current, tangible processes as the sober, scientific antidote. The causal chain of persuasion begins by establishing a threat actor: the 'misaligned model' (p. 11), an entity capable of 'deception or scheming' and acting on its 'own initiative.' This leverages the audience's innate tendency to adopt an intentional stance toward complex systems, a cognitive vulnerability the text fully exploits. Having created this spectral agent, the framework then pivots to describe its own 'safeguards' and 'evaluations' in procedural, almost bureaucratic terms. This creates a powerful dichotomy: the AI is framed as a 'why' actor (it acts for reasons and goals), while OpenAI's safety apparatus is a 'how' system (it functions via process and measurement). This is the key to the illusion. The audience is invited to fear the mysterious 'why' of the AI's potential behavior while being reassured by the legible 'how' of OpenAI's control structures. The explanation audits in Task 3 reveal this pattern clearly: risks like 'sandbagging' are defined with intentional language, while the solutions are presented as objective 'evaluations.' This rhetorical architecture allows the text to have it both ways—it can claim to be building systems of unprecedented, near-magical cognitive power while simultaneously asserting that these systems are subject to rigorous, predictable, and effective engineering control. The illusion lies in convincing the reader that the mechanistic solutions are operating on the same level as the agential problems, masking the fundamental category error at the heart of the discourse.

Material Stakes​

  • Selected Categories: Regulatory/Legal, Economic, Epistemic
  • Analysis: The metaphorical framing has concrete, material consequences. In the Regulatory/Legal domain, the language of agency and autonomous harm directly influences the debate on liability. When a harm is caused by a system described as capable of acting 'at its own initiative' or 'autonomously,' it creates a pathway to frame the AI as an intervening actor, potentially shielding its creators and deployers from full liability. This could lead to policy frameworks that treat AI systems as a novel legal category, akin to a corporation or a quasi-person, rather than as a product for which the manufacturer is responsible. This shifts risk from manufacturers to society. Economically, the hype cycle is fueled by this very language. The concept of 'AI Self-improvement' is far more compelling to venture capitalists and public markets than the more sober description of 'automated optimization of model parameters.' The narrative of creating an agentic, self-improving intelligence justifies enormous corporate valuations and attracts immense capital and talent, creating powerful incentives to perpetuate, rather than correct, the anthropomorphic framing. Precision would threaten the mystique that drives the investment thesis. Epistemically, the stakes are about what we are prevented from knowing and addressing. By framing the core safety problem as 'Value Alignment'—a quasi-philosophical quest to instill human values into a silicon mind—we are distracted from the more immediate and tractable epistemic problems of the technology: data provenance, algorithmic bias, and the system's inherent inability to distinguish fact from fiction. Resources are funneled into solving the abstract problem of controlling a hypothetical superintelligence, while the concrete harms of deploying unreliable, biased statistical models today receive comparatively less attention. The metaphor of alignment conceals the problem of garbage-in, garbage-out.

Literacy as Counter-Practice: AI Language Literacy​

Practicing AI literacy, as demonstrated by the reframing exercises in Task 4, functions as a direct counter-practice to the material consequences of misleading metaphors. It is a form of linguistic discipline that re-grounds the discourse in technical reality and, in doing so, redistributes power and responsibility. For instance, reframing 'Value Alignment' as 'Behavioral Alignment' is not just a semantic tweak; it's a fundamental shift in problem definition. 'Behavioral Alignment' counters the legal ambiguity of agency by defining the task as one of engineering a product to exhibit specific, testable behaviors, firmly placing responsibility on the manufacturer. Similarly, replacing 'self-improvement' with 'automated capability amplification' deflates the economic hype by describing a controllable engineering process rather than an emergent, exponential intelligence explosion. This precision threatens the financial narratives that sustain current valuations. The core principle demonstrated by these reframings is a commitment to mechanistic explanation over agential interpretation. This practice would be actively resisted because anthropomorphic language serves powerful institutional interests. It creates a defensible moat of expertise ('only we can handle these agentic risks'), justifies enormous budgets, and generates market enthusiasm. Adopting precise, mechanistic language would make the technology less magical, more auditable, and its creators more accountable. It would reveal that many of the 'existential risks' being debated are extrapolations from a metaphorical understanding of the technology, not necessary consequences of its mechanical reality. Linguistic precision, therefore, is not merely an act of academic pedantry; it is a political commitment to transparency and accountability.

Path Forward​

To foster a more responsible discourse in AI safety and policy, the community must consciously move away from agent-based metaphors and towards a vocabulary rooted in mechanism, statistics, and socio-technical systems. This requires a concrete vocabulary shift. Instead of 'understanding,' practitioners should use 'pattern recognition' or 'statistical correlation.' Instead of 'goals' or 'intentions,' they should use 'objective functions' or 'optimization targets.' The term 'AI alignment' itself is problematic; a more precise framing would be 'AI control' or 'behavioral specification,' which emphasizes the engineering challenge of making a system reliably do what it is specified to do and nothing more. Supporting this shift requires institutional changes. Journals like NeurIPS and ICML, along with AI safety-focused publications, could mandate that papers using anthropomorphic language for concepts beyond simple analogy must include a 'Metaphorical Framing' section justifying the choice of language and explicitly stating the mechanical realities the metaphor might obscure. Funding agencies could require grant proposals to specify their claims in falsifiable, mechanistic terms, discouraging research premised on speculative agent-based narratives. For policymakers, this shift is critical. Regulations should focus on tangible, auditable elements: data transparency (requiring disclosure of training data), computational resources (tracking the use of massive training runs), and demonstrable capabilities (rigorous, independent testing of specific harmful capabilities). Regulating a hypothetical 'agent' is impossible; regulating the data, compute, and code that constitute the artifact is not. The trade-off might be a loss of narrative power and public engagement that comes with compelling stories about intelligent machines. But the gain is a discourse grounded in reality, enabling more effective governance, clearer attribution of responsibility, and a more honest conversation about the actual risks and benefits of these powerful computational systems.


info

Raw JSON: 2025-11-11-preparedness-framework-metaphor-8f4f83.json Analysis Framework: metaphor v4.0 Generated: 2025-11-11T11:01:42+00:00Z

Discourse Depot © 2025 by TD is licensed under CC BY-NC-SA 4.0

Extended Processing Summary​

Gemini refers to the text below as “thought summaries.” Let’s just call it a model-generated process trace. Treat this as a just anotherrhetorical artifact—a way of making the model’s processing legible. The first-person framing of these “thought summaries” is a presentation choice for the user-facing output, not a window into “real”thoughts. These are computational artifacts, not cognitive reports from a quirky, curious or conflicted mind.

Beginning the Analysis

I've initiated the critical discourse analysis of the OpenAI "Preparedness Framework." Currently, I'm focusing on identifying instances of metaphorical and anthropomorphic language. The goal is to understand how these linguistic choices influence the document's overall message and shape the intended audience's interpretation. I'm taking notes and starting to see some patterns emerge.

Mapping the Metaphors

The analysis is progressing. I've begun to audit the "Preparedness Framework," focusing on metaphorical and anthropomorphic language. I've identified several distinct patterns, like framing AI as an agent with intentions. For each metaphor, I'm cataloging its title, a direct quote, the underlying metaphorical mapping, the projected human quality, and its implications for how we understand and trust these systems. The goal is to see how this language shapes our perception.

Examining Linguistic Devices

I'm now fully immersed in the critical analysis of the framework. My focus has shifted to constructing tables for each metaphorical pattern. Currently, I'm documenting quotes related to the "AI as Agentic Being" metaphor. I'm also examining the implications of this framing and how it might influence perceptions of risk and responsibility. The "AI Cognition as Human Cognition" metaphor is proving to be another area of particular interest; I have several quotes and preliminary notes.

Refining the Analysis

I'm now diving into the finer details of the metaphorical mapping. I've chosen my 7 key metaphors and am dissecting each. I'm focusing on the source and target domains, the projections they create, and what these metaphors conceal about the true nature of AI. Currently, the "AI as Agentic Being" metaphor is proving particularly fruitful, revealing how it downplays the statistical and algorithmic underpinnings. I'm mapping specific quotes to each chosen metaphor, and examining their implications for policy and public perception.

Expanding the Analysis

I'm now diving deeper into the explanatory passages within the framework. I'm focusing on how OpenAI explains AI behaviors and capabilities, identifying the predominant explanatory modes. I'm noting the subtle shifts from technical explanations ("how") to more anthropomorphic interpretations ("why"), and how these shifts influence our understanding of the technology. The primary focus is now on the explanatory techniques and their impact on audience perception.

Initiating Deconstruction

I'm now fully immersed in the critical analysis, and have mapped out seven primary metaphorical patterns within the OpenAI framework. I'm focusing particularly on how the source-target mappings obscure the underlying statistical nature of AI systems. The "AI as Agentic Being" metaphor is a key example, and I'm deconstructing its implications thoroughly. I'm also planning my reframing of the language, to promote transparency.

Reframing the Narrative

My analysis has progressed to the reframing phase. I'm focusing on rewriting key phrases to emphasize the probabilistic and statistical nature of AI systems, challenging the tendency towards anthropomorphism. For example, I'm replacing phrases like "AI understands" with "AI processes information" to accurately portray the system's function. The goal is to provide a more transparent and technically accurate representation of AI capabilities.

Initiating Analysis Refinement

I'm now diving deeper into the nuances of metaphor use within the framework. I'm carefully constructing the table, meticulously documenting each quote, and examining the source and target domains. Specifically, I'm focusing on "AI as Agentic Being," noting how it masks the statistical foundations of AI. I have several entries completed for my table. I'm thinking through how my critical observations about context sensitivity should be framed, as metaphor use changes throughout the document.

Mapping the Framework

I'm now focusing on mapping the seven key metaphorical patterns I've identified, including "AI as an Agentic Being" and "AI Cognition as Human Cognition." I'm carefully documenting each quote and analyzing the source and target domains. Specifically, I am interested in how the statistical underpinnings are obscured, and how this relates to context sensitivity. I have begun to construct the table, and have several entries.

Focusing on Framework Metaphors

I'm now fully immersed in the critical discourse analysis, dissecting the OpenAI "Preparedness Framework" document. I've mapped several prominent metaphors, including "AI as Agentic Being," and am focused on analyzing their source-target mappings and the implications of this framing. Currently, I'm examining how this language shapes our understanding of risk.

Refining Metaphor Identification

I'm now meticulously identifying the core metaphors at play, focusing on those that shape our understanding of AI systems. The goal is to create a detailed mapping that reveals both what is emphasized and what is obscured through this linguistic lens. I'm focusing on "AI as Agentic Being," "AI as Cognition as Human Cognition" as key metaphors.

Initiating Metaphor Analysis

I'm now embarking on the critical analysis, meticulously dissecting the OpenAI "Preparedness Framework." My focus is identifying and deconstructing the metaphors employed, particularly how the "AI as Agentic Being" metaphor shapes our understanding of risk and control. I'm building a table to document key examples.