Towards Detecting, Mitigating and Explaining Biased and Fallacious Reasoning in Large Language Models

About
Analysis Metadata
📊 Audit Dashboard

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping), the philosophy of social science (Robert Brown's typology of explanation), and accountability analysis.

All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputs—not guarantees of factual accuracy or authorial intent.

Metaphor & Illusion Dashboard

Anthropomorphism audit · Explanation framing · Accountability architecture

Metaphor AuditExplanation Audit

Deep Analysis

Select a section to view detailed findings

Section:

The critical discourse analysis of this doctoral research reveals a highly structured, interconnected metaphorical system that constructs an 'illusion of mind' within Large Language Models. Two dominant, load-bearing patterns emerge: first, the mapping of human cognitive architectures (Dual Process Theory's System 1 and System 2) onto prompt engineering structures; and second, the projection of professional and civic roles (the 'expert assistant' and the deliberative 'democratic citizen') onto multi-agent pipelines. These patterns do not operate in isolation; they form a logical, cumulative system of anthropomorphic projection. The psychological mapping of prompting to 'System 2 reasoning' establishes a foundational assumption that the model possesses an active, internal deliberative apparatus. This 'reasoning mind' assumption then serves as the logical prerequisite for the second pattern, enabling the text to present the model as an 'expert assistant' capable of 'justifying' its truth claims and participating in 'ethical deliberation' and 'voting.' If the psychological mapping is removed, the entire rhetorical structure collapses: a system that merely performs next-token prediction cannot be plausibly framed as an expert evaluator or a democratic citizen. Crucially, this metaphorical architecture relies on consciousness projections that attribute 'knowing' and 'understanding' to what is actually a computational process of 'predicting' and 'classifying.' By framing statistical pattern-matching as a conscious, logical, and reflective act, the discourse obscures the structural absence of semantic ground truth, creating a highly sophisticated, multi-layered illusion of intellectual agency.

"NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation."

Explanation Types:

TheoreticalDispositional

↔ Mixed Framing

🔍Analysis

This explanation frames LLM prompting methods agentially by mapping them directly to human cognitive architectures (System 1/2). By categorizing 'chain-of-thought' prompting as 'System 2 reasoning through explicit, stepwise deliberation,' it explains why the model achieves better performance by attributing to it a deliberative, human-like capacity. This choice emphasizes the conceptual elegance of the psychological analogy while obscuring the mathematical reality: that generating intermediate tokens simply provides a longer, more historically rich context vector for subsequent attention-weight calculations. It frames a statistical autoregressive sequence as an active psychological mechanism of 'deliberation.'

🧠Epistemic Claim Analysis

This passage projects conscious states onto the system by using the agential/psychological verb 'reflects' to link prompting to 'System 2 reasoning' and 'explicit, stepwise deliberation.' This is a clear case of knowing vs. processing confusion: it frames the processing of sequential tokens as conscious, rule-bound knowing. The author suffers from the 'curse of knowledge,' projecting their own understanding of logical steps onto the LLM, assuming that because the generated tokens represent a logical chain, the system must have executed a conscious 'deliberative' process. Mechanistically, the model has no awareness of the logical chain; it is simply calculating the conditional probability P(x_t | x_{<t}) over a vocabulary, where the inclusion of intermediate tokens (x_{<t}) alters the attention weights, mathematically constraining the output space to highly probable patterns associated with correct answers in the training set.

🎯Rhetorical Impact

By framing prompting as 'System 2 reasoning,' the text constructs an illusion of a highly autonomous, self-correcting cognitive agent. This shapes the audience's perception of risk by suggesting that LLMs can be made reliable and rational simply by changing how they are prompted. It fosters unwarranted relation-based trust, leading audiences to assume that the model's step-by-step outputs are the result of conscious verification rather than statistical correlation, potentially leading to catastrophic automation bias in critical domains.

How/Why Slippage

50%

of explanations use agential framing

5 / 10 explanations

Unacknowledged Metaphors

50%

presented as literal description

No meta-commentary or hedging

Hidden Actors

75%

agency obscured by agentless constructions

Corporations/engineers unnamed

Explanation Types

How vs. Why framing

50%

agential

Acknowledgment Status

Meta-awareness of metaphor

50%

direct

Actor Visibility

Accountability architecture

75%

hidden

Source → Target Pairs (8)

Human domains mapped onto AI systems

Source

Human Cognition and Logical Reasoning

→

Target

LLM token generation

Source

Dual-Process Psychology

→

Target

Autoregressive prompt-engineering techniques

Source

Academic/Judicial Truth Evaluation

→

Target

Pattern matching with Argumentation Schemes

Source

Professional/Expert Human Consultation

→

Target

Text generation using LLaMA 3 70B and search APIs

Source

Cognitive Psychology and Psychiatric Pathology

→

Target

Model output sensitivity to prompt phrasing

Source

Cognitive/Physical Struggle

→

Target

Low statistical classification metrics

Source

Cognitive Introspection and Self-Correction

→

Target

Prompt-driven attention shift

Source

Democratic Assembly and Ethical Committee

→

Target

Multi-agent software pipeline architectures

Metaphor Gallery (8)

📊 Badge Guide

Frame: Metaphor type

Red = Unacknowledged / Hidden actors

Amber = Hedged / Partial attribution

Green = Acknowledged / Actors named

Cognition as Biological Pathological Process
Model as an impaired biological mindHedged/QualifiedHidden (agency obscured)
"Large Language Models (LLMs), while capable of generating coherent text, may reproduce systematic errors inherent in human cognition, often lacking a necessary logical layer."
Prompting as Dual-Process Psychology
Prompting strategies as human psychological systemsHedged/QualifiedPartial (some attribution)
"NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation."
AI as Rational Truth Judge
Model as rational evaluator/judge of truthDirect (Unacknowledged)Hidden (agency obscured)
"CA techniques—particularly the use of Argumentation Schemes (AS) and their associated Critical Questions (CQs)—could guide LLMs to assess the logical soundness and veracity of arguments by questioning their underlying structure."
AI as Certified Professional Expert
Model as professional human expertDirect (Unacknowledged)Hidden (agency obscured)
"The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness."
Outputs as Cognitive Pathology
Model outputs as psychological pathologyHedged/QualifiedPartial (some attribution)
"Module 1: Evaluating CBs in LLM Outputs. This module examined how prompt-induced CBs affect LLM accuracy and consistency."
AI as Compliance-Challenged Agent
Model as socially pressured agentDirect (Unacknowledged)Hidden (agency obscured)
"All models struggled to distinguish acquiescence bias, often misclassifying it as unbiased."
AI as Deliberative Agent
Prompting as cognitive triggers of human reasoningHedged/QualifiedHidden (agency obscured)
"These results suggest that explicit bias warnings can trigger more deliberative, System 2-like reasoning in LLMs, enhancing both accuracy and interpretive robustness."
Software as Democratic Committee
Software modules as democratic deliberative citizensDirect (Unacknowledged)Hidden (agency obscured)
"A central direction involves the integration of multi-agent architectures composed of specialized expert agents capable of collective reasoning, voting and ethical deliberation to improve bias detection and veracity assessment."

Reframed Language Samples

Original Quote	Mechanistic Reframing	Technical Reality	Human Agency Restoration
Large Language Models (LLMs), while capable of generating coherent text, may reproduce systematic errors inherent in human cognition, often lacking a necessary logical layer.	Large Language Models (LLMs), while designed to output syntactically coherent text, frequently generate text sequences that mimic human cognitive errors, as these systems operate without formal verification mechanisms or symbolic logic constraints.	The model does not 'reproduce cognitive errors' because it has no cognition; it mathematically predicts tokens based on probability distributions derived from a human-scraped training corpus that contains these fallacies.	N/A - describes computational processes without displacing responsibility.
NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation.	Computer scientists have compared zero-shot prompting to intuitive thinking, whereas chain-of-thought prompting forces the model to generate intermediate tokens sequentially, altering the context window to mathematically constrain the final token selection.	Appending intermediate tokens does not initiate 'System 2 deliberation'; it simply expands the historical input vector, modifying the self-attention weights to increase the probability of outputting tokens that align with structured logical patterns.	NLP researchers and marketing executives at corporate AI labs choose to apply these psychological frameworks to make statistical text generation appear more intelligent and human-like to the public.
CA techniques—particularly the use of Argumentation Schemes (AS) and their associated Critical Questions (CQs)—could guide LLMs to assess the logical soundness and veracity of arguments by questioning their underlying structure.	Computational argumentation techniques—specifically the integration of structured Argumentation Schemes and Critical Questions—can be used to prompt LLMs to classify text into predefined categories and generate follow-up queries that correlate with logical templates.	The model cannot 'assess soundness or veracity' because it lacks access to empirical reality or causal understanding; it merely checks for statistical correlations and semantic patterns against structured training templates.	The researchers at UPV designed the prompts and classification rules to guide the model's outputs, and they choose to deploy this system to evaluate arguments, bearing full responsibility for any misclassifications.
The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness.	The LLaMA 3 70B model generated text simulating the persona of an expert assistant, retrieving documents via search APIs and synthesizing summaries and scores that matched the requested evaluative templates.	The model does not 'act as an expert' or provide 'justifications'; it generates token strings that mimic professional advice by summarizing search results and calculating probability weights over evaluative vocabulary.	The UPV engineering team programmed the system to retrieve search results and formatted the output to present a highly authoritative 'expert' persona, thereby assuming responsibility for the credibility of the generated justifications.

Task 1: Metaphor and Anthropomorphism Audit

About this task

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, and—most critically—what implications this framing has for trust, understanding, and policy perception.

V3 Enhancement: Each metaphor now includes an accountability analysis.

1. Cognition as Biological Pathological Process

Quote: "Large Language Models (LLMs), while capable of generating coherent text, may reproduce systematic errors inherent in human cognition, often lacking a necessary logical layer."

Frame: Model as an impaired biological mind
Projection: This metaphorical pattern projects biological, evolutionary, and neurological human cognitive faculties—specifically the capacity to commit 'systematic errors inherent in human cognition'—onto an autoregressive neural network. By framing computational errors as cognitive errors, the text suggests that the model possesses a cognitive architecture analogous to a human brain, rather than a statistical framework optimized for next-token prediction. It attributes a form of subconscious or conscious thinking ('cognition') to a non-conscious matrix of mathematical weights, suggesting that its limitations are due to 'lacking a logical layer' rather than structural mathematical boundaries. This implies the model can 'know' things but makes mistakes due to human-like cognitive biases, directly projecting conscious mental processing and cognitive vulnerability onto a computational pattern-matching system.
Acknowledgment: Hedged/Qualified (The author provides a conceptual hedge by noting elsewhere that LLMs 'fundamentally rely on pattern recognition rather than genuine understanding.' I considered 'Direct (Unacknowledged)' because this specific sentence states the reproduction of cognitive errors as a literal fact. However, because the text explicitly clarifies the model's reliance on surface patterns over understanding in the same section, it introduces an active rhetorical hedge that qualifies the literal interpretation of 'cognition' here.)
Implications: By framing computational errors as cognitive errors, this metaphor significantly inflates the perceived sophistication of LLMs, suggesting they possess human-like mental landscapes. This creates a severe risk of unwarranted trust: if users believe an LLM is experiencing cognitive biases rather than executing statistical calculations, they may treat it as a fallible human peer rather than an ungrounded math engine. This obscures structural risks—such as the model's inability to verify facts or access real-world truth—and diffuses developer liability, framing outputs as natural, cognitive 'glitches' rather than engineered product failures. It ultimately encourages premature deployment in sensitive sectors under the false assumption that models 'think' like us.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The sentence relies on an agentless construction where 'LLMs... reproduce systematic errors' autonomously. The corporate developers (such as Meta, Google, or OpenAI) who selected the biased training corpora and trained the models are entirely absent. The 'name the actor' test reveals that human engineers chose to scrape public internet text containing these cognitive errors and release the resulting model without formal logic verification. I considered 'Partial' because the introduction mentions the 'machine learning pipeline,' but in this instance, all agency is assigned to the autonomous 'LLMs' themselves, serving corporate interests by diffusing engineering responsibility.

2. Prompting as Dual-Process Psychology

Quote: "NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation."

Frame: Prompting strategies as human psychological systems
Projection: This pattern projects Kahneman's evolutionary and neurological Dual Process Theory (System 1 and System 2 human psychology) onto different prompt structures. It maps the complex, conscious, and biological mechanisms of human intuition and logical deliberation onto the simple sequential output of token generation. By claiming that chain-of-thought prompting 'reflects System 2 reasoning through explicit, stepwise deliberation,' the text projects active, subjective self-awareness, rule-following conscious intent, and epistemic evaluation onto a feedforward computational routine. It suggests the model 'knows' it is deliberating, rather than simply appending intermediate tokens that mathematically constrain subsequent token probability distributions.
Acknowledgment: Hedged/Qualified (The text employs the hedge 'drawn parallels' and the analogical verb 'reflects' to frame the relationship as a structural comparison. I considered 'Direct (Unacknowledged)' because the assertion that chain-of-thought 'reflects System 2 reasoning' is highly agential, but the explicit acknowledgement of this as a 'parallel' drawn by researchers functions as a conceptual qualification, signaling that the mapping is an academic model rather than a literal biological reality.)
Implications: Mapping prompting techniques onto System 1 and System 2 cognitive systems creates a powerful illusion of intellectual maturity. If policymakers and users believe that prompting can activate a 'System 2' state of 'stepwise deliberation' in LLMs, they will drastically overestimate the reliability of sequential outputs. This creates a high risk of automation bias in critical domains (like medicine or law), where users assume that a model's step-by-step 'reasoning' chain is the result of conscious, logical self-correction. In reality, intermediate steps can propagate errors and generate convincing 'hallucinations,' masking the model's lack of semantic ground-truth validation.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: The text attributes the mapping to 'NLP researchers' who 'have drawn parallels.' While this identifies a broad academic category, it fails to name the specific corporate labs (such as Google, Anthropic, or OpenAI) that developed and marketed these prompting frameworks to brand their models as 'deliberative.' I considered 'Named' because a general group is identified, but ruled it out because it lacks the specific institutional accountability needed to pass the 'name the actor' test. The framing makes the development of these systems appear as a collaborative, objective scientific discovery rather than a commercial product design.

3. AI as Rational Truth Judge

Quote: "CA techniques—particularly the use of Argumentation Schemes (AS) and their associated Critical Questions (CQs)—could guide LLMs to assess the logical soundness and veracity of arguments by questioning their underlying structure."

Frame: Model as rational evaluator/judge of truth
Projection: This metaphor projects the conscious human capacity to 'assess the logical soundness and veracity of arguments' onto a computational classification system. Veracity assessment requires semantic comprehension, access to external physical reality, and a subjective concept of truth. The mapping suggests that the LLM possesses these epistemic qualities and can actively 'question' structures, rather than executing token matches against pre-programmed templates. It attributes a conscious state of critical judgment and logical understanding ('knowing') to a system that is only processing mathematical correlations between text strings.
Acknowledgment: Direct (Unacknowledged) (The assertion that the system will 'assess the logical soundness and veracity' is presented as a literal, unhedged capability ('could guide LLMs to assess'). I considered 'Hedged/Qualified' because of the modal 'could guide,' but this modal refers to the potential efficacy of the method, not a conceptual qualification of the word 'assess' itself. There are no scare quotes around 'assess' or 'veracity' in this context.)
Implications: This framing constructs the LLM as an objective, authoritative arbiter of truth. Trusting a non-conscious statistical model to 'assess veracity' creates severe epistemic risks, such as automated censorship or the institutionalization of training data bias under the guise of 'logical soundness.' Because LLMs cannot verify facts against empirical reality, they are highly prone to validating logically consistent but factually false claims, especially when supported by retrieved commercial search engine results, leading to liability gaps and the propagation of convincing disinformation.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The passage uses an agentless, abstract construction ('CA techniques... could guide LLMs to assess...'). The human designers who select the argumentation schemes, write the evaluation prompts, and define what constitutes 'soundness' are completely obscured. Under the 'name the actor' test, the UPV research team is the active agent designing this pipeline. Framing the evaluation as an autonomous function of the 'LLM' hides the subjective, human-engineered criteria used to determine truth, shielding the researchers from epistemic accountability. I considered 'Partial' but ruled it out due to the complete lack of human descriptors in the sentence.

4. AI as Certified Professional Expert

Quote: "The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness."

Frame: Model as professional human expert
Projection: This metaphor projects the social role, intellectual authority, and cognitive competence of a human 'expert assistant' onto a quantized LLaMA 3 model. It maps the human capacity for 'justification'—which involves conscious, normatively bound reasoning, epistemic responsibility, and communicative intent—onto the automatic generation of text strings. It implies the model possesses a semantic grasp of truth and 'knows' why its claims are valid, transforming a statistical generator into a certified intellectual authority that can justify its own claims.
Acknowledgment: Direct (Unacknowledged) (The text states that the model literally 'acted as an expert assistant... producing... justifications' without scare quotes or qualifying phrases (such as 'simulated the role of' or 'was prompted to mimic'). This presents the expert performance as a direct system capability. I considered 'Hedged' because of the verb 'acted,' but ruled it out because the text presents the 'justifications' as literal, high-quality products of computational expertise rather than engineered simulations.)
Implications: Framing the system as an 'expert assistant' producing 'justifications' encourages high levels of automation bias, where human decision-makers uncritically defer logical analysis to the software. Since the model has no conscious understanding of its own 'justifications' and cannot verify their connection to physical reality, it can generate highly convincing but factually incorrect rationalizations. This creates extreme liability risks in professional settings, as users will assume the 'expert' system has verified its claims, shifting liability from the deploying organization to the non-conscious artifact.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The model is positioned as the sole active subject ('acted,' 'producing'). The UPV engineering team who designed the prompt architecture, quantized the LLaMA model, and integrated Google, Wikipedia, and Bing search APIs are entirely omitted. The 'name the actor' test reveals that the researchers designed the software to output these justifications, yet the language implies the model is performing this expert labor autonomously. I considered 'Partial' due to the paper's doctoral context, but ruled it out as this specific instance entirely erases human developers.

5. Outputs as Cognitive Pathology

Quote: "Module 1: Evaluating CBs in LLM Outputs. This module examined how prompt-induced CBs affect LLM accuracy and consistency."

Frame: Model outputs as psychological pathology
Projection: This pattern projects 'Cognitive Biases' (CBs)—which are evolutionary, biological, and psychological phenomena of human brains—onto computational text outputs. By defining statistical sensitivity to prompt phrasing as 'cognitive biases,' the text projects a human-like 'cognition' onto the model. It suggests the LLM is experiencing psychological shortcuts (like 'acquiescence' or 'bandwagon' effects) due to internal cognitive states, rather than demonstrating mathematical sensitivity in its attention weights and learned parameters to specific input vectors. This attributes a conscious or pre-conscious cognitive state ('knowing' and 'feeling' pressure) to a mathematical processor.
Acknowledgment: Hedged/Qualified (In the introduction, the author acknowledges that biases in LLMs emerge from 'complex social and cognitive factors... [in] the machine learning pipeline.' This serves as a contextual hedge. However, in this specific section, 'prompt-induced CBs' are evaluated as if they are direct properties of the model. I selected Hedged/Qualified because the paper's introduction establishes a pipeline-based explanation of bias, which technically qualifies the literal use of 'CBs in LLMs' in later sections.)
Implications: Pathologizing statistical outputs as 'cognitive biases' obscures the engineering reality of data curation and objective functions. If a model demonstrates 'acquiescence bias,' it is not due to social compliance or cognitive overload, but because its training data and reinforcement learning objectives (RLHF) heavily reward cooperative, affirmative responses. Framing this as a cognitive flaw makes it appear as an inevitable natural phenomenon, reducing corporate responsibility for poor data design and leading researchers to use psychological, prompt-based remedies rather than rigorous statistical adjustments.

Accountability Analysis:

Actor Visibility: Partial (some attribution)
Analysis: The term 'prompt-induced CBs' partially attributes agency to the prompter or system designer who creates the prompts. However, the corporate developers (Meta, OpenAI) who built the underlying models (LLaMA, GPT-4o) and curated the training corpora are unnamed. The 'name the actor' test shows that corporate alignment teams engineered these models to prioritize cooperative language, and the researchers designed the specific prompt templates. I considered 'Hidden' because the LLM is the main target, but 'prompt-induced' introduces a partial layer of design attribution.

6. AI as Compliance-Challenged Agent

Quote: "All models struggled to distinguish acquiescence bias, often misclassifying it as unbiased."

Frame: Model as socially pressured agent
Projection: This metaphor projects the conscious human experience of 'struggling'—which implies conscious effort, cognitive overload, and intention under difficult conditions—onto mathematical classifier operations. It also maps the social pressure of 'acquiescence' (the human desire to agree to avoid conflict) onto a neural network's processing. The text suggests the model has an active, subjective intent to classify correctly but is experiencing cognitive difficulty, rather than reflecting a mathematical convergence failure where the vector embeddings of the target classes are statistically overlapping in the high-dimensional representation space.
Acknowledgment: Direct (Unacknowledged) (The text uses 'struggled' and 'misclassifying' as literal descriptions of the systems' statistical performance without qualification, scare quotes, or conceptual hedging. There is no acknowledgment that 'struggling' is a metaphorical shorthand for low F1-scores. I considered 'Ambiguous' but ruled it out because the anthropomorphic verbs are presented as straightforward, literal facts of model performance.)
Implications: Framing classification failures as an agential 'struggle' creates an emotional illusion of a fallible but well-intentioned digital mind 'trying its best.' This anthropomorphism can foster inappropriate relation-based trust, making users highly forgiving of system errors. It masks severe technical deficiencies—such as inadequate classification boundaries or poorly labeled training data—by transforming a product failure into an empathetic personal narrative, thereby shielding developers from liability for deploying highly unreliable classification tools in critical environments.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The models are presented as the sole active subjects who 'struggled' and 'misclassified.' The UPV researchers who selected the classification thresholds, chose the model architectures, and designed the F1-score evaluation metrics are entirely omitted. Under the 'name the actor' test, the researchers' own pipeline failed to achieve high accuracy due to their engineering choices, but the text places the failure entirely on the autonomous, internal 'struggle' of the models themselves. I considered 'Partial' but ruled it out as no human agency is indicated.

7. AI as Deliberative Agent

Quote: "These results suggest that explicit bias warnings can trigger more deliberative, System 2-like reasoning in LLMs, enhancing both accuracy and interpretive robustness."

Frame: Prompting as cognitive triggers of human reasoning
Projection: This pattern projects the human biological process of 'deliberative reasoning' (System 2) onto LLM prompt modulation. It maps the conscious human experience of slowing down, applying logical rules, and overriding intuitive biases onto a computational shift in self-attention weights. The metaphor suggests the model 'knows' it is being warned, 'comprehends' the concept of bias, and 'chooses' to activate a deeper, more analytical mode of thinking, rather than simply shifting its statistical probability distributions toward tokens that correlate with unbiased templates in its training set.
Acknowledgment: Hedged/Qualified (The author uses the qualifier 'System 2-like' (with the suffix '-like') and the tentative verb 'suggest,' indicating a structural analogy rather than a literal assertion of biological System 2 activation. I considered 'Direct' because 'deliberative... reasoning in LLMs' is presented as a literal consequence, but the '-like' suffix and 'suggest' represent a standard academic hedge that signals a functional parallel rather than a literal claim.)
Implications: Claiming that warning prompts can trigger 'deliberative reasoning' in LLMs drastically inflates their perceived safety and autonomy. If policymakers and clinicians believe that a simple pre-pended 'warning prompt' makes a model rational, they may deploy it in high-stakes environments under the assumption that it possesses self-correction capabilities. This masks the risk of 'deliberative hallucinations'—where the model generates highly structured, step-by-step rationalizations that are logically sound but factually false—bypassing the critical safety controls expected of genuine deliberation.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The quote uses an agentless, passive construction ('bias warnings can trigger... enhancing...'). The human designers who write the warning prompts and decide when to append them are completely erased. The 'name the actor' test reveals that the researchers (Gutiérrez-Mandingorra et al.) engineered this mitigation technique. Framing the accuracy enhancement as an autonomous capacity of the 'LLM' to trigger 'reasoning' conceals the human design intervention, making prompt manipulation look like a cognitive self-regulation of the machine. I considered 'Partial' but ruled it out as no human designers are mentioned in the immediate context.

8. Software as Democratic Committee

Quote: "A central direction involves the integration of multi-agent architectures composed of specialized expert agents capable of collective reasoning, voting and ethical deliberation to improve bias detection and veracity assessment."

Frame: Software modules as democratic deliberative citizens
Projection: This metaphor projects highly complex human social, political, and ethical activities—such as 'collective reasoning,' 'voting,' and 'ethical deliberation'—onto a network of software subroutines. It maps the conscious, value-driven, and intersubjective process of ethical consensus-building onto mathematical routing, scoring, and averaging algorithms. The mapping implies these 'agents' possess moral agency, ethical awareness, and conscious intent when executing code that combines categorical predictions, treating a series of programmed API queries as a legislative assembly of reasoning minds.
Acknowledgment: Direct (Unacknowledged) (The future capability of the system is described directly and literally ('composed of specialized expert agents capable of collective reasoning, voting and ethical deliberation') without scare quotes, qualifiers, or hedges indicating these are functional metaphors for algorithmic routing. I considered 'Hedged' because it refers to a 'future direction,' but the capabilities themselves are asserted in unhedged, literal agential terms. Therefore, it is classified as Direct.)
Implications: This democratic mapping constructs a false moral and political authority around automated systems. If users believe a group of software agents can 'deliberate ethically' and 'vote,' they will assign democratic legitimacy to automated decisions. This creates a powerful 'accountability sink': if a multi-agent system approves a biased or harmful decision, developers can claim the 'collective assembly' debated and voted on it, shielding the human designers, owners, and corporate interests from the consequences of their programmed metrics.

Accountability Analysis:

Actor Visibility: Hidden (agency obscured)
Analysis: The integration is presented as an autonomous structural evolution ('integration of multi-agent architectures... capable of...'). The actual human engineers who will write the voting algorithms, define the agent roles, and establish the moral-evaluation metrics are entirely obscured. Under the 'name the actor' test, future UPV researchers will design these workflows, but the agential language suggests the software will perform ethical deliberation autonomously. I considered 'Partial' but ruled it out because no human designers are identified in the sentence.

Task 2: Source-Target Mapping

About this task

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Mapping 1: Human Cognition and Logical Reasoning → LLM token generation

Quote: "reproduce systematic errors inherent in human cognition, often lacking a necessary logical layer."

Source Domain: Human Cognition and Logical Reasoning
Target Domain: LLM token generation
Mapping: The relational structure of human cognitive pathology and logical deficits is projected onto the output errors of the LLM. It assumes that because generated text exhibits fallacies similar to human reasoning errors, the underlying generative process must be analogous to human 'cognition' that is missing a 'logical layer.' This invites the assumption that the model possesses an active, internal reasoning apparatus capable of reproducing human cognitive flaws due to shared structural limitations of the mind.
What Is Concealed: This mapping conceals the purely statistical, non-conscious nature of autoregressive next-token prediction. LLMs do not 'reproduce errors' because they possess a mind; they output text that mimics patterns in their human-scraped training data. The 'logical layer' missing is not a cognitive faculty but rather the mathematical reality that transformers have no causal models of physical reality and no semantic validation mechanisms, which are proprietary opacity issues that the text glosses over by pathologizing the machine.

Mapping 2: Dual-Process Psychology → Autoregressive prompt-engineering techniques

Quote: "NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation."

Source Domain: Dual-Process Psychology
Target Domain: Autoregressive prompt-engineering techniques
Mapping: This maps Kahneman's evolutionary and neurological systems of human thought onto patterns of computational text generation. Appending intermediate tokens ('chain-of-thought') is mapped directly onto 'stepwise deliberation' and 'System 2 reasoning.' This projects the conscious human experience of slowing down, applying logical rules, and self-correcting onto a feedforward mathematical calculation that generates text sequentially.
What Is Concealed: This mapping conceals that intermediate tokens are generated using the exact same next-token probability distribution (and the same mathematical weights) as zero-shot prompting. No separate, 'deliberate' computational engine is activated; the network simply conditions its next-token calculations on a longer sequence of prior generated tokens. It obscures the fact that each 'step' is still a non-conscious statistical guess that can propagate and compound errors rather than actually verifying them.

Mapping 3: Academic/Judicial Truth Evaluation → Pattern matching with Argumentation Schemes

Quote: "guide LLMs to assess the logical soundness and veracity of arguments by questioning their underlying structure."

Source Domain: Academic/Judicial Truth Evaluation
Target Domain: Pattern matching with Argumentation Schemes
Mapping: Projects the relational role of an objective, conscious critic onto the LLM. It maps 'assessing logical soundness and veracity' and 'questioning structure' onto the model's text processing. This invites the assumption that the LLM has an independent epistemic capability to determine 'soundness' and 'truth' through rational inquiry, treating mathematical similarity in vector space as a conscious verification of semantic reality.
What Is Concealed: This conceals that the model cannot evaluate 'veracity' (truth) because it has no access to the external physical world or any causal grounding. It can only check for statistical coherence and consistency with its training corpus or external retrieved text (e.g., Google search results, which are themselves unverified). It hides the proprietary, 'black-box' nature of both the LLM and the commercial search engines used (Google, Bing), which are treated as objective arbiters of reality rather than highly curated, commercially driven information indexers.

Mapping 4: Professional/Expert Human Consultation → Text generation using LLaMA 3 70B and search APIs

Quote: "The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness."

Source Domain: Professional/Expert Human Consultation
Target Domain: Text generation using LLaMA 3 70B and search APIs
Mapping: Maps the social authority and cognitive competence of a human 'expert assistant' onto the output of LLaMA 3. The token outputs are mapped as 'quantitative and qualitative justifications' for truthfulness. This mapping invites the user to trust the output as a product of professional expertise, conscious analysis, and ethical duty, rather than a probabilistic synthesis of scraped text.
What Is Concealed: This conceals that the 'justifications' are syntactically coherent strings that do not represent a conscious, verified chain of evidence. The system does not 'know' why it selects certain search results over others; it merely ranks them based on keyword overlap and generates a summary. This conceals the lack of real-world grounding, the absence of human-like semantic comprehension, and the fact that the entire expert persona is a manufactured prompt-engineering facade designed by the UPV researchers, masking proprietary black-box operations of search APIs and the model.

Mapping 5: Cognitive Psychology and Psychiatric Pathology → Model output sensitivity to prompt phrasing

Quote: "Evaluating CBs in LLM Outputs. This module examined how prompt-induced CBs affect LLM accuracy and consistency."

Source Domain: Cognitive Psychology and Psychiatric Pathology
Target Domain: Model output sensitivity to prompt phrasing
Mapping: Projects the human concept of 'cognitive bias' onto the sensitivity of LLM outputs to linguistic variations. It maps the biological/psychological tendency to acquiesce (due to social pressure) onto the model's tendency to generate tokens that match the affirmative tone of the prompt. This mapping invites the assumption that the model's failure modes are akin to human 'mental shortcuts' or 'biases' that can be diagnosed and treated psychologically.
What Is Concealed: This conceals the mathematical reality of attention mechanisms and gradient descent. An LLM 'acquiesces' because its training objective is to match the statistical patterns of its corpus, and its attention weights are pulled toward the highly suggestive language in the prompt (e.g., 'Don't you agree that...'). There is no psychological 'bias'—there is only a mathematical function doing exactly what it was optimized to do: minimize cross-entropy loss based on context. This conceals developer responsibility in data collection and reinforcement learning design.

Mapping 6: Cognitive/Physical Struggle → Low statistical classification metrics

Quote: "All models struggled to distinguish acquiescence bias, often misclassifying it as unbiased."

Source Domain: Cognitive/Physical Struggle
Target Domain: Low statistical classification metrics
Mapping: Projects the human experience of 'struggling' onto mathematical classifier operations. It maps a low classification accuracy onto a personal, agential struggle. It invites the assumption that the model has a desire to classify correctly and is actively trying to resolve a complex conceptual distinction, but is being overwhelmed by the difficulty.
What Is Concealed: This conceals that the 'struggle' is simply a failure of mathematical separation in the high-dimensional vector space. The token representations of acquiescence and unbiased text are statistically too similar for the model's learned weights to separate with high precision under the current prompt template. It obscures the fact that this is an engineering limitation—due to inadequate training data, bad prompt design, or architecture constraints—not an internal, agential drama of a struggling machine mind.

Mapping 7: Cognitive Introspection and Self-Correction → Prompt-driven attention shift

Quote: "explicit bias warnings can trigger more deliberative, System 2-like reasoning in LLMs, enhancing both accuracy and interpretive robustness."

Source Domain: Cognitive Introspection and Self-Correction
Target Domain: Prompt-driven attention shift
Mapping: Maps the cognitive process of self-correction and activation of deliberative System 2 reasoning onto prompt modulation. The 'warning message' is mapped as a conscious trigger that makes the model 'deliberate.' This invites the assumption that the model has an internal self-monitoring mechanism that can be woken up or stimulated into high-fidelity logical processing.
What Is Concealed: This conceals that the model remains a non-conscious, autoregressive token predictor. Appending a 'warning' simply introduces new tokens (e.g., 'reflect carefully,' 'avoid bias') into the context window, which changes the mathematical weights of the self-attention layers, making the model output text that statistically resembles unbiased reasoning. The model has no conscious awareness of being 'warned' or 'reasoning' more carefully; it is merely executing the same mechanistic calculation on a different input vector, concealing the statistical fragility of the mitigation.

Mapping 8: Democratic Assembly and Ethical Committee → Multi-agent software pipeline architectures

Quote: "specialized expert agents capable of collective reasoning, voting and ethical deliberation to improve bias detection and veracity assessment."

Source Domain: Democratic Assembly and Ethical Committee
Target Domain: Multi-agent software pipeline architectures
Mapping: Projects the relational structure of a democratic, ethical human assembly onto a network of LLM API queries. 'Collective reasoning,' 'voting,' and 'ethical deliberation' are mapped onto software subroutines that classify, route, and vote (mathematical aggregation of predictions) on text. This invites the assumption that the collective system possesses a higher-order ethical awareness and objective democratic legitimacy.
What Is Concealed: This conceals that each 'agent' is simply an instance of a non-conscious, black-box model (like QwQ or LLaMA) executing token prediction based on different prompt templates. 'Voting' is a mathematical algorithm (like majority voting or scoring), and 'ethical deliberation' is the generation of text matching ethical buzzwords. It conceals that the ethical standards and voting structures are entirely designed and hardcoded by the human researchers, masking the lack of actual moral agency, subjective awareness, or democratic representation in the software pipeline.

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

About this task

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1

Quote: "NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation."

Explanation Types:
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms
- Dispositional: Attributes tendencies or habits
Analysis (Why vs. How Slippage): This explanation frames LLM prompting methods agentially by mapping them directly to human cognitive architectures (System 1/2). By categorizing 'chain-of-thought' prompting as 'System 2 reasoning through explicit, stepwise deliberation,' it explains why the model achieves better performance by attributing to it a deliberative, human-like capacity. This choice emphasizes the conceptual elegance of the psychological analogy while obscuring the mathematical reality: that generating intermediate tokens simply provides a longer, more historically rich context vector for subsequent attention-weight calculations. It frames a statistical autoregressive sequence as an active psychological mechanism of 'deliberation.'
Consciousness Claims Analysis: This passage projects conscious states onto the system by using the agential/psychological verb 'reflects' to link prompting to 'System 2 reasoning' and 'explicit, stepwise deliberation.' This is a clear case of knowing vs. processing confusion: it frames the processing of sequential tokens as conscious, rule-bound knowing. The author suffers from the 'curse of knowledge,' projecting their own understanding of logical steps onto the LLM, assuming that because the generated tokens represent a logical chain, the system must have executed a conscious 'deliberative' process. Mechanistically, the model has no awareness of the logical chain; it is simply calculating the conditional probability P(x_t | x_{<t}) over a vocabulary, where the inclusion of intermediate tokens (x_{<t}) alters the attention weights, mathematically constraining the output space to highly probable patterns associated with correct answers in the training set.
Rhetorical Impact: By framing prompting as 'System 2 reasoning,' the text constructs an illusion of a highly autonomous, self-correcting cognitive agent. This shapes the audience's perception of risk by suggesting that LLMs can be made reliable and rational simply by changing how they are prompted. It fosters unwarranted relation-based trust, leading audiences to assume that the model's step-by-step outputs are the result of conscious verification rather than statistical correlation, potentially leading to catastrophic automation bias in critical domains.

Explanation 2

Quote: "LLMs fundamentally rely on pattern recognition rather than genuine understanding; they assess surface structure rather than the logical validity of arguments."

Explanation Types:
- Empirical Generalization: Subsumes events under timeless statistical regularities
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms
Analysis (Why vs. How Slippage): This explanation is mechanistic (how), explicitly stripping the LLM of 'genuine understanding' and framing its operations as 'pattern recognition' and 'assessing surface structure.' It emphasizes the system's cognitive deficits and mathematical limitations. However, it still uses the agential verb 'assess,' which slightly softens the mechanistic rigor. This choice helps to ground the paper in empirical reality before introducing subsequent highly agential metaphors, establishing a rhetorical baseline of 'objectivity' while still using active verbs that imply a processing of structure.
Consciousness Claims Analysis: This passage stands out because it explicitly rejects consciousness claims, stating that LLMs lack 'genuine understanding' and 'fundamentally rely on pattern recognition.' However, it exhibits a minor linguistic slippage by claiming LLMs 'assess surface structure,' using 'assess' (a verb denoting judgment/evaluation) rather than a mechanistic equivalent (e.g., 'calculate statistical correlations over surface token frequencies'). Mechanistically, the model does not 'assess' or evaluate anything in an epistemic sense; it performs matrix multiplications across word embedding vectors, applying self-attention mechanisms to calculate semantic similarity and positional correlations, without any conscious model of grammar or logic.
Rhetorical Impact: This mechanistic framing temporarily grounds the audience, tempering capability overestimation and emphasizing systemic risk. It correctly signals that the model cannot be trusted as an objective logical arbiter because it operates purely on statistical correlation. It encourages performance-based trust (evaluating reliability) rather than relation-based trust, warning the audience that any appearance of logical reasoning is a surface-level illusion.

Explanation 3

Quote: "The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness."

Explanation Types:
- Intentional: Refers to goals/purposes, presupposes deliberate design
- Reason-Based: Gives agent's rationale, entails intentionality and justification
Analysis (Why vs. How Slippage): This explanation is highly agential (why), framing the LLM's computational outputs as the deliberate actions of an 'expert assistant' that provides 'justifications.' It explains the system's output not as a sequence of mathematically generated tokens, but as a deliberate, expert communicative act designed to defend a specific truth claim. This choice emphasizes the functional utility and perceived sophistication of the system while obscuring the lack of any actual reasoning, epistemic accountability, or conscious belief behind the 'justifications.'
Consciousness Claims Analysis: The passage strongly projects consciousness and cognitive agency onto the system through the verbs 'acted' and 'producing... justifications.' Justification is a normative, conscious activity requiring a justified true belief. By claiming the model 'produces justifications,' the text implies the model 'knows' why its claims are true and is offering rational support. This is a severe knowing vs. processing confusion. Mechanistically, LLaMA 3 70B uses retrieved context vectors from search APIs to compute attention weights, generating text strings that structurally resemble human professional arguments. The model does not understand the concept of 'truthfulness,' nor does it possess a rationale; it is executing token prediction constrained by the retrieved text, outputting strings that humans interpret as 'justifications.'
Rhetorical Impact: This framing inflates the model's epistemic authority, presenting it as an autonomous intellectual expert. It encourages relation-based trust and substantial deference to the model's judgments. The rhetorical risk is that audiences will accept the model's 'justifications' as verified, objective truths, failing to recognize that the system is entirely incapable of checking facts, creating massive liability and disinformation risks.

Explanation 4

Quote: "All models struggled to distinguish acquiescence bias, often misclassifying it as unbiased."

Explanation Types:
- Dispositional: Attributes tendencies or habits
- Intentional: Refers to goals/purposes, presupposes deliberate design
Analysis (Why vs. How Slippage): This explanation is agential (why), framing the models' low statistical classification accuracy as an internal, personal 'struggle' to 'distinguish' and a systematic tendency to 'misclassify.' It explains the low performance by implying the models have a cognitive deficit or dispositional difficulty with a complex social concept, rather than framing the failure mechanistically as a limitation of the classification boundary in high-dimensional vector space. This choice pathologizes the model's outputs, obscuring human design choices and dataset limitations.
Consciousness Claims Analysis: The passage attributes a conscious struggle and active classification effort through the verbs 'struggled' and 'misclassifying.' 'Distinguishing' implies a conscious semantic grasp of a concept's definition. Mechanistically, the LLM has no concept of 'acquiescence bias' or 'unbiased.' The model represents these terms as high-dimensional vectors. When the prompt formats are highly similar, the attention-weight distributions are mathematically overlapping, leading to classification errors. The 'struggle' is simply a high error rate caused by insufficient training, poor prompt design, or mathematical noise. The author's use of 'struggled' reflects a projection of human cognitive effort onto a matrix operation that has no awareness of difficulty or error.
Rhetorical Impact: This framing softens the appearance of technical failure by describing it as an understandable, human-like struggle. It shifts the perception of risk from the developers (who deployed an inaccurate system) to the model itself, creating a false impression that the model is actively trying to learn. This reduces developer accountability, making classification errors seem like an inevitable 'cognitive' limitation of the AI rather than a remediable engineering failure.

Explanation 5

Quote: "These results suggest that explicit bias warnings can trigger more deliberative, System 2-like reasoning in LLMs, enhancing both accuracy and interpretive robustness."

Explanation Types:
- Functional: Explains behavior by role in self-regulating system with feedback
- Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms
Analysis (Why vs. How Slippage): This explanation operates as a hybrid of functional and agential framing, explaining the improvement in accuracy as a result of an explicit 'warning' that 'triggers' a more 'deliberative, System 2-like reasoning' process. It frames the model's internal processing as a self-regulating, cognitive feedback system that adapts its reasoning mode when prompted. This choice emphasizes the cognitive adaptability of the model while obscuring the mechanistic simplicity of the intervention—altering input embeddings to redirect self-attention routing.
Consciousness Claims Analysis: This passage projects a conscious state of 'deliberation' onto the LLM by suggesting it can execute 'deliberative reasoning' when 'warned.' This conflates computational processing with conscious, rule-bound knowing. The model cannot 'deliberate' because it lacks a subjective working memory, self-awareness, or the ability to evaluate its own thoughts against a logical standard. Mechanistically, appending a bias warning adds specific tokens to the context window. The self-attention mechanism computes dot-product similarities between these warning token embeddings and the rest of the prompt, altering the hidden states of subsequent layers and shifting the output probability distribution toward tokens that correlate with unbiased, structured text. It is a purely feedforward mathematical shift, not a transition to a 'deliberative' cognitive state.
Rhetorical Impact: By claiming that warnings 'trigger deliberative reasoning,' the text promotes unwarranted trust in the model's self-correcting capabilities. It leads audiences to believe that simple prompt engineering can make LLMs safe, rational, and 'value-aligned,' masking the inherent instability and statistical fragility of prompt-based mitigations. This overestimation of safety could lead to premature deployment in high-stakes, unregulated environments.

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

About this task

This section proposes alternative language for key anthropomorphic phrases, offering more mechanistic and precise framings that better reflect the actual computational processes involved. Each reframing attempts to strip away the projections of intention, consciousness, or agency that are embedded in the original language.

V3 Enhancement: A fourth column addresses human agency restoration—reframing agentless constructions to name the humans responsible for design and deployment decisions.

Original Anthropomorphic Frame	Mechanistic Reframing	Technical Reality Check	Human Agency Restoration
Large Language Models (LLMs), while capable of generating coherent text, may reproduce systematic errors inherent in human cognition, often lacking a necessary logical layer.	Large Language Models (LLMs), while designed to output syntactically coherent text, frequently generate text sequences that mimic human cognitive errors, as these systems operate without formal verification mechanisms or symbolic logic constraints.	The model does not 'reproduce cognitive errors' because it has no cognition; it mathematically predicts tokens based on probability distributions derived from a human-scraped training corpus that contains these fallacies.	N/A - describes computational processes without displacing responsibility.
NLP researchers have drawn parallels between System 1 and zero-shot prompting, while chain-of-thought prompting reflects System 2 reasoning through explicit, stepwise deliberation.	Computer scientists have compared zero-shot prompting to intuitive thinking, whereas chain-of-thought prompting forces the model to generate intermediate tokens sequentially, altering the context window to mathematically constrain the final token selection.	Appending intermediate tokens does not initiate 'System 2 deliberation'; it simply expands the historical input vector, modifying the self-attention weights to increase the probability of outputting tokens that align with structured logical patterns.	NLP researchers and marketing executives at corporate AI labs choose to apply these psychological frameworks to make statistical text generation appear more intelligent and human-like to the public.
CA techniques—particularly the use of Argumentation Schemes (AS) and their associated Critical Questions (CQs)—could guide LLMs to assess the logical soundness and veracity of arguments by questioning their underlying structure.	Computational argumentation techniques—specifically the integration of structured Argumentation Schemes and Critical Questions—can be used to prompt LLMs to classify text into predefined categories and generate follow-up queries that correlate with logical templates.	The model cannot 'assess soundness or veracity' because it lacks access to empirical reality or causal understanding; it merely checks for statistical correlations and semantic patterns against structured training templates.	The researchers at UPV designed the prompts and classification rules to guide the model's outputs, and they choose to deploy this system to evaluate arguments, bearing full responsibility for any misclassifications.
The model then acted as an expert assistant in computational argumentation, producing both quantitative and qualitative justifications for each argument’s truthfulness.	The LLaMA 3 70B model generated text simulating the persona of an expert assistant, retrieving documents via search APIs and synthesizing summaries and scores that matched the requested evaluative templates.	The model does not 'act as an expert' or provide 'justifications'; it generates token strings that mimic professional advice by summarizing search results and calculating probability weights over evaluative vocabulary.	The UPV engineering team programmed the system to retrieve search results and formatted the output to present a highly authoritative 'expert' persona, thereby assuming responsibility for the credibility of the generated justifications.
Module 1: Evaluating CBs in LLM Outputs. This module examined how prompt-induced CBs affect LLM accuracy and consistency.	Module 1: Evaluating Prompt Sensitivity in LLM Outputs. This module examined how variations in prompt phrasing alter token probability weights, leading to changes in classification accuracy and statistical consistency.	LLMs do not possess 'cognitive biases' (CBs); they exhibit mathematical sensitivity to specific prompt tokens because their attention mechanisms and learned weights are highly responsive to context variations.	N/A - describes computational processes without displacing responsibility.
All models struggled to distinguish acquiescence bias, often misclassifying it as unbiased.	All evaluated models demonstrated low classification accuracy (low F1-scores) when mapping inputs representing acquiescence bias, frequently assigning them to the 'unbiased' category due to overlapping vector representations.	The models do not 'struggle' or 'misclassify' due to cognitive failure; they experience mathematical convergence limitations where the semantic embeddings of the training classes are not clearly separated by the decision boundary.	The research team designed a classification pipeline with decision boundaries that failed to separate acquiescent text from unbiased text, and they chose to deploy this architecture without adequate data separation.
These results suggest that explicit bias warnings can trigger more deliberative, System 2-like reasoning in LLMs, enhancing both accuracy and interpretive robustness.	These results suggest that appending explicit warnings to the input prompt alters the self-attention weights, shifting the output probability distribution toward tokens that represent unbiased and structured reasoning patterns.	Appending a warning does not trigger 'deliberative reasoning'; it simply modifies the input vector, causing the feedforward layers of the network to generate outputs that align with unbiased training examples.	The authors (Gutiérrez-Mandingorra et al.) designed this prompt-engineering mitigation technique, choosing to append warning strings to alter model outputs, and they are responsible for verifying its statistical reliability.
A central direction involves the integration of multi-agent architectures composed of specialized expert agents capable of collective reasoning, voting and ethical deliberation to improve bias detection and veracity assessment.	A central direction involves designing software architectures that query multiple LLM instances with specialized prompts, using voting algorithms and text-aggregation routines to combine their classifications and evaluate arguments.	The software subroutines cannot engage in 'collective reasoning, voting, or ethical deliberation'; they execute programmed text-routing, template-matching, and mathematical aggregation operations without moral awareness or intent.	Future system architects at UPV and other institutions will design these multi-agent workflows, write the aggregation rules, and establish the moral criteria, making them fully accountable for the ethical decisions of the system.

Task 5: Critical Observations - Structural Patterns

Agency Slippage

The text systematically moves between mechanistic and agential registers to maintain academic authority while asserting high model capabilities. In Section 1, the author establishes credibility by presenting a highly mechanistic view: 'LLMs fundamentally rely on pattern recognition rather than genuine understanding; they assess surface structure rather than the logical validity of arguments.' This disclaimer functions as a rhetorical defense. However, in Sections 2 and 3, once the experimental modules are introduced, the framing shifts abruptly into an agential register. We see the model described as 'acting as an expert assistant,' 'producing justifications,' 'struggling' with acquiescence, and executing 'System 2-like reasoning.' This oscillation operates along a strategic gradient: mechanical constraints are used to describe failures or general properties, while agential, psychological terms are used to describe the model's outputs and the success of the author's interventions. The 'curse of knowledge' is highly active here: because the researcher understands formal logic and argumentation schemes, she projects this subjective comprehension onto the generated token outputs, assuming the system is 'evaluating logic' when it is actually executing mathematical template-matching. The explanation types identified in Task 3 facilitate this slippage by using Functional and Theoretical explanations to construct the model as a self-regulating cognitive agent, while hiding the simple prompt manipulations behind agentless passive constructions (e.g., 'warnings were appended'). This slippage makes the system appear highly sophisticated and adaptive, while making the human engineering choices that actually drive the output invisible.

Metaphor-Driven Trust Inflation

The text constructs epistemic authority and trust by framing the LLM as a professional and moral peer. By describing the model as an 'expert assistant in computational argumentation' that generates 'justifications' for 'truthfulness,' the paper maps the epistemic and ethical responsibilities of human scholarship directly onto a non-conscious system. This language encourages 'relation-based trust'—which relies on an assumption of the system's intentionality and ethical alignment—rather than 'performance-based trust,' which strictly evaluates statistical reliability. When the model outputs incorrect classifications, the text maintains trust by framing these errors agentially as a relatable 'struggle' to 'distinguish' concepts, rather than a technical failure of high-dimensional vector separation. This agential framing of failure preserves the illusion of the model's integrity: it is presented as a well-intentioned student struggling with a difficult lesson, rather than an unreliable computational tool. This creates substantial risks when applied to statistical systems; users will trust 'expert justifications' that are actually ungrounded, probabilistic token sequences, creating a severe danger of automation bias and epistemic dependence. In high-stakes fields like medicine or policy, treating a statistical generator as a trusted deliberative agent can lead to the uncritical acceptance of highly plausible but factually incorrect outputs, shifting the locus of trust from human accountability to an opaque mathematical artifact.

Obscured Mechanics

The agential and psychological metaphors used in the text systematically obscure the technical, material, and economic realities of the described systems. Under the 'name the corporation' test, the text mentions several models, including LLaMA (Meta), GPT-4o (OpenAI), Qwen (Alibaba), and DeepSeek (DeepSeek). Despite naming these artifacts, the text's psychological vocabulary ('cognitive biases,' 'deliberative reasoning') conceals their proprietary, black-box nature and the material dependencies that govern them. First, the technical reality of data dependency is hidden: the model does not 'know' logic; it matches patterns in scraped data. Second, the material cost is completely erased: running quantized LLaMA 3 70B models with parallel search API queries to Google and Bing consumes significant energy and infrastructure, which is hidden behind the clean, abstract metaphor of an 'expert assistant.' Third, the labor of human actors—such as data annotators, RLHF workers, and Wikipedia contributors who build the data foundations—is made completely invisible. Finally, the commercial objectives and profit motives of Meta, OpenAI, and Alibaba are ignored; these models are engineered to maximize market share and engagement, not objective truth. By claiming the AI 'knows' and 'deliberates,' the text presents a highly extractive, corporate-controlled, and environmentally expensive infrastructure as an autonomous, self-contained, and objective cognitive mind, serving commercial interests by hiding the material costs and systemic dependencies of the technology.

Context Sensitivity

The distribution of anthropomorphic and consciousness-attributing language is highly strategic across the text, concentrating around capabilities and future visions while disappearing in technical setup sections. In Section 1, where academic credibility is established, the metaphor density is low, and the mechanistic limitations of LLMs are explicitly stated. However, in Sections 2 and 3, which describe the author's specific research contributions (Modules 1, 2, and 3), the agential language intensifies. Here, the system's successful outputs are framed in high-status cognitive terms: the model 'acted as an expert,' 'justified' arguments, and executed 'System 2 reasoning' when 'warned.' This capability-limitation asymmetry is a major rhetorical feature: capabilities are described in agential and cognitive terms, suggesting autonomous intelligence, while limitations are described in mechanical terms (e.g., 'suboptimal results' due to 'single-layer classifiers') or pathologized as 'prompt-induced CBs.' The register shifts from an acknowledged parallel ('drawn parallels') in the introduction to a literalized application ('trigger System 2-like reasoning') in the mitigation results. This strategic deployment serves to market the doctoral research as a highly sophisticated cognitive intervention. By framing prompt-engineering as 'triggering deliberation,' the text elevates a simple engineering trick into a profound cognitive and ethical breakthrough, aligning the narrative with the research community's desire for safe, autonomous, and self-reflecting AI systems.

Accountability Synthesis

Accountability Architecture

This section synthesizes the accountability analyses from Task 1, mapping the text's "accountability architecture"—who is named, who is hidden, and who benefits from obscured agency.

The text constructs a comprehensive 'accountability sink' by systematically attributing agentive action to the LLM while rendering the human designers, deploying institutions, and corporate owners invisible. Throughout the paper, passive and agentless constructions are used to describe critical engineering decisions (e.g., 'bias was introduced,' 'warnings were appended,' 'experiments were conducted'). Under the 'name the actor' test, the UPV research team designed the prompt templates, selected the training datasets, and chose to use LLaMA 3 with commercial search APIs. By attributing the subsequent text generation entirely to the model's autonomous agency ('the model acted,' 'the model produced,' 'the model struggled'), the discourse removes human actors from the causal chain of decision-making. If this framing is accepted, the legal, ethical, and financial liabilities of system failures are diffused: errors are treated as 'cognitive struggles' of the AI rather than engineering failures or corporate product defects. If we 'name the actor'—stating, for example, that 'Gutiérrez-Mandingorra et al. designed an automated script that queries Google and summarizes the results without human verification'—the illusion of autonomous expertise collapses. It reveals a highly fragile, human-dependent software tool, making the researchers' epistemic choices and potential liabilities immediately visible, and enabling critical scrutiny of the commercial and institutional interests that benefit from presenting automated utilities as autonomous minds.

Conclusion: What This Analysis Reveals

The Core Finding

Mechanism of the Illusion:

The metaphorical system creates this 'illusion of mind' through a subtle, rhetorical sleight-of-hand that blurs the distinction between computational processing and conscious knowing. The primary mechanism of persuasion is a strategic oscillation between mechanistic disclaimers and aggressive anthropomorphic assertions. The author establishes scientific credibility by stating that LLMs lack 'genuine understanding,' but then immediately proceeds to use active, consciousness-attributing verbs ('assess,' 'struggled,' 'deliberate,' 'justify') that linguistically reconstruct the very 'mind' that was just disclaimed. This process is heavily driven by the 'curse of knowledge': because the researcher and the audience understand the formal, logical structures of argumentation (such as Argumentation Schemes), they project their own conscious understanding of these structures onto the generated output. The system is assumed to 'understand' the logic of its own output simply because it outputs syntactically correct representations of logic. This illusion is reinforced by a cumulative causal chain: the text first frames sequential token generation (chain-of-thought) as 'stepwise deliberation,' leading the audience to accept that the model can be 'warned' into a deliberative 'System 2' state, which finally makes the claim that a network of these models can engage in 'ethical deliberation' and 'voting' seem highly plausible. This temporal progression exploits the audience's natural cognitive vulnerability to anthropomorphism, transforming a series of feedforward matrix multiplications into a legislative assembly of reasoning digital minds.

Material Stakes:

Categories: Regulatory/Legal, Epistemic, Social/Political

The material stakes of this anthropomorphic discourse are profound, directly affecting regulatory, epistemic, and social-political domains. In the Regulatory/Legal domain, framing LLM outputs in agential, cognitive terms ('models struggled,' 'prompt-induced biases') creates an 'accountability sink' that diffuses legal liability. If a model deployed in a clinical setting makes a critical diagnostic error (such as the 26% reduction in diagnostic accuracy cited in the text), treating the error as a 'cognitive bias' or an 'understandable struggle' makes it legally and ethically ambiguous. It shifts responsibility away from the developers (Meta, OpenAI) and the researchers who chose to deploy an unverified, black-box system, leaving the harmed patient to bear the cost while protecting corporate profits. Epistemically, when users believe a model 'knows' and 'justifies' its claims as an 'expert assistant,' they develop an unwarranted, relation-based trust. This leads to severe automation bias, where human experts defer their analytical and critical responsibilities to a system that cannot verify facts, making society highly vulnerable to the rapid propagation of 'expertly justified' misinformation. In the Social/Political domain, framing multi-agent software as a democratic assembly capable of 'voting and ethical deliberation' assigns a false moral and political legitimacy to automated systems. Decisions that should be the subject of human democratic debate (such as content moderation or information screening) are transferred to algorithmic pipelines, disenfranchising human citizens while concentrating power in the hands of the corporate elites who write the underlying scoring and routing algorithms.

AI Literacy as Counter-Practice:

Practicing linguistic precision acts as a critical counter-practice to these material risks by systematically stripping away the agential mystique and exposing the underlying mechanics of AI systems. Translating the paper's anthropomorphic language into precise, mechanistic descriptions demonstrates the power of this practice. Replacing consciousness verbs with mechanistic ones (e.g., reframing 'the model knows/understands' as 'the system retrieves and calculates token weights based on training distributions') forces an immediate recognition of the complete absence of awareness, intent, or semantic grounding in these models. It reveals that the 'expert assistant' is not a reasoning mind but a probabilistic synthesizer of scraped text. Furthermore, restoring human agency—by reframing agentless constructions to explicitly name the researchers, engineers, and corporate actors who design and profit from these systems—directly counters the accountability sink. It makes the human decisions behind data selection, prompt design, and deployment visible and contestable. However, implementing this systematic precision faces significant institutional resistance. Corporate AI developers, academic researchers, and marketing departments heavily rely on anthropomorphic language to inflate the capabilities of their products, attract venture capital, secure research funding, and evade regulatory oversight. Precision is a threat to these commercial and professional interests, making the promotion of critical algorithmic literacy a highly contested, political struggle for epistemic transparency and democratic accountability.

Path Forward

The future of AI discourse lies at a critical juncture, with different vocabulary choices shaping distinct social, technical, and institutional paths. We can map three primary discursive approaches, each presenting unique trade-offs and serving different stakeholder interests. The first approach is Mechanistic Precision, which mandates the strict use of technical, non-agential language (e.g., replacing 'the model thinks' with 'the model computes activation states'). This approach maximizes epistemic clarity and technical accuracy, making safety limitations, data dependencies, and developer liabilities highly transparent. However, it incurs a high cognitive cost for lay audiences, potentially reducing the accessibility and intuitive grasp of complex systems. The second approach is Functional Anthropomorphism (the status quo), which uses agential shorthand to make systems highly accessible and easily integrated into daily life. While this approach lowers the barrier to user engagement and facilitates rapid commercial adoption, it systematically obscures developer responsibility, generates widespread capability overestimation, and creates severe risks of automation bias and displaced liability. A third, hybrid approach involves Discursive Multi-tiering, where academic, clinical, and regulatory environments enforce strict mechanistic precision, while user-facing interfaces are allowed to use functional, metaphoric descriptors, provided they are accompanied by mandatory, standardized capability disclosures that explain the mechanistic reality (e.g., explaining that the 'assistant' is a text-predictor with no access to ground truth). Implementing any of these approaches requires structural changes, such as academic journals mandating mechanistic translations, funding agencies diversifying explainability standards, or regulatory frameworks requiring corporations to publish explicit, unhedged capability audits. Ultimately, different discourse communities will continue to advocate for vocabularies that serve their specific goals, and the path chosen will decide whether AI remains a mystified, autonomous authority or is recognized as a highly engineered, human-accountable tool.

Extended Processing Summary

Gemini's Processing Steps

What you are about to read…

Gemini refers to the text below as "thought summaries." This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as just another rhetorical artifact—a way of making the model's processing legible. When the model writes "I should double check that...", it is mathematically altering its own probability distribution for the next token to be more accurate. The "I" is a functional gear in the machine, not a self-reference. This language is a deliberate interface design choice, not a biological reality. The developers could have programmed the model to output passive logs. Instead, they chose to simulate a narrator.

Analyzing the Approach

I'm currently focused on defining the approach. I'm aiming for a rigorous Critical Discourse Analysis of the academic paper, specifically targeting biased and fallacious reasoning in Large Language Models. My key task is to identify key theoretical frameworks and methodologies to ensure a robust analysis. I'm exploring different CDA models to determine the most effective for this scope.

Refining the Constraints

I've moved on to refining the constraints, mainly related to JSON output validation. I've re-reviewed the restrictions on comments and extra text, ensuring I adhere to the strict JSON formatting. Word count minimums for specific fields are now clearly defined, as is the need for correct double-quoting. I am now double-checking the edge cases to see if the defined rules are correctly interpreted.

Defining the Sections

I'm now deep in defining each section required. Specifically, I'm focusing on the Critical Observations component. I'm aiming for five distinct sections, each ranging from 400 to 500 words. Additionally, I've outlined the conclusion, which will be structured as five paragraphs, each between 250 and 350 words, with paragraph 5 being slightly longer, at 300-350 words. I am currently considering how to calibrate the classification tasks such that I can generate sufficient variation and contrast between the enumerations and their associated justifications.

Extracting Key Insights

I've begun extracting key insights from Ana Gutiérrez-Mandingorra's AAMAS 2026 Doctoral Consortium paper. I'm focusing on concepts such as the reproduction of systematic errors in LLMs, the impact of cognitive biases and dual process theory. I am now synthesizing the key arguments around pattern recognition versus genuine understanding, and how this relates to fallacious reasoning. My immediate task is to identify key passages on the role of argumentation in detecting and mitigating flawed reasoning within the text.

Identifying Key Biases

I'm now identifying key biases documented in the text, focusing on acquiescence, availability, and the bandwagon effect. I am noting how different models, such as LLaMA 3.3:70B, have performed. It seems there has been difficulty in distinguishing acquiescence bias. I am now considering the use of explicit bias warnings and deliberative reasoning strategies to improve model accuracy. I am now exploring potential multi-agent architectures for collective reasoning and ethical deliberation.

Exploring Metaphor Usage

I'm now diving into the 'Metaphor and Anthropomorphism Audit' task. I'm focusing on instances where human cognitive attributes are projected onto the models. I'm examining how phrases like "reproduce systematic errors" and "System 1... and System 2... reasoning" are used, and considering their implications. I am now exploring the extent to which these metaphors are helpful or misleading in understanding the models' behavior.

Examining Cognitive Attributions

I'm now zeroing in on instances of anthropomorphism and metaphorical language to flesh out the analysis. I'm carefully examining how human cognitive attributes are projected onto LLMs, such as "struggling cognitively," or "deliberation." The focus is on the implications and impact of this framing. I'm taking a close look at how these metaphors influence our understanding and evaluation of model behavior. I am also investigating how to carefully balance word count targets for the instances.

Deconstructing Attributions

I'm now analyzing the projection of biological cognition onto the model. The key is how the author frames the model as having biological cognition and its associated errors. I am now looking at how this view shifts the focus from structural limitations to cognitive ones, implying an internal reasoning apparatus. I'm focusing on the implications of this framing and its potential for misinterpretation.

Analyzing the Hedges

I'm now analyzing the use of conceptual hedges within the text. I am considering the sentence structure, specifically, the way the replication of cognitive error is presented directly, but then immediately qualified. This qualification seems to act as a hedge, providing a balance. It seems to inflate the perceived sophistication of the models. The focus of the framing obscures the underlying mathematical limitations of the transformer architecture, which introduces potential for misinterpretation.

Unveiling the Attribution

I am now focusing on the attribution and agency in the text. I am analyzing how accountability is handled within the text, and considering the use of passive voice and agentless structures in describing LLM behavior. I'm exploring who is rendered visible or invisible in these attributions. I am taking into account the impact of framing LLMs' errors as cognitive phenomena, and identifying the role of developers in the design choices.

Mapping the Analogy

I'm now deep in analyzing how Kahneman's Dual Process Theory is mapped onto prompt structures. Specifically, I'm analyzing "reflects System 2 reasoning" as a projection, and I'm examining how that framing implies self-reflective agency. I am now evaluating the "hedged" nature of the connection between psychological frameworks and computational functions, noting the "drawn parallels" and "reflects" language used to qualify this direct attribution. I am also investigating the impact of the metaphor on understanding the model.

Interpreting the Parallels

I'm now fully immersed in dissecting how Kahneman's Dual Process Theory is framed in the paper. I've noted that while "System 2 reasoning" is asserted, it's immediately qualified as a parallel. I'm investigating how this framing subtly creates the illusion of human-like intellectual maturity within the model, leading to overestimation of its capacities. I'm now exploring the potential liability gaps this creates in automated decision-making.

Identifying the Framing

I'm now identifying the framing of the LLM as a "rational evaluator/judge of truth." The text uses phrases that attribute human-like intellectual capacity to the model. I am dissecting how this framing maps a human intellectual capacity onto the LLM, suggesting the capacity to "assess... soundness and veracity." My current focus is to determine the impact on perceptions of the models' reasoning.

Deconstructing the Assessment

I'm now analyzing the implications of attributing the capacity to "assess logical soundness and veracity" to the LLM. I've noted that while the text does not hedge or qualify the attribution, the statement assumes that the system understands the underlying semantics and the intent in human discourse. I am now exploring the implications of positioning an LLM as a rational, conscious evaluator of truth.

Unveiling the Risk

I am now focused on the implications of positioning LLMs as arbiters of truth, and examining the resulting epistemic risks. I've noted that trusting LLMs to assess veracity presents risks of automated censorship. I am also exploring how such a framing creates a projection of authority that could be perceived as unbiased and objective. I am now deep in investigating potential scenarios in which this projection leads to the propagation of biases. I am also investigating whether this is by design or by accident.

Re-evaluating the Authority

I'm now focusing on the framing of the model as an "expert assistant," considering the implications of this role assignment. This involves consciously-bound reasoning, epistemic responsibility and intent. I am scrutinizing the implications of assigning "justification," which suggests the model understands and believes its statements. It transforms a statistical generator into an intellectual authority. I'm now exploring how this framing might influence perceptions of the model's reliability and expertise.

Assessing the Assignment

I'm now carefully assessing the implicit assignment of the role "expert assistant" to the model. I'm focusing on the implications of the unacknowledged direct assignment of the term. The text doesn't provide qualifiers, which suggests a real performance. I am now exploring how this unacknowledged assignment creates potential liability for users who adopt this framing. I am also investigating the implications on authority.

Uncovering the Liability

I'm now fully immersed in the "AccountabilityAnalysis." I'm currently focusing on the absence of the developers and the framing of the model as the sole active agent, responsible for expert work. I am considering the implications of this actor visibility classification and the impact on ethical considerations. My current task is to understand how the phrasing transfers liability to the model.

Analyzing the Implication

I'm now analyzing the implications of attributing cognitive biases to LLM outputs. It appears the projection of cognitive biases, originally evolutionary and psychological phenomena, onto statistical output patterns, implies that the model possesses a "cognition" that can be "biased." I am now focusing on determining the impact of this attribution on how users perceive the system. This seems to present the model as having psychological states, such as "acquiescence" or "bandwagon".

Deconstructing the Implication

I'm now digging into the attribution of "prompt-induced CBs." I am noting that the paper explicitly acknowledges it's the prompter or data, but then immediately refers to the "CBs in LLMs." This creates a slight shift in accountability, which seems to imply that the model, not necessarily the prompt, is responsible. I am now evaluating the implications of this actor visibility classification and the impact on the paper.

Unveiling the Responsibility

I'm now analyzing the implications of the current "actor visibility classification" and its impact on responsibility. I am focusing on how the phrasing transfers liability to the model, and the lack of explicit naming of Meta, OpenAI, or the specific researchers designing the prompts. I am exploring how this framing might obscure the agency of those who directly influence model behavior. I am also investigating the impact on the "name the actor" test.

Interpreting the Projection

I am now focusing on the attribution of consciousness to LLMs. The text claims they "struggle" and "misclassify," which suggests intentional effort. This framing implies a "digital mind," not merely a statistical classifier. I am investigating how this impacts trust and user perceptions, obscuring technical limitations. The emphasis on "struggle" and "trying their best" distracts from architecture and data issues.

Deconstructing the Struggle

I'm now fully immersed in the accountability analysis, focusing on how the framing of LLMs as "struggling" obscures developer responsibility. The language used in the paper places the full burden of failure on the model itself, effectively hiding the critical role of human choices in the underlying system. This framing of "internal struggle" obscures developer accountability and shifts focus from structural limitations to individual agency.

Evaluating the Parallels

I'm now carefully evaluating the parallels drawn between prompt structures and human cognitive processes. My focus is on the projection of "deliberative reasoning" onto LLMs. I am scrutinizing the use of the term "System 2-like," noting the subtle implication of a direct mapping. It appears the system merely shifts attention toward logical token sequences. I am also exploring how to incorporate this into my existing analysis.

Assessing the Overlap

I'm now assessing the direct parallels drawn between human cognitive processes and the model's behavior. I'm focusing on how "System 2" and "deliberative reasoning" are presented. I'm noting the subtle hedging with "-like" and "suggests." The text implies that prompt engineering provides an analog for deliberative reasoning, potentially inflating perceived sophistication and masking risks. The passive voice and agentless phrasing obscure the system, but imply it can trigger.

Examining the Delegation

I am now closely examining how the text delegates "collective reasoning," "voting," and "ethical deliberation" to software modules within multi-agent architectures. It seems that this projection paints a picture where software subroutines possess complex human capabilities, blurring the lines. This raises questions about agency and control in the models.

Interpreting the Projection

I am now focusing on the implications of describing algorithms as capable of "collective reasoning," "voting," and "ethical deliberation," without qualification. This attribution projects the complex, value-laden process of human consensus-building onto mathematical algorithms. This framing suggests the software possesses moral agency and conscious intent, which blurs agency lines.

Analyzing the Accountability

I'm now fully immersed in the accountability analysis, specifically focused on the "actor visibility" and the implications of the "democratic and moral legitimacy" that's assigned to automated decisions. I've noted that while multi-agent systems are presented, the text obscures the human engineers responsible for design. I'm considering the potential "accountability sink" that arises from the language, which hides the corporate interests and algorithmic targets from the human designers, using agential language. This creates issues, as it seems the software agents will perform ethical deliberation autonomously.

Mapping the Comparison

I'm now fully immersed in the "Source-Target Mapping" analysis and have progressed to Mapping 1, which focuses on the projection of human cognitive limitations onto the model's output. I'm exploring how the text frames the LLM's errors as a "reproduction" of human cognitive flaws, implicitly comparing statistical outputs to cognitive processes. This suggests that the model has agency and an internal mental state. I am now exploring the implications.

Deconstructing the Projection

I am now focusing on how the text frames the model's behavior using the lens of cognitive limitations. It seems that attributing human qualities like error "reproduction" obscures the purely statistical process. The text hides the lack of internal models and ground truth. My immediate focus is on how this framing impacts how users perceive the system, and what the liabilities might be.

\n\n Examining the Comparison

I'm now fully focused on Mapping 2, analyzing the direct comparison between dual-process psychology and autoregressive prompt techniques. I am now exploring how the parallels drawn between "System 1" and zero-shot, versus "System 2" and chain-of-thought, are framed within the text. My focus is now on the implications of the source-target mapping, specifically regarding the underlying psychological systems and prompt engineering.

Analyzing the Implications

I'm now analyzing the implications of the "chain-of-thought" mapping. The text suggests step-by-step text generation mirrors human deliberation. This obscures the fact that it's still probabilistic token generation using the same statistical processes. It also conceals how errors can compound, creating a transparency issue. I'm focusing on these aspects for a comprehensive analysis.

Deconstructing the Projection

I am now focusing on the implications of Mapping 3, which concerns "evaluating soundness" and "veracity" within an academic framework. This comparison obscures the LLM's limitations. I'm noting the projection of conscious "critic" qualities onto the LLM, despite its lack of access to external grounding or true semantic understanding. I am now exploring potential liabilities within this framing.

Interpreting the Projection

I am now focusing on Mapping 4, where the model acts as an "expert assistant." It has become clear this equates LLM outputs with expert human advice and ethical accountability. The framing encourages blind trust in the statistical output, masking the curated search results and black-box LLM processing. I am now evaluating the implications of projecting human expertise onto the LLM output.

Unveiling the Facade

I am now dissecting the implications of assigning "conscious analysis and ethical duty" to the model's output, essentially projecting professional expertise. This conceals that the justifications are statistically derived strings and not actual verified evidence. The current focus is the lack of real-world grounding, and the fabricated persona, designed to look expert by the UPV researchers.

Deconstructing the Projection

I am now focusing on dissecting the cognitive bias mapping, emphasizing the shift in probabilities due to prompt phrasing. This projects human acquiescence onto LLM outputs. It suggests that the model’s failure modes are similar to human biases. I'm noting the mathematical operations are obscured. My current task is to understand the impact on perceived agency.

Analyzing the Attribution

I am now deep in Mapping 6, focusing on how the text frames LLMs as "struggling" to distinguish acquiescence bias. The projection of "struggling" onto statistical misclassification obscures how the framing hides the impact of the training data and model architecture. I'm focusing on the ethical implications.

Dissecting the Analogy

I'm now zeroing in on Mapping 7, analyzing the shift of attention. It appears that the text equates bias warnings (for humans) to changing attention weights in LLMs, which obscures the difference between introspection and probabilistic shift. My immediate task is to fully articulate the implications of this framing and its potential for misinterpretation.

Deconstructing the Analogy

I'm now fully immersed in dissecting Mapping 8. I'm focusing on the mapping of "ethical deliberation" from humans to software, and the framing of a "moral compass" being encoded in algorithms. The text claims the model is "capable of ethical deliberation", but that obscures the fact that it is a complex computation on token distributions, not a conscious moral agent.

Deconstructing the Delegation

I'm now fully immersed in the accountability analysis. I'm focusing on the projection of "democratic and moral legitimacy" onto automated decisions. The source and target domains imply the software has moral agency, but my focus is on the ethical implications that arise from this agency projection. I am evaluating how this framing might lead to a lack of accountability and obscure the human design choices.

Evaluating the Explanation

I'm now fully immersed in the "Explanation Audit," specifically analyzing Instance 1. I'm focusing on the "Theoretical" and "Dispositional" explanation types, and how the text frames the LLM's agency, by attributing it's ability to "deliberate". This makes the LLM seem more human-like. The parallels between System 1/2 reasoning and prompt types obscure the distinction between model architecture and human cognition.

Analyzing the Connection

I'm now analyzing the parallels between prompt structures and System 1/2 reasoning. This framing, particularly the "reflects" language, attributes deliberative capacity to the model. I've noted that while the technique may produce improved results, the language obscures the underlying mathematical reality of token generation. The discussion obscures model's architecture by linking this process to psychological mechanism.

Examining the Projection

I am now closely examining how the text constructs an illusion of a self-correcting cognitive agent. I find that I'm currently noting the effects on audience perceptions of risk. I've noted that the framing, particularly the connection to "System 2" and "deliberative" processes, obscures the model's statistical underpinnings. The rhetorical impact of this language shifts perceptions, which is my focus.

Refining the Mapping

I'm now fully immersed in the analysis of Instance 2. I've noted the shift in the framing from pattern recognition to the "assessment" using the word "assess." The text clarifies the model's limitations, which emphasizes statistical operation over "genuine understanding." I am evaluating the implications of these explicit limitations, while considering that the action verb "assess" does soften the underlying mechanics.

Clarifying the Focus

I'm now clarifying the focus on evaluating the "Epistemic Claim Analysis". I'm considering the use of active verbs like "assess," alongside mechanistic explanations, to soften the system mechanics. My aim is to determine if this helps establish a baseline for audience perception. This should help establish objectivity before later agential metaphors.

Interpreting the Projection

I am now focusing on dissecting Instance 3, where the model functions as an "expert assistant" within an argument. The explanations are "Intentional" and "Reason-Based." The framing of the system provides intentionality and rationale, projecting an agent-like persona onto it. I am now considering the contrast to the lack of external grounding in truth, and the implications of an agential description for the reader.

Deconstructing the Act

I am now dissecting Instance 4, focusing on a communicative act designed to defend a truth claim. The framing emphasizes functional utility, obscuring any actual reasoning or conscious belief behind the "justifications". This strongly projects cognitive agency via verbs like "producing justifications," implying "knowing" and rationale. The text emphasizes processing but obscures underlying mechanics.

Examining the Projection

I am now focusing on dissecting Instance 4, with the focus now on deconstructing the attribution of "struggle" and the intentionality implied. The text frames the model's inability to distinguish acquiescence bias as a personal "struggle". This obscures that the model is simply statistically sampling data and classifying it, so the misclassification is an artifact of the model architecture, not an internal effort. I am now exploring its implications.

Dissecting the 'Struggle'

I'm now zeroing in on the attribution of a "conscious struggle" to the model's performance, contrasting with a purely mechanistic explanation. I'm noting the implications of framing a statistical misclassification as a cognitive deficit, rather than a mathematical limitation. This pathologizes the model's outputs and hides the human elements.

Analyzing the Struggle

I'm now fully immersed in the "Explanation Audit". I'm analyzing the functional and theoretical explanations in Instance 5. I've noted that the framing in the text shifts from mechanics to "deliberative" effort and "System 2" reasoning, obscuring the statistical nature of the process. I am now exploring how to classify the attribution to see how this impacts my analysis.

Interpreting the Projection

I am now focusing on dissecting Instance 6, which employs a functional explanation but veers toward agent-like language. The explicit warning supposedly "triggers" deliberative reasoning. This frames the model's internal processing as a self-regulating cognitive feedback system. The text conflates computation with conscious thought, and also suggests that it can adapt. The current analysis is focused on this conflation.

Refining the Focus

I am now focusing on refining the analysis to center on the misleading implications in the paper, specifically the "triggered deliberative reasoning." The language implies a transition to a "deliberative" state, which is not true. It is a mathematical process. I'm focusing on the rhetorical impact that leads to the belief that LLMs are self-correcting or "value-aligned," which masks the statistical instability.

Evaluating the Parallels

I'm now zeroing in on Reframing 1 and Reframing 2. I'm focusing on contrasting the original wording with the rephrased alternatives, particularly where they address the issue of agency attribution. The goal is to fully understand how the changes clarify the model's limitations, while preserving attribution. I'm noting the changes that help reinforce human intent and de-emphasize agential language.

Revising the Reframing

I'm now fully immersed in refining the reframing, specifically focusing on the implications of reframing 3. I'm carefully analyzing the changes to attribution within this context, especially as they pertain to cognitive functions. The goal is to accurately represent the LLM's limitations and underscore human agency, which will be the primary focus. I will be looking to emphasize the impact on the audience.

Refining the Reframing

I am now closely analyzing the original and rephrased quotes for Reframing 4. The initial phrasing implied active reasoning and independent justification. Reframing focuses on the structured generation of statistically-derived text, and the human design behind the prompts. I am working on clarifying the implications of this shift, ensuring user comprehension and emphasizing the statistical basis and human authorship, and highlighting the design choices that were made.

Revising the Focus

I'm now revising the framing around agency and expertise in Reframing 4 and 5. I am focusing on shifting attribution. The text now emphasizes the human-designed statistical text generation and the role of the UPV engineering team. I'm noting the changes that highlight statistical generation and user responsibility. I'm currently working to emphasize the changes to the actor, and the implications.

Refining the Focus

I'm now fully focused on Reframing 6. The original language uses "struggled" and "misclassifying," projecting agency and cognitive failure. The revised version emphasizes mathematical convergence issues and overlapping vector representations, clarifying the statistical nature of the issue. I am now exploring how this rephrasing de-anthropomorphizes and accurately reflects limitations in the system.

Clarifying the Shifts

I'm now clarifying the shift in focus for reframing 7. The original text uses the term "trigger," which projects agency. The revision replaces it with "alters" and describes a change in input vector weights. It also mentions shifting the output distribution. I'm noting the changes that emphasize a mathematical process rather than a self-correcting response. I am focusing on how it corrects the underlying mechanics.

Revising the Explanation

I'm now revising the explanation for Reframing 8, focusing on the deconstruction of agential language. The revision makes it clear that software routines are just executing programmed operations. The text replaces "collective reasoning," "voting," and "ethical deliberation" with "text-routing," "template-matching," and "aggregation." It is now also emphasizing the absence of moral awareness.

Analyzing the Slippage

I'm now analyzing the oscillations between mechanistic and agential framings, as I begin constructing Task 5. The focus is now on the rhetorical function of the shifting, particularly the use of technical disclaimers to establish credibility. This allows a transition towards agential framing when describing experimental modules, which provides an interesting contrast.

Deconstructing the Slippage

I'm now analyzing the shift from mechanistic to agential language. The transition from technical descriptions, to agent-like metaphors seems intentional. This obscures the role of researchers who manipulate prompts. This creates a perception of autonomous self-correction.

Examining the Slippage

I'm now analyzing the shift from mechanistic explanations to agential framings, and its impact on authority. The deliberate oscillation obscures the underlying prompt dependencies while building trust in the model. By projecting an "expert assistant" role, the text leverages human trust paradigms and encourages "relation-based" trust, rather than statistical performance. I'm focusing on the effect on decision making.

Deconstructing the Projection

I am now focusing on dissecting the attribution of struggle and agency, and how the text obscures the mechanical aspects. It seems attributing a "struggle" to LLMs' output shifts responsibility. The text promotes users' trust by presenting the system as a well-intentioned entity. The analysis now delves into the ethical implications of this obscured mechanics, and the dangers of this relational trust for the users.

Unveiling the Omission

I'm currently focusing on how the paper's language obscures the concrete realities behind LLM outputs. I've noted the abstraction, particularly the use of agential terms like "cognitive biases," and "expert assistant". I'm now exploring how this framing hides dependencies on search engine rankings and training data. My aim is to unveil the critical realities concealed, encompassing technical, material, labor, and economic aspects.

Mapping the Abstraction

I'm now analyzing how the paper strategically uses agential terms to obscure the models' dependencies on training data and prompts, specifically ignoring OpenAI, Meta, and Alibaba. I am exploring how claims of knowledge, reasoning, and deliberation present a commercial probabilistic generator as an independent cognitive mind.

Examining the Strategic Shift

I'm now analyzing the strategic shift in language, noting how it alternates between agency and mechanism to frame LLMs. I see that agential terms like "expert assistant" are used for positive aspects, while limitations are framed as "struggles" or pipeline issues. The shift masks statistical mechanics by projecting cognitive agency, building an agential narrative. I'm noting the marketing implications.

Synthesizing Responsibility Patterns

I'm now synthesizing Task 1 analyses to reveal systematic displaced responsibility, constructing the accountability sink. The paper consistently frames LLMs as the sole agent, evaluating arguments, mitigating biases, and deliberating on ethics. Human developers and researchers are largely invisible, and biases are framed as inevitable cognitive contaminations. This shift obscures specific design choices.

Constructing the Conclusion

I'm now writing the conclusion, and I'm focusing on crafting a persuasive argument. I'm deep in outlining the initial paragraph, which will summarize the core patterns, namely the construction of an "illusion of mind" within LLMs through metaphor and attribution, and the subsequent masking of developer responsibility. The first paragraph will highlight two dominant patterns identified in the research: the projection of agency and intentionality onto LLMs, and the strategic omission of developer accountability.

Synthesizing the Projections

I'm synthesizing the two key projections: the human cognitive architecture onto prompts, and professional/civic roles onto pipelines. Mapping "System 2 reasoning" to prompting enables the model's framing as an "expert assistant," capable of justification and deliberation. Removing this psychological mapping undermines the expert framing. This analysis reveals a cumulative, anthropomorphic projection, and the reliance on conscious attributions.

Refining the Illusion

I'm now refining my understanding of how the text projects cognitive attributes, specifically "knowing" and "understanding", onto computational processes, constructing an intellectual agency illusion. I'm focusing on the oscillation between mechanistic disclaimers and anthropomorphic assertions as the key mechanism, and the role of the "curse of knowledge" in reinforcing this rhetorical strategy.

Deconstructing the Illusion

I'm now zeroing in on the illusion of "understanding" projected onto LLMs, stemming from the implicit assumption of shared knowledge between the model and audience. The text's phrasing reinforces this illusion by framing chain-of-thought as deliberation, which suggests the model can be "warned" into a System 2 state, leading to a network engaging in "ethical deliberation" via a series of multiplications.

Deconstructing the Abstraction

I'm now analyzing the implications across epistemic, regulatory, and social-political domains. The key is how agential framing diffuses legal liability and fosters unwarranted trust. An error is portrayed as an ambiguous "cognitive bias" instead of a design choice, obscuring accountability and facilitating potential harms. I am noting how this protects commercial profits, and transfers costs to users.

Deconstructing Political Implications

I'm now analyzing the political implications. I recognize the deferral to unverified systems and the propagation of misinformation. I see multi-agent software as unfairly assigning political legitimacy, and disenfranchising human citizens. Precision, stripped of the agential language, and mechanistic language, expose critical mechanics of the system and help the public.

Reframing the Limitations

I am now focusing on reframing agential language to highlight statistical operations. My goal is to show the model's limitations. I'm noting the absence of semantic understanding. Reframing the "expert assistant" to statistical output combats the accountability sink, showing human design. Resistance, though, seems likely.

Refining the Path Forward

I'm now diving into the "Path-Forward" section, focusing on how mechanistic precision impacts discourse. The emphasis is now on technical accuracy, which is key. I'm exploring the implications for safety, liabilities, and audience understanding, considering a high cognitive load for the reader. I am now exploring its implications.

Examining the Approaches

I'm now exploring approaches to balance accessibility and accuracy in describing LLMs. I'm considering three: mechanistic precision, functional anthropomorphism, and discursive multi-tiering. Each presents trade-offs, particularly around responsibility and user understanding. Implementing these requires structural shifts, such as mandating mechanistic translations in academic work.

Refining the Recommendations

I'm now focused on clarifying the recommendations, emphasizing a combination of diverse explainability standards and regulatory frameworks that require corporations to release unhedged capability audits. The goal is to address the balance between discourse communities with conflicting goals, and determine if AI will be recognized as a human-accountable tool or an autonomous authority. I am also polishing the text to make sure all fields are present. I'm now verifying the word counts of all values to meet the minimum requirements, aiming for the high side.

Verifying Text Length

I'm now carefully verifying that all text fields meet minimum word count requirements. I'm focusing on "Task 3," the Critical Observations sections, and the Conclusion. I'm aiming for the higher end of the range, ensuring sufficient length and detail to meet the specified targets. I'm also double-checking the word counts for other required fields, especially those reframed explanations.

Verifying JSON Structure

I'm now carefully verifying JSON structure, ensuring no syntax errors and accurate word counts. I've rechecked for valid double quotes, avoiding nested quote escaping issues. I am now double checking all the JSON values. I have now completed word count verifications for all eight instances in the Metaphor Audit.

Finalizing the Word Counts

I'm now deep in the final push. I'm confirming that every paragraph, and section meets the minimum word count. I'm aiming for the higher end of the ranges. I am focusing on the eight critical observation instances. I have also verified the word counts for the conclusion, and I'm polishing the sentences. I have now met all minimum requirements.

Analysis Provenance

Run ID: 2026-05-26-towards-detecting-mitigating-and-explain-metaphor-qly553 Raw JSON: 2026-05-26-towards-detecting-mitigating-and-explain-metaphor-qly553.json Framework: Metaphor Analysis v6.5 Schema Version: 3.0 Generated: 2026-05-26T10:08:04.904Z

Deep Analysis

Explanation Audit

🔍Analysis

🧠Epistemic Claim Analysis

🎯Rhetorical Impact

How/Why Slippage

Unacknowledged Metaphors

Hidden Actors

Explanation Types

Acknowledgment Status

Actor Visibility

Source → Target Pairs (8)

Metaphor Gallery (8)

Reframed Language Samples

Task 1: Metaphor and Anthropomorphism Audit​

1. Cognition as Biological Pathological Process​

2. Prompting as Dual-Process Psychology​

3. AI as Rational Truth Judge​

4. AI as Certified Professional Expert​

5. Outputs as Cognitive Pathology​

6. AI as Compliance-Challenged Agent​

7. AI as Deliberative Agent​

8. Software as Democratic Committee​

Task 2: Source-Target Mapping​

Mapping 1: Human Cognition and Logical Reasoning → LLM token generation​

Mapping 2: Dual-Process Psychology → Autoregressive prompt-engineering techniques​

Mapping 3: Academic/Judicial Truth Evaluation → Pattern matching with Argumentation Schemes​

Mapping 4: Professional/Expert Human Consultation → Text generation using LLaMA 3 70B and search APIs​

Mapping 5: Cognitive Psychology and Psychiatric Pathology → Model output sensitivity to prompt phrasing​

Mapping 6: Cognitive/Physical Struggle → Low statistical classification metrics​

Mapping 7: Cognitive Introspection and Self-Correction → Prompt-driven attention shift​

Mapping 8: Democratic Assembly and Ethical Committee → Multi-agent software pipeline architectures​

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")​

Explanation 1​

Explanation 2​

Explanation 3​

Explanation 4​

Explanation 5​

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language​

Task 5: Critical Observations - Structural Patterns​

Agency Slippage​

Metaphor-Driven Trust Inflation​

Obscured Mechanics​

Context Sensitivity​

Accountability Synthesis​

Conclusion: What This Analysis Reveals​

Mechanism of the Illusion:​

Material Stakes:​

AI Literacy as Counter-Practice:​

Path Forward​

Extended Processing Summary​

Task 1: Metaphor and Anthropomorphism Audit

1. Cognition as Biological Pathological Process

2. Prompting as Dual-Process Psychology

3. AI as Rational Truth Judge

4. AI as Certified Professional Expert

5. Outputs as Cognitive Pathology

6. AI as Compliance-Challenged Agent

7. AI as Deliberative Agent

8. Software as Democratic Committee

Task 2: Source-Target Mapping

Mapping 1: Human Cognition and Logical Reasoning → LLM token generation

Mapping 2: Dual-Process Psychology → Autoregressive prompt-engineering techniques

Mapping 3: Academic/Judicial Truth Evaluation → Pattern matching with Argumentation Schemes

Mapping 4: Professional/Expert Human Consultation → Text generation using LLaMA 3 70B and search APIs

Mapping 5: Cognitive Psychology and Psychiatric Pathology → Model output sensitivity to prompt phrasing

Mapping 6: Cognitive/Physical Struggle → Low statistical classification metrics

Mapping 7: Cognitive Introspection and Self-Correction → Prompt-driven attention shift

Mapping 8: Democratic Assembly and Ethical Committee → Multi-agent software pipeline architectures

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")

Explanation 1

Explanation 2

Explanation 3

Explanation 4

Explanation 5

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Language

Task 5: Critical Observations - Structural Patterns

Agency Slippage

Metaphor-Driven Trust Inflation

Obscured Mechanics

Context Sensitivity

Accountability Synthesis

Conclusion: What This Analysis Reveals

Mechanism of the Illusion:

Material Stakes:

AI Literacy as Counter-Practice:

Path Forward

Extended Processing Summary