Skip to main content

๐Ÿ“Š Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model

About

This document presents a Critical Discourse Analysis focused on AI literacy, specifically targeting the role of metaphor and anthropomorphism in shaping public and professional understanding of generative AI. The analysis is guided by a prompt that draws from cognitive linguistics (metaphor structure-mapping), the philosophy of social science (Robert Brown's typology of explanation), and accountability analysis.

All findings and summaries below were generated from detailed system instructions provided to a large language model and should be read critically as interpretive outputsโ€”not guarantees of factual accuracy or authorial intent.


Task 1: Metaphor and Anthropomorphism Auditโ€‹

About this task

For each of the major metaphorical patterns identified, this audit examines the specific language used, the frame through which the AI is being conceptualized, what human qualities are being projected onto the system, whether the metaphor is explicitly acknowledged or presented as direct description, andโ€”most criticallyโ€”what implications this framing has for trust, understanding, and policy perception.

V3 Enhancement: Each metaphor now includes an accountability analysis.

1. Software Configuration as Human Personalityโ€‹

Quote: "One way to humanise an agent is to give it a task-congruent personality. ... IAโ€™s introverted nature means it will offer accurate and expert response without unnecessary emotions."

  • Frame: Statistical parameter settings as psychological character traits
  • Projection: This metaphor projects the complex, stable, and internally felt psychological construct of human personality (specifically the Big Five traits) onto a set of temporary system instructions and probability weights. It attributes an internal 'nature' and emotional capacity ('without unnecessary emotions') to the system, suggesting the AI possesses a stable disposition that drives behavior, rather than simply executing a style-transfer task based on a prompt. It implies the system is introverted, rather than simulating introverted lexical patterns.
  • Acknowledgment: Direct (Unacknowledged) (The text states the agent has an "introverted nature" and discusses "inculcating" personality without hedging, treating the software configuration as an ontological equivalent to human personality.)
  • Implications: By framing prompt-based style transfer as 'personality' and 'nature,' the text invites users to anticipate consistent, coherent behavior derived from an internal selfโ€”something LLMs cannot provide. This increases the risk of 'eliza effect' attachment, where users attribute social accountability and emotional depth to the system. In educational or medical contexts (mentioned in the text), this could lead to misplaced trust in the 'authority' or 'empathy' of an agent that is merely predicting tokens based on a 'friendly' or 'authoritative' system prompt.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The construction 'IAโ€™s introverted nature means it will offer' obscures the developers' role. The engineers (Jayakumar et al.) wrote the prompt 'You are a Canadian friendly poetry expert.' The behavior is a direct result of this instruction and the OpenAI model's training, yet the text attributes the behavior to the agent's 'nature.' This displaces responsibility for the output from the prompt engineering choices to an inherent property of the software artifact.
Show more...

2. Data Processing as Cognitive Graspโ€‹

Quote: "questions... which are currently beyond the agentโ€™s cognitive grasp."

  • Frame: Computational limitation as bounded rationality
  • Projection: This projects the human faculty of cognitionโ€”the mental action or process of acquiring knowledge and understanding through thought, experience, and the sensesโ€”onto data processing limits. To say something is beyond a 'cognitive grasp' implies that there is a 'grasp' (understanding) in place, just not for this specific topic. It suggests the system is a 'knower' with a limited scope, rather than a statistical processor with limited training data distribution.
  • Acknowledgment: Direct (Unacknowledged) (The phrase 'agent's cognitive grasp' is used literally to explain failure modes, without scare quotes or qualification, treating the lack of vector associations as a lack of mental reach.)
  • Implications: Framing limitations as 'cognitive grasp' reinforces the illusion of mind even when discussing failure. It suggests the solution is 'teaching' or 'learning' (expanding the grasp) rather than database expansion or algorithm adjustment. This obscures the fundamental difference between human lack of understanding (conceptual) and AI failure (pattern mismatch), potentially leading policymakers to believe these systems can eventually 'understand' nuance if they just 'learn' more, ignoring the structural limitations of probabilistic generation.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The phrase 'beyond the agentโ€™s cognitive grasp' makes the agent the subject of the limitation. A mechanistic framing would be 'absent from the training data selected by OpenAI' or 'not retrievable via the RAG architecture designed by the authors.' This semantic move shields the developers and model providers from the specific choice of excluding socio-cultural context from the dataset.

3. Model as Juridical Authorityโ€‹

Quote: "LLM as a Judge is a concept where the Large Language Models will act as a 'judge' to evaluate the responses... You are an intelligent and unbiased judge in personality detection"

  • Frame: Pattern matching as judicial evaluation
  • Projection: This metaphor maps the human role of a judgeโ€”requiring wisdom, ethics, interpretation of law, and conscious deliberationโ€”onto the process of token classification. The prompt explicitly tells the model 'You are... unbiased,' projecting the human capacity for fairness and ethical neutrality onto a statistical model that fundamentally reproduces training data biases. It implies the system can evaluate 'quality' and 'appropriateness' rather than just similarity to training examples.
  • Acknowledgment: Explicitly Acknowledged (The text initially uses scare quotes ('judge') and introduces it as a 'concept,' acknowledging the metaphorical status before adopting it as a functional term ('Judge LLM').)
  • Implications: Labeling an LLM a 'Judge' and claiming it is 'unbiased' constructs a dangerous authority. It legitimizes the automation of evaluation in sensitive domains (like education or hiring). If users believe the system is a 'Judge' capable of 'reasoning' (as requested in the prompt), they are less likely to audit the outputs for the statistical regression to the mean or bias that actually drives the 'judgment.' This risks cementing model outputs as objective standards.

Accountability Analysis:

  • Actor Visibility: Partial (some attribution)
  • Analysis: The authors acknowledge selecting Google's Gemini to avoid self-agreement bias, but the prompt itself ('You are an intelligent and unbiased judge') delegates the responsibility for fairness to the model. If the 'Judge' is biased, the text frames it as a property of the judge ('Judge LLM is biased towards introvert traits'), rather than a failure of the engineers to calibrate the evaluation metric or a result of Google's RLHF tuning.

4. Error as Psychopathology (Hallucination)โ€‹

Quote: "The agent may hallucinate or fail on questions that are not directly answerable from the text."

  • Frame: Factual error as perceptual disorder
  • Projection: Using 'hallucinate' projects human biological and psychological vulnerability onto the system. In humans, hallucination is a disconnect between sensory input and perception. In AI, 'hallucination' is simply the system functioning correctly (predicting likely tokens) but generating factually false content. This projection anthropomorphizes the error, suggesting a 'mind' that is temporarily confused, rather than a probabilistic engine that has no concept of truth.
  • Acknowledgment: Direct (Unacknowledged) (The term is used as a standard technical descriptor without qualification or alternative mechanistic explanation in the classification of question types.)
  • Implications: The 'hallucination' frame implies the system generally 'knows' the truth but is having a glitch. It obscures the reality that the model never knows the truth; it only knows probability. This distinction is vital for liability: if a system 'hallucinates,' it sounds like an accident. If a system 'fabricates information based on probability weights,' it sounds like a design feature that requires strict oversight before deployment in critical sectors.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The agent is the actor that 'hallucinates.' This obscures the decision by developers to use a generative model for an information retrieval task without sufficient constraints. It erases the nature of the technology (which is designed to confabulate plausible text) and frames the output as an anomaly of the agent's behavior, rather than a direct result of the architecture chosen by the researchers.

5. Inculcating Personalityโ€‹

Quote: "The personality of both the agents are inculcated using the technique of Prompt Engineering."

  • Frame: Instruction as pedagogy/socialization
  • Projection: The verb 'inculcate' (to instill by persistent instruction) implies a pedagogical relationship where the agent learns and internalizes values or traits. This projects a developmental psychology frame onto the mechanic of context injection. It suggests the 'personality' becomes a stable, internal part of the agent's constitution, whereas technically, it is just a pre-pended text string that influences the probability distribution of the immediate session.
  • Acknowledgment: Direct (Unacknowledged) (The text uses 'inculcated' as a factual description of the engineering process, treating the prompt as a method of instilling deep traits.)
  • Implications: This framing exaggerates the stability and depth of the behavioral modification. It suggests the 'agent' has been fundamentally altered or educated. This creates a false sense of consistency for the user. If a user believes a trait has been 'inculcated,' they expect it to hold up under pressure or complex questioning, potentially leading to trust failures when the model reverts to default training behaviors (catastrophic forgetting or context window overflow).

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: While 'Prompt Engineering' is mentioned, the passive voice 'are inculcated' hides the specific agency of the authors. The authors wrote the prompts. If the personality is toxic or inappropriate, the 'inculcation' frame diffuses this into a process, rather than a specific authorship decision. It suggests a transfer of traits rather than a configuring of filters.

6. Generative Mimicry as Behaviorโ€‹

Quote: "โ€œAgentsโ€... refer to generative agents which are software entities that leverage generative artificial intelligence models to simulate and mimic human behaviour"

  • Frame: Output generation as behavioral agency
  • Projection: While 'mimic' is a relatively accurate verb, coupling it with 'human behaviour' suggests the software is performing actions in the world (behavior) rather than outputting symbols (text). It projects the complexity of human social action onto the generation of strings. It suggests the agent behavesโ€”has agency, intent, and impactโ€”rather than simply processing inputs and outputs.
  • Acknowledgment: Explicitly Acknowledged (The text acknowledges the definition of 'Agents' and uses 'simulate and mimic,' which indicates an awareness of the distinction, though it quickly slips into treating the behavior as real.)
  • Implications: Framing text output as 'behaviour' flattens the ontology of action. It allows for the evaluation of AI on social terms (is it polite? is it introverted?) rather than functional terms (is it accurate? is it safe?). This shift invites social trust and emotional engagement from users, which is the precise vulnerability that 'social engineering' exploits. It primes the user to treat the artifact as a subject.

Accountability Analysis:

  • Actor Visibility: Partial (some attribution)
  • Analysis: The text mentions 'software entities that leverage... models.' This creates a chain of removal: the authors build the agent, the agent leverages the model, the model mimics behavior. The ultimate responsibility for the 'behavior' is diffused across this chain. The 'agent' becomes the primary actor in the sentence, obscuring the human intent behind the simulation.

7. Expertise and Knowledgeโ€‹

Quote: "This poetry agent is an 'expert' on this poem... deep knowledge of various forms and styles"

  • Frame: Database retrieval as intellectual expertise
  • Projection: This projects the human quality of 'expertise'โ€”which involves experience, judgment, context, and justified beliefโ€”onto the retrieval of vectorized text. The prompt explicitly claims 'deep knowledge.' This attributes an epistemic state (knowing) to a system that possesses only retrievable patterns. It suggests the system understands the meaning of poetry, not just the co-occurrence of words about poetry.
  • Acknowledgment: Hedged/Qualified (The text uses scare quotes for 'expert' initially ('This poetry agent is an โ€œexpertโ€') but then drops them in the prompt instruction ('You are a Canadian friendly poetry expert').)
  • Implications: Calling the system an 'expert' with 'deep knowledge' creates epistemic warrant where none exists. Users are encouraged to defer to the system's output as authoritative. In domains like poetry, this risks homogenizing interpretation; in domains like law or medicine, it risks malpractice. It conceals the fact that the 'knowledge' is actually just a statistical aggregate of training texts, possibly containing errors or hallucinations.

Accountability Analysis:

  • Actor Visibility: Hidden (agency obscured)
  • Analysis: The prompt 'You are a... poetry expert' creates a fictional persona. The accountability for the accuracy of the 'expertise' is displaced onto this persona. If the agent makes a mistake, it is a failure of the 'expert,' not a failure of the database curation or the retrieval algorithm designed by the authors.

8. Agent Reflectionโ€‹

Quote: "The IA features โ€œreflectionโ€, โ€œlacks socialโ€, โ€œavoids directโ€, and โ€œsolitaryโ€ which are to be expected from the definition of introverted-ness."

  • Frame: Data processing as internal psychological state
  • Projection: The text discusses 'reflection'โ€”a complex metacognitive process of self-analysisโ€”as a feature of the agent's output. It attributes an inner mental life ('solitary,' 'reflection') to the software. It confuses the semantic content of the output (words about reflection) with the process of the system (which is not reflecting, but calculating).
  • Acknowledgment: Direct (Unacknowledged) (These terms are treated as valid data points or features detected by the Judge LLM, confirming them as properties of the agent.)
  • Implications: Validating that an agent 'features reflection' reinforces the illusion that the system has an inner life. This is a critical component of the 'sentience' fallacy. It suggests that the system is not just talking about reflection, but is reflective. This builds a false model of the system's capabilities, leading users to overestimate its ability to self-correct or understand ethical nuance.

Accountability Analysis:

  • Actor Visibility: Ambiguous/Insufficient Evidence
  • Analysis: The text says 'The IA features...' attributing these qualities to the agent. It is unclear if 'reflection' refers to the Judge LLM's classification label or the agent's actual process. In either case, the human decision to interpret specific token patterns as 'reflection' is obscured.

Task 2: Source-Target Mappingโ€‹

About this task

For each key metaphor identified in Task 1, this section provides a detailed structure-mapping analysis. The goal is to examine how the relational structure of a familiar "source domain" (the concrete concept we understand) is projected onto a less familiar "target domain" (the AI system). By restating each quote and analyzing the mapping carefully, we can see precisely what assumptions the metaphor invites and what it conceals.

Mapping 1: Human Developmental Psychology/Ontology โ†’ System Prompt/Hyperparameter Configurationโ€‹

Quote: "One way to humanise an agent is to give it a task-congruent personality."

  • Source Domain: Human Developmental Psychology/Ontology
  • Target Domain: System Prompt/Hyperparameter Configuration
  • Mapping: The mapping treats the configuration of a software interface (target) as the cultivation of a human being's character (source). It assumes that a text generator has a 'self' that can be 'humanised' and that 'personality' is a modular component that can be 'given' or installed. It implies that the resulting behavior is an expression of this internal character.
  • What Is Concealed: This conceals the mechanistic reality that 'personality' here is merely a constraint on vocabulary choice and sentence length imposed by a system instruction. It hides the fact that the system has no preferences, no mood, and no stable identity. It obscures the labor of the prompt engineer who writes the script the model follows.
Show more...

Mapping 2: Conscious Mind/Embodied Cognition โ†’ Training Data Distribution/Vector Space Coverageโ€‹

Quote: "concepts... which are currently beyond the agentโ€™s cognitive grasp."

  • Source Domain: Conscious Mind/Embodied Cognition
  • Target Domain: Training Data Distribution/Vector Space Coverage
  • Mapping: The mapping treats the limitations of a database and pattern-matching algorithm (target) as the limitations of a conscious mind's understanding (source). 'Grasp' implies an attempt to understand that falls short due to complexity. It assumes the system is trying to understand.
  • What Is Concealed: It conceals the fact that the system has no 'grasp' of anything, even simple concepts. It obscures the absence of groundingโ€”the system processes symbols without reference to the real world. It also hides the specific data curation choices: the concept isn't 'beyond its grasp'; it's 'absent from its dataset.'

Mapping 3: Juridical/Expert Human Authority โ†’ Pattern Recognition/Token Classification Taskโ€‹

Quote: "You are an intelligent and unbiased judge in personality detection... Evaluate the language used"

  • Source Domain: Juridical/Expert Human Authority
  • Target Domain: Pattern Recognition/Token Classification Task
  • Mapping: The mapping treats the output of a statistical model (target) as the reasoned judgment of a qualified human expert (source). It assumes the model attempts to be 'fair' or 'unbiased' in a moral sense, rather than simply minimizing a loss function based on training data.
  • What Is Concealed: This conceals the lack of reasoning. The model does not 'evaluate'; it calculates the probability that a specific text input correlates with the token 'Introvert' or 'Extrovert' based on training correlations. It hides the potential for 'bias' to be a statistical artifact rather than a moral failing. It explicitly hides the black-box nature of the decision-making process.

Mapping 4: Psychopathology/Perception โ†’ Probabilistic Token Generation Errorsโ€‹

Quote: "The agent may hallucinate... on questions that are not directly answerable"

  • Source Domain: Psychopathology/Perception
  • Target Domain: Probabilistic Token Generation Errors
  • Mapping: The mapping treats the generation of factually incorrect text (target) as a perceptual error or mental break (source). It assumes the system has a 'normal' state of perceiving truth and occasionally deviates into 'hallucination.'
  • What Is Concealed: It conceals the fact that the model functions exactly the same way when telling the truth as when lying: it predicts the next likely token. It hides the absence of a truth-function in the architecture. It obscures the danger that the system is designed to be a plausible text generator, not a fact retriever.

Mapping 5: Human Character/Disposition โ†’ Instruction-following constraints on lexical outputโ€‹

Quote: "IAโ€™s introverted nature means it will offer accurate and expert response without unnecessary emotions."

  • Source Domain: Human Character/Disposition
  • Target Domain: Instruction-following constraints on lexical output
  • Mapping: The mapping treats specific constraints on word choice (e.g., avoid emotive words, keep sentences short) (target) as a deep psychological disposition (source). It assumes that the text output is a symptom of an inner state ('nature').
  • What Is Concealed: It conceals the instructional nature of the behavior. The system isn't 'introverted'; it is 'following the instruction to be concise.' It hides the fragility of the behaviorโ€”a single prompt injection could make the 'introvert' scream profanities, which is not true of a human with a stable introverted nature.

Mapping 6: Human Social Relationship โ†’ Automated Text Generation Loopโ€‹

Quote: "LLMs are used to create highly engaging interactive applications... providing companionship"

  • Source Domain: Human Social Relationship
  • Target Domain: Automated Text Generation Loop
  • Mapping: The mapping treats a text-generation loop (target) as a social bond or 'companionship' (source). It assumes that the exchange of text constitutes a relationship and that the 'engagement' is mutual.
  • What Is Concealed: It conceals the one-sided nature of the interaction. The user engages; the system processes. It hides the economic model: the 'companionship' is a service provided for data harvesting or subscription fees. It obscures the lack of reciprocity and care in the system.

Mapping 7: Human Episodic Memory โ†’ Context Window/Token Bufferโ€‹

Quote: "The agent has the capability to maintain the chat history to provide contextual continuity"

  • Source Domain: Human Episodic Memory
  • Target Domain: Context Window/Token Buffer
  • Mapping: The mapping treats the re-injection of previous tokens into the current prompt (target) as 'maintaining history' or memory (source). It assumes the agent 'remembers' the conversation.
  • What Is Concealed: It conceals the computational cost and the hard limit (context window) of this 'memory.' It hides the fact that the agent effectively dies and is reborn with every new prompt, simply reading the transcript of the 'past' each time. It obscures the lack of continuous existence.

Mapping 8: Human Epistemic Possession โ†’ Database of Textual Patternsโ€‹

Quote: "Deep knowledge of various forms and styles of poetry"

  • Source Domain: Human Epistemic Possession
  • Target Domain: Database of Textual Patterns
  • Mapping: The mapping treats the statistical accessibility of patterns in a database (target) as the possession of 'knowledge' (source). It assumes the system holds information in a way that allows for contemplation and understanding.
  • What Is Concealed: It conceals the absence of semantic understanding. The system has tokens, not concepts. It hides the dependency on the training data's copyright and quality. It conceals the inability of the system to explain why it 'knows' what it knows (lack of justified true belief).

Task 3: Explanation Audit (The Rhetorical Framing of "Why" vs. "How")โ€‹

About this task

This section audits the text's explanatory strategy, focusing on a critical distinction: the slippage between "how" and "why." Based on Robert Brown's typology of explanation, this analysis identifies whether the text explains AI mechanistically (a functional "how it works") or agentially (an intentional "why it wants something"). The core of this task is to expose how this "illusion of mind" is constructed by the rhetorical framing of the explanation itself, and what impact this has on the audience's perception of AI agency.

Explanation 1โ€‹

Quote: "IAโ€™s introverted nature means it will offer accurate and expert response without unnecessary emotions."

  • Explanation Types:

    • Dispositional: Attributes tendencies or habits (Why it tends to act certain way)
    • Intentional: Refers to goals/purposes, presupposes deliberate design (Why it appears to want something)
  • Analysis (Why vs. How Slippage): This explanation frames the AI agentially. By attributing an 'introverted nature' to the IA (Introvert Agent), the text explains the output (accurate responses, no emotions) as a consequence of this internal disposition. It suggests the agent acts this way because of who it is. This obscures the mechanistic reality: the system outputs specific tokens because the prompt instructed it to be 'direct' and 'concise.' The 'nature' is a reification of the prompt instructions.

  • Consciousness Claims Analysis: The passage projects conscious intent and stable identity ('nature'). It contrasts 'offering response' (agential) with 'unnecessary emotions' (implying the agent could have emotions but chooses not to). The claim that the agent 'will offer' implies a volition. A mechanistic description would be: 'The system prompt penalizes emotive tokens and rewards concise sequences, resulting in outputs that mimic introverted speech patterns.' The text suffers from the curse of knowledge: the authors know they designed the prompt to simulate introversion, so they attribute the resulting behavior to the agent's 'introverted nature.'

  • Rhetorical Impact: This framing creates a sense of reliability and coherent identity. Users are led to trust the 'introvert' not just as a tool, but as a personality type they can understand socially. It masks the risk that the 'nature' is entirely superficial and can be broken by a single contradictory user prompt.

Show more...

Explanation 2โ€‹

Quote: "Langchainโ€™s retrieval mechanism is powered by the Retrieval Augmented Generation (RAG) technique... allows it to generate accurate, domain-specific responses"

  • Explanation Types:

    • Functional: Explains behavior by role in self-regulating system with feedback (How it works within system)
  • Analysis (Why vs. How Slippage): This is a predominantly mechanistic explanation. It describes the 'how'โ€”RAG technique, retrieval chain, document fetching. It identifies the components (retriever, LLM) and their roles. This emphasizes the architecture and data flow, obscuring less than the agential explanations. However, it still attributes the ability to 'generate accurate... responses' to the system's allowance, slightly glossing over the probabilistic nature of that generation.

  • Consciousness Claims Analysis: This passage is relatively clean of consciousness projection. It uses 'generate' rather than 'know' or 'think.' It attributes the accuracy to the 'external context' provided, not the model's mind. However, it asserts the output is 'accurate' as a result of the mechanism, which is an epistemic claim about the reliability of RAG that may not always hold (RAG can still hallucinate). It describes the process technically.

  • Rhetorical Impact: This builds technical credibility. It assures the reader that there is a 'mechanism' ensuring accuracy, grounded in engineering ('powered by', 'technique'). It creates trust in the system's output through the logic of architectural soundness rather than personality.

Explanation 3โ€‹

Quote: "The agent may hallucinate or fail on questions that are not directly answerable from the text... beyond the agentโ€™s cognitive grasp."

  • Explanation Types:

    • Dispositional: Attributes tendencies or habits (Why it tends to act certain way)
    • Theoretical: Embeds in deductive framework, may invoke unobservable mechanisms (How it is structured)
  • Analysis (Why vs. How Slippage): This mixes dispositional framing ('may hallucinate'โ€”a tendency) with a pseudo-theoretical explanation ('cognitive grasp'). It frames the failure as a limitation of the agent's mind/ability. It obscures the mechanistic cause: low probability scores for factual tokens or absence of relevant tokens in the vector store. It frames the 'why' as a lack of mental reach.

  • Consciousness Claims Analysis: This is a strong consciousness projection. 'Cognitive grasp' explicitly attributes cognition (mental process) to the agent. 'Hallucinate' attributes perception. It implies the agent could grasp it if it were smarter. A technical description would be: 'The model generates low-confidence or erroneous tokens when the query vector does not align sufficiently with document embeddings.' The authors project their own model of 'understanding' onto the system's data retrieval failure.

  • Rhetorical Impact: This framing softens the failure. 'Beyond cognitive grasp' sounds like a student who hasn't learned enough yet, implying potential for growth. 'Hallucination' sounds like a temporary glitch. This maintains trust in the fundamental potential of the agent, framing errors as developmental stages rather than fundamental architectural limitations of probabilistic generation.

Explanation 4โ€‹

Quote: "Judge LLM is biased towards introvert traits... This seems to indicate that the Judge LLM is biased towards introvert traits."

  • Explanation Types:

    • Empirical Generalization: Subsumes events under timeless statistical regularities (How it typically behaves)
  • Analysis (Why vs. How Slippage): The explanation observes a regularity ('biased towards') based on output frequency (Empirical Generalization). It treats the bias as a property of the model. This obscures the genetic explanation (originating in training data or RLHF tuning by Google). It presents the bias as a mysterious trait of the 'Judge' rather than a direct result of its design and data provenance.

  • Consciousness Claims Analysis: It attributes a 'bias' (a human cognitive or social inclination) to the system. While 'bias' is a technical term in ML, here it is used in the social sense (preferring introverts). It attributes an evaluative preference to the system. It avoids consciousness verbs but implies an unobservable internal preference structure. It fails to explain the mechanism (e.g., 'training data contained more text labeled as introverted' or 'safety tuning prefers cautious/reserved language').

  • Rhetorical Impact: This frames the LLM as an imperfect human-like judge (subjective) rather than a flawed instrument. It suggests we need to 'correct' its opinion, rather than re-engineer its weights. It anthropomorphizes the error, making the system seem like a biased person.

Explanation 5โ€‹

Quote: "You are a Canadian friendly poetry expert... Use the following context to answer... Tone: Conversational"

  • Explanation Types:

    • Intentional: Refers to goals/purposes, presupposes deliberate design (Why it appears to want something)
  • Analysis (Why vs. How Slippage): This is the prompt itself, which serves as the genetic explanation for the agent's behavior. It frames the agent's existence intentionally ('You are...'). It commands the agent to adopt a persona. This effectively programs the 'why' of the agent's behaviorโ€”it acts this way because it was told to be this person. It emphasizes the simulation of identity.

  • Consciousness Claims Analysis: The prompt commands the system to 'be' an expert. It commands the system to 'answer like a human.' This is the root of the consciousness projectionโ€”the system is explicitly instructed to simulate a knower. The authors then analyze the output as if the system became the knower. The text assumes the system can 'use context' (process) to 'answer like a human' (simulate agency).

  • Rhetorical Impact: This creates the entire fiction of the paper. By commanding 'You are,' the authors create the character that the rest of the paper analyzes. It sets up the reader to accept the 'expert' framing because the system was 'told' to be one.

Task 4: AI Literacy in Practice - Reframing Anthropomorphic Languageโ€‹

About this task

This section proposes alternative language for key anthropomorphic phrases, offering more mechanistic and precise framings that better reflect the actual computational processes involved. Each reframing attempts to strip away the projections of intention, consciousness, or agency that are embedded in the original language.

V3 Enhancement: A fourth column addresses human agency restorationโ€”reframing agentless constructions to name the humans responsible for design and deployment decisions.

Original Anthropomorphic FrameMechanistic ReframingTechnical Reality CheckHuman Agency Restoration
One way to humanise an agent is to give it a task-congruent personality.One way to align the model's output style with user expectations is to prompt it to simulate specific lexical patterns associated with human character archetypes.Models classify and generate tokens based on training data correlations; they do not possess personality or humanity to be 'given' or enhanced.Jayakumar et al. chose to design system prompts that mimic specific human social traits to increase user engagement.
IAโ€™s introverted nature means it will offer accurate and expert response without unnecessary emotions.The model, when prompted with instructions to simulate an introvert, generates text that is concise and lacks emotive adjectives, consistent with the statistical distribution of 'introverted' text in its training data.The system processes input vectors and predicts tokens; it has no 'nature' or 'emotions' to suppress, only probability weights favoring neutral vocabulary.The authors configured the system prompt to penalize emotional language and reward brevity.
concepts... which are currently beyond the agentโ€™s cognitive grasp.Concepts that are not sufficiently represented in the vector embeddings or the retrieved context documents, resulting in low-probability or generic outputs.The system matches patterns; it does not 'grasp' concepts. Failure is a lack of data correlation, not a limit of cognitive understanding.N/A - describes computational processes without displacing responsibility (though it obscures data curation).
The agent may hallucinate or fail on questionsThe model may generate grammatically correct but factually inconsistent sequences when the probabilistic associations for accurate information are weak.The model generates the most probable next token; it does not perceive reality or 'hallucinate' deviations from it.The developers chose to use a generative model for a factual retrieval task, introducing the risk of fabrication.
You are an intelligent and unbiased judge in personality detectionProcessing instruction: Classify the input text into 'Introvert' or 'Extrovert' categories based on pattern matching with training data definitions.The model calculates similarity scores; it does not judge, possess intelligence, or hold bias in the cognitive sense.The researchers instructed the model to simulate the role of a judge and defined the criteria for classification.
This poetry agent is an 'expert' on this poem with deep knowledgeThis instance of the model has access to a vector database containing the poem and related critical analyses, allowing it to retrieve relevant text segments.The system retrieves and rephrases text; it does not 'know' the poem or possess expertise.The authors curated a dataset of poems and prompted the system to present retrieved information in an authoritative style.
The IA features โ€œreflectionโ€, โ€œlacks socialโ€... which are to be expected from the definition of introverted-ness.The text generated by the model contained semantic clusters related to reflection and solitude, matching the target lexical distribution for the 'introvert' prompt.The model outputs words about reflection; it does not possess the mental feature of reflection.N/A - describes output characteristics.
Simulate and mimic human behaviourGenerate text sequences that statistically resemble transcripts of human interaction.The system outputs text; it does not behave. 'Behavior' implies agency and consequence in the physical/social world.Engineers design software to output text that users will interpret as meaningful social behavior.

Task 5: Critical Observations - Structural Patternsโ€‹

Agency Slippageโ€‹

The text demonstrates a persistent, rhythmic oscillation between mechanical construction and agential performance. In the 'Methodology' section (3.1), agency is largely human or mechanical: 'We developed,' 'The conversational agents are built,' 'Langchainโ€™s retrieval mechanism is powered.' Here, the authors and the code are the actors. However, as soon as the text moves to 'Agent Personality Prompting' (3.1.3) and 'Results' (5), agency slips dramatically to the software. The prompt instructions ('You are a Canadian friendly poetry expert') act as the pivotal moment of slippageโ€”a linguistic speech act that theoretically transforms the software into a subject. Following this, the text asserts 'IAโ€™s introverted nature means it will offer' and 'The agent... is an expert.' The authors fade; the 'agent' takes over. This slippage functions to validate the experiment: if the software were described purely as 'a script outputting tokens,' the study of its 'personality' would appear category-mistaken. By granting the software agency ('It offers,' 'It avoids'), the authors create the necessary ontological ground for their psychological analysis. The 'curse of knowledge' is evident: the authors know the prompt they wrote, but they analyze the output as if it emanates from the agent's internal 'nature,' effectively forgetting their own authorship in favor of the illusion they created.

Metaphor-Driven Trust Inflationโ€‹

The text constructs a comprehensive architecture of trust through the metaphors of 'Expert,' 'Judge,' and 'Personality.' By labeling the evaluative model a 'Judge LLM' and commanding it to be 'unbiased,' the text borrows the immense social capital of the legal/judicial system. This implies that the model's outputs are not just calculations but judgmentsโ€”reasoned, fair, and authoritative. Similarly, calling the agent a 'Poetry Expert' with 'deep knowledge' signals to the user (and reader) that the system is a reliable source of truth, obscuring the statistical and potentially hallucination-prone nature of RAG systems. The 'Personality' metaphor further builds trust by suggesting consistency; if an agent is 'introverted,' I can trust it to behave in a specific, predictable way. This shifts the basis of trust from performance-based (is the output correct?) to relation-based (do I understand this entity's character?). This is dangerous for statistical systems, as they do not have a character to be true to; they only have a probability distribution that can shift unpredictably with input noise.

Obscured Mechanicsโ€‹

The anthropomorphic language systematically hides the industrial and technical realities of the system. First, the 'Personality' framing hides the fragility of prompt engineering. By calling it 'inculcating personality,' the text obscures the fact that this is merely a 'system message' that can be bypassed (jailbroken). Second, the 'Judge' metaphor hides the corporate alignment of the models. The text notes the 'Judge LLM is biased towards introvert traits' but frames this as a quirk of the judge, rather than a result of OpenAI's or Google's safety tuning (RLHF) which creates models that are 'helpful, harmless, and honest'โ€”traits that statistically overlap with 'introversion' (cautious, polite, reserved). The 'name the corporation' test reveals this: 'Google's Gemini model classifies text as introverted because Google trained it to prefer safe, non-confrontational speech.' Finally, 'Cognitive Grasp' hides the data curation labor. It implies the agent has a mind that can't reach far enough, rather than a database that humans (the authors) failed to populate with sufficient socio-cultural context.

Context Sensitivityโ€‹

The distribution of anthropomorphism creates a 'sandwich' effect. The Abstract and Introduction are highly agential ('mimic human behaviour,' 'possess personality'). The middle technical sections (3.1.1, 3.1.2) briefly ground the reader in 'vectors,' 'embeddings,' and 'frameworks.' However, section 3.1.3 ('Prompting') re-ignites the anthropomorphism, which intensifies through the Evaluation and Results. Crucially, capabilities are framed agentially ('Expert,' 'maintains chat history'), while limitations are framed with a mix of agential ('hallucinate,' 'cognitive grasp') and mechanical ('lack of training material') terms. The text uses the mechanical language to explain why the agent fails (lack of data), but uses the agential language to describe what the failure is (hallucination). This asymmetry preserves the illusion of the agent's core competenceโ€”it is a knower, just an under-resourced one. The 'Judge' prompt is the peak of intensity, demanding the system 'be' an intelligent judge, showing how the rhetorical goal of the experiment (simulating personality) requires maximal anthropomorphism.

Accountability Synthesisโ€‹

Accountability Architecture

This section synthesizes the accountability analyses from Task 1, mapping the text's "accountability architecture"โ€”who is named, who is hidden, and who benefits from obscured agency.

The text creates an 'accountability sink' where human decisions are washed away into the 'nature' of the agent. The 'named actors' (Jayakumar, Mukherjee, Dash) design the system, but the 'hidden actors' (the agents) take the blame for behavior. When the text says 'The agent may hallucinate,' it removes the authors' responsibility for choosing a non-deterministic model for a factual task. When it says 'Judge LLM is biased,' it removes Google's responsibility for the model's RLHF tuning. The 'Accountability Analysis' reveals a pattern: successes are shared (the authors developed the agent, the agent performed well), but the 'personality' and 'bias' are treated as independent properties of the software. If a user were harmed by the 'Introvert Agent' giving bad medical advice (a use case mentioned in the intro), the text's framing suggests the fault lies with the agent's 'cognitive grasp' or 'nature,' diffussing the legal liability of the deployers. Naming the actors forces a shift: 'Jayakumar et al.'s script caused the OpenAI model to generate false text.' This clarity is exactly what the 'Personality' metaphor dissolves.

Conclusion: What This Analysis Revealsโ€‹

The Core Finding

The discourse is dominated by two interlocking metaphorical patterns: 'Software as Psychological Subject' (Personality, Introvert/Extrovert) and 'Processing as Institutional Authority' (Judge, Expert). The foundational pattern is the psychological oneโ€”the assumption that a set of prompt instructions constitutes an internal 'nature' or 'personality.' This assumption bears the load of the entire paper; without it, the research is simply a study of lexical style transfer. The 'Judge' metaphor relies on this foundation, assuming that one software subject is capable of assessing the 'character' of another. These patterns systematically convert stylistic surface features (word choice, sentence length) into deep ontological states (introversion, reflection, bias).

Mechanism of the Illusion:โ€‹

The illusion of mind is constructed through a 'performative speech act' in the prompt engineering phase. The authors command the system: 'You are a Canadian friendly poetry expert.' They then treat the system's compliance with this command not as obedience to a script, but as evidence of a successful 'inculcation' of traits. The illusion is amplified by the 'curse of knowledge': the authors evaluate the system using the very criteria (Big Five) they used to prompt it, creating a tautological feedback loop. They mistake the mirror for a windowโ€”seeing a 'personality' where they are actually seeing their own prompt reflected back. The temporal structure supports this: the 'setup' is mechanical, but the 'performance' is agential, leading the reader to forget the mechanism once the dialogue begins.

Material Stakes:โ€‹

Categories: Epistemic, Institutional, Social/Political

The material stakes of this framing are significant. Epistemically, framing LLMs as 'Experts' and 'Judges' degrades the standard of knowledge. If educational institutions (as suggested by the paper's reference to Bloom's taxonomy) adopt these 'Judge' systems to grade student work, they are subjecting students to opaque, probabilistic bias disguised as 'intelligent evaluation.' Institutionally, the reliance on 'Judge LLMs' creates a dangerous closed loop where AI evaluates AI, potentially amplifying errors and biases without human oversight, validated by the label 'Judge.' Socially, the 'Personality' framing in 'companion' apps (mentioned in the intro) exploits vulnerable users by promising a stable social bond ('introvert friend') that is actually a volatile data-harvesting process. The winners are the model providers (OpenAI, Google) whose products are elevated to the status of judges and experts; the losers are students, patients, and users subjected to automated, unaccountable decisions.

AI Literacy as Counter-Practice:โ€‹

Countering this illusion requires a rigorous practice of mechanistic reframing. Replacing 'The agent knows' with 'The model retrieves tokens' dissolves the false authority. Replacing 'The agent has an introverted nature' with 'The prompt penalizes emotive vocabulary' exposes the constructed, fragile nature of the behavior. This reframing restores human agency: it reminds us that 'bias' is a training data choice, not a personality quirk, and 'hallucination' is a system feature, not a mental illness. Systematic adoption of this literacy would require journals to mandate mechanistic descriptions of 'agent' behavior and reject papers that un-critically attribute psychological states to software. Resistance will be high, as the anthropomorphic frame is essential for the commercial hype cycle ('AI Agents') that drives funding and publication interest.

Path Forwardโ€‹

The discourse faces a fork. One pathโ€”'Anthropomorphic Realism'โ€”continues to deepen the metaphors, treating Agents as a new species of social actor. This benefits commercial entities selling 'companionship' and 'digital workers,' but risks catastrophic trust failures and liability confusion. The alternativeโ€”'Mechanistic Precision'โ€”insists on describing these systems as 'probabilistic text engines.' This approach makes the technology less 'magical' and harder to sell as a solution to complex social problems, but it enables rigorous safety engineering and clear accountability lines. A hybrid future is likely, where engineers speak mechanistically in private while public interfaces remain aggressively anthropomorphic. Critical literacy must therefore focus on training the public to 'translate' the interface's 'I think' into the reality of 'I predict,' insulating society from the risks of misplaced trust.


Extended Processing Summaryโ€‹

Gemini's Processing Steps
What you are about to readโ€ฆ

Gemini refers to the text below as "thought summaries." This is an overt consciousness projection because 'intentions' are hallmarks of a conscious mind that 'knows' what it is doing and why. The concealed mechanistic process is probabilistic text generation. Treat this as just another rhetorical artifactโ€”a way of making the model's processing legible. When the model writes "I should double check that...", it is mathematically altering its own probability distribution for the next token to be more accurate. The "I" is a functional gear in the machine, not a self-reference. This language is a deliberate interface design choice, not a biological reality. The developers could have programmed the model to output passive logs. Instead, they chose to simulate a narrator.

Refining Analysis Focus

I'm now zeroing in on the request, clarifying my role as a critical discourse analyst examining AI metaphor and anthropomorphism. The task involves dissecting the PDF using frameworks like Lakoff and Johnson, and van Leeuwen. I'm focusing on identifying key metaphors and their implications within the text.

Mapping the Structure

I've outlined the structure for my analysis. The project will involve a metaphor audit, source-target mapping, explanation audit using Brown's typology, reframed language examples, critical observations, and a conclusion. There is a constraint checklist, so it all needs to be presented as a single JSON object. The source material is an academic paper on AI agent personality and response evaluation.

Analyzing Found Language

I've identified key anthropomorphic language in the source text, like "personality," "mimic," and "hallucinate." Now, I'm working on the metaphor audit. I'm noting specific frames, such as "Software as Human Character" and "Software as Autonomous Actor," with relevant quotes and status assessments. For instance, "personality" directly frames software as a human character, while "Agents" is more explicitly acknowledged.

Expanding the Analysis

I'm now delving into specific language uses. The "LLM as a Judge" metaphor, initially marked by scare quotes, intrigues me as a starting point. I've noted "cognitive grasp" framing data processing as mental comprehension, and "hallucination" casting errors as psychopathology, both direct frames. "Introvert/Extrovert" and "Humanise" follow similar direct frames: statistical parameters are recast as psychological traits, and optimization is seen as biological evolution.

Defining Source-Target

I'm now drafting the source-target mapping, and the examples above are key. I've mapped "personality" to system parameters, "judge" to pattern matching, "cognitive grasp" to data coverage, and "hallucination" to token generation. "Expert" is framed by professional expertise. I'm building this source-target mapping to support the metaphor audit and critical analysis, following the prescribed framework. My immediate task is to develop a complete list of these metaphors.

Expanding the Scope

I'm now focusing on re-framing specific phrases. I'm moving away from just direct metaphors, aiming to offer technically precise alternatives. I'm replacing "cognitive grasp" with "representation within the training dataset," and "hallucinate" with "generate factually incorrect sequences." I've also swapped "judge" for "evaluator model," and am working on refining "introverted nature." I'm aiming for full linguistic precision, and expanding the reframing examples.

Synthesizing Critical Insights

I'm now drafting the critical observations. I'm focusing on "agency slippage," noting the shift between mechanical descriptions and agential attributions. I'm unpacking "metaphor-driven trust," showing how terms like "expert" and "judge" imply social partnership. I'm examining "obscured mechanics," highlighting how prompt engineering and expert labor are framed. I'm also addressing "context sensitivity" and looking at accountability.

Synthesizing Observations

I'm synthesizing the critical observations. The "personality" metaphor is the central pillar, built on equating lexical style with internal states. The stakes are high in educational and evaluative contexts, where judge LLMs can create dangerous feedback loops. I'm focusing on reframing it as "stylistic emulation," and highlighting the dangers of "Agent" discourse becoming the dominant language for software. I'm also ensuring the analysis sections are expanded to meet the word count requirements.

Refining Critical Framing

I'm now integrating critical discourse analysis principles. I'm focusing on "ontological flattening" in the personality-as-AI debate, and examining how the text constructs an "epistemic claim" through anthropomorphism. I'm looking at "agential realism" in the framing of AI. I will also incorporate the distinction between "inculcate" as a statistical method versus as a psychological process.


Analysis Provenance

Run ID: 2026-01-01-large-language-model-agent-personality-a-metaphor-u979jo Raw JSON: 2026-01-01-large-language-model-agent-personality-a-metaphor-u979jo.json Framework: Metaphor Analysis v6.4 Schema Version: 3.0 Generated: 2026-01-01T11:19:28.467Z

Discourse Depot ยฉ 2025 by TD is licensed under CC BY-NC-SA 4.0