How This Works

An image from the static

This page explains the methodology behind Discourse Depot, the bundle of practices, the tools, and the pedagogical framework. If you're looking for the why, see About Discourse Depot.

Also check out an idea for a syllabus and how I would envision the Metaphor, Anthropomorphism and Explanation Audit prompt aligning with the ACRL Framework for Information Literacy for Higher Education.

The Conceptual Anchor

If you understand only one thing about the generative AI used on this site, understand this:

It is optimized to produce plausible continuations, not true or grounded claims.

The good news here is I’m not suggesting learners need to understand linear algebra, classification algorithms, high-dimensional vector space, be conversant in the behavior of neural networks, or pull out their worn copy of Proofs and Refutations in order to competently talk about generative AI or even to responsibly use the products created around it. That’s great if you want to go there, and I’d encourage you to, it really is fascinating stuff.

You simply need to understand what kind of success they are built for. Everything that follows, both the utility and the risk of these systems, flows from the fact that they are engines of prediction, not comprehension. In other words, the model does not work with ideas, it works with distributions.

When a model generates text, it is essentially "curve fitting" on a massive scale. It is finding the linguistic pattern that best connects the dots of your prompt to a statistically likely response. Instead of filling the silence with text that maximizes truth, it fills the silence with the text that minimizes error. And it is quite capable at that task right now and likely to get better.

And don’t get me wrong. I don’t mean to trivialize and say, it is just curve fitting, but something more like it is astonishing what curve fitting can do at this scale. The math - linear algebra, gradient descent, probability distributions - is well known for sure. But optimization is happening at a scale us humans have never encountered before. There’s all sorts of fascination coming from this scale, compression, emergent behavior, and unintended consequences. (No need to then conclude all this capability is coming from some place of inner cognitive theater). There’s a product and its producers.

If you peel back the billion-dollar marketing budgets and the sci-fi narratives, the technology is both fundamentally mundane and fascinating. (the contrast here is not mundane vs. powerful or capable, it is mundane vs. mystical).

The Bundle of Practices

My experiments in using an LLM to critique discourse about LLMs are forensic or maybe cartographic in nature. I am writing system instructions that, when unleashed on a text, attempt to explore the gap between the mathematical reality of the model (the “latent terrain") and the human narratives we impose on it. I’m also learning a lot.

Does stripping away the magical language around AI make the world more boring? Perhaps, but it does aim to make it more visible. For one thing, whatever kind of literacy we call it, and whatever we mean by "intelligence," under the umbrella of that word “literacy” must be the word “reality.”

One way the system instructions for this anthropomorphism audit does this is to attempt ways to reframe anthropomorphic or evasive language it has identified in a text. I'm going for instructions that do less of “create a complaint about anthropomorphism” and more “try to locate it and reframe it” using other terms (outlined in the instructions):

Original Anthropomorphic Frame	Mechanistic Reframing	Technical Reality Check	Human Agency Restoration
AI can act as a partner for conversation, explaining concepts, untangling complex problems.	The interface allows users to query the model iteratively, prompting it to generate summaries or simplifications of complex text inputs	The model does not 'act as a partner' or 'untangle' problems; it processes user inputs as context windows and generates text that statistically correlates with 'explanation' patterns in its training data.	Google developed this interface to simulate conversational turn-taking, encouraging users to provide more data and spend more time on the platform.

What is lost here? The warmth and comfort of the narrative perhaps. What is gained? A clearer view of the mechanism, the actors involved, a reality check and the choice of whether to keep the narrative anyway.

Prompt as Instrument

The prompts here, and in particular the Metaphor, Anthropomorphism and Explanation Audit, act as critical literacy instruments, designed to

Audit the metaphor: Identify where "mechanical" processes (vectors, weights) are described as "biological" states (thinking, feeling).
Articulate the narrative or story about the "thing"
Explain the mechanism (what is actually happening)
Map the gains (why narrative is useful)
Identify the costs (what gets obscured)
Recognize rhetorical slippage (where how becomes why)
Ask critical questions (who benefits? what alternatives exist?)

The internal scaffolding baked into each prompt attempts to:

Audit specific texts for anthropomorphic and metaphorical frames
Analyze what explanation types are deployed
Recognize power dynamics in discourse
Propose alternative framings

The model is being used as a pattern-sensitive instrument to surface how humans talk about AI. Because large language models are trained on vast amounts of human discourse, they are unusually sensitive to recurring rhetorical patterns. When properly constrained, they can help make visible:

which metaphors dominate AI discussions
how agency is subtly assigned or implied
where explanations slide into hype or mystification

In this context, the system is being used to analyze the language ecology surrounding AI. All interpretation, evaluation, and judgment still remain basic human responsibilities.

The Probabilistic Minefield

LLMs are mechanical systems that contain randomness. Mechanical systems can be probabilistic. The banana peel slip happens when I then say that probabilistic implies or is "like" intentionality. Probabilistic ≠ Intentional. Randomness can be fully mechanical.

This project started with an exploration of what happens when I write 4,000-word system instructions, strap the model into a JSON schema straitjacket, and watch a probabilistic text generator perform its simulation of coherence.

What an LLM actually does:

Completes Patterns: It fills the silence with the most statistically likely token to minimize "loss" (error).
Optimizes: It navigates a high-dimensional map to find the vector that best aligns with your prompt.
Scaffolds: It generates intermediate text to constrain its own future probabilities.

It does not read, understand, or reason about documents in a human sense. It produces linguistic performances shaped by probability.

Why the output sometimes feels confident:

Even as I create system instructions, there are instructions “in” the LLM that shape language in response to patterns. These are built-in instructions that I cannot see
The model has been trained (optimized) to mimic some stance of expertise
These instructions come from the tool's developers and are not visible to users
LLM’s are really good at pumping out meaningful outputs since “meaning” is a structural property of language
If you need a metaphor, let’s call an LLM an overworked bureaucrat and its primary mandate is to complete a form no matter what. If it lacks data, it will "statistically fill" (hallucinate) a plausible-looking fact to ensure no field is left empty. And the hallucinations are not occasional failures in an otherwise truth-tracking system. They are structural consequences of how the system works. The overworked bureaucrat is not failing to distinguish truth from falsehood; the poor guy was never designed to do so.

Where variation comes from:

Running the same prompt twice yields different phrasings because the system is sampling from a probability distribution
This variation is a feature of probabilistic generation. It is the inherent instability of traversing the “superposition" of the model's internal state (where features are compressed and overlapping) and translating that alien “terrain” into interpretable language.
It cannot be fully removed, no matter how cleverly I RAG-ify the inputs and bolt on guardrails

So yes, using an LLM for discourse analysis is stepping into a probabilistic minefield. But that's the interesting part. When the model produces structure that almost fits the schema, or weirdly fails to fit it, that friction is a type of useful evidence. It shows where predictive text tries (and fails) to inhabit conceptual distinctions it cannot hang onto.

Full transparency: I'm creating elaborate prompts. Testing them out. Putting the results here. Thinking about how I might make the process into a syllabus for a course on AI literacy.

This site is the archive (depot) of those attempts—the readable markdown versions of an LLM's JSON outputs. It's what happens when I operationalize theory, constrain the output with schemas, and let a generative model perform inside those boundaries. Some outputs seem insightful and are interesting to read, but that's not ultimately the point. The point is to see what the performance reveals.

About the Outputs

Structure and Consistency

Each analysis follows the structure defined by its prompt, with standardized headings that map directly to analytical tasks. This makes outputs somewhat auditable in that I can generally trace which instruction produced which section.

Two Forms of Output

Prose analyses (what you see on this site): Human-readable analyses generated by the model following the prompt's instructions. I use framework-specific Node.js processors to transform JSON output into readable markdown.
Structured data schemas (used in applications): Each framework has a corresponding JSON-ish schema that enforces structured output via the API. These schemas are designed to be normalized, auditable, and scalable so they can be used beyond a single analysis, and for building database schemas or other web apps that deal in structured data.

Reading LLM Outputs Critically

When reading AI-generated text, on this site or elsewhere, it can be useful to ask things like: What would justify this claim if a human had written it? If the answer involves evidence, sources, expertise, or accountability, then those same standards still apply.

I treat all LLM-generated analysis as rhetorical artifacts requiring critical examination:

They reflect training data patterns: The LLM's analysis reproduces interpretive moves from its training data, which may include the very anthropomorphic patterns I'm critiquing
They are probabilistically generated: Every phrase is selected based on statistical likelihood, not semantic understanding
They embody my prompt design choices: Outputs reveal what my instructions actually communicated, which may differ from what I intended
They are one possible analysis: Different prompt formulations, temperature settings, even identical runs will produce variations
There is no guarantee of factual accuracy
This is not about locating authorial intent in the texts analyzed

This makes the model an unstable but revealing instrument. When it performs discourse analysis on AI-related texts, it often reproduces the very metaphors and assumptions under examination. At other moments, it surfaces contradictions or rhetorical habits with surprising clarity. Both outcomes are instructive.

Since these systems don't "know" anything, this is more about what their performances reveal about how AI is currently imagined, explained, and rhetorically scaffolded.

Generative AI can produce convincing language. I have zero expectations that it can take responsibility for what that language asserts.

Why Schemas Matter

The structured output approach transforms the tool from "text-in, text-out" to a data analysis pipeline:

User Text → Gemini API (with Schema) → Structured JSON → Database

This creates the foundation for comparative analysis across multiple texts: tracking metaphor patterns over time, comparing framing strategies across political speeches, querying how agency is distributed across a corpus of policy documents.

By constraining the model to output JSON shaped by a schema, I can build things that use structured data. Since React treats that JSON like any other API response, lightweight React components can render, sort, filter, and interact with the schemas.

The schema design process is itself pedagogically valuable. In a teaching context, it forces students to answer: What are the essential, irreducible components of a metaphorical frame? What data type is "agency"—a string, a boolean, a relation to another object?

Extended Processing Summaries

Some outputs include an "Extended Processing Summary." In the API documentation, Google refers to these as "Thought Summaries," a semi-acknowledged anthropomorphism that conflates data processing with cognitive reflection.

The Mechanism:

Technically, these are not "thoughts"; they are latent scaffolding. The model is allocated a “thinking budget" (a literal token limit) to generate the intermediate text. This acts as a “scratchpad,” allowing the model to produce new tokens that guide its future predictions before it “commits” to a final answer.

What you see as the “thought summary" thankfully is not the raw computation that it is (after all, that would be unreadable math) but essentially the “press release" of that computation, rewritten as an anthropomorphized and readable narrative.

Example* SO-CALLED "THOUGHT SUMMARY"**

Deconstructing the Critique I'm now grappling with the inherent irony of this task. The text I'm analyzing already critiques the anthropomorphic tendencies I'm supposed to be identifying. This creates a fascinating meta-level challenge: how do I analyze a critique of anthropomorphism for anthropomorphism? My approach will need to be nuanced. I'll focus on the specific ways the authors avoid anthropomorphism and the potential implications of those choices.

The LLM struggle is REAL!

Why Include Them

These summaries are diagnostically useful and not so much for seeing "minds" at work, but for seeing friction. Since the API literally charges by the token for this 'thinking,' these summaries show us exactly how much “syntactic scaffolding” was required to bridge the gap between the prompt and the answer. A longer summary doesn't mean “deeper” thought; it means the statistical path to the answer was more just more “resistant.”

They show:

The Cost of Logic: How much "budget" (scaffolding) was required to bridge the statistical gap between the prompt and the result.
Uncertainty Markers: Where the model had to "re-prompt" itself to avoid hallucination (error).
Prompt Adherence: How the instructions were parsed into a sequence of operations.
Great examples of how LLM creators make deliberate choices to maximally project an "inner life" onto their models

The Literacy Caveat

Why are they in the first-person? The model uses "I" for the most obvious reasons: Because it was likely trained that first person “narrative" is the most effective format for self-correction. When the model writes "I should double check that...", it is mathematically altering its own probability distribution for the next token to be more accurate.

The "I" is a functional gear in the machine. This language is a deliberate interface design choice. The developers could have programmed the model to output passive logs ("Processing vector... Error found... Recalculating") or the unreadable math that it is. Instead, they chose to simulate a narrator. Why? Because it sounds smart, is way better for marketing, and most likely in the vast corpus of human text the model was trained on, better answers usually follow first-person reasoning. The "I" is a predictive anchor. It creates a grammatical structure that makes the subsequent math more accurate and why not anthropomorphize that for the user. It sounds cooler, even if it is potentially dangerous in the long run.

The Illusion of Continuity: The ultimate proof that this is a performance? The API is stateless. As the API documentation reveals, the model does not "remember" this “thought process” in the next turn unless the system saves an encrypted "thought signature" and re-uploads it. The "mind" at work here is so fragile that it ceases to exist the moment the generation finishes. But since it is a not a mind, but a product I’m using, it just needs to be reliable and capable.

The output of “thought summaries” simply invites the reader to imagine a subject ("I") grappling with a problem. In reality, we are watching a bureaucratic process filing paperwork to balance its own statistical books. If there is a “struggle" to observe, it is computational, not mental. That’s still interesting.

Addressing Potential Critiques

"Isn't This Hypocritical?"

Using an LLM while saying LLMs "know" nothing? This would only be hypocritical if I claimed an LLM "understood" my critique or "agreed with" my analysis. I make no such claims. I'm using a computational tool to execute procedures I designed, and I'm transparent about what that tool is doing.

The project actually demonstrates its own thesis: I can use LLMs effectively when I understand what they actually do (process patterns, generate probable text) rather than what anthropomorphic language suggests they do (understand meaning, know truths).

"Are the Outputs Reliable?"

The outputs are reliable in the same sense that any structured analysis following explicit procedures, then handed off to a probabilistic language generator, is reliable. They are not reliable as definitive interpretations or insight into authorial intention and I don't claim they are. But they are relatively good reads, and if I notice weirdness, I tweak the prompt and see what happens.

This is why each framework emphasizes:

Multiple analytical perspectives
Explicit methodological choices (the analysis reveals what I asked it to look for)
Human evaluation (outputs are starting points for discussion, not endpoints)

Behind the Scenes Workflow

The discourse analysis workflow uses a framework-specific Node.js processor system that transforms raw JSON outputs from Google AI Studio or Vertex AI into structured, publication-ready formats. The architecture uses modular, reusable processor classes for each analytical framework (Metaphor Audit, CDA-Soft, CDA-Spicy, Political Framing).

The processor drives an eight-step pipeline: load files, prompt for metadata via CLI questions, mint a run ID, generate MDX, flatten the analysis into a Supabase-ready record, write all artifacts (MDX, JSON archive, JSONL append, report), move processed inputs into a directory, and emit a provenance linkage manifest.

I then use another standalone Node CLI that ingests every JSON analysis into a directory, normalizes the data across schema generations (since I keep changing it), and emits appropriate JSONL files suitable for Supabase ingestion.

I keep changing it and that's why outputs look different over time. I tweak the prompt, then the schema, then the markdown. I recently switched from VitePress to Docusaurus to experiment with React components.

Please note that this project, including the analytical frameworks and outputs displayed here, is an ongoing work in progress. You may notice variations in formatting or structure between different analysis pages. This reflects the iterative nature of research and development as I work to refine prompts, schemas, and presentation in unison.

~ Troy

The Conceptual Anchor​

The Bundle of Practices​

Prompt as Instrument​

The Probabilistic Minefield​

About the Outputs​

Structure and Consistency​

Two Forms of Output​

Reading LLM Outputs Critically​

Why Schemas Matter​

Extended Processing Summaries​

The Mechanism:​

Why Include Them​

The Literacy Caveat​

Addressing Potential Critiques​

"Isn't This Hypocritical?"​

"Are the Outputs Reliable?"​

Behind the Scenes Workflow​