Corpus Libraries

From Documents to Data
Every analysis on this site started as a document, but these analyses are also data.
When I enforce a consistent schema across dozens of analyses, each analysis becomes a record in a database. Each analytical section becomes a queryable field.
This is an example of the power of structured data in humanities work.
What Libraries Are
A "library" here is a thematic extraction: one analytical dimension pulled across the entire corpus and presented as a unified view.
Think of it like a database query made visible. Instead of, for example, reading 50 analyses to find every instance where authors obscure human decision-making behind agentless constructions, you can see them all in one place.
Each library answers a different question:
- How do texts distribute anthropomorphic language across their structure?
- What gets hidden when we attribute "knowing" to systems that don’t really “know” but instead process?
- Who benefits when accountability disappears into the passive voice?
These are cross-sections: the same corpus viewed through different lenses.
The Technical Reality
Behind these pages is a pipeline:
- Analysis outputs from Gemini API follow a JSON schema that enforces consistent fields
- Processing scripts transform JSON into both human-readable MDX and database-ready records
- Supabase tables store normalized data across analyses
- Extraction scripts query specific fields and generate consolidated library pages
The schema does the heavy lifting. Because every metaphor analysis has a contextSensitivity field with the same structural expectations, aggregating them is trivial. The analytical work happened upstream, in the prompt design and schema definition.
This is what "prompt as scholarship" looks like in practice: the prompt is more than instructions to an LLM. It's a research methodology encoded as a type of flexible (but purposeful) data contract.
Available Libraries
This corpus currently supports 14 library types: 5 critical observations, 5 conclusion syntheses, and 4 task extractions.
Critical Observations
These libraries extract the synthetic analytical sections that examine structural patterns across each source text.
Context Sensitivity
Extracts: criticalObservations.contextSensitivity
Maps the distribution of anthropomorphic language across each text. Where do consciousness claims intensify? What's the relationship between technical grounding and metaphorical license? Does the text deploy agential language for capabilities and mechanical language for limitations?
This library reveals a strategic geography of anthropomorphism: where the metaphors appear, where they're deployed, and what that positioning accomplishes.
Agency Slippage
Extracts: criticalObservations.agencySlippage
Tracks how texts oscillate between mechanical and agential framings. Agency slippage runs in two directions: agency attributed to AI systems (making them seem autonomous) and agency displaced from human actors (making decisions seem inevitable).
In aggregate, this library exposes the rhetorical machinery of anthropomorphism: the moves that let texts claim scientific rigor while trafficking in some folk psychology.
Metaphor-Driven Trust
Extracts: criticalObservations.metaphorDrivenTrust
Examines how metaphorical framings construct or undermine trust. The key distinction: performance-based trust (does the tool work?) versus relation-based trust (can I trust this agent's intentions?).
Consciousness language ("the model understands," "AI knows") often is a signal for relation-based trust. This library tracks how texts invite audiences to trust statistical systems as they would trust persons, and what risks that category error creates.
Obscured Mechanics
Extracts: criticalObservations.obscuredMechanics
Identifies what gets hidden by anthropomorphic framing: technical realities, material costs, labor conditions, economic interests. Each entry applies the "name the corporation" test: when text says "the model learned," who actually made the decisions, extracted the data, performed the labor, and captured the profit?
This library makes visible a political economy that the mystification conceals.
Accountability Synthesis
Extracts: criticalObservations.accountabilitySynthesis
Synthesizes the accountability architecture across each source text: who gets named versus who remains invisible, what's framed as choice versus inevitability, where responsibility goes to hide.
The recurring question: what would change if human decision-makers were explicitly named throughout?
Conclusion Syntheses
These libraries extract the synthetic conclusion paragraphs that interpret findings and assess implications.
Pattern Summary
Extracts: conclusion.patternSummary
The opening synthesis from each analysis: 2-3 dominant anthropomorphic patterns identified, their interconnection as a system, and which pattern is "load-bearing" (the one that must hold for the others to function).
Mechanism of Illusion
Extracts: conclusion.mechanismOfIllusion
How does each text's metaphorical system create the "illusion of mind"? These entries examine the internal logic of persuasion: the rhetorical moves, their sequence, audience vulnerabilities exploited, and the "curse of knowledge" dynamics where authors tend to project their own understanding onto these systems.
Material Stakes
Extracts: conclusion.materialStakes
Concrete consequences of metaphorical framings: economic, regulatory, epistemic, institutional, social. Each entry traces causal paths from metaphor to material outcome and who benefits and who bears costs when AI is framed as a” knower” rather than a “processor.”
Literacy as Counter-Practice
Extracts: conclusion.literacyAsCounterPractice
Reflections on how critical reframing might serve as a kind of resistance to misleading AI discourse. What would systematic adoption of mechanistic language require? Who resists precision, and why?
Path Forward
Extracts: conclusion.pathForward
Forward-looking analyses mapping vocabulary alternatives and their consequences. Each entry sketches possible discourse futures: mechanistic precision, anthropomorphic deepening, or continued confusion.
Task Extractions
These libraries extract structured data from the analytical tasks themselves (the raw material of the audit).
Reframing Library
Extracts: task4ReframedLanguage (all items)
The practical output of the audit: anthropomorphic language rewritten with mechanistic precision. Each entry shows the original frame, the technical reframing, a technical reality check, and (where applicable) the restoration of human agency.
This is the most directly pedagogical library: a reference for how to say what you mean about AI systems.
Source-Target Mappings
Extracts: task2SourceTargetMapping (all items)
Lakoff-style structure-mapping analyses: how relational structure from familiar source domains (teacher, conscious mind, knower) projects onto AI target domains (gradient descent, pattern matching, token prediction).
The "Conceals" field is critical: what dissimilarities does each mapping hide?
Metaphor Audit Items
Extracts: task1MetaphorAudit (all items)
The complete Task 1 audit across all analyses: metaphorical patterns identified, human qualities projected, acknowledgment status, implications, and accountability analysis.
Explanation Audit Items
Extracts: task3ExplanationAudit (all items)
Brown's typology applied across the corpus: how explanations frame AI mechanistically (how it works) versus agentially (why it acts), and the rhetorical impact of those choices.
For the Technically Curious
The extraction pipeline is straightforward:
Supabase (normalized tables)
↓
node generate-corpus-library.js --type context-sensitivity
↓
analyses/libraries/context-sensitivity/context-sensitivity.mdx
↓
copy to docs/01-metaphor-analysis/corpus-libraries/
Each library type is defined in a configuration object specifying the table, column(s), output format, and descriptive metadata. Adding a new library is a matter of adding a configuration entry and running the script.
The full script supports all 14 types:
node generate-corpus-library.js --all
What’s the point
Digital humanities has long grappled with the tension between close reading (depth) and distant reading (scale). Libraries like these suggest a third mode: astructured reading where the schema itself encodes some rich analytical questions, and aggregation becomes a form of interpretation.
The prompt-as-scholarship approach treats things schema design as the work of methodology. What fields you require in the schema shapes what questions become askable.
This is reproducible. The prompts, schemas, and scripts that generate these libraries are documented. The methodology can be adapted, critiqued, extended. With the advance of open models and hosted inference end points, we have arrived at a place where the digital humanist has a significant capacity to play (and intervene).
That's the point.
Reach out! ~ Troy