📚 Corpus Libraries

An image from the static

From Documents to Data

Every analysis on this site started as a document, but these analyses are also data.

When I enforce a consistent schema across dozens of analyses, each analysis becomes a record in a separate database. Each analytical section becomes a queryable field.

This is an example of the power of structured data in humanities work.

What Libraries Are

A "library" here is a thematic extraction: one analytical dimension pulled across the entire corpus and presented as a unified view.

Think of it like a database query made visible. Instead of, for example, reading 50 analyses to find every instance where authors obscure human decision-making behind agentless constructions, you can see them all in one place.

Each library answers a different question:

How do texts distribute anthropomorphic language across their structure?
What gets hidden when we attribute "knowing" to systems that don’t really “know” but instead process?
Who benefits when accountability disappears into the passive voice?

These are cross-sections: the same corpus viewed through different lenses.

The Technical Reality

Behind these pages is a pipeline:

Analysis outputs from Gemini API follow a JSON schema that enforces consistent fields
Processing scripts transform JSON into both human-readable MDX and database-ready records
Supabase tables store normalized data across analyses
Extraction scripts query specific fields and generate consolidated library pages

The schema does the heavy lifting. Because every metaphor analysis has a contextSensitivity field with the same structural expectations, aggregating them is trivial. The analytical work happened upstream, in the prompt design and schema definition.

This is what "prompt as scholarship" looks like in practice: the prompt is more than instructions to an LLM. It's a research methodology encoded as a type of flexible (but purposeful) data contract.

Available Libraries

This corpus currently supports 14 library types: 5 critical observations, 5 conclusion syntheses, and 4 task extractions.

Critical Observations

These libraries extract the synthetic analytical sections that examine structural patterns across each source text.

Context Sensitivity

Extracts: criticalObservations.contextSensitivity

Maps the distribution of anthropomorphic language across each text. Where do consciousness claims intensify? What's the relationship between technical grounding and metaphorical license? Does the text deploy agential language for capabilities and mechanical language for limitations?

This library reveals a strategic geography of anthropomorphism: where the metaphors appear, where they're deployed, and what that positioning accomplishes.

Agency Slippage

Extracts: criticalObservations.agencySlippage

Tracks how texts oscillate between mechanical and agential framings. Agency slippage runs in two directions: agency attributed to AI systems (making them seem autonomous) and agency displaced from human actors (making decisions seem inevitable).

In aggregate, this library exposes the rhetorical machinery of anthropomorphism: the moves that let texts claim scientific rigor while trafficking in some folk psychology.

Metaphor-Driven Trust

Extracts: criticalObservations.metaphorDrivenTrust

Examines how metaphorical framings construct or undermine trust. The key distinction: performance-based trust (does the tool work?) versus relation-based trust (can I trust this agent's intentions?).

Consciousness language ("the model understands," "AI knows") often is a signal for relation-based trust. This library tracks how texts invite audiences to trust statistical systems as they would trust persons, and what risks that category error creates.

Obscured Mechanics

Extracts: criticalObservations.obscuredMechanics

Identifies what gets hidden by anthropomorphic framing: technical realities, material costs, labor conditions, economic interests. Each entry applies the "name the corporation" test: when text says "the model learned," who actually made the decisions, extracted the data, performed the labor, and captured the profit?

This library makes visible a political economy that the mystification conceals.

Accountability Synthesis

Extracts: criticalObservations.accountabilitySynthesis

Synthesizes the accountability architecture across each source text: who gets named versus who remains invisible, what's framed as choice versus inevitability, where responsibility goes to hide.

The recurring question: what would change if human decision-makers were explicitly named throughout?

Conclusion Syntheses

These libraries extract the synthetic conclusion paragraphs that interpret findings and assess implications.

Pattern Summary

Extracts: conclusion.patternSummary

The opening synthesis from each analysis: 2-3 dominant anthropomorphic patterns identified, their interconnection as a system, and which pattern is "load-bearing" (the one that must hold for the others to function).

Mechanism of Illusion

Extracts: conclusion.mechanismOfIllusion

How does each text's metaphorical system create the "illusion of mind"? These entries examine the internal logic of persuasion: the rhetorical moves, their sequence, audience vulnerabilities exploited, and the "curse of knowledge" dynamics where authors tend to project their own understanding onto these systems.

Material Stakes

Extracts: conclusion.materialStakes

Concrete consequences of metaphorical framings: economic, regulatory, epistemic, institutional, social. Each entry traces causal paths from metaphor to material outcome and who benefits and who bears costs when AI is framed as a” knower” rather than a “processor.”

Literacy as Counter-Practice

Extracts: conclusion.literacyAsCounterPractice

Reflections on how critical reframing might serve as a kind of resistance to misleading AI discourse. What would systematic adoption of mechanistic language require? Who resists precision, and why?

Path Forward

Extracts: conclusion.pathForward

Forward-looking analyses mapping vocabulary alternatives and their consequences. Each entry sketches possible discourse futures: mechanistic precision, anthropomorphic deepening, or continued confusion.

note

I recently came across an example of a provenance statement that read:

Provenance note: Portions of this work were developed in testing AI conversational systems, used as an exploratory tool for structure, synthesis, and scenario testing. References to “agency,” “voice,” or “orientation” in AI systems are used functionally to describe observable interaction patterns and socio-technical effects, not as claims about biological consciousness or moral personhood. The responsibility for interpretations, claims, and framing remains with the human authors.

-- Meijer, D., & Dobson, R. Beyond the Current AI Hype: Towards a Human-Guided AI Evolution by Fostering Human Consciousness, A Proposal for a New Scientific Discipline.

Task Extractions

These libraries extract structured data from the analytical tasks themselves (the raw material of the audit).

Reframing Library

Extracts: task4ReframedLanguage (all items)

The practical output of the audit: anthropomorphic language rewritten with mechanistic precision. Each entry shows the original frame, the technical reframing, a technical reality check, and (where applicable) the restoration of human agency.

This is the most directly pedagogical library: a reference for how to say what you mean about AI systems.

Source-Target Mappings

Extracts: task2SourceTargetMapping (all items)

Lakoff-style structure-mapping analyses: how relational structure from familiar source domains (teacher, conscious mind, knower) projects onto AI target domains (gradient descent, pattern matching, token prediction).

The "Conceals" field is critical: what dissimilarities does each mapping hide?

Metaphor Audit Items

Extracts: task1MetaphorAudit (all items)

The complete Task 1 audit across all analyses: metaphorical patterns identified, human qualities projected, acknowledgment status, implications, and accountability analysis.

Explanation Audit Items

Extracts: task3ExplanationAudit (all items)

Brown's typology applied across the corpus: how explanations frame AI mechanistically (how it works) versus agentially (why it acts), and the rhetorical impact of those choices.

For the Technically Curious

The extraction pipeline is straightforward:

Supabase (normalized tables)
    ↓
node generate-corpus-library.js --type context-sensitivity
    ↓
analyses/libraries/context-sensitivity/context-sensitivity.mdx
    ↓
copy to docs/01-metaphor-analysis/corpus-libraries/

Each library type is defined in a configuration object specifying the table, column(s), output format, and descriptive metadata. Adding a new library is a matter of adding a configuration entry and running the script.

The full script supports all 14 types:

node generate-corpus-library.js --all

What’s the point

Digital humanities has long grappled with the tension between close reading (depth) and distant reading (scale). Libraries like these suggest a third mode: astructured reading where the schema itself encodes some rich analytical questions, and aggregation becomes a form of interpretation.

The prompt-as-scholarship approach treats things schema design as the work of methodology. What fields you require in the schema shapes what questions become askable.

This is reproducible. The prompts, schemas, and scripts that generate these libraries are documented. The methodology can be adapted, critiqued, extended. With the advance of open models and hosted inference end points, we have arrived at a place where the digital humanist has a significant capacity to play (and intervene).

That's the point.

Reach out! ~ Troy

From Documents to Data​

What Libraries Are​

The Technical Reality​

Available Libraries​

Critical Observations​

Conclusion Syntheses​

Task Extractions​

For the Technically Curious​

What’s the point​