The Over-worked Homunculus and Underpaid Labor

Troy Davis (cross posted on substack)

I came across an exposé in 404 Media that not only helped me fill in more gaps in my understanding about the current state of AI, but it also might be useful to promote an experiment I’m doing over at Discourse Depot.

The article introduces Michael Geoffrey Asia who worked for a data labeling company in Nairobi Kenya and who spent eight hours a day staring, frame by frame, at all kinds of explicit content for which he was paid $240 a month. But it gets worse. When that shift ended, he started his second job: acting as the human labor behind AI sex bots, sexting with people, switching genders and personalities on instruction from an algorithm that managed and dictated his shifts. Not surprising, Asia reported that the psychological toll was devastating and included insomnia, severe PTSD, and the fracturing of his personal life.

Asia is now the secretary-general of the Data Labelers Association(DLA), which is an advocacy organization that is pushing for the “fair treatment, better conditions, and recognition for data labelers worldwide.” As one worker named Angela at a DLA event profiled in the 404 piece put it:

When you think of colonialism, we were under British Imperial East Africa Company […] so literally, we are working under a company. We are just products, part of their operation.

And all the players like Meta, OpenAI, Apple, and Google seem to be doing it: using cheap labor to refine their models.

When I talk with colleagues in higher education about AI, the anxieties usually cluster around some predictable themes: Will students use it to cheat? How’s it going to change our jobs? What about the carbon footprint of the data centers? Super valid concerns. But since AI is playing out in already in an extractive economic system, we can add to our growing list an invisible and traumatized human workforce that does the work that helps make these systems “align” with our human values.

Enter Discourse Depot: Auditing the “Illusion of Mind”

On my research site, Discourse Depot, I’ve been exploring ways to track how this flavor of evasion happens through the discourse about AI. The site chronicles a research project where I am using Large Language Models (LLMs) as instruments to perform critical discourse analysis on the academic papers, press releases, and journalism surrounding AI. (More on this tension later). I feed texts into a 5,000+ word system instruction prompt designed to locate metaphors, identify anthropomorphism, and flag the rhetorical slippage where how a machine works is replaced by why a machine “thinks.”

Because every analysis on the site follows a strict data schema, I am then able to save the outputs in a relational database then aggregate these observations into 14 different thematic areas that I call “Corpus Libraries.” These are cross-sections of the entire database of JSON enforced outputs viewed through different thematic lenses. See the end for a description of the libraries.

One of the most revealing parts of this schema is a section I call “Obscured Mechanics.” When the LLM model audits a text, it is specifically instructed to identify, in the text being audited, what technical, material, or labor realities might be obscured by the anthropomorphic framing in the text. Over time, I’ve pulled these specific analyses out of the database and compiled them into an Obscured Mechanics Library.

If you read through that library, you’ll see an obvious pattern. But there needs to be a bit of nuance here: I feel like I do need to say that I’m the one creating or forcing this pattern through the system instructions I am using and I am not claiming that every tech journalist, academic, or developer intends to actively erase workers like Michael Geoffrey Asia.

See “A Note on Intent (and the Instrument)” section of Discourse Depot…

The whole point of Discourse Depot is not so much to claim that this tendency to grant agency to generative AI is just a malicious trick to help us ignore human exploitation. It is a bit more than that. The primary issue I see is that attributing agency to software is simply wrong in an empirical sense in that it completely ignores how these models actually work. And the way that agency gets attributed is through the language used to describe what that software does, or more accurately, what it is.

By not talking about LLMs mechanistically, but as some sort of “mind,” what happens is that the mechanistic nature of the technology itself is actively obscured. And here I feel the need to be a bit delicate but precise: the exploited labor in Nairobi, the staggering water and energy consumption of data centers, the “use” of public data for private gain, well, none of this is uniquely “because of AI.” That is simply global capitalism operating exactly as it always has.

So, over at Discourse Depot, I’m pushing this button over and over again: my point is that the true, distinct threat of the moment is how we talk about it. When we use anthropomorphic language to describe these tools, we drape a magical, cognitive cloak over all this standard corporate extraction. When an illusion of a “digital mind” is offered as the status quo description of the latest LLM, it becomes harder to recognize this familiar supply chain which then makes the exploitation far harder to regulate or resist.

Everything else then, including the hidden labor, the environmental costs, the evaded corporate liability is a downstream consequence of that way of talking about it.

For example, swap the phrase “AI learns our preferences” for “engineers tune parameters using engagement data” and this forces an immediate recognition of these systems’ absolute lack of awareness. And it underscores their dependence on data and the statistical (not semantic) character of their outputs.

Name the Actor: How Language Shifts Accountability

If we applied a simple “name the actor” test to major agentless constructions in a typical press release (or even research article) about AI, the discourse would shift substantially. For example, instead of saying “the AI discriminated,” we might say: “the engineering team at OpenAI deployed a model optimized on biased historical data.” That reframing makes different (and more actionable) questions possible:

Why was this objective function chosen?
Who approved the dataset?
Which alternative architectures were set aside and why (e.g., speed, cost, profit)?

Why Actorless Framing Persists

Well, for one, naming the relevant institutions, corporations and decision-makers makes the underlying design choices visible and creates conditions for genuine legal and ethical accountability. By contrast, obscuring human agency serves another function: it protects the commercial and political interests of the technology sector. By sustaining the illusion of an autonomous, conscious machine, tech firms construct a rhetorical shield and one that enables them to exercise (outsized) societal power.

So, the audit isn’t necessarily about finding malicious authorial intent to obscure or ignore these realities; Rather, it demonstrates a powerful side effect: this empirically inaccurate language makes the technology much more palatable (and mysterious, and human, etc). It is far easier to embrace an “emergent being” or a “digital mind” than to reckon with just another corporate product, created by human producers, prone to errors and deployed in a system driven by profit.

By forcing the system to keep the labor conversation on repeat, just like the environmental conversation, the audit reveals how anthropomorphic language is functioning as a de facto corporate shield, regardless of the author’s intent.

The Magic Trick of “Alignment” and “Learning”

In the AI industry, when a model refuses to generate hate speech, or when it answers a question with a polite, helpful tone, researchers call this “alignment.” The discourse is filled with phrases like, “The model learned to be helpful,” or “The AI understands human intent,” or “The system developed an empathetic persona.”

However, I think these sentences perform an epic act of concealment. When my prompt applies what I call the “Name the Corporation” test, the purpose is to shatter some significant illusions (or that’s the desired effect). A model does not “learn” to be polite. It doesn’t “develop” empathy. One thing that is happening is a process called Reinforcement Learning from Human Feedback (RLHF).

To make a model safe and conversational, corporations seem to be ok with outsourcing this grueling work of reading horrific, toxic, and violent text to precarious gig workers in the Global South. These workers manually rate and rank outputs, effectively beating the statistical model into a shape that mimics human decency.

When a research paper marvels at an AI’s “warmth” or “ethical reasoning,” it is looking past the crystallized, alienated labor of an underpaid Kenyan data annotator like Asia, and attributing that human effort to the spontaneous genius of a machine.

As one analysis in the Obscured Mechanics library notes:

The AI’s supposed ‘Theory of mind,’ ‘social perception,’ and ‘empathy’ are not emergent properties of a synthetic soul; they are the direct product of RLHF… The text attributes the wisdom of this hidden labor force entirely to the autonomous ‘social cognition’ of the machine.

Speaking of Metaphors: The Elephant in the Room

There is an obvious tension in using a large language model that is itself produced by exactly the labor in question as the instrument for naming that labor. I don’t think the tension is necessarily fatal for my purposes, but I also don’t pretend it isn’t there.

The instrument is good at recognizing rhetorical patterns at scale because it was trained on enormous amounts of human prose, much of it scraped without consent and rated by workers like Asia. That capability is a direct consequence of the workforce the audits keep surfacing. The work runs partly with and partly against its own conditions of possibility. I guess I’d rather hold that contradiction openly for now than dissolve it by either refusing to use the tool or by claiming that the tool is somehow neutral.

Words Matter

Am I just making an academic complaint about semantics? I don’t’ think so, but I do think that anthropomorphism is looking like a highly effective business strategy. I also think it is undermining AI literacy efforts. (For example, using the word “hallucination” for a good old fashioned error is problematic). If we describe an LLM as a “cognitive partner” or a “learning mind,” the corporation that built it gets to claim they have birthed a miracle, deflecting scrutiny (and product liability) away from their invasive data scraping, errors, and exploitative labor models. Also, it is just a good old fashioned category error at the end of the day. (I say more about how this ties into a politics of refusal over at Discourse Depot).

Toward the end of Koebler’s reporting, Asia delivered the line that gives the piece its title:

AI can never be AI without humans. It is not artificial intelligence. It’s African intelligence.

For me, that is the kind of sentence that snaps AI back to earth and out of the sci-fi narratives its founders are enthralled in. Asia takes words like intelligence and cognition and reattaches all of them to a labor force, a wage, a continent, a supply chain. The reattachment is an entire argument worth framing and repeating. So that’s what I’m doing with every text that shows up on Discourse Depot.

The 404 Media article and others like it are vital pieces of reporting because they pull back the curtain on the “soul of AI” to reveal what is essentially a digital sweatshop. Over at Discourse Depot, my goal is to show how the curtain is linguistically woven in the first place.

I don’t think we can build a durable, critical AI literacy if we continue to use language that grants agency to software. We don’t do it with other technologies. We use metaphors all the time, of course. For example, we say “the internet has traffic” or we put files in “folders.” But there is something benign and structurally useful there; we are simply mapping physical logistics onto digital architecture. The metaphors we use around AI are entirely different because they are sourced from the world of human cognition, understanding and intentionality which impact how we think about their capabilities and which organize our trust in its products.

In fact, I’ll go a bit further and say that a true critical AI literacy must include this: the practice of actively and persistently tagging the anthropomorphisms, mapping the metaphors, and catching the explanation slippages in our daily discourse when it comes to AI. By refusing the illusion of a digital mind, we can stop debating whether the machine is conscious or hallucinating, and start focusing on the corporations profiting from it and the human minds they seem to be breaking to build it.

Asia’s full testimony, “The Emotional Labor Behind AI Intimacy,” is at the Data Workers archive. The Data Labelers Association is doing a fair amount of organizing and worth exploring. Koebler’s reporting is at 404 Media. The Obscured Mechanics Library is here. While you’re there check out the other Corpus Libraries.

Corpus Libraries

Source-Target Mapping Library: Structure-mapping analyses of how relational structure from familiar source domains projects onto AI target domains, and what they might hide.
Reframing Library: The practical output of the audit: anthropomorphic language rewritten mechanistically and the restoration of human agency.
Pattern Summary Library: Identifies 2-3 dominant anthropomorphic patterns, their interconnection as a system, and which pattern is “load-bearing.
Path Forward Library: Forward-looking analyses mapping vocabulary alternatives and their consequences, sketching possible discourse futures.
Obscured Mechanics Library: Identifies what gets hidden by anthropomorphic framing: technical realities, material costs, labor conditions, economic interests.
Metaphor-Driven Trust Library: Examines how metaphorical framings construct or undermine trust, tracking how texts invite audiences to trust statistical systems as they would trust persons.
Metaphor Audit Library: The complete audit identifying metaphorical patterns, human qualities projected, acknowledgment status, implications, and accountability analysis.
Mechanism of Illusion Library: Examines how each text’s metaphorical system creates the “illusion of mind” and exploits audience vulnerabilities or the “curse of knowledge.”
Material Stakes Library: Traces the concrete economic, regulatory, epistemic, institutional, and social consequences of metaphorical framings.
Literacy as Counter-Practice Library: Reflects on how critical reframing might serve as resistance to misleading AI discourse and what systematic adoption of mechanistic language would require. *
Explanation Audit Library: Applies Robert Brown’s explanation typology to show how explanations frame AI mechanistically (how it works) versus agentially (why it acts).
Context Sensitivity Library: Maps the distribution of anthropomorphic language across each text. Where do consciousness claims intensify? What’s the relationship between technical grounding and metaphorical license?
Agency Slippage Library: Tracks how texts oscillate between mechanical and agential framings. Agency slippage runs in two directions: agency attributed to AI systems and agency displaced from human actors.
Accountability Synthesis Library: Synthesizes the accountability architecture across each source text: who gets named versus who remains invisible, what’s framed as choice versus inevitability.