Skip to main content

The Educator’s FAQ set list on Generative AI

An image from the static

This page is a working archive of questions.

It collects recurring concerns about generative AI as they appear in higher education (my world): in classrooms, faculty workshops, policy drafts, committee meetings, hallway conversations, and all the moments of refusal or unease I am witnessing. These questions are not hypothetical. I think they are rich and are doing some other hard work like surfacing where assumptions about authorship, originality, labor, intelligence, responsibility, and pedagogy start to play out.

I don’t think these questions are problems to be solved once and for all. I am treating them as their own flavor of diagnostic prompts that are becoming the greatest hits (and misses) in higher ed.

The interesting thing about discussions of AI "transformation" in higher ed is that they are surfacing questions that one could consider "analytically prior to that of whether this transformative effect will help or harm humanity." Each question reveals something about what higher education already assumes or grapples with or kicks to the curb: things like how meaning is supposed to work, where authority is supposed to reside, how originality is recognized, and why certain kinds of language feel trustworthy. The idea was that a FAQ format would be a cool way to try and put those assumptions front and center, even at the risk of complicating them even more.

In practice, this means that many of the questions overlap. A concern that looks legal at first often turns out to be rhetorical. A pedagogical worry often masks an longer standing anxiety about academic labor systems in general.

What follows is a preliminary index of recurring questions, along with one anchor FAQ developed at length as an example. Over time, other entries could expand into similar riffs. For now, the page functions as a holding space for the questions themselves, and also a place to hold my fascination for the explanatory habits they expose.

Why an FAQ?

The FAQ format is familiar. Again and again, when it comes to genAI, what looks like a straightforward question opens into something deeper. What counts as authorship. What does originality actually name. Whether meaning requires intention.

Each entry will begin with a recognizable objection and then I’ll follow where it leads. Often, that path runs straight into philosophy, or literary theory, or any flavor of structural critique.

Recurring Questions

I’ve tried to organize them by some kind of lens instead of by discipline. You’ll notice that many questions appear across categories in slightly different forms.

Authorship, Origin, and Intent

Where meaning is assumed to come from...

  • Isn’t there still an author somewhere behind the output?
  • If the training data was authored, doesn’t authorship transfer to the output?
  • Isn’t meaning ultimately tied to an intending mind?
  • Doesn’t the model “remix” or rework existing human texts?
  • If no one meant the text, how can it mean anything at all?

Where ethical and legal language get into the mix…

  • Isn’t this just plagiarism in a new form?
  • Isn’t generative AI stealing from human authors?
  • If the training data was unlicensed, isn’t everything downstream compromised?
  • Is this a copyright violation, even if the output is technically new?
  • Can this be addressed by requiring citations or attribution?
  • Who is responsible when the output resembles existing work? ⠀

How These Systems Work, And How We Describe Them

Mechanism, metaphor, and misrecognition...

  • If the system is not understanding, what is it actually doing?
  • Is the model thinking, learning, or reasoning in any meaningful sense?
  • Why do the responses sound confident and coherent?
  • Why does interaction with the system feel conversational or interpersonal?
  • How does a language model generate text, in plain terms?
  • Why can’t the system identify where specific information came from?
  • What role does memorization play, and when does it become copying?

Originality, Novelty, and Value

Why something can feel new without being authored...

  • What counts as originality anymore?
  • Can something be new if no one intended it?
  • Is this just remixing at scale, or something else?
  • Why does novelty feel convincing even without authorship?
  • Is this all just semantic hair-splitting? ⠀

Reading, Writing, and Pedagogy

What happens when intention disappears...

  • How can educators teach a text with no author?
  • If the model is not making meaning, where does meaning emerge?
  • What does it mean to read something that was never meant?
  • Does generative text signal the death of the author, or a different condition altogether?
  • What happens to close reading when the text has no speaker?
  • Can students still learn to write and think critically when AI is involved? ⠀

Anxiety, Identity, and Academic Labor

What these questions are really about...

  • Why do readers and educators keep searching for a speaker or a mind behind the text?
  • Is the anxiety here about the loss of the author, or about the loss of authority?
  • Is generative AI threatening expertise, or exposing how fragile existing claims to expertise already were?
  • What do these debates reveal about academic labor and professional identity?
  • Is the discomfort focused on AI itself, or on long-standing narratives about human distinctiveness?

Metaphor, Discourse, and Power

How language organizes trust and blame...

  • Why is anthropomorphic language so readily applied to AI systems?
  • How do metaphors such as “thinking” or “hallucinating” shape public perception?
  • Why do companies describe models as agentic until responsibility or harm enters the frame?
  • What kinds of explanations are favored when things go wrong?
  • What if the text is neither lying nor meaningfully saying anything?
  • Isn’t Refusing to Use AI the Most Ethical Choice?

Let’s start with a hot one.

Objection: If large language models are trained on copyrighted works without permission, isn’t that theft? Doesn’t that make everything they generate a legal, or at least ethical, violation?
Response: This is a good one and is one of the most contested questions in current AI discourse, and it remains legally unsettled. The short answer is: it depends. The longer answer is where the question becomes philosophically and pedagogically interesting.


Much of the murkiness comes from treating “training” and “output” as the same act.

  • Training on copyrighted data is often defended under fair use in the United States, especially when the use is transformative, non-expressive, or analytic rather than substitutive.
  • Language models do not store or reproduce training texts directly. They extract statistical patterns. At the same time, lawsuits brought by publishers and authors’ organizations argue that large-scale pattern extraction still constitutes some form of appropriation.
  • Output liability is even murkier. If a model generates a sentence that resembles a copyrighted work, infringement depends on context, similarity, and intent. This is probably where some edge cases matter, and they remain unresolved.

Discourse Note: When AI Has a Mind, and When It Doesn’t

I encourage anyone reading this to pay attention to how descriptions of generative AI shift depending on what's at stake.

When AI systems are being introduced, demonstrated, or marketed, their creators seem to have zero problems using the language of minds and agency. The model understands context. It reasons through problems. It chooses responses. It is empathetic. These descriptions make the system understandable and sound impressive. They help audiences (customers) grasp what it seems to do but they are linguistic choices, and that's cool.

However, notice that when legal responsibility or talk of accountability enters the mix, that language often changes. A choice is often made to re-describe the same system in more mechanistic terms. It is a statistical process. A mathematical function. An inert tool. It does not understand, intend, or choose. It is “just math.”

The issue is that both descriptions are pointing in some general direction toward something real. The model does produce behavior that resembles understanding. It is also true that the system lacks intention, consciousness, or agency in any human sense. The problem is not that one description is true and the other false. The problem is that the shift itself often goes unremarked.

Notice this: Anthropomorphic language tends to appear when outcomes are impressive or beneficial. Mechanistic language tends to appear when responsibility, harm, or liability are under discussion. This choice of description can, and does, shape how audiences (and judges or juries) assign trust, credit, and blame.

Seen this way, copyright debates are not only about ownership. They are also about which explanatory frames are adopted at moments of risk or liability.

Legally, this is still unsettled ground and it is not going away by a long shot. But this instability, settled or not, will continue to point beyond issues of law.

The Constitution empowers Congress to secure copyrights in order to "promote the Progress of Science and useful Arts," creating a system to regulate the relationships between authors, readers, and the use of texts. Generative AI, to put it mildly, is fundamentally disrupting that system.


2. Why “It Must Have Been in the Training Data” Feels So Obvious

“That output must have been in the training data. It was stolen.”

The intuition is totally understandable. The language sounds coherent. It feels grounded. Perhaps I am assuming that some fluency in language implies an originating act, some prior utterance that can be traced, some causal chain to intention.

Technically, that is not how most generative AI outputs are produced.

Large language models do not retrieve stored passages or replay texts they have seen before. They generate new sequences of tokens step by step, guided by statistical regularities learned during training. What appears is a synthetic composition, not a reproduction.

At the same time, none of the ethical concerns associated with how it got there disappear. The training corpus most assuredly includes unlicensed material. Many (probably all) creators of the content were not asked for any consent. This remains a serious and ongoing controversy about how these systems are being built.

The problem arises when two different issues are collapsed into one.

  • Mechanics of generation: probabilistic, synthetic, not causally tied to any single source.
  • Practices of data collection: often opaque, extractive, and deserving of critique - and probably lawsuits.

Conflating these makes every fluent output feel like some kind of evidence of theft, rather than a consequence of how plausibility works in generative systems.

Let me be super clear: Separating them does not excuse harmful practices associated with training data. It does, for me, help clarify what kind of critique is actually being made.


3. Reproduction vs. Production

A familiar comparison helps here. A phone call is an indexical reproduction. The sound is mediated, compressed, and transmitted, but it remains causally tied to a real-world event, a real body. Someone spoke. The signal points back to that occurrence. It is not the “real” voice you are hearing but a reconstruction.

A generative AI response is a statistical production. It is plausible because of learned patterns, not because it reproduces or mediates any prior event or utterance. There is no originating moment to trace back to. No speaker. No scene. No intention. No body.

This difference explains why AI outputs feel so weird. They carry the surface qualities of mediated communication (like a phone call), fluency, responsiveness, coherence, but the anchor is missing.

What I am reading (like all the outputs on this site) is not a reproduction of something that happened. It is an emergent effect of a system trained to maximize plausibility.


4. From Ownership to Interpretation

The copyright objection is interesting in that does assume the existence of a stable chain: Author → Work → Ownership → Violation With generative AI, that chain pretty much frays. Why?

  • The work is not "stable." It is probabilistic and emergent.
  • The author is missing. There is no intending subject.
  • Any “violation” becomes difficult to locate, because "influence" is diffuse and "origin" cannot be cleanly traced.

What remains is a feeling rather than a fact: something feels taken, even when it cannot be clearly identified.

This is where AI literacy becomes more like a type of interpretation. Generative AI is doing a great job exposing how deeply these systems of ownership, originality, and value rely on some pretty entrenched assumptions about agency. When that agency disappears, the frameworks built around it begin to wobble.

Generative AI challenges copyright law and questions its basic assumptions: that works are separate, authors are independent, and originality is clear. These are not legal facts but cultural narratives about creativity.


5. A Pedagogical Opening

This FAQ is not a way to resolve copyright law, I’m not a lawyer. But even judges and lawyers disagree all the time about fair use. I’m trying to bring some questions to the surface. For students and educators, it is an example of how a “simple” question about copyright and AI can become an entry point into some other paths of inquiry:

  • What do we protect when we protect authorship?
  • Why does originality matter, and to whom?
  • What is creativity?
  • What does it mean to own a style, a voice, a turn of phrase?
  • Can language be treated as participation rather than property?

In this sense, one could just as easily conclude that copyright functions as a metaphor for control: over meaning, over interpretation, over authority. Generative AI does not neatly oppose that metaphor. It seems to step outside it? That opens a philosophical can of worms. It also opens up just another possible teaching opportunity.