LLM Extractivism

LLM Extractivism image
The practice by which large language model companies ingest publicly available material, peer-produced archives, journalism, creative works, and public knowledge repositories, at scale to build proprietary model capacity, without systematic consent, attribution, or benefit-sharing with the communities whose labour produced that material.

So what? Framing this solely as a copyright question obscures a deeper structural shift: the political economy through which shared knowledge becomes private computational power. For technologists and testers evaluating AI tools, it is a lens for asking who actually benefits from the systems they build with.

Example: GPT-3 was trained on filtered subsets of Common Crawl, Wikipedia, and books; once incorporated into a proprietary model, the underlying knowledge became part of a system whose internal workings remain opaque and privately controlled.
Explore MoT
MoTaCon 2026 image
Thu, 1 Oct
A tech conference to help you navigate the ever-shifting landscape of Quality Engineering, AI, Leadership, Product, Accessibility and Security.
Advanced prompting for testers image
Advanced prompting skills to turn AI into your trusted testing companion.
This Week in Quality image
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter