🔥 MoTaCon tickets are hot! Get yours today! 🔥

LLM Extractivism

The practice by which large language model companies ingest publicly available material, peer-produced archives, journalism, creative works, and public knowledge repositories, at scale to build proprietary model capacity, without systematic consent, attribution, or benefit-sharing with the communities whose labour produced that material.

So what? Framing this solely as a copyright question obscures a deeper structural shift: the political economy through which shared knowledge becomes private computational power. For technologists and testers evaluating AI tools, it is a lens for asking who actually benefits from the systems they build with.

Example: GPT-3 was trained on filtered subsets of Common Crawl, Wikipedia, and books; once incorporated into a proprietary model, the underlying knowledge became part of a system whose internal workings remain opaque and privately controlled.

Source: https://botpopuli.net/llm-extractivism-and-the-politics-of-the-knowledge-commons/

Rosie Sherry