The process by which cultural artefacts lose their contextual meaning, authorship history, and situatedness when processed as statistical data during model training. Language and cultural production are converted into model parameters optimised for predictive probability, stripped of the social relations and political-economic histories they were embedded in.
So what? For AI testing and quality work, this is a signal that model outputs may systematically misrepresent knowledge from communities or traditions that were underrepresented or decontextualised in training.
Example: A body of journalism produced within a specific political context enters a training dataset as tokenised text; the model learns statistical patterns from it without any representation of the journalistic judgement, editorial history, or cultural stakes that shaped it.
So what? For AI testing and quality work, this is a signal that model outputs may systematically misrepresent knowledge from communities or traditions that were underrepresented or decontextualised in training.
Example: A body of journalism produced within a specific political context enters a training dataset as tokenised text; the model learns statistical patterns from it without any representation of the journalistic judgement, editorial history, or cultural stakes that shaped it.