Low-resource Languages

Low-resource Languages image
Languages that are significantly underrepresented in computational datasets and NLP research, typically because the volume of digitised text available for training is small relative to dominant languages such as English.

So what? Over 90% of the world's languages are classified as low-resource; this imbalance means AI models perform unevenly across linguistic and cultural contexts, producing representational injustice for speakers of those languages.

Example: A multilingual model trained predominantly on English-language web data may perform well for English speakers but generate unreliable or culturally inappropriate outputs for speakers of languages with limited digital corpora.
Explore MoT
MoTaCon 2026 image
Thu, 1 Oct
A tech conference to help you navigate the ever-shifting landscape of Quality Engineering, AI, Leadership, Product, Accessibility and Security.
Prompting for Testers image
Unleash the power of generative AI to boost your software testing and day-to-day tech tasks
Into The Motaverse image
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.
Subscribe to our newsletter