Quality Statements for LLMs: The Good, The Bad and The Ugly thumbnail

Quality Statements for LLMs: The Good, The Bad and The Ugly

AI as a buzzword is everywhere. It will steal our jobs, make us all obsolete and in the end: It will rule the world. We've been experiencing a shift in paradigms for two years and, most prominently, Large Language Models like LLaMA, ChatGPT or BARD are re-shaping industries and our everyday lives.

Using a Co-Pilot for Coding or Testing is seen as enhancing production and lowering barriers to entry.
But now that the uses of these LLMs are increasing rapidly:

  • Who is testing them?
  • And what actually is Quality in the age of AI?


In this talk, I want to provide results from my experience in projects of testing Large Language Models and regressive AI. I will explain the high-level function of a Large Language Model.

I will translate the components of a Copilot onto a newly thought testing pyramid from the component level to the system level. Now that we have a sort of framework to test LLMs, I will outline the metrics used and why testers will still be needed in the age of AI - maybe even more than ever.


Comments

Sign in to comment
Explore MoT
Leading with AI - The London Edition image
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Into The Motaverse image
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.
Subscribe to our newsletter