Quality Statements for LLMs: The Good, The Bad and The Ugly

2nd October 2024
  • Locked
Bastian Knerr's profile
Bastian Knerr

Teamlead Testing

Quality Statements for LLMs: The Good, The Bad and The Ugly image
Talk Description

AI as a buzzword is everywhere. It will steal our jobs, make us all obsolete and in the end: It will rule the world. We've been experiencing a shift in paradigms for two years and, most prominently, Large Language Models like LLaMA, ChatGPT or BARD are re-shaping industries and our everyday lives.

Using a Co-Pilot for Coding or Testing is seen as enhancing production and lowering barriers to entry.
But now that the uses of these LLMs are increasing rapidly:

  • Who is testing them?
  • And what actually is Quality in the age of AI?


In this talk, I want to provide results from my experience in projects of testing Large Language Models and regressive AI. I will explain the high-level function of a Large Language Model.

I will translate the components of a Copilot onto a newly thought testing pyramid from the component level to the system level. Now that we have a sort of framework to test LLMs, I will outline the metrics used and why testers will still be needed in the age of AI - maybe even more than ever.

By the end of this session, you'll be able to:

  • Learn how a Large Language Model works on a high level and possible pitfalls for testing
  • Discover a high-level standardized approach to testing Large Language Models
  • Understand a new testing pyramid: What's the component level in LLM systems?
  • What is quality in the age of AI? What metrics can we use - and how contextual are they?
  • Understand the importance of a tester's perspective and why testers will still be important going forward
Teamlead Testing
From Accounting and Controlling to Software Testing. I love reading and jogging. Love philosophy. Occasional rallye co-driver. Everywhere I go, music needs to be with me.
Suggested Content
Tool Of The Week : Parasoft Selenic image
Selenium tests are often unstable and difficult to maintain. Parasoft Selenic fixes common Selenium problems within your existing Java projects with no vendor lock.
Explore MoT
Episode Eight: Exploring Quality Engineering image
Explore the principles and practices of quality engineering
MoT Foundation Certificate in Test Automation
Unlock the essential skills to transition into Test Automation through interactive, community-driven learning, backed by industry expertise
This Week in Testing
Debrief the week in Testing via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter
We'll keep you up to date on all the testing trends.