Quality Statements for LLMs: The Good, The Bad and The Ugly
-
Locked
AI as a buzzword is everywhere. It will steal our jobs, make us all obsolete and in the end: It will rule the world. We've been experiencing a shift in paradigms for two years and, most prominently, Large Language Models like LLaMA, ChatGPT or BARD are re-shaping industries and our everyday lives.
Using a Co-Pilot for Coding or Testing is seen as enhancing production and lowering barriers to entry.
But now that the uses of these LLMs are increasing rapidly:
- Who is testing them?
- And what actually is Quality in the age of AI?
In this talk, I want to provide results from my experience in projects of testing Large Language Models and regressive AI. I will explain the high-level function of a Large Language Model.
I will translate the components of a Copilot onto a newly thought testing pyramid from the component level to the system level. Now that we have a sort of framework to test LLMs, I will outline the metrics used and why testers will still be needed in the age of AI - maybe even more than ever.
Comments