Test Oracle

Test Oracle image
A reference point used to decide whether a test has passed or failed. For LLM testing this becomes unreliable because multiple valid outputs can exist for the same input.

So what? The absence of a stable oracle is one of the central challenges of AI testing. Techniques like metamorphic testing exist partly to work around it by checking consistency rather than correctness.

Example: "Who was the first president of the USA?" has a clear oracle. A summarisation request does not. 

"The 'expected result' can be determined, but will always have some ambiguity. Comparing it to the 'actual result' won't be as straightforward as you are used to." — Demi Van Malcot
Explore MoT
MoTaCon 2026 image
Thu, 1 Oct
A tech conference to help you navigate the ever-shifting landscape of Quality Engineering, AI, Leadership, Product, Accessibility and Security.
Advanced prompting for testers image
Advanced prompting skills to turn AI into your trusted testing companion.
This Week in Quality image
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter