Goldens for your evals

25 Jun 2026

In this moment: Swathika Visagn
I am exploring how to evaluate AI agent workflow and just learnt the golden test cases that we provide to any 'LLM evaluation f/w' is crucial.
From a traditional QA mindset, I would think of positive, negative and edge case scenarios. From an AI evaluation mindset, we need to look into this in a data science lens. 

I am thinking what would be a convincing range of data or scenarios that I can feed to the evaluation framework? 
How to shift left and prepare these golden test cases in advance by working closely with SMEs ?
Do these golden test cases be part of a requirement gathering activity than waiting till testing ?
Does it sound meaningful, if I say I evaluated 50 scenarios and the metric score was 80% ?
Can testers come up with 50 different scenarios without being an SME of the domain where the agent will be used ?

Any thoughts and tips ?
Swathika Visagn
Senior Test Engineer at PwC UK

I am a very curious Senior Quality Engineer who is more driven towards automation and promotes shifting left. I have proven experience as an agile tester having strong fundamentals in manual and automation testing principles. I enjoy the entire journey from setting the automation framework from scratch to building the pipelines onto continuous integration tools like Jenkins.

My framework adds more flavor by incorporating service layer (APIs) calls with UI layer automation which we call it 'Seaming' in automation terms. I communicate with stakeholders about risks, accessibility and pain points rather than number of passes/fails.

I test with a purpose by automating business flow and add in appropriate plug-ins to make the automation reports/metrics readable for stakeholders. I also love to take part in agile ceremonies and volunteer to run retrospectives/daily scrums to keep the team self thriving in temporary absence of the scrum master.

When I'm not scripting, I love to binge on Netflix, indulge in testing communities, read about Web3 and all things Quality :-) I am a yogic person too. If anything that calms me that's a cup of chai and a morning walk in the park.

Team Account Member
Sign in to comment
Explore MoT
MoTaCon 2026 image
Thu, 1 Oct
A tech conference to help you navigate the ever-shifting landscape of Quality Engineering, AI, Leadership, Product, Accessibility and Security.
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Into The Motaverse image
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.
Subscribe to our newsletter