Every QA tool vendor will tell you AI generates perfect tests. After building an AI test generation platform - and dogfooding it on our own codebase - I can tell you what actually happens.Â
This talk is a practitioner’s honest debrief. I’ll walk through two years of running multi-model AI against real web apps: what produces usable tests, what produces confident-looking garbage, and where the failure modes hide.
Specifically, I’ll cover:
- Why reading code isn’t enough - AI generates plausible tests from source, but they fail on real UIs. Crawling the live app changes everything.
- The selector problem - LLMs reach for brittle CSS selectors by default. How to force better strategies without prompt-engineering every call.
- Assertions that rot - AI loves asserting exact text and prices. Why your generated suite breaks on the first content change, and how to catch it before CI does.
- Multi-model routing - no single model wins at everything. What we learned running GPT-4o, Claude, and Gemini on the same flows.
- Self-healing in practice - the gap between “it healed” and “it healed correctly.”
Better than a generic video, see YOUR test, live, ready to show you what matters most: quality at scale.
Explore MoT
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
Unleash the power of generative AI to boost your software testing and day-to-day tech tasks
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Comments