A tester’s role in evaluating and observing AI systems
17 Oct 2025
-
Locked
As more teams build products powered by AI models, testers have a growing opportunity to shape how these systems are evaluated and understood. In this talk, Carlos Kidman shows how testers can apply familiar testing skills to the world of AI, using LangSmith to create manual and automated evaluations, define quality attributes, and observe how AI behaves in development and production.
Through live examples, Carlos demonstrates how to design meaningful tests for non-deterministic systems, measure performance and accuracy, and add value to AI projects from design to deployment. You’ll see that many of the skills testers already have, such as analysis, evaluation, experimentation, and observation, translate directly into testing AI.
Resources
- LangSmith – used in the talk to create evaluations, run experiments, and observe AI behaviour.
- Hugging Face – a platform for open-source models, datasets, and evaluation tools.
- PyTorch – a Python framework for building and training machine learning models.
Test complex APIs and microservices smarter—with confidence.
Explore MoT
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
Boost your career in quality engineering with the MoT Software Quality Engineering Certificate.
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Comments