A tester’s role in evaluating and observing AI systems
17 Oct 2025
-
Locked
As more teams build products powered by AI models, testers have a growing opportunity to shape how these systems are evaluated and understood. In this talk, Carlos Kidman shows how testers can apply familiar testing skills to the world of AI, using LangSmith to create manual and automated evaluations, define quality attributes, and observe how AI behaves in development and production.
Through live examples, Carlos demonstrates how to design meaningful tests for non-deterministic systems, measure performance and accuracy, and add value to AI projects from design to deployment. You’ll see that many of the skills testers already have, such as analysis, evaluation, experimentation, and observation, translate directly into testing AI.
Resources
- LangSmith – used in the talk to create evaluations, run experiments, and observe AI behaviour.
- Hugging Face – a platform for open-source models, datasets, and evaluation tools.
- PyTorch – a Python framework for building and training machine learning models.
Manage your entire QA lifecycle in one place. Sync Jira, automate scripts, and use AI to accelerate your testing.
Explore MoT
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
Boost your career in quality engineering with the MoT Software Quality Engineering Certificate.
Into the MoTaverse is a podcast by Ministry of Testing, hosted by Rosie Sherry, exploring the people, insights, and systems shaping quality in modern software teams.
Comments