🎉 Buy MoT Professional Membership, get MoTaCon free! 🎉

A tester’s role in evaluating and observing AI systems

Senior Quality Architect

17 Oct 2025

Locked

As more teams build products powered by AI models, testers have a growing opportunity to shape how these systems are evaluated and understood. In this talk, Carlos Kidman shows how testers can apply familiar testing skills to the world of AI, using LangSmith to create manual and automated evaluations, define quality attributes, and observe how AI behaves in development and production.

Through live examples, Carlos demonstrates how to design meaningful tests for non-deterministic systems, measure performance and accuracy, and add value to AI projects from design to deployment. You’ll see that many of the skills testers already have, such as analysis, evaluation, experimentation, and observation, translate directly into testing AI.

Resources

LangSmith – used in the talk to create evaluations, run experiments, and observe AI behaviour.
Hugging Face – a platform for open-source models, datasets, and evaluation tools.
PyTorch – a Python framework for building and training machine learning models.

Carlos Kidman .pdf

Comments

Explore MoT

Beyond API Mocking: Modern Service Virtualization for Distributed Systems

Wed, 29 Jul

Join us for a demo of Parasoft Virtualize and see how teams eliminate testing delays with realistic API mocks and virtual services that simulate both synchronous and asynchronous system behavior.

MoT Software Quality Engineering Certificate

Boost your career in quality engineering with the MoT Software Quality Engineering Certificate.

19 Nov 25

Certification

This Week in Quality

Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community

A tester’s role in evaluating and observing AI systems

Resources

Comments

When AI writes the code, test environments matter

Beyond API Mocking: Modern Service Virtualization for Distributed Systems