A tester’s role in evaluating and observing AI systems thumbnail

A tester’s role in evaluating and observing AI systems

As more teams build products powered by AI models, testers have a growing opportunity to shape how these systems are evaluated and understood. In this talk, Carlos Kidman shows how testers can apply familiar testing skills to the world of AI, using LangSmith to create manual and automated evaluations, define quality attributes, and observe how AI behaves in development and production.

Through live examples, Carlos demonstrates how to design meaningful tests for non-deterministic systems, measure performance and accuracy, and add value to AI projects from design to deployment. You’ll see that many of the skills testers already have, such as analysis, evaluation, experimentation, and observation, translate directly into testing AI.

Resources

  • LangSmith – used in the talk to create evaluations, run experiments, and observe AI behaviour.
  • Hugging Face – a platform for open-source models, datasets, and evaluation tools.
  • PyTorch – a Python framework for building and training machine learning models.
Carlos Kidman .pdf

Comments

Sign in to comment
Explore MoT
Leading with AI - The London Edition image
Fri, 19 Jun
A half-day educational experience to navigate the world of AI
MoT Software Quality Engineering Certificate image
Boost your career in quality engineering with the MoT Software Quality Engineering Certificate.
This Week in Quality image
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter