A tester’s role in evaluating and observing AI systems

17th October 2025
  • Locked
Carlos Kidman's profile
Carlos Kidman

Senior Quality Architect

A tester’s role in evaluating and observing AI systems image
Talk Description

As more teams build products powered by AI models, testers have a growing opportunity to shape how these systems are evaluated and understood. In this talk, Carlos Kidman shows how testers can apply familiar testing skills to the world of AI, using LangSmith to create manual and automated evaluations, define quality attributes, and observe how AI behaves in development and production.

Through live examples, Carlos demonstrates how to design meaningful tests for non-deterministic systems, measure performance and accuracy, and add value to AI projects from design to deployment. You’ll see that many of the skills testers already have, such as analysis, evaluation, experimentation, and observation, translate directly into testing AI.

By the end of this session, you'll be able to:

  • Describe a tester’s role in evaluating and observing AI systems.
  • Explain how to design and run manual and automated tests for AI models using LangSmith.
  • Identify useful metrics and evaluation techniques for assessing AI quality.
  • Apply observability tools to monitor AI performance and behaviour in production.
Senior Quality Architect
He/Him
Carlos is a Senior Quality Architect and AI Engineer. He is the founder of QA at the Point, but is best known for his hands-on courses and presentations on using AI and testing AI systems.
Sign in to comment
Suggested Content
Explore MoT
This Week in Quality image
Share your week’s highlights, challenges, and lessons in quality
MoT Software Quality Engineering Certificate image
Boost your career in quality engineering with the MoT Software Quality Engineering Certificate.
This Week in Quality image
Debrief the week in Quality via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter
We'll keep you up to date on all the testing trends.