A tester’s guide to AI guardrails
Identify, test and improve AI guardrails through a structured, scenario-based framework that addresses common implementation failures and attack patterns.
AI systems introduce a new kind of risk.
They don’t just fail; they produce plausible, unsafe, or misleading outputs while appearing correct.
Guardrails are used to control these behaviours. But in most systems, they are:
- vaguely defined
- poorly tested
- incorrectly implemented
This session introduces a structured approach to learning AI guardrails using an interactive, scenario-based format.
Participants will work through short exercises that reflect real testing challenges:
- Identifying types of guardrail failures
- Determining when a guardrail should trigger
- Recognising common attack patterns
- Improving weak system prompt rules
- Spotting implementation-level issues
By the end, participants will have a practical framework to test AI systems more systematically.
Learning outcomes
- Understand the various categories of AI guardrails
- Practice a set of practical techniques to test AI guardrails
- A clearer understanding of where set guardrails fail in real systems