How quality is created, maintained and lost in complex software systems
-
Locked
Principal Tester

Talk Description
The July 2024 CrowdStrike outage was one of the most significant software incidents in recent memory. In this talk, Jitesh Gosai uses the event as a case study to explore what happened, why it was so disruptive, and what it reveals about how quality is created, maintained and lost in complex sociotechnical systems.
Jitesh examines the incident from multiple perspectives, showing why traditional root cause analysis often fails to explain large-scale failures and how we can instead learn from these events to build resilience into our systems. He connects lessons from the outage to the broader practice of quality engineering, showing how studying real-world incidents can help teams build healthier systems and make quality a shared responsibility.
By the end of this session, you'll be able to:
- Describe what the CrowdStrike outage reveals about how quality is created, maintained and lost in complex systems.
- Explain why root cause analysis can limit understanding of large-scale failures.
- Identify practices that help build resilience and reduce the impact of future incidents.
- Recognise the role of quality engineering in studying and improving sociotechnical systems.