How quality is created, maintained and lost in complex software systems
-
Locked
The July 2024 CrowdStrike outage was one of the most significant software incidents in recent memory. In this talk, Jitesh Gosai uses the event as a case study to explore what happened, why it was so disruptive, and what it reveals about how quality is created, maintained and lost in complex sociotechnical systems.
Jitesh examines the incident from multiple perspectives, showing why traditional root cause analysis often fails to explain large-scale failures and how we can instead learn from these events to build resilience into our systems. He connects lessons from the outage to the broader practice of quality engineering, showing how studying real-world incidents can help teams build healthier systems and make quality a shared responsibility.
Resources
- Jit's Slides, Prezi
- Crowdstrike Mass Global IT Outage, Aj Wilson, MoT article
Comments