Chaos Engineering
Chaos engineering is the practice of deliberately introducing failures to your system to understand how your system responds to different failures that can manifest to your users.
For example, you may want to take down a random API or data service and see the side effect on your users’ applications. If the side effects are impactful and if you have Service Level Agreements (SLA) in place, then this can have a detrimental impact on your organisation.
A type of testing where we basically go around, unplugging things. In the old days, way back in the old days, we use to have physical servers and we'd pull wires out of machines, these days thngs are all hosted. We would switch things off and see what happens.
Does it recover?
Does data get lost?
Have things been queued to transactions?
Does it fail part way through?
Community commentary from Callum during Software Testing Live.
Chaos Engineering is the discipline of exploring your system to see how it handles turbulent conditions in live. It’s a term that was coined by Netflix to see whether their systems and infrastructure could handle infrastructure failures, network failures, and application failures.
Basically Chaos Engineering asks, what would happen if this thing failed?
Chaos engineering is about purposely setting the system to fail in order to test how our system behaves.
Consider following scenario:
- Network is interrupted/gone while a video call is going on.
- Another one, what if user is redirected to 3rd party for payment after he purchases but the payment server is not responding
- What if user tries to upgrade our app but he don’t have sufficient space on his handset
- What if user tries to register but due to high volume registration is not able to complete or dies in between
For these kinds of scenarios which may not occur on a day-to-day basis but can happen once a while for a few sets of users, we need to ensure that our system behaves well.
For above cases, how to test these:
- This scenario can be tested by switching off wi-fi or data plan while a video call is on to see how the system behaves. Or can go to the area which is out of reach of wi-fi
- This scenario can be tested by disconnecting the API call which is responsible for redirecting to 3rd party
- This definitely involves consuming the handset’s space first and then try to upgrade
- This scenario can be tested by purposely killing the query from backend to test it
These are few examples which may not happen for all the users but need to be tested to ensure nothing is breaking.
Smarter testing starts now with Sembi IQ, bringing AI-powered enhancements to TestRail, Xray, and Designwise.
Explore MoT
Tue, 30 Sep
A one-day educational experience to for quality engineering leaders
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
A one-day educational experience to help business lead with expanding quality engineering and testing practices.
Debrief the week in Testing via a community radio show hosted by Simon Tomes and members of the community