Many software development organisations simply tolerate buggy staging environments: few think it’s a big deal. If you have worked in software development, chances are you’ve come across a staging environment in a state similar to the average person’s first car. Most of us promise to take care of the problems when we first bought the car, but in reality the car stays just as it was when we bought it! Wing mirror broken, tail light flickering, but just about still moving. We convince ourselves these minor issues are not important, and too expensive to fix.
I’d like to explain precisely why these 'minor' issues are going to cause your engineering teams some major problems. In fact, they probably are causing trouble already. I’m then going to give you some tips on how to persuade your company to direct some resources towards fixing these problems at the root. Finally I will provide you with ideas on actions you can take to ensure you are successful in your quest to clean up your staging environment — and to keep it clean!
Staging Environments: A Few Basic Facts And Realities
When I talk about “staging environments,” this means any development or testing environment that engineers are expected to use to carry out their daily responsibilities. This includes developing new features, regression testing, running automated tests, and so on. You may use a different term at your company.
Staging environments that are low-quality, unreliable, and poor replicas of production introduce subtle faults into your software development life cycle. They most likely will be encountered by your development team, and maybe someone will bring them up a few times in a retrospective. But no action is ever taken to address them. You have higher priorities; maybe the person who knows how to fix it no longer works at the company. In any event, there's some reason why you can't deal with it now. After all, it's not blocking anyone. This environmental bug is then accepted as a fact of using the staging environment.
- “Hi folks, I’ve noticed this page doesn't load on our staging environment… this doesn't happen in production though. Do we know about this? 🤔"
- “Oh lol it just does that 😂don't worry about it!”
This is where anti-quality culture starts to get baked into your processes and your team’s mindset.
The three main faults that begin to develop are:
- Anti-quality culture
- Higher risk of production issues
- Hidden costs in your development processes
What is anti-quality culture?
Anti-quality culture is team-wide acceptance of low quality ways of working.
If a quality culture is striving for quality for its own sake, then anti-quality culture is the opposite. Your team becomes slowly desensitised to seeing unexpected behaviour in your staging environment. They start to assume (or to suspect) that all unexpected behaviour present is due to the environment being neglected instead of genuine issues. Engineers get used to seeing these issues, learn to work around them, and work more heedlessly as a result. This is an anti-quality culture, which leads directly to defect fatigue. Defect fatigue occurs when we are used to seeing bugs that we develop cognitive bias and assume unrelated bugs are environmental, and change our behaviours as a result, potentially with negative results.
Before defect fatigue sets in, good engineers will diligently spend time investigating these issues as they crop up. You might notice this in the form of group chat messages, complaints in meetings, or points raised in retrospectives. Your leads or managers may not initially deem them important enough to fix. But what is not immediately obvious is how much time was wasted on investigating the issue before the team dismissed it as environmental.
Consider these scenarios. They might be happening now within your engineering department:
- An engineer is working on an unfamiliar part of the codebase when they see that suddenly part of the page isn’t loading as it should. They want to work independently, and decide to look into it themselves before burdening others. They spend an hour debugging their code, assuming their changes caused the problem, and then the code spontaneously starts working again. They later enquire and find that it’s a common intermittent issue they just weren’t aware of.
- An engineer has finished working on a change and they quickly check the branch before sending it to QA. They are experienced in their field and are fully aware of all the quirks in the software. They notice some strange UI defects, but quickly conclude it’s unlikely to be their changes … due to the environment being consistently buggy. They send the task over to QA and it promptly gets sent back as QA confirms that it’s a genuine issue.
Over time, the engineer in the first scenario is likely going to get tired of investigating these environmental bugs, either consciously or unconsciously. And eventually, the diligent engineer described in the first scenario becomes the experienced but mistaken engineer in the second scenario!
Defect fatigue occurs when your engineers repeatedly investigate bugs or unexpected behaviour so often that they get mentally drained. And after a while, the repetitive investigations cease altogether. It takes strong self-awareness and self-control for an engineer to investigate these bugs over and over when in all likelihood it’s just wasting time to do so.
The reality is: most of us don’t have the time to investigate environmental bugs, so we rely on these quick cognitive biases to save time. This can lead to either concluding they are environmental issues rightly…or wrongly.
How Anti-Quality Culture Affects QA Engineers
Anti-quality culture can cause low morale in QA engineers. They are quite literally Quality Assurance, and if they are surrounded by bugs and expected to work around it, it makes it a lot harder to do their jobs, and frankly it’s demoralising.
Defect fatigue can affect QA engineers too, and in the role where the main responsibility is looking for bugs, this is a dangerous trait to acquire. I will explore this more in the next section.
QA engineers are likely the most frequent users of staging environments: it is where we run our automated tests and perform exploratory testing. If your company does not invest time into ensuring that QA can fulfil their roles without constantly encountering environmental issues, they may look for greener grass elsewhere. Or, in another common scenario, your remaining QA engineers are working at lower productivity and using less of their potential than they otherwise might do, which is just sad.
Higher Risk Of Production Issues
If your staging environment is rife with bugs, here’s what can happen during testing. A QA engineer might regularly encounter a small environmental bug. Although small, these bugs do very much affect your users, albeit indirectly, if the side effects of these environmental bugs impact your operational bottom line.
First, the engineer has to figure out whether the bug occurs because there’s a fault in the staging environment, or whether it truly is present in the code. This takes valuable time.
Or, if they make the decision not to investigate an issue that is suspected to be environmental, this can have consequences. The reality is: QA doesn’t always get it right, and that’s understandable! If a QA engineer has a deadline due in an hour causing them stress and they encounter what is ‘likely’ an environmental issue, there’s a high chance they will allow the change to move forward either because they believe it’s not a genuine issue or they are so fatigued that they didn’t register it as an issue. QA engineers are not infallible.
Suddenly those tiny bugs, or bugs related to the areas they inhabit, are live in production, and your product’s users are raising bug reports. If you have ever seen obvious bugs in production that ‘should’ have been caught, it’s worth considering if environmental issues that masked a real defect were a contributing factor.
Ask yourself: If your team has a defect in their staging environment, how would they know when the product code breaks for real?
Hidden Costs In Your Development Processes
But it’ll cost so much to fix these…
Bugs in your staging environment are already costing your company money. Consider how much your environment costs to run on an annual basis. This cost is fixed and does not change if your environment is infested with bugs. Your company continues to pay exactly the same money for the environment even if your engineers find it difficult to use and a detriment to their productivity.
If a QA engineer frequently spends time investigating and confirming these issues are environmental, this is wasted time. They could have used that time automating, exploring, or challenging requirements. Instead they were looking into a problem that has probably been investigated hundreds of times by other engineers in your company. At what point has a company spent more money through their employees collectively looking into tiny bugs than it would take to fix them?
If every QA engineer repeatedly has to debug the staging environment, and every new QA engineer repeatedly asks the same questions about the same broken areas, this affects their productivity. If QA spends hours investigating whether a test failure is genuine or simply environmental instead of testing new features, this affects their productivity too. A QA engineer’s productivity is vital in providing value to users. QA is often the last step before release, if they have their time wasted by environmental bugs, it means users receive new features slower and they receive less value from the app. It’s just not as obvious as the worst-case scenario: your product’s customers finding bugs and submitting bug reports.
Some QA engineers in their tenacity tend to maintain high standards by using different environments on a case by case basis. If they are testing an area which is knowingly problematic in one environment, they will use a different environment if they have the opportunity. This means that parts of these environments your company spends money on maintaining are unusable, and this is very wasteful.
Following on, the increased time it takes engineers to investigate these issues (if they do) also costs your company money too. And finally, if your engineers don’t investigate the issues, in all likelihood eventually a production issue will occur. Suddenly this has cost your company much more money than it would have if the environmental issues had been resolved in the first place.
The Staging Environment Must Be Fixed. How Do I Sell This To The Company?
Your team has voted on fixing the staging environment. Great. But when will that get done, and how much will it cost? And how do you make sure you don’t end up in the same situation a year from now?
If you work in software development, chances are your product roadmap is chockablock for the next five financial years. It's tricky simply to make room for holiday, and it's even more difficult to squeeze in fixing these 'low priority' bugs into your busy sprints. So it's a good question: how will you sell this to your team?
Let's reiterate the problems identified. Habitually accepting low quality environments creates a culture which is antagonistic to quality. These low quality environments make it more difficult for engineers to work effectively, causing frustration, low morale, and lowered productivity. This can lead to employee churn. It can even result in production bugs which cost the company a lot of money.
For your manager to be on board, as well as knowing why your team should spend time fixing your staging environment, you need a plan.
This is where we get creative, because there are so many answers to this question! First, if you've realised that an anti–quality culture has taken hold of your engineering team, the first action to take is to acknowledge it as a problem. Using the points set out previously, you can work to persuade the wider team that these small environmental bugs have wide ripple effects on your software development life cycle. Encourage the team to look out for environmental issues, and instead of accepting them as a fact of life, give them an outlet to start cataloguing them in a similar way to production bugs. (If you don't already do this, start logging production bugs as soon as you can!)
After accounting for the problem, knowing which bugs exist and where is necessary so you have an idea of the scale of the problem. This will inform your next step, which is choosing the ideal approach for dealing with them.
The ideal approach to fixing the environmental bugs will vary widely depending on your personal situation. Some ideas:
- Do a one-off hackathon where multiple teams compete to fix the most environmental bugs
- Feed the environmental bugs fix tickets into each sprint in a trickle pattern over time to avoid disruption to regular sprint work
- Fix the environmental bugs in the limbo period between sprint retrospective and planning. As long as you're getting even one point of environmental bug fixes into every sprint, that is progress and better than nothing!
I would again encourage you to make your team feel accountable for this. If you can set a goal of fixing x% of environmental bugs in y sprints, and that goal is achieved in a couple of consecutive sprints, this will help cement fixing these bugs as a project to be taken seriously. You can monitor this goal in your sprint ceremonies, technical roadmap meetings, or regular other engineering huddles or all hands gatherings. Start talking about it!
Some Final Advice
I recommend treating your engineers like you would a priority stakeholder, and setting some service level agreements in relation to environmental bugs. Ensure environmental bugs and any associated processes you implement are communicated and documented. If done right, this will allow your team to keep on top of the bugs if and when they appear.
Of course this is easier said than done. In the beginning, you may be the only person who believes your engineering team should spend time on this. I believe through persuasion, showing passion, and using the points made in this article, you should be able to convince others to think the same.
If you think environmental bugs are a problem, it shows that you innately strive for things to be better, and that is something to be proud of! You recognise that tiny issues build up to become massive problems for you and your team. Help others to see the same thing and to do something about it.
For Further Information
- How To Make Your Test Environment Work For You, by Erol Selitektay
- Changing Testing Culture in a Ginormous Company, by Jim Holmes
- Become A Better Tester By Becoming A Better Critical Thinker, by Naveen Bhati
- Mapping Biases to Testing, by Maaike Brinkhof
- Avoiding The Million Dollar Question: How Did The QA Team Miss This Defect?, by Venkat Ramesh Atigadda