How To Build A Performance Testing Stack From Scratch: Running Tests

How To Build A Performance Testing Stack From Scratch: Running Tests

How To Build A Performance Testing Stack From Scratch: Article 4 of 5

Content in review

The other sections in this series will likely consume most of your time when building your performance testing stack. But this section, though small, is just as important. Now we’re getting down to some of the hidden dangers of running performance tests. Even seasoned professionals can make these mistakes.

Running tests incorrectly might lead you to draw the wrong conclusions from your performance data, missing a performance regression when one exists, or thinking performance has changed when it hasn’t.

The following suggestions will help you to avoid these kinds of problems.

1. Use A Performance Framework

Whether you use an existing performance test framework or write your own, it’s important that you use one. A framework handles running tests in a repeatable way. There are many framework and performance libraries available (some were covered in the last section).

Whichever you choose, they should provide the following features.

1.1 Make Test Configuration Consistent

The most important thing about running tests is that they are run with the same parameters on every invocation. This means that those parameters should be explicitly stated in a runner script or a test configuration file, even if the default parameters chosen by the test author are suitable for your environment.

The reason it’s a bad idea to rely on the default parameters is that they can change between software releases. Upgrading your test might cause it to behave differently. Besides, using explicit parameters encourages you to document them thoroughly, which spreads knowledge across the whole team.

1.2 Run Each Test Multiple Times

Tests should be run multiple times with the same parameters so that summary statistics can be used, e.g. minimum, maximum, and mean. The more data that is available, the more accurate any statistics will be.

More data also makes it easier to distinguish outliers; since outliers are uncommon, you’ve more chance of seeing the typical values for your test if you run more iterations.

2. Ensure The Test Duration Is Consistent

Some complex benchmark tools will try and improve the test result accuracy on noisy test systems by changing the number of test-internal loop iterations at runtime. The idea being that if there are things happening on your system that might interfere with the test, such as running many tasks, lots of network traffic, or frequent device interrupts, you need to collect more data to stop outliers skewing the results.

While this might sound good in theory, changing the number of loop iterations also changes the total duration of the test. Test duration is the simplest indicator for understanding whether performance has changed, i.e. if the test completes quicker than previously, then performance has improved. If the duration is not guaranteed to be consistent, assuming no software change, then you cannot use it as a performance metric.

Alternatively, the duration might vary from run-to-run if the test uses services that take a variable amount of time to execute, e.g. memory allocation services. Though the test duration varies unintentionally, in this case, it is still not possible to use the duration as a performance metric.

3. Order Tests By Duration

The different types of benchmarks (see Performance Tests and Benchmarks) usually have correspondingly different test durations. Nano benchmarks complete fastest (usually measured in nanoseconds or microseconds), micro benchmarks take a little longer (milliseconds to seconds), and macro benchmarks take the most amount of time (minutes to hours).

So that you can get performance data feedback as quickly as possible, you might want to stage your tests in order of estimated duration, from shortest to longest.

4. Keep Reproduction Cases Small

It’s much easier to investigate performance issues if they have a small reproduction case. Small tests not only complete faster than larger tests, they are easier to analyse too because they are less complex.

If a particular issue only shows up only with a macro benchmark, you could consider writing a smaller test that exhibits the same performance issue.

Your test environment may also be a factor, so it’s worth honing in on the specific environment conditions required to trigger the issue.

5. Setup The Environment Before Each Test

Before a test starts, it’s sometimes surprising how the state of the test system or network can affect the results. For example, if a test has run recently, test data can be cached in memory (not requiring a read from disk) or Operating System buffers might not have been freed yet. This can have a dramatic effect on the test results.

Performance tests should configure the test environment before starting a run, and restore it to its initial state, or tear it down, after. It’s important that the state of your test environment, before the test run, resembles your production environment as closely as possible. That way your test results will indicate the performance your users will see.

It is also possible to run performance tests on production machines, though there is a real risk of disrupting their operation and your users. Nevertheless, some performance issues might only show up in production.

6. Make Updating Tests Easy

As was covered in Performance Tests and Benchmarks, bugs are often found in the tests themselves. Periodically monitoring the upstream projects for new releases and bug fixes, which could require you to make adjustments to the tests, is a good idea.

When new releases are available, you need to be able to easily update the existing test and regenerate just the dependent results. Tests should be capable of running individually, not just in all-or-nothing suites.

7. Errors Should Be Fatal

It’s a common issue in performance testing that misleading results can be generated because errors in the test went unnoticed. Hitting errors does not always cause a test to fail, and in fact, sometimes the results can look normal.


Performance test errors are frequently caused by misconfiguration. For example, a HTTP benchmark may encounter 404 errors when testing connections to a web server due to misconfiguration. Because a 404 is a legitimate error, and because triggering it requires establishing a HTTP connection, it’s easy to miss that the benchmark is testing the wrong thing.

If you’ve written your own tests, check that the data you send and receive is what you expect. Popular benchmarks often include statistics on the number of errors encountered while the test was running; do not ignore those error statistics.

What’s Up Next

Following the advice above should help you to generate accurate test results and avoid many of the common hazards of performance testing. They’ll help you to keep your tests up-to-date, and make sure you get correct results quickly.

Next, we’ll look at how to interpret those test results and performance data, including how to decide when a change in performance should be considered significant.

Matt Fleming's profile
Matt Fleming

Matt is a Senior Performance Engineer at SUSE where he’s responsible for improving the performance of the Linux kernel. You can find him on Twitter, and writing about Linux performance on his personal blog.

Explore MoT
Managing Distributed QA Teams: Strategies for Success
In an era where remote teams have become the norm, mastering the art of managing hybrid and distributed QA teams is more crucial than ever
MoT Foundation Certificate in Test Automation
Unlock the essential skills to transition into Test Automation through interactive, community-driven learning, backed by industry expertise
This Week in Testing
Debrief the week in Testing via a community radio show hosted by Simon Tomes and members of the community