🤩 50% off TestBash Brighton for Pro Members 🤩

How To Build A Performance Testing Stack From Scratch: Establishing Performance Goals

How To Build A Performance Testing Stack From Scratch: Article 1 of 5

by Matt Fleming
Nov 30, 2017
8 min read

Content in review

This is the first instalment of a series which will walk you through designing and building a performance testing stack from scratch.

We will begin with a 5 step plan that will help you to define performance goals for your product. These are goals you want to achieve with your performance testing stack.

Some goals you might want to have for performance could be:

Making sure your product doesn’t slow down as new features are added.
Proving that the latest changes increase the speed of your product.
Understanding how your product performs today because you have no existing data

Understanding what you want to measure, and why, will influence many decisions and trade-offs that need to be made throughout your project.

The Golden Rule of performance is: get used to making trade-offs. They are an integral part of performance engineering and optimisation.

The planning stage is the most important of any project. It will give you a generalized blueprint to follow which allows you to focus on the specific requirements you need for your project. The steps described below, which help define performance testing goals, are applicable to all types of performance testing projects.

Step 1. Identify Stakeholders

Performance tests help you to understand how your product performs for your users. You will most likely need to provide reports on the results of your performance tests, You may also need to know who will be receiving them, how they will be used, and have some way of granting access. Any of these tasks requires identifying the stakeholders.

There are two questions that can help to discover the stakeholders:

Who would want to take action if a performance issue was found?
Who would want to provide input on what to test and how to test it?

You’ll need to figure out which teams would use the test results to influence development practices or plans, and which teams would want to influence the test design. In most situations, this will include the development and product management teams. For a large organization, it might include specific teams which are handling parts of the application, such as in the case of companies which have a large microservice architecture.

Different stakeholders could have different levels of understanding of your data and analysis. Developers might want low-level details so they know what code to change. Business development might want high-level details to understand product and user impact from performance issues. Providing both kinds of details adds complexity to your testing stack because you need to present the same data in different ways, e.g. application logs and graphs. It's a good idea to design a way to view all the details you might need into your stack now, during the planning phase.

Step 2. Identify What To Measure

The stakeholders you have identified from the previous step should help you decide what to measure and what measurements are not necessary or a priority.

Benchmarks To Metrics

Data from performance tests and benchmarks are used to generate metrics (a measurement), and you are interested in measuring those metrics because they provide insight into how users experience your product.

You’ll want to start by gathering benchmarks and metrics which to measure against later for a given application or workload. Talk to other teams to find out what metrics they’re measuring.

Here is a common list of metrics you could use to start the conversation about what to measure when you are collaborating with other teams:

Number of users per day.

Number of concurrent users at any point in time.

Latency, one-way and round-trip: this measures the time to complete an operation in one direction (send database query) and in both directions (send database query and receive result). This metric is most often tracked for UI components as large latencies can make your product feel slow to users.

Throughput: measures the amount of work done in a fixed amount of time. This tracks transaction rate, like downloads over a network, or data transfers to/from disk.

CPU utilisation: measures how busy CPUs are and show how much headroom is available to handle spikes in load (number of users, transactions per user, etc). If your CPUs are 100% busy, any increase in load will severely impact the performance of your product.

Memory usage: tracks how much memory is currently in use, and gives you some idea how much more load can be handled before memory is exhausted. Running out of memory is bad -- at best, your product will experience delays as memory is reclaimed, at worst it will crash.

Google is also a great resource to find out what teams outside of your company are measuring.

Example: Try searching for “key performance metrics for workload”.

Examples of Specific Stakeholder Performance Requests

Here are some stakeholder specific examples which could give you a starting point for a discussion around what kinds of metrics they would like to measure.

UX Designer

If you're working with a team that is building a web services application and the UX designer wants to measure HTTP transaction latency so that you have some idea how long users have to wait for a response to their input.

DBA

A database administrator (DBA) that is responsible for the database layer of a distributed system will want to know how quickly database transactions are being processed. In that scenario, throughput may be the most important metric.

E-commerce Developer

Load testing might be required for E-commerce web applications, and an E-commerce developer might be interested to know the maximum number of users that can use the application.

Most projects measure a combination of metrics, with different ones being important for different scenarios. Like any test suite, you want to keep in mind when it’s appropriate to test certain metrics and when it would be unnecessary overhead to your testing efforts. Don’t try to measure everything.

Make sure every stakeholder agrees on the metrics and that those metrics satisfy their needs or goals.

Step 3. Test Design

Performance tests can be divided into two categories: performance regression testing and performance target testing.

Regression Testing For Performance

To track whether the performance of an application or a product declines, you will want regression tests which target certain metrics to monitor predetermined performance goals which are measured over a period of time. Performance regression testing requires keeping historical data for your tests so that you can pinpoint exactly when performance became worse. To understand why there was a decline in performance, you first need to know when the decline started.

You might not have an idea of what a good performance score is for a given metric. That's OK. Regression tests only care about monitoring whether performance goes backwards, and not whether something is performing at maximum efficiency.

Performance Target Testing

In contrast, having a performance target means you should only be concerned with two things:

The value of your performance metrics today.
Where your metrics need to be to hit the target.

For this type of testing it's not necessary to maintain historical data, though it's a good idea to keep some record of the progress you're making towards your target to show the stakeholders.

It's important to decide whether the target should be a relative or absolute number.

You will need to know that everyone is aiming for the same target. If the number is relative, perhaps the target should be 10 percent of the previous software version. Then everybody involved should agree on the baseline value for the software and what version that baseline value was first met.

If possible, make your performance target an absolute number:

Network packet latency must be under 20 milliseconds

is much easier to understand than:

Network packet latency in V2 must be 10% faster than in V1.

Using the last 3 steps we can now understand how a stakeholder, such as a UX team member, could be alerted whenever the UI latency, from mouse-click to on-screen response, exceeds the current average of 80ms. Since the latency increasing would mean performance is declining (higher latency is bad), a regression test would be the most suitable type of test to use.

Step 4. Measuring Test Success And Failure

Test results can be binary (pass and fail) but often in performance testing results are on a spectrum from acceptable to unacceptable. Some people will care about an 8% performance regression, but not everyone will. This is one of the things that makes performance work so difficult. Without careful consideration, it's not always clear whether performance is better, worse, or the same.

Measuring Performance Targets

In addition to a single value, performance targets often come with an acceptable margin of error, or tolerance, that allows for test results to be slightly different from the target, e.g. network packet latency must be 20ms +/- 100us. A tolerance allows for harmless fluctuations in metric values which can be caused by the complexity of modern software.

A performance test should return success if it is within the range specified by the target and its margin.

Beware of test results that suddenly perform orders of magnitude better than they did previously. Improved performance isn’t automatically good and extreme changes in results could indicate that there is a bug in the test or some other part of the software stack.

Measuring Regression Testing

Things can be more complicated for regression testing. Your ability to correctly detect a regression will impact the stakeholders. You want to provide accurate and reliable information while avoiding bringing trivial (or nonexistent) issues to their attention.

During the planning meetings with stakeholders, you should have discussions to decide for each metric what would classify as a regression.The accuracy of the tests will also be a factor. We'll discuss that more in the Analyzing Results section.

Occasionally, certain metrics can display improvement at the expense of others. For example, code changes to increase the maximum number of concurrent connections to a web server may also increase page load times, potentially causing your page load time regression tests to fail. In those situations, it's important to take a holistic approach when deciding whether failing tests are really failing or should be modified to communicate a change which complements other metrics which have been influenced. Be sure to clearly communicate a holistic view when sharing your results with your stakeholders and get their buy-in establishing a new threshold based on new information.

Step 5. Sharing Results

If you are working on a solo project, or need to share only a simple summary of performance e.g. network latency increased by 5%, you have lots of freedom in how you display your test results. It requires sorting out some nitty-gritty details beyond having a shared understanding of testing needs. As long as everyone, which includes the stakeholders, and members of your team understand how to read and interpret the performance data, anything goes.

If your results are going to be shared outside your team, you need to make them easily understandable at first glance. Performance issues are often looked at when everyone is in a state of panic, either because a customer has complained or senior management have taken notice. Assume the people staring at your results are frantic.

Results can be textual, numeric or visual. Using some combination is the best approach because each one gives you a different level of detail.

Graphs are excellent for seeing trends and patterns in data, and there is a huge range of libraries for generating graphs. Make things intuitive by labelling graph axis, and giving graphs descriptive titles and legends. Here’s an example of a good graph from GitHub’s Engineering Blog:

Even without reading the content of the blog post, it’s possible to understand what test was measured, what data was collected, and the units on each axis.

Numeric results allow you to quickly see the magnitude of differences between two scores. Be sure to always provide the unit of measurement for numerical data - a 7% reduction in latency may seem like a lot, but probably not if it works out at 700 nanoseconds.

If you provide other teams with results it's a good idea to keep old results around, at least for a while, to avoid broken links appearing in bug reports, emails, and pull requests. Labelling results with a date or timestamp greatly simplifies managing multiple result sets and makes it easy to store old and new results in the same place.

What’s Up Next

You have now completed the planning stage. Having worked through each of the 5 steps, you, your stakeholders, and your team have now:

Decided what performance metrics to measure, and why.
Decided whether you’re aiming for a performance target or testing for regressions.
Agreed on how to share the results.

With all of that planning out of the way, you're almost ready to start selecting benchmarks and tests. Let's take a quick detour and discuss the fundamental skill required for all performance engineering: statistics. In the next section we’ll cover:

How to correctly combine multiple test results into a single number.
The different ways to calculate the typical latency value.
How to tell the difference between a genuine change in performance and changes due to random chance.

Reference:

Benchmarking GitHub Enterprise

Next >

Matt Fleming

Matt is a Senior Performance Engineer at SUSE where he’s responsible for improving the performance of the Linux kernel. You can find him on Twitter, and writing about Linux performance on his personal blog.

Like Bookmark

Explore Performance

How To Build A Performance Testing Stack From Scratch: Statistics For Testers

A Quick Start Guide To Learning Performance Testing

Mastering Test Orchestration with Playwright

Allure Report Got More than 3700 Stars at the GitHub

This open-source tool is the #1 Automation Test Reporting Tool loved by the community and the developing team plans to share their knowledge via a learning course. Stay tuned!

Explore MoT

TestBash Brighton 2024

Thu, 12 Sep 2024, 9:00 AM

We’re shaking things up and bringing TestBash back to Brighton on September 12th and 13th, 2024.

Performance Testing 101 - Simon Knight

Get started with performance testing and JMeter