Mysteries and Mathematics of the Test Pyramid

By Brendan Connolly

A Tale of Triangles

It may be referred to as the Test Automation Pyramid, but it looks an awful lot like a triangle in most depictions. If you use the dimensions of The Great Pyramid of Giza and the mathematical equations discussed in this article, you’ll end up with a greater understanding of the role and dependencies of each layer of your testing pyramid, and the importance of building strong foundations.

Image By Mike Cohn source: Mountain Goat Software

By treating the test automation pyramid as a triangle we can use elements of geometry and trigonometry to find the size of each level. To figure this out, we'll start out by breaking the pyramid down into 3 separate triangles. We will determine the area of each triangle, then use a slicing technique to determine each level's size.

The first step we need to do is find the total area for a triangle using these dimensions from The Great Pyramid of Giza:

Image source: The Great Pyramid of Giza

Base Length

Height

Base Angle

230 meters 147 meters

51.5 degrees


Using those dimensions we can find the total area for the triangle that makes up one side of the pyramid.

Area = ½ (230 * 147) ≈ 16905

Working from the top (UI level) down we can find out how large each level is and what percent of the whole pyramid it captures.

UI Level

Dividing the height of the great pyramid evenly into 3 sections means our top section, the UI layer, is 49 meters tall. Now we can use some trigonometry and the Pythagorean Theorem to find the area of this triangle, to see the details of the mathematics involved I have documented that process in a separate blog post.

Doing the math we find that the area of the UI layer of the pyramid is 1909.4 or about 11% of the total pyramid.

Service Level

Using the same process to find the area of the middle layer, we find that the service layer of the pyramid is  5726.76 or about 33% of the total pyramid.

Unit Level

To find the area of the unit layer we subtract the total area of the service and UI layers from the total area of our test pyramid

16905 - 5726.76 - 1909.4 = 9268.84

The unit layer of the pyramid is about 56% of the total pyramid.

Level

UI

Service

Unit

area

1909.4

5726.76

9268.84

% of total

11.11%

33.21%

55.68%

Considering More Layers

Some test pyramids you may find have more than 3 layers. To get a sense of how more layers affects the amount of area UI testing should represent in your testing strategy. The results from using the same mathematical process for 4 and 5 equal layers are below:

4 levels

UI

Component

Integration

Unit

Great Pyramid

6.3%

18.7%

31.3%

43.7%

Equilateral

6.3%

18.8%

31.3%

43.7%

45-45-90

6.18%

18.79%

31.15%

43.88%

Since there has consistently been only minor variations between the different triangle types for F layers it seems pretty safe to draw conclusions based on the rounded results of the triangle based on the Great Pyramid.

5 levels

UI

API

Component

Integration

Unit

Great Pyramid

4%

12%

19.8%

28.2%

36%

Putting Pyramid Percentages into Practice

People understand the top and bottom of the pyramid. We can agree what a unit test is, and we can mostly agree what a UI or end-to-end test is.

The middle section(s) is where there is decidedly more variation in both the types of tests and the underlying definitions of the tests being referenced. If we consolidate the numbers for the 3, 4, and 5-layer test automation pyramid into three ranges for unit, UI and the stuff in between we can start to see an informative metric.

Unit

Gooey Center

UI

36-55%

33-60%

4-11%

Unit testing has many benefits; it's widely known and accepted as the foundation of a test automation effort. The numbers support this, showing 36-55% of test automation should be at the unit level.

But, what the numbers also highlight is that the “gooey center,” or, the service level that so many folks are often uncertain about, should be 33-60% of test automation. That amount is roughly equal to, or potentially even larger than the unit test level.

That leaves 4-11% of test automation for the UI level. If the UI level occupies 4-11% of test automation, and the numbers tell us that generally the unit and service level tests are of about equal size,then a reasonable distribution of test automation based on the testing pyramid would be roughly:

Unit

Gooey Center

UI

44.5-48%

44.5-48%

4-11%

When it comes to putting this into practice, what do these percentages actually represent? For the triangles we've been using, unit lengths are measured in meters, and the area is square meters. What would useful units be for test automation?

Quantity

I bet that “quantity of tests” is what you were thinking.  It's probably what most people think of when looking at the test automation pyramid. Even without going through all the mathematics, you visually get the idea that as you move up the levels in the pyramid you should have fewer and fewer tests.

What this technically means is that for every 100 tests added you should have about 45-48 unit tests, 45-48 service tests, and 4-11 UI / end-to-end tests. Think about that for a minute. How does that fit with your mental model or the current practices on your team?

It's very common to see over-testing at the UI level. In fact it's probably one of the primary reasons people actually cite the test automation pyramid. We know UI tests are expensive and are often brittle or flaky. What the numbers also highlight is there is a good chance over-testing is occuring at the unit level, while under-testing is occuring at the service level.

It's even easier to over-test at the unit level. Unit tests are cheap, fast and reliable. Many teams are often chasing a code coverage metric. When this happens, you may not feel the problem of over-testing at the unit level until the problem is likely much bigger. Builds that used to take seconds or minutes have crept up to 30 minutes, an hour, or even more. It can also manifest itself as developer frustration when small changes or refactoring result in excessive time spent updating failing tests. 

The quality of your unit tests is as important as that of your UI level tests. Like Goldilocks and the Three Bears, we want our tests not too big, and not too small, but just right. Putting more focus on the tests in the middle level of the pyramid can help accomplish this.

Time

In addition to quantity, a fairly constant topic of conversation is how much time should be dedicated to automation.

Rather than quantity of tests, what if, instead, the test automation pyramid was a heuristic about the amount of time that teams should spend writing and maintaining automation at each level?

For a given 40-hour work week, this pencils out to about 18 hours writing and maintaining unit level tests, about 18 hours writing and maintaining service level tests, and, being generous given the numbers, about 4 hours writing and maintaining UI level tests.

It might sound a bit off at first. However, good development practices are going to involve pretty consistent attention and time dedicated to adding and maintaining tests as code is added, fixed, and refactored. A developer spending a little less than half their time attending to unit tests and some service layer tests sounds pretty reasonable.

Assuming that QA or a tester on the team is responsible for some portion of service layer tests and the UI tests, it works out to roughly a quarter to a third of their time being dedicated to automation. This may be low, especially if the team is new to automation or the project is just starting and some framework or infrastructure needs to be built out. Once that build out has taken place that amount of time feels pretty reasonable.

By placing more focus on the tests at the higher end of the service layer spectrum, and introducing only a select few end-to-end UI automated tests, you are left with a more robust and reliable set of tests. This focus reduces the footprint of time required to support and maintain higher-level tests, because more tests than this become a burden that cannot be sustained. The time constraint helps guide us toward only allowing tests that characterize business-critical functionality to be added at the UI level.

Automation is written to boost productivity. Having to spend more time than what these numbers show may be a sign that there is some deeper issue lurking.

Is there some technical debt the team has accrued where the application or code base has become challenging to test? Is there a deficiency in test infrastructure that is making tests unreliable? Are there too many tests at some levels and/or too few tests at others?

It could be a variety of issues, but if it feels like an excessive amount of time spent on automation is required, it's an indication that maybe the team needs to take a step back. Take the opportunity to come together as a team and question why so much time is necessary, and then acknowledge and hopefully create a plan to address any issues.

Effort

Rather than focus on a specific metric like how many tests or how much time to spend on test automation, let’s instead take a page out of sprint planning and look at these numbers similar to the way story points are used in estimation on agile projects.

A story point is a unitless measure that is used to understand the relative size of something. You can get a sense of this when looking at the test pyramid. Assign the top level as an arbitrary size, then estimate each of the other levels’ sizes in relation to that. 

By using these numbers from the test pyramid as a heuristic for the expected effort a team should put towards automation, we are aligning with how feature work is estimated. This isn't an invitation to start estimating automation work separately from feature work. When estimating that effort, you want to include any automation effort in your estimate. Instead, what we have is a means to compare how much effort should be spent collectively on automation that is in direct alignment with the standard way that teams are already estimating work.

It's important to remember to view each level of testing as part of a larger whole, where each layer acts as a foundation for the levels above it. Instability and deficiencies at lower levels will undermine the integrity of the entirety of your test strategy. The math we've done gives a starting place for relative comparison so you can start having deeper conversations about test strategy across roles on teams.

Author Bio

Brendan is a Test Automation University instructor, recipient of the EuroStar Rising Star Award, and frequent conference speaker. He is focused on creating and executing testing strategies that enable quality outcomes while using his coding powers for developing tooling to make testing and testers lives easier. You can find him on Twitter.