Reading:
Software testers unite: Let's build a great tool and community for combinatorial testing!
MoT Professional Membership image
For the advancement of software testing and quality engineering

Software testers unite: Let's build a great tool and community for combinatorial testing!

Adopt combinatorial testing, a technique you may never have heard of but should consider using daily.

Software testers unite: Let's build a great tool and community for combinatorial testing!  image

The test technique you didn't know you needed

Imagine that you need to test a browser-based application that will be supported in a single browser on a single operating system. It contains only three input fields, all of which are dropdown lists. No mischief via unexpected text input is possible. 

Piece of cake, right? Hold my iced chai. Each of the three dropdown lists contains 30 choices. And another thing: the application's purpose is to facilitate healthcare for people with serious health problems. And, the kicker: you have exactly one week to test the app, submit defects, and verify bug fixes before the application has to be deployed to production. 

To get things going quickly, you ask ChatGPT to give you all possible combinations of values across the three parameters. ChatGPT provides exactly what you ask for, and the resulting dataset is thousands of rows long. 

In despair, you run to the nearest donut shop and stay there for a while. And two days later, although you've since left the donut shop, you are nowhere nearer a reasonable number of test cases than you were when you started.

What is combinatorial testing, and why do we need a good way to do it?

There is a solution to this problem, and it is called combinatorial testing. Practical combinatorial testing uses a test set of "just enough" combinations of values across several parameters. A good-enough combinatorial test set, known as a covering array, is likely to uncover defects that even the most carefully curated functional testing cannot match. 

To be much more precise, a covering array is a sample or subset of all possible values across your variables. The sample contains each possible pair (two-way, or pairwise) of values, or triplet (three-way), and so on up to a power of six or seven.  The research folks speak of the "power" as the variable t. So if you look at the research, you will read statements like

 "For some value of t, testing all t-way interactions among n parameters will detect nearly all errors."

And that's the "money" quote here. Any covering array generated with what is known as the IPOG (In-Parameter-Order-General) algorithm has immense bug-finding power AND is much smaller in size than the set of test cases represented by all possible combinations. The IPOG algorithm was developed by researchers at the University of Texas at Arlington, the United States National Institute of Standards and Technology (the US NIST), and George Mason University in Virginia. We should give these folks a lot of credit for introducing a reliable and efficient way to generate reasonable numbers of test cases from what seem like vast data sets. 

What does a covering array look like?

To understand what a covering array does, it is best to look at examples. 

Let's say that you have five Boolean variables named A, B, C, D, and E. Each variable has two possible values: 0 and 1. 

I asked ChatGPT to give me all possible combinations of values for five Boolean variables. Here's the result, which is 32 rows in size.

A B C D E

0 0 0 0 0

0 0 0 0 1

0 0 0 1 0

0 0 0 1 1

0 0 1 0 0

0 0 1 0 1

0 0 1 1 0

0 0 1 1 1

0 1 0 0 0

0 1 0 0 1

0 1 0 1 0

0 1 0 1 1

0 1 1 0 0

0 1 1 0 1

0 1 1 1 0

0 1 1 1 1

1 0 0 0 0

1 0 0 0 1

1 0 0 1 0

1 0 0 1 1

1 0 1 0 0

1 0 1 0 1

1 0 1 1 0

1 0 1 1 1

1 1 0 0 0

1 1 0 0 1

1 1 0 1 0

1 1 0 1 1

1 1 1 0 0

1 1 1 0 1

1 1 1 1 0

1 1 1 1 1

You may be saying to yourself, "32 rows? That isn't all that big a set of test cases." I assure you that the number of rows multiplies like bunnies if you introduce simply one more value, even for just one or two variables. 

What does a pairwise (two-way) covering array for our five variables look like? Nice and compact: only six rows. Each possible set of values across each TWO columns is represented in this set. The hyphens in the last two rows indicate that you could use either a 0 or a 1 in that column, since all possible pairs of values for the first two columns are represented in the first four rows. 

A B C D E

0 0 1 0 0

0 1 0 1 1

1 0 0 1 0

1 1 1 0 1

- 1 0 0 0

- 0 1 1 1

And what does a three-way covering array look like for that same set of variables? Only 11 rows, far fewer than the 32 rows for "all possible combinations." Each possible combination of values for each group of THREE columns is represented. 

A B C D E

0 0 0 1 0

0 0 1 0 1

0 1 0 0 0

0 1 1 1 0

1 0 0 0 1

1 0 1 0 0

1 1 0 1 1

1 1 1 0 1

1 0 1 1 1

1 1 0 1 0

0 1 0 1 1

You can see examples of covering arrays for dozens of similar scenarios at an archive maintained by the US NIST. I explain how to use those sets in the For more information section below. 

I'm already doing pairwise testing. Isn't that good enough?

Answer: Maybe! But read on.

To reiterate, pairwise testing, which you may have heard of, tests each unique PAIR of values across a set of parameters or configuration options. So it is a subset of the world of combinatorial testing, which can examine each unique triplet, quadruplet, and so on of the possible values. 

Pairwise testing is an excellent option with significant improvement in bug-finding power over traditional functional testing. Research shows again and again that most bugs result from the combination of a particular pair of values across two parameters. What's more, you can use one of a few good tools to generate covering arrays of pairwise values, some of which are free of charge, like Pairwise Pict

However, pairwise testing does not yield the bug-finding bounty that greater powers of combinatorial testing can produce. Study after study shows that a shockingly high number of system failures arise from the interaction of two parameter values. Failures due to three-way interactions can significantly add to that list. For example, tests run on the Mozilla browser showed that 70 percent of system failures were due to the interaction of two different parameters. That number jumped another 20 percent, to 90 percent, when three-way interactions were considered. (See page 54 of this research paper.) 

Combinatorial testing is not just a good idea. Sometimes it's an ethical obligation.

As indicated above, three-way combinatorial testing can identify a significant number of bugs that pairwise testing cannot, and it is far more effective than functional testing at uncovering defects. If you are testing an application where safety or security is paramount, you need those couple of percentage points of increased test effectiveness over pairwise testing. A missed defect could literally be fatal. Combinatorial testing is the test technique that is most likely to give your team that essential edge.

Here's an infamous example of a case where pairwise testing would not have been enough. The Therac-25 radiation therapy device caused devastating injuries to some people. The key trigger of the deadly defect? A certain combination of keystrokes was entered by the radiologist who operated the device. (You can find a link to the history of the Therac-25 in the For more information section below. I have content-warned it because of the descriptions of severe injuries.) 

Another example, in the security area, is cryptographic hash algorithms and their implementations. A few years ago, US NIST researchers developed and ran combinatorial test sets on the candidates for the new SHA-3 message encryption standard. In doing so, their combinatorial test sets uncovered bugs in several years-old cryptographic hash implementations that no other testing had ever found. Moreover, the development of effective algorithms to generate good covering arrays has made the generation of good test sets relatively straightforward. 

Yay! Sign me up! Where's the tool that will generate good combinatorial test sets for me?

Sadly, it's not that easy. When I first started drafting this article, I thought such a tool existed and was readily available. Turns out the truth is more complicated. 

Before I don my verbal Gromit face, let the record reflect that the US NIST researchers and developers and their colleagues in academia have done an absolutely smashing job, over decades, of priming the pump for combinatorial testing to become a standard practice. I strongly recommend you read a few of the multiple case studies published on the US NIST site

As they say in US criminal courts, the case for making combinatorial testing a regular part of software testing practice has been established "beyond a reasonable doubt." 

  • This is especially true for applications and devices that require high standards of safety and security. 
  • The technique lends itself very well to testing combinations of values that can be represented by long strings or bitstrings. 

The best things in life are free… sometimes

The US NIST researchers developed a tool with a graphical user interface (GUI) to generate covering arrays. It's a standalone Java JAR file that can run on any operating system that supports Java GUIs. It's called the ACTS tool, and it is billed as capable of generating reliable covering arrays for up to six-way combinations of values across parameters. 

As of July 22, 2025, the ACTS tool, with its user guide, is available online. The US NIST offers several other combinatorial testing tools free of charge on this page

Wallace would be waving his hands with joy right now! What about Gromit? Well, a few funny things happened as I worked on this article.

Cartoon characters Wallace and Gromit. Wallace, clad in white shirt, red tie, and green vest, is smiling and waving at the camera. He has his left arm around the shoulders of his dog Gromit, who is looking up at him with skepticism and annoyance.

US politics is playing magic tricks on us

A few weeks ago, the current US regime apparently mothballed the US NIST agency. Poof! Around July 21 or so, I literally wrote in a draft of this very article that the ACTS tool was no longer available online.

And, just like that, as of July 22 or so, the site was back up and running. Poof again! Well, guess what, the ACTS tool is downloadable again. Or at least it is downloadable today, July 31. 

Does this on-again, off-again scenario ring any tester bells? Even if the US NIST research information on combinatorial testing is trustworthy, as it appears to be, would YOU trust the US NIST website as a source of information you could consistently rely on? 

Inconsistent test set generation results from the ACTS tool

I must admit that I didn't see this one coming. 

With an initial set of data, I had promising results. What I mean by this is that when I instructed the ACTS tool to produce a three-way covering array, it generated far fewer test cases than ChatGPT did when I prompted it to generate all possible combinations for the same dataset. 

But then, when I tried the same comparison with a different data set… ACTS gave me the same number of test cases as ChatGPT. Spectacular fail.

Using a comparison of test results based on a heuristic as your test oracle is known as metamorphic testing. And my heuristic was "the covering-array test set should be much smaller than the all possible values test set."

So I was stuck. When I get stuck these days, I like to reach out to other folks if they're available.

Lack of a savvy community 

So who in the software testing world is savvy re: the ACTS tool and combinatorial testing? Do I hear crickets? I had no community to turn to for answers on what apparently went wrong with the ACTS tool. I'm not a math major, so on my own, I can't tackle the question of what the algorithm SHOULD have produced. 

Again, what a difference a day makes: the single tester who has apparently given talks on the ACTS tool just accepted my LinkedIn invite yesterday. But that's just one tester. Even if he's the Albert Einstein of combinatorial software testing, one person is not a community. A back-and-forth interaction with a single person is not sustainable for building a community of practice around combinatorial testing. 

Closed-source code and no way to submit issue reports 

The ACTS tool relies on closed-source code. Also, I just looked at the website again (July 2025), and I see no option to submit bug reports. 

Verdict: Right now, there aren't any reliable tools for combinatorial test set generation. 

I can't personally vouch for the ACTS tool's ability to generate reliable covering arrays. I say this based on what I saw in my own testing of the tool. 

What about ChatGPT? Great idea, but ChatGPT itself told me it can handle only pairwise data set generation. 

You can find a couple of individual Python projects that purport to generate covering arrays of three-way and higher combinations of parameter values. They are poorly documented and scarcely used. I can't recommend them to anyone except perhaps a skilled developer. I've included links to these projects in the For more information section below. 

And there's the JMP statistical discovery tool, which also purports to offer covering array generation. If you can get past the $1,000-plus price tag and the statistics-speak on the website, have at it. 

Why haven't the big testing toolmakers jumped on the combinatorial testing train?

Good question. I have a few hypotheses.

Nerd alert: academic-speak incoming

The language of the US NIST research literature appears to be aimed at people in academia or fellow researchers, preferably with maths backgrounds. Elevator speeches in PowerPoint format? Fun five-minute TikToks? Dummies books? Sadly, these simply do NOT yet exist for this type of testing. 

However, MoT DOES offer a course series on practical pairwise testing, authored by Venkat Ramakrishnan! Check out the link in the ’For more information’ section below. As you can see, the beginnings of a great community are underway. 

Generating "good" covering arrays is easier for some types of test scenarios than for others

Research shows that the covering-array technique lends itself beautifully to parameters whose values can be represented by strings and bitstrings. 

Regular UI testing, however, is not quite as easy to represent in that format as, say, the results of an implementation of a cryptographic hash algorithm.

Many of the US NIST research papers offer solutions to this conundrum. Refer to the links in the For more information section below. 

Frequent lack of an obvious "oracle" for test results in many cases

If you have 1,000 tests to run, will you have specific expected results for each one? And how do you figure out which parameter combination is causing the failure? This is a stumbling block, but it's not an impassable one. Again, the US NIST research papers offer guidance, including using a comparison of test results as an oracle.

So what should we testers do to make combinatorial testing a standard practice?

It will take an energetic and devoted community. 

We need people who can

  • Reckon with the algorithms for generating covering arrays
  • Write user stories that keep in mind the average tester, NOT computer scientists or math majors
  • Develop solid user interfaces that testers can understand and put to use quickly
  • Design a tool that is compatible with at least the most common desktop operating systems. Linux too, at least Ubuntu: Windows and Mac only is not acceptable for many of us. 
  • Document simply and elegantly how to use the tool 
  • Make the code open-source and maintain it
  • Write good tests for the tool and run them in CI / CD pipelines
  • Answer questions promptly on a forum

I don't want much, do I? Maybe I should wish on a star.

Fact is, such communities exist and in greater numbers than we might think. And no one is getting paid a ton of VC or shareholder money to support them.

Active communities and the tools they love 

Qubes OS

The Qubes OS operating system, which is free of charge, offers what is billed to be a far more secure computing environment than the Big Three (Windows, Mac, and, yes, Linux too).

 A Qubes OS devotee from another community of mine, Metafilter, says: 

“The main reason I bothered to invest in trying out Qubes is the same reason why I've enjoyed it so much: the documentation is amazingly thoughtful, thorough, and consumable, striking an excellent balance between prescriptiveness and reasonable assumptions about the readers' existing knowledge and skills, while enabling a community to easily share experiences and consume the benefits of that sharing. Had it been otherwise, I wouldn't have bothered; as intrigued as I was by the technology, at this point in life, I no longer have the time or patience for faffing around trying to figure out an already-solved problem just because some technologist can't be bothered to explain their work or make it accessible. Qubes' documentation is top notch.”

I quote him at length because he makes several key points about solid user communities and the great tools that they can produce.

Linux

Try Qubes OS first, since it's far more secure. But, if that doesn't work out, perhaps due to older hardware, consider trying Linux at home. Really. I recommend Manjaro or Debian, and I hear good things about Puppy Linux and a few other distributions. Free. Of. Charge.

I'm not a computer scientist, nor am I particularly quick to learn anything with frequently hidden or silently defaulted configuration options. Even so, here I am, nearly 20 years after my first Ubuntu install at home, happily running a backed-up and fully restorable Manjaro instance on 10-year-old hardware. 

This is not to pat myself on the back. I make the point just to say that it's doable for a lot of people. And that's largely because so much solid information is available via web or forum search. 

Posthog 

MoT's own Hanisha Arora tells us that she loves the Posthog analytics platform!  She says that she used to use Google Analytics to see what users of her organisation's product were doing. But the Google Analytics 4 release saw a falloff in usefulness. Then Hanisha's manager suggested Posthog to her. 

  • She reports being able to learn how to use the product on her own thanks to its structured design. 
  • She senses that the design excels thanks to well-thought-out user stories: "everything in it was connected as if it knew where I would go next." 
  • Their documentation is top-notch. 
  • Finally, when bugs arise, and they always do, the company has been "quite proactive in sorting them."

Ministry of Testing, naturally!  

Where do you turn when you have any kind of question about software testing? Twenty years ago, you had to rely primarily on your coworkers, many of whom were just about as educated (or not) in good software testing practice as you. Things are different today. 

And there's a new feature rolled out on the MoTaverse just about every week. For example, the new Observatory gives community members the opportunity to add links to news of interest. And you can now add video and microblog posts to MoT Memories. So does the Ministry of Testing develop tools? I would say so. 

To sum up 

Research consistently shows that combinatorial testing can find bugs more effectively than any other technique.  

Based on that research, I believe that the software and hardware testing communities are ethically obliged to make combinatorial testing a standard, well-known practice, just like exploratory testing, performance testing, and so forth. This is especially true when safety and security are at risk. 

As testers, we have more influence within our community than we used to. Let's use it to build a simple and elegant set of tools for combinatorial testing, as well as a community to support it. 

And, if you really want to try using the ACTS tool despite what I've written above, but the user interface or user guide makes you want to cry (and they might), give me a shout on the MoT Slack or at the email address in my MoT profile. I can share the steps I took to set up the tool. 

And have a look at how people responded to my query about great tools and their communities, which I posed here: The beginning of a beautiful friendship.

For more information

Browsing the resources of the US NIST

Combinatorial Methods for Trust and Assurance from the US NIST is a treasure trove of information on combinatorial testing. I will say again that their contributions to this field have been invaluable. Please nerd out there often and give them a shoutout if you wind up doing any work in this area. 

Understanding the US NIST's example covering arrays

To use the US NIST's archive of example covering arrays, knowledge of a little lingo is necessary. Just remember, in the world of these examples, that k equals the number of variables in your data set, v is the number of values per variable, and t is the power of the covering array (2 for pairwise, 3 for three-way, and so on). 

If you download one of the data sets where v is greater than 2, you will see that the set of possible values is a list of integers beginning with 0. For example, if you download a set where v = 5, you will see that the possible values include 0, 1, 2, 3, and 4.

You might want to see how much time you would save with a covering array over the set of all possible combinations. If so, just ask ChatGPT to generate all possible combinations of values for your scenario. 

  1. Let's say that you just downloaded from the US NIST archive a three-way covering array for six variables with three values each. 
  2. Open the download to verify that the possible values for each of the six variables are 0, 1, and 2. 
  3. Then, ask ChatGPT to generate all possible combinations of values for six variables, each of which can be equal to 0, 1, or 2. 

Combinatorial testing in the quality engineering context

Academic research

Tools

  • tijohask / Covering_Arrays: Python project that, according to its README, "generates a covering array based on the input parameters"
  • JMP: a commercial tool designed for statisticians to generate covering arrays
  • Pairwise Pict: online pairwise test set generator, free of charge
Wordsmith, Ministry of Testing
At Ministry of Testing, I review article submissions and help our writers present their ideas in the clearest way possible. Qué aprovechéis y disfrutéis lo que ofrecemos en Ministry of Testing!
Comments
Sign in to comment
MoT Professional Membership image
For the advancement of software testing and quality engineering
Explore MoT
Introducing TestRail 9.5: AI That Truly Understands Testing image
Thu, 18 Sep
See how easy it is to achieve 90% faster test case creation with TestRail's new intelligent test case generation tool
MoT Software Testing Essentials Certificate image
Boost your career in software testing with the MoT Software Testing Essentials Certificate. Learn essential skills, from basic testing techniques to advanced risk analysis, crafted by industry experts.
Leading with Quality
A one-day educational experience to help business lead with expanding quality engineering and testing practices.
This Week in Testing image
Debrief the week in Testing via a community radio show hosted by Simon Tomes and members of the community
Subscribe to our newsletter
We'll keep you up to date on all the testing trends.