Intro to Mutation Testing

Mutation testing is a technique for analyzing the quality and effectiveness of your tests. Despite being very simple to set up (at least in Java) and able to find holes in the test suite, it is relatively little known. It’s the kind of thing you might hear about in passing in the context of automated software tests, but without ever having used it in a real project or even personally knowing anyone who has. Let’s change that! I, too, have only learned about the concept less than two years ago, but have since then used it in real enterprise projects and experimented with it in personal projects. I want to share with you what I learned.

How does mutation testing work?

Mutation testing “tests” your tests by systematically introducing defects (“mutants”) into your code and then seeing if your automated tests catch them. It is not another testing technique for the implementation code itself. The mutants may include things like inverting conditions, negating numbers, replacing arithmetic operators, simply returning empty collections/strings or null, and more. Surviving mutants hint at a possible hole in the test suite, because it means that one or multiple tests executed that line but didn’t catch the defect. Mutation testing only gives meaningful information about lines of code that are actually covered (i.e., executed) by tests in the first place.

Mutation testing is best used strategically

Since the output of a mutation test doesn’t tell where your code has defects (like, e.g., unit tests would do), but which potential defects would not be found by your existing tests, you probably don’t need to immediately react and kill all the mutants by adding additional tests. It’s not a quality gate to block deployments. Instead, it can be worth looking for clusters of surviving mutants and adding (or improving) the tests in those areas. Or if you find that your tests already correctly specify all expected behaviors, the surviving mutants might hint at obsolete code since it doesn’t affect the outcome of any of the tests.

Since mutation testing is also quite slow (orders of magnitude slower than your test suite), it doesn’t belong into your integration pipeline. Running it occasionally over night, like every two weeks or even once a month, is probably sufficient for a large enterprise code base.

TDD helps with both branch coverage and mutation coverage

If you consistently practice TDD (test-driven development) or BDD (behavior-driven development), you incrementally specify each new expected behavior and then only write as much implementation code as necessary to satisfy that executable specification. Although it’s not the main goal, this inherently leads to close to 100 percent code coverage (for most definitions of code coverage, including line and branch coverage). You’re also likely to have higher mutation coverage than if you were writing the tests last. Still, mutation testing can help you discover scenarios and edge cases you didn’t think of earlier, even if you applied ZOMBIES while test-driving your design.

How can I use mutation testing in Java (and other languages)?

Mutation testing is well supported in Java with the PIT open source tool. The setup is very simple (I’ve used it with Maven and Gradle so far) and decently documented, so I won’t go into language- or tool-specific details here. Additional features and Kotlin support is available with a paid version. Many other languages have mutation testing tools as well, although I’ve only tried it in Java using PIT.

While some mutation testing tools rely on changing the source code before each test run (which may be less of a problem for interpreted languages), PIT injects the mutations on the byte code level.