Imagine this: you've been working day and night to get your new and revolutionary feature ready for your customer. You've used TDD, you've tested, the code has been reviewed and your dashboard shows an amazing 99% rule, method and branch coverage. You implement your feature after acceptance and go home feeling good. The next day you arrive at work and - out of the blue - your mailbox is filled with test findings from the client.... what happened? After some analysis of these findings, it appears that the 99% rule and branch coverage did not result in a complete picture of code quality. How is this possible?
What went wrong?
Why can't we rely on our old friends, line coverage and branch coverage? Well, the problem is that high line or branch coverage is not a true indication of the quality of your code. Let's look at three problems with line and branch coverage:
What is missing?
Well, there is no control on calling the perform method so all the side effects of this method (extra parameters being set, context being changed...) are not covered by this test.
Is this test complete?
No, there is no control on the boundary case of a 0 value for the variable "i", while boundary values often have special meaning, can cause separate behavior or even cause unwanted exceptions (divide by 0 etc.).
What is wrong with the two tests above?
The return value of method foo is completely ignored, whereas a return value has explicit meaning (otherwise it would not be used) and can have a direct impact on the result of the process flow. To make it even more concrete, take a look at the test below, a test for the power function. What can go wrong if this is your only test?
Answer: power() can perform an addition, a multiplication, perform y to the power x instead of x to the power y, etc. and still the test would not fail.
What do we need - is there a better way?
A solution to these problems can be found in mutation testing. Mutation testing is about testing your tests, checking the quality of your tests. In mutation testing, you run your tests on slightly modified versions of your source code. Such a modified version of your code is called a mutant. The end game is: get all mutants - or as many as possible - killed. Killing a mutant means that at least one of your tests fails on this mutation.
Mutation testing is a pretty old idea, it was conceived in the 1970s but was not very mainstream until recently because a huge amount of mutants can be generated from your code. This makes mutation testing very resource intensive. Today, because computers are extremely powerful, the concept is becoming increasingly popular.
How does it work?
How does a mutation test work? A tree structure of your source code is created and a mutation can be applied in all the nodes that contain conditions, constant values for variables, etc. These can include removing or ignoring a condition, for example.
Unit tests are performed for all mutations generated. Once a test fails, the mutant is killed and that test ends. In this way, all mutants are processed and at the end you get a summary of the percentage of mutations killed.
A tool like PIT can use this outcome to generate a detailed report showing the quality of your tests by class, package, etc.
Types of mutators
What are the different types of mutators? Here you can see the main types:
You have - as already mentioned - the mutations on conditions, in addition you have mutations on math operators or logical operators. Looking at return values, you can replace a boolean true with false, ignore a numeric value, etc., and - in the case of collections - you can make a mutation that returns an empty collection instead of the intended collection.
Do we need perfection?
Do we need perfection? No, this is not necessary in all cases. As you can see in the screenshot showing the generation of an informational message, mutation tests fail because there is no test for the constructed message. Is this a test we absolutely need? Maybe not...
Additional benefits
What has improving mutation coverage given us other than being able to get off work without fear of the next day😊?
Well, we discovered a few bugs that hadn't been reported, we added tests that covered unforeseen but likely scenarios. We also removed dead code, unnecessary checks, etc. So just running the process provided value. In addition, our tests became more robust and complete. So I would say it's a no-brainer to add this type of testing to the development process.
Just Java?
Is mutation testing only available for Java development? No, not at all, here you will find some examples for a whole range of programming languages and frameworks:
A well-known mutation testing framework is Stryker, which is available for Javascript, Typescript, .NET and Scala.
References: