100% code coverage is near-meaningless - but is there a good measure to use?

@[email protected] · 2 years ago

100% code coverage is near-meaningless - but is there a good measure to use?

@[email protected] · edit-2 2 years ago

~~Pit~~ Mutation testing is useful. It basically tests how effective your tests are and tells you missed conditions that aren’t being tested.

For Java: https://pitest.org

Edit: corrected to the more general name instead of a specific implementation.

@[email protected] · 2 years ago

The most extreme examples of the problem are tests with no assertions. Fortunately these are uncommon in most code bases.

Every enterprise I’ve consulted for that had code coverage requirements was full of elaborate mock-heavy tests with a single Assert.NotNull at the end. Basically just testing that you wrote the right mocks!

@[email protected] · 2 years ago

That’s exactly the sort of shit tests mutation testing is designed to address. Believe me it sucks when sonar requires 90% pit test pass rate. Sometimes the tests can get extremely elaborate. Which should be a red flag for design (not necessarily bad code).

Anyway I love what pit testing does. I hate being required to do it, but it’s a good thing.

@[email protected] · 2 years ago

Yeah. All the same. Create lazy metric - get lazy and useless results.

@[email protected] · 2 years ago

This is really interesting, I’ve never heard of such an approach before; clearly I need to spend more time reading up on testing methodologies. Thank you!

@[email protected] · 2 years ago

I’d never heard of mutation testing before either, and it seems really interesting. It reminds me of fuzzing, except for the code instead of the input. Maybe a little impractical for some codebases with long build times though. Still, I’ll have to give it a try for a future project. It looks like there’s several tools for mutation testing C/C++.

The most useful tests I write are generally regression tests. Every time I find a bug, I’ll replicate it in a test case, then fix the bug. I think this is just basic Test-Driven-Development practice, but it’s very useful to verify that your tests actually fail when they should. Mutation/Pit testing seems like it addresses that nicely.

@[email protected] · 2 years ago

We are running the above pi tests with an extra (Gradle based) build plugin so that it only runs mutations for the changed lines in that pull request. That drastically reduces runtime and still ensures that new code is covered to the mutation test level we want. Maybe something similar can be done for C or C++ projects.

@[email protected] · 2 years ago

I’m currently working on a C++ project that takes about 10 minutes to do a clean build (Plus another 5 minutes in CI to actually run the tests). Incremental builds are set up, and work quite well, but any header changes can easily result in a 5 minute incremental build.

As much as I’d like to try, I don’t see mutation testing being worthwhile for this project outside of maybe a few isolated modules that could be tested independently. It’s a highly interconnected codebase, and I’ve personally reviewed (or written) every test, so I already know they’re of fairly high quality, but it would be nice to be able to measure.

robotdna · 2 years ago

Does something like this exist for Python?

v_krishna · 2 years ago

https://mutatest.readthedocs.io/en/latest/

robotdna · 2 years ago

Oh sweet! This introduced a whole new world to me. Also seeing mutmut, is one better than the other?