Search code examples
functional-programmingscalacheckproperty-based-testing

Does non-deterministic nature of property-based testing hurt build repeatability?


I am learning FP and got introduced to the concept of property-based testing and for someone from OOP world PBT looks both useful and dangerous. It does check a lot of options, but what if there is one (or some) options that fail, but they didn't fail during your first let's say Jenkins build. Then next time you run the build the test may or may not fail, doesn't it kill the entire idea of repeatable builds?

I see that some people explored options to make the tests deterministic, but then if such test doesn't catch an error it will never catch it.

So what's better approach here? Do we sacrifice build repeatability to eventually uncover a bug or do we take the risk of never uncovering it, but get our repeatability back?

(I hope that I properly understood the concept of PBT, but if I didn't I would appreciate if somebody could point out my misconceptions)


Solution

  • Doing a lot of property-based testing I don’t see indeterminism as a big problem. I basically experience three types of it:

    1. A property is really indeterministic b/c some external factor - e.g. timeout, delay, db config - makes it so. Those flaky tests also show up in example-based testing and should be eliminated by making the external factor deterministic.

    2. A property fails rarely because the triggering condition is only sometimes met by pseudo random data generation. Most PBT libraries have ways to reproduce those failing runs, eg by re-using the random seed of the failing test run or even remembering the exact constellation in a database of some sort. Those failures reveal problems and are one of the reasons why we’re doing random test cases generation in the first place.

    3. Coverage assertions („this condition will be hit in at least 5 percent of all cases“) may fail from time to time even though they are generally true. This can be mitigated by raising the number of tries. Some libs, eg quickcheck, do their own calculation of how many tries are needed to prove/disprove coverage assumptions and thereby mostly eliminate those false positives.

    The important thing is to always follow up on flaky failures and find the bug, the indeterministic external factor or the wrong assumption in the property‘s invariant. When you do that, sporadic failures will occur less and less often. My personal experience is mostly with jqwik but other people have been telling me similar stories.