Search code examples
clojuregenerative-testing

Using a generative test library in clojure vs build your own using higher order functions


Clojure has a number of libraries for generative testing such as test.check, test.generative or data.generators.

It is possible to use higher order functions to create random data generators that are composable such as:

(defn gen [create-fn content-fn lazy]
  (fn [] (reduce #(create-fn %1 %2) (for [a lazy] (content-fn)))))

(def a (gen str #(rand-nth [\a \b \c]) (range 10)))
(a)

(def b (gen vector #(rand-int 10) (range 2)))
(b)

(def c (gen hash-set b (range (rand-int 10))))
(c)

This is just an example and could be modified with different parameters, filters, partials, etc to create data generating functions which are quite flexible.

Is there something that any of the generative libraries can do that isn't also just as (or more) succinctly achievable by composing some higher order functions?

As a side note to the stackoverflow gods: I don't believe this question is subjective. I'm not asking for an opinion on which library is better. I want to know what specific feature(s) or technique(s) of any/all data generative libraries differentiate them from composing vanilla higher order functions. An example answer should illustrate generating random data using any of the libraries with an explanation as to why this would be more complex to do by composing HOFs in the way I have illustrated above.


Solution

  • test.check does this way better. Most notably, suppose you generate a random list of 100 elements, and your test fails: something about the way you handled that list is wrong. What now? How do you find the basic bug? It surely doesn't depend on exactly those 100 inputs; you could probably reproduce it with a list of just a few elements, or even an empty list if something is wrong with your base case.

    The feature that makes all this actually useful isn't the random generators, it is the "shrinking" of those generators. Once test.check finds an input that breaks your tests, it tries to simplify the input as much as possible while still making your tests break. For a list of integers, the shrinks are simple enough you could maybe do them yourself: remove any element, or decrease any element. Even that may not be true: choosing the order to do shrinks in is probably a harder problem than I realize. And for larger inputs, like a list of maps from vectors to a 3-tuple of [string, int, keyword], you'll find it totally unmanageable, whereas test.check has done all the hard work already.