Search code examples
clojurespecifications

I have complex Spec for my data - how to generate samples?


My Clojure spec looks like :

(spec/def ::global-id string?)
(spec/def ::part-of string?)
(spec/def ::type string?)
(spec/def ::value string?)
(spec/def ::name string?)
(spec/def ::text string?)
(spec/def ::date (spec/nilable (spec/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
(spec/def ::interaction-name string?)
(spec/def ::center (spec/coll-of string? :kind vector? :count 2))
(spec/def ::context- (spec/keys :req [::global-id ::type]
                                :opt [::part-of ::center]))
(spec/def ::contexts (spec/coll-of ::context-))
(spec/def ::datasource string?)
(spec/def ::datasource- (spec/nilable (spec/keys :req [::global-id ::name])))
(spec/def ::datasources (spec/coll-of ::datasource-))
(spec/def ::location string?)
(spec/def ::location-meaning- (spec/keys :req [::global-id ::location ::contexts ::type]))
(spec/def ::location-meanings (spec/coll-of ::location-meaning-))
(spec/def ::context string?)
(spec/def ::context-association-type string?)
(spec/def ::context-association-name string?)
(spec/def ::priority string?)
(spec/def ::has-context- (spec/keys :req [::context ::context-association-type ::context-association-name ::priority]))
(spec/def ::has-contexts (spec/coll-of ::has-context-))
(spec/def ::fact- (spec/keys :req [::global-id ::type ::name ::value]))
(spec/def ::facts (spec/coll-of ::fact-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::fulltext (spec/keys :req [::global-id ::text]))
(spec/def ::feature- (spec/keys :req [::global-id ::date ::location-meanings ::has-contexts ::facts ::attributes ::interaction-name]
                                :opt [::fulltext]))
(spec/def ::features (spec/coll-of ::feature-))
(spec/def ::attribute- (spec/keys :req [::name ::type ::value]))
(spec/def ::attributes (spec/coll-of ::attribute-))
(spec/def ::ioi-slice string?)
(spec/def ::ioi- (spec/keys :req [::global-id ::type ::datasource ::features ::attributes ::ioi-slice]))
(spec/def ::iois (spec/coll-of ::ioi-))
(spec/def ::data (spec/keys :req [::contexts ::datasources ::iois]))
(spec/def ::data- ::data)

But it fails to generate samples with:

(spec/fdef data->graph
  :args (spec/cat :data ::xml-spec/data-))

(println (stest/check `data->graph))

then it will fail to generate with an exception: Couldn't satisfy such-that predicate after 100 tries.

It is very convenient to generate spec automatically with stest/check but how to beside spec also have generators?


Solution

  • When you see the error Couldn't satisfy such-that predicate after 100 tries. when generating data from specs, a common cause is an s/and spec because spec builds generators for s/and specs based solely on the first inner spec.

    This spec seemed most likely to cause this, because the first inner spec/predicate in the s/and is string?, and the following predicate is a regex:

    (s/def ::date (s/nilable (s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))))
    

    If you sample a string? generator, you'll see what it produces is unlikely to ever match your regex:

    (gen/sample (s/gen string?))
    => ("" "" "X" "" "" "hT9" "7x97" "S" "9" "1Z")
    

    test.check will try (100 times by default) to get a value that satisfies such-that conditions, then throw the exception you're seeing if it doesn't.

    Generating Dates

    You can implement a custom generator for this spec in several ways. Here's a test.check generator that will create ISO local date strings:

    (def gen-local-date-str
      (let [day-range (.range (ChronoField/EPOCH_DAY))
            day-min (.getMinimum day-range)
            day-max (.getMaximum day-range)]
        (gen/fmap #(str (LocalDate/ofEpochDay %))
                  (gen/large-integer* {:min day-min :max day-max}))))
    

    This approach gets the range of valid epoch days, uses that to control the range of large-integer* generator, then fmaps LocalDate/ofEpochDay over the generated integers.

    (def gen-local-date-str
      (gen/fmap #(-> (Instant/ofEpochMilli %)
                     (LocalDateTime/ofInstant ZoneOffset/UTC)
                     (.toLocalDate)
                     (str))
                gen/large-integer))
    

    This starts with the default large-integer generator and uses fmap to provide a function that creates a java.time.Instant from the generated integer, converts it to a java.time.LocalDate, and converts that to a string which happens to conveniently match your date string format. (This is slightly simpler on Java 9 and above with java.time.LocalDate/ofInstant.)

    Another approach might use test.chuck's regex-based string generator, or different date classes/formatters. Note that both of my examples will generate years that are eons before/after -9999/+9999, which won't match your \d{4} year regex, but the generator should produce satisfactory values often enough that it may not matter for your use case. There are many ways to generate date values!

    (gen/sample gen-local-date-str)
    =>
    ("1969-12-31"
     "1970-01-01"
     "1970-01-01"
     ...)
    

    Using Custom Generators with Specs

    Then you can associate this generator with your spec using s/with-gen:

    (s/def ::date
      (s/nilable
       (s/with-gen
        (s/and string? #(re-matches #"^\d{4}-\d{2}-\d{2}$" %))
        (constantly gen-local-date-str))))
    
    (gen/sample (s/gen ::date))
    =>
    ("1969-12-31"
     nil ;; note that it also makes nils b/c it's wrapped in s/nilable
     "1970-01-01"
     ...)
    

    You can also provide "standalone" custom generators to certain spec functions that take an overrides map, if you don't want to tie the custom generator directly to the spec definition:

    (gen/sample (s/gen ::data {::date (constantly gen-local-date-str)}))
    

    Using this spec and generator I was able to generate your larger ::data spec, although the outputs were very large due to some of the collection specs. You can also control the size of those during generation using :gen-max options in the specs.