Search code examples
clojureclojure.spec

Evaluation time of spec/valid? grows exponentially


I am using clojure.spec to parse a DSL. Unfortunately, the computation time for testing if something conforms with my spec seems to grow exponentially. I would like to understand why, and how to address it.

This is what the spec looks like:

(spec/def ::settings map?)

(spec/def ::header (spec/spec
                    (spec/cat :prefix #{:begin-example}
                              :label string?
                              :settings (spec/? ::settings))))

(def end-block [:end-example])

(spec/def ::not-end (partial not= end-block))

(spec/def ::end #{end-block})

(spec/def ::block (spec/cat
                   :header ::header
                   :data (spec/* ::not-end)
                   :suffix ::end))

(spec/def ::form (spec/alt :block ::block
                           :form any?))

(spec/def ::forms (spec/* ::form))

In order to exercise the spec, I wrote a small function that produces valid data for the spec, and a size parameter to control the size of the data:

(defn make-sample-data [size]
  (transduce
   (comp (take size)
         cat)
   conj
   []
   (repeat [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9])))

(make-sample-data 1)
;; => [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9]

(make-sample-data 2)
;; => [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9 :a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9]

Now I am executing this code:

(dotimes [i 13]
  (assert (time (spec/valid? ::forms (make-sample-data i)))))

which produces the following output:

"Elapsed time: 0.077095 msecs"
"Elapsed time: 0.333636 msecs"
"Elapsed time: 0.864481 msecs"
"Elapsed time: 2.198994 msecs"
"Elapsed time: 4.432004 msecs"
"Elapsed time: 9.026142 msecs"
"Elapsed time: 17.709151 msecs"
"Elapsed time: 35.201316 msecs"
"Elapsed time: 73.178516 msecs"
"Elapsed time: 138.93966 msecs"
"Elapsed time: 288.349616 msecs"
"Elapsed time: 569.471181 msecs"
"Elapsed time: 1162.944497 msecs"

It appears to me that for every step of the size parameter, the computation time doubles.

My question is: How can I modify my spec so that validation time is linear with the size of my data?


Solution

  • I'm guessing the performance issue comes from a combination of greedy, branching regex specs with any? predicates.

    The use of any? in the s/alt :form regex branch stood out to me. I suppose spec might be evaluating each branch of the s/alt greedily/exhaustively and then backtracking, and any? matches everything including values that match your :block branch. (Notice the spec conforms the same regardless of whether :form any? branch is defined before or after :block branch.)

    If you can use a more specific predicate than any? in your top-level s/alt :form branch, you should see a big improvement. I inlined the spec definitions for brevity:

    (s/def ::forms
      (s/*
        (s/alt :block
               (s/cat :header (s/spec
                                (s/cat :prefix #{:begin-example}
                                       :label string?
                                       :settings (s/? map?)))
                      :data (s/* #(not= % [:end-example]))
                      :suffix #{[:end-example]})
               :form
               (s/nonconforming ;; don't tag results
                 (s/or :keyword keyword?
                       :number number?)))))
    
    (time (s/valid? ::forms (make-sample-data 1000)))
    "Elapsed time: 84.637513 msecs"
    => true
    

    Note that allowing a collection predicate (e.g. coll?, vector?) to that :form branch degrades performance just like any?. I suppose this is because the same values are matching both branches of the s/alt.