I am using clojure.spec
to parse a DSL. Unfortunately, the computation time for testing if something conforms with my spec seems to grow exponentially. I would like to understand why, and how to address it.
This is what the spec looks like:
(spec/def ::settings map?)
(spec/def ::header (spec/spec
(spec/cat :prefix #{:begin-example}
:label string?
:settings (spec/? ::settings))))
(def end-block [:end-example])
(spec/def ::not-end (partial not= end-block))
(spec/def ::end #{end-block})
(spec/def ::block (spec/cat
:header ::header
:data (spec/* ::not-end)
:suffix ::end))
(spec/def ::form (spec/alt :block ::block
:form any?))
(spec/def ::forms (spec/* ::form))
In order to exercise the spec, I wrote a small function that produces valid data for the spec, and a size parameter to control the size of the data:
(defn make-sample-data [size]
(transduce
(comp (take size)
cat)
conj
[]
(repeat [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9])))
(make-sample-data 1)
;; => [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9]
(make-sample-data 2)
;; => [:a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9 :a 1 :b :c [:begin-example "a" {:indent 4}] :d :e [:end-example] 9]
Now I am executing this code:
(dotimes [i 13]
(assert (time (spec/valid? ::forms (make-sample-data i)))))
which produces the following output:
"Elapsed time: 0.077095 msecs"
"Elapsed time: 0.333636 msecs"
"Elapsed time: 0.864481 msecs"
"Elapsed time: 2.198994 msecs"
"Elapsed time: 4.432004 msecs"
"Elapsed time: 9.026142 msecs"
"Elapsed time: 17.709151 msecs"
"Elapsed time: 35.201316 msecs"
"Elapsed time: 73.178516 msecs"
"Elapsed time: 138.93966 msecs"
"Elapsed time: 288.349616 msecs"
"Elapsed time: 569.471181 msecs"
"Elapsed time: 1162.944497 msecs"
It appears to me that for every step of the size parameter, the computation time doubles.
My question is: How can I modify my spec so that validation time is linear with the size of my data?
I'm guessing the performance issue comes from a combination of greedy, branching regex specs with any?
predicates.
The use of any?
in the s/alt :form
regex branch stood out to me. I suppose spec might be evaluating each branch of the s/alt
greedily/exhaustively and then backtracking, and any?
matches everything including values that match your :block
branch. (Notice the spec conforms the same regardless of whether :form any?
branch is defined before or after :block
branch.)
If you can use a more specific predicate than any?
in your top-level s/alt :form
branch, you should see a big improvement. I inlined the spec definitions for brevity:
(s/def ::forms
(s/*
(s/alt :block
(s/cat :header (s/spec
(s/cat :prefix #{:begin-example}
:label string?
:settings (s/? map?)))
:data (s/* #(not= % [:end-example]))
:suffix #{[:end-example]})
:form
(s/nonconforming ;; don't tag results
(s/or :keyword keyword?
:number number?)))))
(time (s/valid? ::forms (make-sample-data 1000)))
"Elapsed time: 84.637513 msecs"
=> true
Note that allowing a collection predicate (e.g. coll?
, vector?
) to that :form
branch degrades performance just like any?
. I suppose this is because the same values are matching both branches of the s/alt
.