I have the following code segment. It works expected.
(use '(incanter.stats))
(defmacro dbg [body]
`(let [x# ~body]
(println "dbg:" '~body "=" x#)
x#))
(defn sorted-map-by-values
"create a map sorted in descending order, first by value, then by key"
[super-map & reverse]
(dbg "Start to sort")
(dbg super-map)
(let [compare-value (if (nil? reverse) 1 -1)]
(into (sorted-map-by
(fn [key1 key2]
(let [val1 (super-map key1) val2 (super-map key2)]
(cond
(= val1 val2) (.compareTo (str key2) (str key1)) ; use string representation of list, to overcome that there is no .compareTo for AarrySeq
(< (dbg val1) (dbg val2)) compare-value
:else (- compare-value)))))
super-map))
)
(def search (clojure.string/split "garbage stuff" #"\s"))
(def candidate (clojure.string/split "stuff" #"\s"))
(sorted-map-by-values (let [pairs-init (for [x search y candidate] [x y])]
(loop [pairs pairs-init distance-map {}]
(if (empty? pairs)
distance-map
(let [pair (sort (first pairs))
updated-map (if (nil? (get distance-map pair))
(merge distance-map {pair (apply incanter.stats/levenshtein-distance pair)})
distance-map)]
(recur (rest pairs) updated-map)))))
true)
But if I replace the last form by the following:
(let [pairs-init (for [x search y candidate] [x y])]
(loop [pairs pairs-init distance-map {}]
(if (empty? pairs)
distance-map
(let [pair (sort (first pairs))
updated-map (if (nil? (get distance-map pair))
(sorted-map-by-values ; <- move sorted-map-by-values to here
(merge distance-map {pair (apply incanter.stats/levenshtein-distance pair)})
true)
distance-map)]
(recur (rest pairs) (dbg updated-map))))))
then I got an error:
java.lang.NullPointerException: null
Numbers.java:961 clojure.lang.Numbers.ops
Numbers.java:219 clojure.lang.Numbers.lt
(Unknown Source) user/sorted-map-by-values[fn]
AFunction.java:47 clojure.lang.AFunction.compare
PersistentTreeMap.java:311 clojure.lang.PersistentTreeMap.doCompare
PersistentTreeMap.java:298 clojure.lang.PersistentTreeMap.entryAt
PersistentTreeMap.java:278 clojure.lang.PersistentTreeMap.valAt
PersistentTreeMap.java:283 clojure.lang.PersistentTreeMap.valAt
RT.java:645 clojure.lang.RT.get
It seemed that the error happened at line:
(< (dbg val1) (dbg val2)) compare-value
The dbg trace is as follows:
Instarepl:
dbg: Start to sort = Start to sort
Instarepl:
dbg: super-map = {(garbage stuff) 7}
Instarepl:
dbg: updated-map = {(garbage stuff) 7}
Instarepl:
dbg: val1 = nil
Instarepl:
dbg: val2 = 7
It does not make sense that when there is only one mapping in a map, the comparator function should not be called. By my tracing the code, it seems that the error actually happened at second iteration of the loop-recur, as the dbg trace of updated-map value shows that the first iteration including returning from sorted-by-map-values was successful, but I was not able to display the second entry to sorted-map-by-values
, it seems that there is unknown another entry to sorted-map-by-values
I guess that sorted-map might be a different type that can not be applied to sorted-by-values again?
Could you shed some light of the strange behavior or I miss some part of Clojure language execution model, something related to lazy-evaluation?
Thanks a lot!
The problem is that distance-map
is a sorted-map which means that any conj will invoke the sort fn. In your case merge
is the one trying to do the conj.
Longer explanation: On the second iteration of the loop the distance-map
is an instance of sorted-map
, which then is merged with {pair (apply incanter.stats/levenshtein-distance pair)}
. Note that this merge
is called before calling sorted-map-by-values
for the second time.
This means that merge
is trying to add to the sorted-map
the pair [(stuff stuff) 0]
, which means that the sort fn of the sorted-map is being called. That sort fn closes over the version of the super-map that was used to create it, which contains only the (garbage stuff)
key, hence the lookup for (stuff stuff)
is nil.