Search code examples
clojuresortedmap

Clojure: Erratic behavior of sorted-map-by in the context of merge and recur operation?


I have the following code segment. It works expected.

(use '(incanter.stats))

(defmacro dbg [body]
      `(let [x# ~body]
         (println "dbg:" '~body "=" x#)
         x#))

(defn sorted-map-by-values
  "create a map sorted in descending order, first by value, then by key"
  [super-map & reverse]
  (dbg "Start to sort")
  (dbg super-map)
  (let [compare-value (if (nil? reverse) 1 -1)]
    (into (sorted-map-by
           (fn [key1 key2]
             (let [val1 (super-map key1) val2 (super-map key2)]
               (cond
                (= val1 val2) (.compareTo (str key2) (str key1)) ; use string representation of list, to overcome that there is no .compareTo for AarrySeq
                (< (dbg val1) (dbg val2)) compare-value
                :else (- compare-value)))))
          super-map))
  )

(def search (clojure.string/split "garbage stuff" #"\s"))

(def candidate (clojure.string/split "stuff" #"\s"))

(sorted-map-by-values (let [pairs-init (for [x search y candidate] [x y])]
                            (loop [pairs pairs-init distance-map {}]
                              (if (empty? pairs)
                                distance-map
                                (let [pair (sort (first pairs))
                                      updated-map (if (nil? (get distance-map pair))
                                                    (merge distance-map {pair (apply incanter.stats/levenshtein-distance pair)})
                                                    distance-map)]
                                  (recur (rest pairs) updated-map))))) 
                          true)

But if I replace the last form by the following:

(let [pairs-init (for [x search y candidate] [x y])]
  (loop [pairs pairs-init distance-map {}]
    (if (empty? pairs)
      distance-map
      (let [pair (sort (first pairs))
            updated-map (if (nil? (get distance-map pair))
                          (sorted-map-by-values ; <- move sorted-map-by-values to here
                           (merge distance-map {pair (apply incanter.stats/levenshtein-distance pair)})
                           true)
                          distance-map)]
        (recur (rest pairs) (dbg updated-map))))))

then I got an error:

java.lang.NullPointerException: null
            Numbers.java:961 clojure.lang.Numbers.ops
            Numbers.java:219 clojure.lang.Numbers.lt
            (Unknown Source) user/sorted-map-by-values[fn]
           AFunction.java:47 clojure.lang.AFunction.compare
  PersistentTreeMap.java:311 clojure.lang.PersistentTreeMap.doCompare
  PersistentTreeMap.java:298 clojure.lang.PersistentTreeMap.entryAt
  PersistentTreeMap.java:278 clojure.lang.PersistentTreeMap.valAt
  PersistentTreeMap.java:283 clojure.lang.PersistentTreeMap.valAt
                 RT.java:645 clojure.lang.RT.get

It seemed that the error happened at line:

(< (dbg val1) (dbg val2)) compare-value

The dbg trace is as follows:

Instarepl:  
dbg: Start to sort = Start to sort
Instarepl:  
dbg: super-map = {(garbage stuff) 7}
Instarepl:  
dbg: updated-map = {(garbage stuff) 7}
Instarepl:  
dbg: val1 = nil
Instarepl:  
dbg: val2 = 7

It does not make sense that when there is only one mapping in a map, the comparator function should not be called. By my tracing the code, it seems that the error actually happened at second iteration of the loop-recur, as the dbg trace of updated-map value shows that the first iteration including returning from sorted-by-map-values was successful, but I was not able to display the second entry to sorted-map-by-values, it seems that there is unknown another entry to sorted-map-by-values

I guess that sorted-map might be a different type that can not be applied to sorted-by-values again?

Could you shed some light of the strange behavior or I miss some part of Clojure language execution model, something related to lazy-evaluation?

Thanks a lot!


Solution

  • The problem is that distance-map is a sorted-map which means that any conj will invoke the sort fn. In your case merge is the one trying to do the conj.

    Longer explanation: On the second iteration of the loop the distance-map is an instance of sorted-map, which then is merged with {pair (apply incanter.stats/levenshtein-distance pair)}. Note that this merge is called before calling sorted-map-by-values for the second time.

    This means that merge is trying to add to the sorted-map the pair [(stuff stuff) 0], which means that the sort fn of the sorted-map is being called. That sort fn closes over the version of the super-map that was used to create it, which contains only the (garbage stuff) key, hence the lookup for (stuff stuff) is nil.