Search code examples
clojurelazy-sequences

why is this looping function so slow compared to map?


I looked at maps source code which basically keeps creating lazy sequences. I would think that iterating over a collection and adding to a transient vector would be faster, but clearly it isn't. What don't I understand about clojures performance behavior?

;=> (time (do-with / (range 1 1000) (range 1 1000)))
;"Elapsed time: 23.1808 msecs"
;
; vs
;=> (time (doall (map #(/ %1 %2) (range 1 1000) (range 1 1000))))
;"Elapsed time: 2.604174 msecs"
(defn do-with
  [fn coll1 coll2]
  (let [end (count coll1)]
    (loop [i   0
           res (transient [])]
        (if
          (= i end)
            (persistent! res)
            (let [x (nth coll1 i)
                  y (nth coll2 i)
                  r (fn x y)]
              (recur (inc i) (conj! res r))) 
                  ))))

Solution

  • In order of conjectured impact on relative results:

    1. Your do-with function uses nth to access the individual items in the input collections. nth operates in linear time on ranges, making do-with quadratic. Needless to say, this will kill performance on large collections.

    2. range produces chunked seqs and map handles those extremely efficiently. (Essentially it produces chunks of up to 32 elements -- here it will in fact be exactly 32 -- by running a tight loop over the internal array of each input chunk in turn, placing results in internal arrays of output chunks.)

    3. Benchmarking with time doesn't give you steady state performance. (Which is why one should really use a proper benchmarking library; in the case of Clojure, Criterium is the standard solution.)

    Incidentally, (map #(/ %1 %2) xs ys) can simply be written as (map / xs ys).

    Update:

    I've benchmarked the map version, the original do-with and a new do-with version with Criterium, using (range 1 1000) as both inputs in each case (as in the question text), obtaining the following mean execution times:

    ;;; (range 1 1000)
    new do-with           170.383334 µs
    (doall (map ...))     230.756753 µs
    original do-with       15.624444 ms
    

    Additionally, I've repeated the benchmark using a vector stored in a Var as input rather than ranges (that is, with (def r (vec (range 1 1000))) at the start and using r as both collection arguments in each benchmark). Unsurprisingly, the original do-with came in first -- nth is very fast on vectors (plus using nth with a vector avoids all the intermediate allocations involved in seq traversal).

    ;;; (vec (range 1 1000))
    original do-with       73.975419 µs
    new do-with            87.399952 µs
    (doall (map ...))     153.493128 µs
    

    Here's the new do-with with linear time complexity:

    (defn do-with [f xs ys]
      (loop [xs  (seq xs)
             ys  (seq ys)
             ret (transient [])]
        (if (and xs ys)
          (recur (next xs)
                 (next ys)
                 (conj! ret (f (first xs) (first ys))))
          (persistent! ret))))