performance gain while using core.reducers

I tried the below to compare the performance of core/map vs transducers vc core.reducers/map vs core.reducers/fold -

    (time (->> (range 10000)
            (r/map inc)
            (r/map inc)
            (r/map inc)
            (into [])))

;; core.reducers/map
;; "Elapsed time: 3.962802 msecs"


(time (->> (range 10000)
            vec
            (r/map inc)
            (r/map inc)
            (r/map inc)
            (r/fold conj)))

;; core.reducers/fold
;; "Elapsed time: 3.318809 msecs"


(time (->> (range 10000)
            (map inc)
            (map inc)
            (map inc)))

;; core/map
;; "Elapsed time: 0.148433 msecs"



(time (->> (range 10000)
            (sequence (comp (map inc)
                         (map inc)
                         (map inc)))))

;; transducers
;; "Elapsed time: 0.215037 msecs"

1) My expectation was that core/map will have the highest time, however it has the lowest time. Why is it more performant than transducers, when intermediate seqs dont get created for transducers, and transducers should be faster ?

2) Why is the core.reducers/fold version not significantly faster than the core.reducers/map version, shouldnt it have parallelized the operation ?

3) Why are the core.reducers versions so slow as compared to their lazy counterparts, the whole sequence is being realized at the end, so should not eager evaluation be more performant than the lazy one ?

Solution

;; core/map without transducers

(quick-bench (doall (->> [1 2 3 4]
                         (map inc)
                         (map inc)
                         (map inc))))

;; Evaluation count : 168090 in 6 samples of 28015 calls.
;; Execution time mean : 3.651319 µs
;; Execution time std-deviation : 88.055389 ns
;; Execution time lower quantile : 3.584198 µs ( 2.5%)
;; Execution time upper quantile : 3.799202 µs (97.5%)
;; Overhead used : 7.546189 ns
;; Found 1 outliers in 6 samples (16.6667 %)
;; low-severe    1 (16.6667 %)
;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers



;; transducers with a non lazy seq

(quick-bench (doall (->> [1 2 3 4]
                         (sequence (comp (map inc)
                                      (map inc)
                                      (map inc))))))

;; Evaluation count : 214902 in 6 samples of 35817 calls.
;; Execution time mean : 2.776696 µs
;; Execution time std-deviation : 24.377634 ns
;; Execution time lower quantile : 2.750123 µs ( 2.5%)
;; Execution time upper quantile : 2.809933 µs (97.5%)
;; Overhead used : 7.546189 ns


;;;;
;; tranducers with a lazy seq
;;;;

(quick-bench (doall (->> (range 1 5)
                         (sequence (comp (map inc)
                                      (map inc)
                                      (map inc))))))

;; Evaluation count : 214230 in 6 samples of 35705 calls.
;; Execution time mean : 3.361220 µs
;; Execution time std-deviation : 622.084860 ns
;; Execution time lower quantile : 2.874093 µs ( 2.5%)
;; Execution time upper quantile : 4.328653 µs (97.5%)
;; Overhead used : 7.546189 ns


;;;;
;; core.reducers
;;;;

(quick-bench (->> [1 2 3 4]
                  (r/map inc)
                  (r/map inc)
                  (r/map inc)))

;; Evaluation count : 6258966 in 6 samples of 1043161 calls.
;; Execution time mean : 89.610689 ns
;; Execution time std-deviation : 0.936108 ns
;; Execution time lower quantile : 88.786938 ns ( 2.5%)
;; Execution time upper quantile : 91.128549 ns (97.5%)
;; Overhead used : 7.546189 ns
;; Found 1 outliers in 6 samples (16.6667 %)
;; low-severe    1 (16.6667 %)
;; Variance from outliers : 13.8889 % Variance is moderately inflated by outliers



;;;; Evaluating a larger range so that the chunking comes into play ;;;;


;; core/map without transducers

(quick-bench (doall (->> (range 500)
                         (map inc)
                         (map inc)
                         (map inc))))



;; transducers with a non lazy seq

(quick-bench (doall (->> (doall (range 500))
                         (sequence (comp (map inc)
                                      (map inc)
                                      (map inc))))))

;; Evaluation count : 2598 in 6 samples of 433 calls.
;; Execution time mean : 237.164523 µs
;; Execution time std-deviation : 5.336417 µs
;; Execution time lower quantile : 231.751575 µs ( 2.5%)
;; Execution time upper quantile : 244.836021 µs (97.5%)
;; Overhead used : 7.546189 ns


;; tranducers with a lazy seq

(quick-bench (doall (->> (range 500)
                         (sequence (comp (map inc)
                                      (map inc)
                                      (map inc))))))

;; Evaluation count : 3210 in 6 samples of 535 calls.
;; Execution time mean : 183.866148 µs
;; Execution time std-deviation : 1.799841 µs
;; Execution time lower quantile : 182.137656 µs ( 2.5%)
;; Execution time upper quantile : 186.347677 µs (97.5%)
;; Overhead used : 7.546189 ns


;; core.reducers

(quick-bench (->> (range 500)
                  (r/map inc)
                  (r/map inc)
                  (r/map inc)))

;; Evaluation count : 4695642 in 6 samples of 782607 calls.
;; Execution time mean : 126.973627 ns
;; Execution time std-deviation : 5.972927 ns
;; Execution time lower quantile : 122.471060 ns ( 2.5%)
;; Execution time upper quantile : 134.181056 ns (97.5%)
;; Overhead used : 7.546189 ns

Based on the above answers / comments I tried the benchmarking again -

1) The reducers version is faster on a magnitude of 10^3.

2) This applies for both small collections (4 elements) and larger ones (500 element) where chunking can happen for lazy seqs.

3) Thus even with chunking, lazy evaluation is much slower than eager evaluation.

Corrections based on the remark :- the reducers only get executed on the reduce operation, which was not getting executed in the above code -

(quick-bench (->> [1 2 3 4]
                  (r/map inc)
                  (r/map inc)
                  (r/map inc)
                  (into [])))

;; Evaluation count : 331302 in 6 samples of 55217 calls.
;; Execution time mean : 2.035153 µs
;; Execution time std-deviation : 314.070348 ns
;; Execution time lower quantile : 1.720615 µs ( 2.5%)
;; Execution time upper quantile : 2.381706 µs (97.5%)
;; Overhead used : 7.546189 ns

(quick-bench (->> (range 500)
                  (r/map inc)
                  (r/map inc)
                  (r/map inc)
                  (into [])))

;; Evaluation count : 3870 in 6 samples of 645 calls.
;; Execution time mean : 150.349870 µs
;; Execution time std-deviation : 2.825632 µs
;; Execution time lower quantile : 146.468231 µs ( 2.5%)
;; Execution time upper quantile : 153.271325 µs (97.5%)
;; Overhead used : 7.546189 ns

So the reducer versions are 30-70 % faster than the transducer counterparts. The performance differential increases as the data set size increases.