I am experimenting with clojure's lazy sequences. In order to see when the evaluation of an item would occur, I created a function called square that prints the result before returning it. I then apply this function to a vector using map.
(defn square [x]
(let [result (* x x)]
(println "printing " result)
result))
(def s (map square [1 2 3 4 5])) ; outputs nothing
Here in my declaration of s, the REPL does not output anything. This signals that the computation has not started yet. This appears to be correct. I then do:
(first s)
The function "first" takes only the first item. So I expect that only 1 will be evaluated. My expectation is the REPL will output the following:
printing 1
1
However, the REPL outputted the following instead.
printing 1
printing 4
printing 9
printing 16
printing 25
1
So rather than evaluating only the first item, it seems it evaluates all items, even though I am accessing just the first item.
If the state of a lazy sequence can only be either all values computed and no values computed, then how can it gain the advantages of lazy evaluation? I came from a scheme background and I was expecting more like the behavior of streams. Looks like I am mistaken. Can anyone explain what is going on?
Laziness isn't all-or-nothing, but some implementations of seq operate on 'chunks' of the input sequence (see here for an explanation). This is the case for vector which you can test for with chunked-seq?
:
(chunked-seq? (seq [1 2 3 4 5]))
When given a collection map
checks to see if the underlying sequence is chunked, and if so evaluates the result on a per-chunk basis instead of an element at a time.
The chunk size is usually 32 so you can see this behaviour by comparing the result of
(first (map square (vec (range 35))))
This should only display a message for the first 32 items and not the entire sequence.