clojure stack-overflow lazy-evaluation thunk

Clojure thunks: stack overflow with [0] but not '(0)?

Here's a snippet of code that gives me a StackOverflowError (boiled down from an actual example in my codebase):

( ->> (range 3000)
      (mapcat #(concat [0] (take 100 (repeat %))))
      (reduce (constantly nil))
      (count))

(Note: this code isn't designed to do anything other than demonstrate the problem or return zero.)

I can "rescue" it by any of the following steps:

Remove the reduce line
Change [0] to '(0)
Add a (take 100000000) (or any integer) at any point between mapcat and count.

I'm basically baffled by this behavior (especially #2). I'd appreciate any input.

(I have a feeling this may be related to Why does reduce give a StackOverflowError in Clojure?, but I can't quite tell how -- so if it is related, I'd appreciate an explanation of why.)

Solution

Under normal circumstances, reduce operates using a loop/recur construct and uses constant stack space. However, you've hit a nasty corner case caused by reducing over a sequence produced by providing concat with alternating chunked and non-chunked sequences (the vector [0] is chunked; the seq produced by (take 100 (repeat %)) is non-chunked).

When the first argument to concat is a chunked sequence, then it will return a lazy sequence that will use chunk-cons to produce another chunked sequence. Otherwise, it will use cons to produce a non-chunked sequence.

Meanwhile, the implementation of reduce uses the InternalReduce protocol (defined in clojure.core.protocols) which provides an internal-reduce function for structures that can reduce themselves more efficiently than with the default first/next recursion. The internal-reduce implementation for chunked sequences uses chunk functions to consume the chunked items in a loop up until it's left with a non-chunked sequence, then calls internal-reduce on the remainder. The default internal-reduce implementation similarly uses first/next to consume items in a loop until the underlying seq type changes, then calls internal-reduce on the new seq type to dispatch to the appropriate optimized version. As you progress through the seq that concat produced, alternating between chunked and non-chunked sub-sequences, the internal-reduce calls pile up on the stack and eventually blow it.

A simpler illustration of this case is:

;; All chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat [1]))))
10000

;; All non-chunked sub-seqs is OK
user> (reduce + (apply concat (take 10000 (repeat '(1)))))
10000

;; Interleaved chunked and non-chunked sub-seqs blows the stack
user> (reduce + (apply concat (take 10000 (interleave (repeat [1]) (repeat '(1))))))
StackOverflowError   clojure.lang.LazySeq.seq (LazySeq.java:60)

Examining the stack trace:

StackOverflowError 
    clojure.core/seq (core.clj:133)
    clojure.core/interleave/fn--4525 (core.clj:3901)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.RT.seq (RT.java:484)
    clojure.core/seq (core.clj:133)
    clojure.core/take/fn--4232 (core.clj:2554)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.next (RT.java:598)
    clojure.core/next (core.clj:64)
    clojure.core/concat/cat--3925/fn--3926 (core.clj:694)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
    clojure.core/chunk-next (core.clj:660)
    clojure.core.protocols/fn--6041 (protocols.clj:101)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6034 (protocols.clj:147)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6041 (protocols.clj:104)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6034 (protocols.clj:147)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6041 (protocols.clj:104)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6034 (protocols.clj:147)
    clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
    clojure.core.protocols/fn--6041 (protocols.clj:104)

As for your workarounds:

Clearly, avoiding reduce prevents the problem altogether.
Changing [0] to '(0) replaces the chunked sub-sequences with non-chunked ones, circumventing the optimization for chunked seqs in internal-reduce and allowing the reduction to happen in a single loop with constant stack space.
Inserting take creates a new, non-chunked sequence, composed entirely of cons cells.