Search code examples
clojurestack-overflowlazy-evaluationthunk

Clojure thunks: stack overflow with [0] but not '(0)?


Here's a snippet of code that gives me a StackOverflowError (boiled down from an actual example in my codebase):

( ->> (range 3000)
      (mapcat #(concat [0] (take 100 (repeat %))))
      (reduce (constantly nil))
      (count))

(Note: this code isn't designed to do anything other than demonstrate the problem or return zero.)

I can "rescue" it by any of the following steps:

  1. Remove the reduce line
  2. Change [0] to '(0)
  3. Add a (take 100000000) (or any integer) at any point between mapcat and count.

I'm basically baffled by this behavior (especially #2). I'd appreciate any input.

(I have a feeling this may be related to Why does reduce give a StackOverflowError in Clojure?, but I can't quite tell how -- so if it is related, I'd appreciate an explanation of why.)


Solution

  • Under normal circumstances, reduce operates using a loop/recur construct and uses constant stack space. However, you've hit a nasty corner case caused by reducing over a sequence produced by providing concat with alternating chunked and non-chunked sequences (the vector [0] is chunked; the seq produced by (take 100 (repeat %)) is non-chunked).

    When the first argument to concat is a chunked sequence, then it will return a lazy sequence that will use chunk-cons to produce another chunked sequence. Otherwise, it will use cons to produce a non-chunked sequence.

    Meanwhile, the implementation of reduce uses the InternalReduce protocol (defined in clojure.core.protocols) which provides an internal-reduce function for structures that can reduce themselves more efficiently than with the default first/next recursion. The internal-reduce implementation for chunked sequences uses chunk functions to consume the chunked items in a loop up until it's left with a non-chunked sequence, then calls internal-reduce on the remainder. The default internal-reduce implementation similarly uses first/next to consume items in a loop until the underlying seq type changes, then calls internal-reduce on the new seq type to dispatch to the appropriate optimized version. As you progress through the seq that concat produced, alternating between chunked and non-chunked sub-sequences, the internal-reduce calls pile up on the stack and eventually blow it.

    A simpler illustration of this case is:

    ;; All chunked sub-seqs is OK
    user> (reduce + (apply concat (take 10000 (repeat [1]))))
    10000
    
    ;; All non-chunked sub-seqs is OK
    user> (reduce + (apply concat (take 10000 (repeat '(1)))))
    10000
    
    ;; Interleaved chunked and non-chunked sub-seqs blows the stack
    user> (reduce + (apply concat (take 10000 (interleave (repeat [1]) (repeat '(1))))))
    StackOverflowError   clojure.lang.LazySeq.seq (LazySeq.java:60)
    

    Examining the stack trace:

    StackOverflowError 
        clojure.core/seq (core.clj:133)
        clojure.core/interleave/fn--4525 (core.clj:3901)
        clojure.lang.LazySeq.sval (LazySeq.java:42)
        clojure.lang.LazySeq.seq (LazySeq.java:60)
        clojure.lang.RT.seq (RT.java:484)
        clojure.core/seq (core.clj:133)
        clojure.core/take/fn--4232 (core.clj:2554)
        clojure.lang.LazySeq.sval (LazySeq.java:42)
        clojure.lang.LazySeq.seq (LazySeq.java:60)
        clojure.lang.Cons.next (Cons.java:39)
        clojure.lang.RT.next (RT.java:598)
        clojure.core/next (core.clj:64)
        clojure.core/concat/cat--3925/fn--3926 (core.clj:694)
        clojure.lang.LazySeq.sval (LazySeq.java:42)
        clojure.lang.LazySeq.seq (LazySeq.java:60)
        clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
        clojure.core/chunk-next (core.clj:660)
        clojure.core.protocols/fn--6041 (protocols.clj:101)
        clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
        clojure.core.protocols/fn--6034 (protocols.clj:147)
        clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
        clojure.core.protocols/fn--6041 (protocols.clj:104)
        clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
        clojure.core.protocols/fn--6034 (protocols.clj:147)
        clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
        clojure.core.protocols/fn--6041 (protocols.clj:104)
        clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
        clojure.core.protocols/fn--6034 (protocols.clj:147)
        clojure.core.protocols/fn--6005/G--6000--6014 (protocols.clj:19)
        clojure.core.protocols/fn--6041 (protocols.clj:104)
    

    As for your workarounds:

    • Clearly, avoiding reduce prevents the problem altogether.
    • Changing [0] to '(0) replaces the chunked sub-sequences with non-chunked ones, circumventing the optimization for chunked seqs in internal-reduce and allowing the reduction to happen in a single loop with constant stack space.
    • Inserting take creates a new, non-chunked sequence, composed entirely of cons cells.