Search code examples
clojurestack-overflowprimestail-recursion

Why does this Clojure prime generator raises a StackOverflowError?


I am just learning Clojure and, as usual when lerning new programming languages, one of the first things I tried is implementing the Sieve of Eratosthenes.

I came up with the following solution:

(defn primes
 "Calculate all primes up to the given number"
 [n]
 (loop
   [
    result []
    numbers (range 2 (inc n))
    ]
   (if (empty? numbers)
     result
     (let [[next & rest] numbers]
       (recur (conj result next) (filter (fn [n] (not= 0 (mod n next))) rest)))
     )
   )
 )

It works fine and quite fast for small numbers but for large inputs a StackOverflowError is raised with a suspiciously short stacktrace, eg.:

(primes 100000)
Execution error (StackOverflowError) at (REPL:1).
null
(pst)
StackOverflowError 
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:51)    
    clojure.lang.RT.seq (RT.java:531)
    clojure.core/seq--5387 (core.clj:137)    
    clojure.core/filter/fn--5878 (core.clj:2809)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:51)
    clojure.lang.RT.seq (RT.java:531)
    clojure.core/seq--5387 (core.clj:137)
    clojure.core/filter/fn--5878 (core.clj:2809)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:51)
=> nil

I was under the impression that recur implements tail recursion if it is evaluated last in a loop form and my first question is if this is really the case here. My second question is why the stack trace is so short for a StackOverflowError. I also have problems interpreting the stacktrace, ie. what line corresponds to what form.

I am only interested in better or more Clojure-like solutions if they provide insights for these questions, since otherwise I would like to find them by myself. Thank you!


Solution

  • Slightly modified, with comments to describe what is happening on each line, this is your function:

    (defn primes
      "Calculate all primes up to the given number"
      [n]
      ;; `loop` is not lazy, it runs until it produces a result:
      (loop [result []
             ;; a lazy sequence implemented by clojure.lang.LongRange:
             numbers (range 2 (inc n))]
        (if (not (nil? (seq numbers)))
          result
          (let [current (first numbers)
                remaining (rest numbers)]
            (recur
             ;; `conj` on a vector returns a vector (non-lazy):
             (conj result current)
             ;; `filter` on a lazy sequence returns a new lazy sequence:
             (filter (fn [n] (not= 0 (mod n next)))
                     remaining))))))
    

    The key is that filter at the end.

    Most lazy sequence operations such as filter work by wrapping one lazy sequence in another. On each iteration of the loop, filter adds another layer of lazy sequence, like this:

    (filter (fn [n] (not= 0 (mod n 5)))       ; returns a LazySeq
      (filter (fn [n] (not= 0 (mod n 4)))     ; returns a LazySeq
        (filter (fn [n] (not= 0 (mod n 3)))   ; returns a LazySeq
          (filter (fn [n] (not= 0 (mod n 2))) ; returns a LazySeq
            remaining))))
    

    The LazySeq objects stack up, each one holding a reference to the previous.

    With most lazy sequences, the wrapping doesn't matter because they automatically "unwrap" as soon as you request a value. That happens in LazySeq.seq.

    This is one case where it does matter, because your loop builds up so many layers of lazy sequence objects that the nested calls to LazySeq.seq and .sval overflow the maximum stack size allowed by the JVM. That's what you see in the stacktrace.

    (This also has memory implications, since a reference to the start of the sequence prevents any of the others from being garbage-collected, what Clojure programmers call "holding on to the head" of the sequence.)

    The more general issue with this function is mixing lazy (filter) and non-lazy (loop) operations. That's often a source of problems, so Clojure programmers learn to avoid it out of habit.

    As Alan suggests, you can avoid the problem by using only non-lazy operations, such as filterv instead of filter, which forces the lazy sequence into a vector.

    Almost any style of lazy evaluation can exhibit some variation of this problem. I described it in Clojure don'ts: concat. For another example see foldr versus foldl in Haskell.

    Even without laziness, deeply-nested object trees can cause a StackOverflow, for examples in Java I found xstream#88 or circe#1074.