Search code examples
performanceclojure

Clojure comp doesn't tail-call-optimize (can create StackOverflow exception)


I got stuck on a Clojure program handling a very large amount of data (image data). When the image was larger than 128x128, the program would crash with a StackOverflow exception. Because it worked for smaller images, I knew it wasn't an infinite loop.

There were lots of possible causes of high memory usage, so I spent time digging around. Making sure I was using lazy sequences correctly, making sure to use recur as appropriate, etc. The turning point came when I realized that this:

at clojure.core$comp$fn__5792.invoke(core.clj:2569)
at clojure.core$comp$fn__5792.invoke(core.clj:2569)
at clojure.core$comp$fn__5792.invoke(core.clj:2569)

referred to the comp function.

So I looked at where I was using it:

(defn pipe [val funcs]
  ((apply comp funcs) val))

(pipe the-image-vec
  (map
    (fn [index] (fn [image-vec] ( ... )))
    (range image-size)))

I was doing per-pixel operations, mapping a function to each pixel to process it. Interestingly, comp doesn't appear to benefit from tail-call optimization, or do any kind of sequential application of functions. It seems that it was just composing things in the basic way, which when there are 65k functions, understandably overflows the stack. Here's the fixed version:

(defn pipe [val funcs]
  (cond
    (= (count funcs) 0) val
    :else               (recur ((first funcs) val) (rest funcs))))

recur ensures the recursion gets tail-call optimized, preventing a stack buildup.

If anybody can explain why comp works this way (or rather, doesn't work this way), I'd love to be enlightened.


Solution

  • First, a more straightforward MCVE:

    (def fs (repeat 1e6 identity))
    ((apply comp fs)  99)
    
    #<StackOverflowError...
    

    But why does this happen? If you look at the (abridged) comp source:

    (defn comp
      ([f g] 
         (fn 
           ([x] (f (g x)))
      ([f g & fs]
         (reduce1 comp (list* f g fs))))
    

    You can see that the whole thing is basically just 2 parts:

    • The first argument overload that does the main work of wrapping each composed function call in another function.

    • Reducing over the functions using comp.

    I believe the first point is the problem. comp works by taking the list of functions and continually wrapping each set of calls in functions. Eventually, this will exhaust the stack space if you try to compose too many functions, as it ends up creating a massive function that's wrapping many other functions.

    So, why can TCO not help here? Because unlike most StackOverflowErrors, recursion is not the problem here. The recursive calls only ever reach one frame deep in the variardic case at the bottom. The problem is the building up of a massive function, which can't simply be optimizated away.

    Why were you able to "fix" it though? Because you have access to val, so you're able to evaluate the functions as you go instead of building up one function to call later. comp was written using a simple implementation that works fine for most cases, but fails for extreme cases like the one you presented. It's fairly trivial to write a specialized version that handles massive collections though:

    (defn safe-comp [& fs]
      (fn [value]
        (reduce (fn [acc f]
                  (f acc))
                value
                (reverse fs))))
    

    Of course note though, this doesn't handle multiple arities like the core version does.

    Honestly though, in 3 and a bit years of using Clojure, I've never once written (apply comp ...). While it's certainly possible you have experienced a case I've never needed to deal with, it's more likely that you're using the wrong tool for the job here. When this code is complete, post it on Code Review and we may be able to suggest better ways of accomplishing what you're trying to do.