Search code examples
haskellclojurestack-overflowreducefold

Haskell's foldr/l and Clojure's reduce


I decided to get serious with Haskell, and am wrapping my head around the use of foldl and foldr. They really feel a lot like Clojure's reduce - but I might be wrong, and ran into a problem soon enough, which I hope someone will easily explain.

Working with this doc: https://wiki.haskell.org/Foldr_Foldl_Foldl'

Before getting too deep into implementing my own versions of foldr/foldl, I decided to first test the existing ones from Prelude:

± |master U:2 ✗| → ghci
GHCi, version 8.6.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /Users/akarpov/.ghc/ghci.conf
Prelude> foldr (+) 0 [1..9999999]
49999995000000
Prelude> foldr (+) 0 [1..99999999]
*** Exception: stack overflow

Didn't see that coming (same result when using foldl); I rather expected something along the lines of Clojure:

> (time (reduce +' (range 1 99999999)))
"Elapsed time: 3435.638258 msecs"
4999999850000001

The only obvious (and irrelevant) difference is using +' rather than +, but it's only to accommodate the JVM's type system - the number produced doesn't fit into a [default] Long, and +' will automatically use a BigInteger when needed. Most importantly, no stack overflow. It seems to indicate, therefore, that folding/reduction in Haskell/Clojure is either implemented very differently, or my use of haskell's implementation is wrong.

in case it is relevant, these are global-project settings: - packages: [] - resolver: lts-13.8


Solution

  • Problem

    As the Wiki explains, the function (+) is strict in both its arguments, which means that when you try to do 1 + (2 + 3), you first need to calculate (2 + 3). While this may not seem like a big problem at first, it becomes when you have a long list. Quoting the wiki,

    to evaluate: 1 + (2 + (3 + (4 + (...)))), 1 is pushed on the stack.

    Then: 2 + (3 + (4 + (...))) is evaluated. So 2 is pushed on the stack.

    Then: 3 + (4 + (...)) is evaluated. So 3 is pushed on the stack.

    Then: 4 + (...) is evaluated. So 4 is pushed on the stack.

    Then: your limited stack will eventually run full when you evaluate a large enough chain of (+)s. This then triggers the stack overflow exception.

    I don't much about Clojure, but if (+') works, then it definitely does not need both its arguments to be evaluated before being reduced, and that is also the solution in Haskell.

    Solution

    Foldl would not solve the problem, as it is known that foldl has to travel the whole list twice before returning a result --which is not efficient--, and even then (+) is still strict, so the reducible expressions are not reduced.

    To solve this, you must use a non-strict function. In the standard Prelude, seq :: a -> b -> b can be used exactly for that, and that's how foldl' works.

    Again quoting the wiki,

    foldr is not only the right fold, it is also most commonly the right fold to use, in particular when transforming lists (or other foldables) into lists with related elements in the same order. Notably, foldr will be effective for transforming even infinite lists into other infinite lists. For such purposes, it should be your first and most natural choice. For example, note that foldr (:) []==id.

    The problem with foldl' is that it reverses the list. If you have a commutative function that is not a problem, so if your list is finite (remember foldl has to travel it all), foldl' is usually better. On the other hand, if your function for some reasons does not necessarily need the whole list, or the list might be infinite, go for foldr