Search code examples
haskellrecursionfold

Why does foldl seems to be harmful despite being tail-recursive?


I've always believed that tail-recursive functions are better in terms of performance than non tail-recursive versions. So, counting items in a list might be implemented like so:

count:: [a] -> Int
count [] = 0
count (x:xs) = 1 + count xs

But this function is not tail recursive, and so is not as performant as possible. The fix is to accumulate counts, like so:

_count:: Num b => b -> [a] -> b
_count b [] = b
_count b (x:xs) = _count (b + 1) xs

count:: [a] -> Int
count = _count 0

This can be easily implemented with a tail-recursive fold:

myfold:: (b -> a -> b) -> b -> [a] -> b
myfold f b [] = b
myfold f b (x:xs) = myfold f (f b x) xs

count = myfold incr 0
   where incr c _ = c + 1

But, then I remembered something about left vs right folds. It turned out myfold is a left fold, which according to Real World Haskell shouldn't be used:

This is convenient for testing, but we will never use foldl in practice.

...because of the thunking of the application of f b x.

So, I tried to rewrite myfold as a right fold:

myfoldr:: (a -> b -> b) -> b -> [a] -> b
myfoldr f b [] = b
myfoldr f b (x:xs) = f x (myfoldr f b xs)

But that's not tail-recursive.

It seems, then, that Haskell non-strict evaluation makes tail-recursiveness less important. Yet, I have this feeling that for counting items in lists a strict foldl should perform better than any foldr, because there's no way we can extract anything from an Integer.

To sum up, I think these are the better implementations (using folds) for map and count:

map:: (a -> b) -> [a] -> [b]
map f = foldr g []
  where g x fxs = (f x):fxs

count:: [a] -> Int
count = foldl incr 0
  where incr c _ = c + 1

Is this correct?


Solution

  • It seems, then, that Haskell non-strict evaluation makes tail-recursiveness less important. Yet, I have this feeling that for counting items in lists a strict foldl should perform better than any foldr, because there's no way we can extract anything from an Integer.

    That is correct, and tail-calls are more efficient. But this benefit can be outweighed by the cost of creating large thunks, and this is the case for foldl.

    The way to have your cake and eat it too is to make sure that the accumulator is not thunked, but rather eagerly evaluated:

    myfold:: (b -> a -> b) -> b -> [a] -> b
    myfold f !b [] = b
    myfold f !b (x:xs) = myfold f (f b x) xs
    

    Which is, of course, the foldl' function.

    TL;DR: Never use foldl, but do use foldl'.