optimizationhaskelltail-recursioncombinatorsfold# foldl is tail recursive, so how come foldr runs faster than foldl?

I wanted to test foldl vs foldr. From what I've seen you should use foldl over foldr when ever you can due to tail reccursion optimization.

This makes sense. However, after running this test I am confused:

foldr (takes 0.057s when using time command):

```
a::a -> [a] -> [a]
a x = ([x] ++ )
main = putStrLn(show ( sum (foldr a [] [0.. 100000])))
```

foldl (takes 0.089s when using time command):

```
b::[b] -> b -> [b]
b xs = ( ++ xs). (\y->[y])
main = putStrLn(show ( sum (foldl b [] [0.. 100000])))
```

It's clear that this example is trivial, but I am confused as to why foldr is beating foldl. Shouldn't this be a clear case where foldl wins?

Solution

Welcome to the world of lazy evaluation.

When you think about it in terms of strict evaluation, foldl looks "good" and foldr looks "bad" because foldl is tail recursive, but foldr would have to build a tower in the stack so it can process the last item first.

However, lazy evaluation turns the tables. Take, for example, the definition of the map function:

```
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs
```

This wouldn't be too good if Haskell used strict evaluation, since it would have to compute the tail first, then prepend the item (for all items in the list). The only way to do it efficiently would be to build the elements in reverse, it seems.

However, thanks to Haskell's lazy evaluation, this map function is actually efficient. Lists in Haskell can be thought of as generators, and this map function generates its first item by applying f to the first item of the input list. When it needs a second item, it just does the same thing again (without using extra space).

It turns out that `map`

can be described in terms of `foldr`

:

```
map f xs = foldr (\x ys -> f x : ys) [] xs
```

It's hard to tell by looking at it, but lazy evaluation kicks in because foldr can give `f`

its first argument right away:

```
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
```

Because the `f`

defined by `map`

can return the first item of the result list using solely the first parameter, the fold can operate lazily in constant space.

Now, lazy evaluation does bite back. For instance, try running sum [1..1000000]. It yields a stack overflow. Why should it? It should just evaluate from left to right, right?

Let's look at how Haskell evaluates it:

```
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
sum = foldl (+) 0
sum [1..1000000] = foldl (+) 0 [1..1000000]
= foldl (+) ((+) 0 1) [2..1000000]
= foldl (+) ((+) ((+) 0 1) 2) [3..1000000]
= foldl (+) ((+) ((+) ((+) 0 1) 2) 3) [4..1000000]
...
= (+) ((+) ((+) (...) 999999) 1000000)
```

Haskell is too lazy to perform the additions as it goes. Instead, it ends up with a tower of unevaluated thunks that have to be forced to get a number. The stack overflow occurs during this evaluation, since it has to recurse deeply to evaluate all the thunks.

Fortunately, there is a special function in Data.List called `foldl'`

that operates strictly. `foldl' (+) 0 [1..1000000]`

will not stack overflow. (Note: I tried replacing `foldl`

with `foldl'`

in your test, but it actually made it run slower.)

- Make code faster to solve which letter represents which digit in a given equation
- Calculate circular shift pairs in a list
- Prefetching Examples?
- How many GCC optimization levels are there?
- Optimizing the rational quadratic kernel function
- Tensorflow: how to minimize under constraints
- Optimizing native hit testing of DOM elements (Chrome)
- Updating previous row data with unique ID using new data dropped at the same sheet in Excel
- Pytorch Model Optimization: Automatic Mixed Precision vs Quantization?
- Most efficient way to calculate radial profile
- Why and when should I declare a function as noinline in C++?
- std::hash_set vs std::unordered_set, are they the same thing?
- Optimal way of counting the number of non-overlapping pairs given a list of intervals
- Optimising array addition (y, x, RGBA)
- Is there a way to further optimize this code?
- Optimize assigning an index to groups of split data in Polars
- How can I optimize the algorithm to find the longest common subsequence (LCS) between two strings?
- Problem understanding Python's turtle refreshing after using a tracer(0) and an update
- Ternary operator vs if statement: compiler optimization
- Rectangular Nesting - Convergence to optimal solution using Simulated Annealing
- How to produce codes that always trigger signal SIGFPE (div by zero)?
- Is there a runtime cost of assigning variables in Rust?
- Do bytebuffer serializers optimized to the bit level (simillar to protobuff) exist?
- Rounding up to next power of 2
- What's the fastest way to check if a list contains an item in Go?
- SCIP - stuck in clique table clean up
- Is it legal for a C++ optimizer to reorder calls to clock()?
- For loop is slowing down performance. Any alternatives?
- Looking to identify a "classic" problem and identify availble algorithms to solve it
- Unity keeps crashing on while loop (C#) (simplex method program)