list haskell lazy-evaluation infinite let

Haskell lazy evaluation and let-in notation with infinite lists

Let pack be the function [a] -> [[a]] which takes a list and groups consecutive repeated elements into sublists.

Here are two implementations of pack in Haskell.

pack :: (Eq a) => [a] -> [[a]]
pack x = reverse $ foldl f [] x where
  f cks@(ck1:_):rest) x
    | x == ck1 = (x:ck):rest
    | otherwise [x]:cks
  f _ x = [[x]]

pack' (x:xs) = let (first,rest) = span (==x) xs
  in (x:first) : pack' rest
pack' [] = []

These implementations have a critical semantic difference: the first implementation fails to terminate if we apply it to an infinite list, e.g. [1..]. But the second implementation does work for infinite lists. For example, head $ pack' [1..] evaluates.

My guess is the let in notation is lazy, hence span (which uses let-in in its Prelude definition) only evaluates finitely many expressions when we apply pack' on an infinite list.

However, this is an unsatisfactory explanation to me, because I can replace reverse with the following definition.

reverse' = foldl (\y x0 -> x0:y) []

If we do this, every expression in pack folds from left to right—so I would expect this to work for infinite lists—yet it still hangs.

The question: Why does pack' work for infinite lists and not pack?

Solution

foldl :: Foldable f => (b -> a -> b) -> b -> f a -> b will for a given function f, and a base value z for a list [x₁, x₂, …, x_n] produce the result of:

f (f (… (f (f z x₁) x₂) …) x_n-1) x_n

If we thus want to determine the weak head normal form (WHNF) we need to access the last element of the list. The fold function f of the foldl can be lazy in its first parameter, but we will at least have to make a function call with x_n as parameter. This is why the documentation on foldl says:

Note that to produce the outermost application of the operator the entire input list must be traversed. Like all left-associative folds, foldl will diverge if given an infinite list.

My guess is the let in notation is lazy, hence span (which uses let-in in its Prelude definition) only evaluates finitely many expressions when we apply pack' on an infinite list.

You are correct, definitions in the let, where clauses and all other subexpressions are all lazy. But eventually if you are interested in the result, you need to determine the WHNF, and sometimes more than the WHNF.

The reason that it works is because span :: (a -> Bool) -> [a] -> ([a], [a]) is implemented lazily. Indeed, span is implemented as [src]:

span                    :: (a -> Bool) -> [a] -> ([a],[a])
span _ xs@[]            =  (xs, xs)
span p xs@(x:xs')
         | p x          =  let (ys,zs) = span p xs' in (x:ys,zs)
         | otherwise    =  ([],xs)

It thus does not need to know how the span of the tail looks in order to generate a 2-tuple where x that has satisfied the predicate is put in the first item, or in the second item if p x failed.

This thus means that span will generate a 2-tuple where the first item will contain all the elements that satisfy the predicate, whereas the second item is a lazy reference to the rest of the list to process.