arrays haskell lazy-evaluation strictness

What's the intuition for recursively defined Haskell Data.Array and strictness?

Consider this Haskell program

module RecursiveArray where

import Data.Array ( (!), listArray, Array )

goodArray :: Array Int Int
goodArray = listArray (0, 1) (go 0)
  where
    go x = x : go ((goodArray ! x) + 1)

badArray :: Array Int Int
badArray = listArray (0, 1) (go 0)
  where
    go !x = x : go ((badArray ! x) + 1)

main = do
  print goodArray
  print badArray

Which will print

> runghc "RecursiveArray.hs"
array (0,1) [(0,0),(1,0)]
array <program stalls here>

I would like some help in understanding what is going on here. Can one use equational reasoning to understand what's going on? Is the arrays internal representation relevant?

My confusion lies with why the strictness of the second array-element matters if the former is already readily available in a cons. I also note that using the array dimensions (0, 0) instead of (0, 1) will make the array defined suddenly.

I tried to do some equational reasoning but failed, I got this similar program:

module RecursiveArray where

import Data.Array ( (!), listArray, Array )

goodArray :: Array Int Int
goodArray = listArray (0, 1) [0, x]
    where x = goodArray ! 0

badArray :: Array Int Int
badArray = listArray (0, 1) [0, x]
    where !x = badArray ! 0

main = do
  print goodArray
  print badArray

But it's not the same I see, since now setting the array range to (0, 0) still causes the array to be undefined.

Solution

(I actually like Li-yao Xia's answer better than this one, but I thought it might be useful to have an answer addressing this more in terms of intuitive reasoning about the operational behaviour, rather than formal reasoning about the denotational semantics)

A simple intuitive way to think about this is to remember that an array is inherently a single block of memory. Accessing an array element requires the array's block of memory to exist, so that we can look in the block to find the pointer to the element. There's no way for the block to be "partially allocated"; accessing any element has to force the allocation of the entire block.

We also can't have the array's block of memory existing without something to put in every element's slot. It can be a reference to a thunk for the element, but we need to have such a thunk for each and every element before we can put them all in the array. This isn't an imperative language where we can allocate an array filled with nulls¹ and then gradually mutate the elements into the references we want. The array either does not yet exist (the array itself is an unevaluated thunk), or it exists and has correct references to all of its elements (which themselves may be thunks); there is no in-between state where we have references to some of the elements and not others.² This is the key difference between an array and a list or tree; both can be partially evaluated, allowing access to some elements while the question of whether other elements even exist is unresolved.

So when we evaluate listArray (0, 1) (go 0), we need to run go 0 far enough that we actually have all the elements to fill a (0, 1) sized array. Those elements need not be completely evaluated, but we need to have a reference to something for each of them, even if it's just a thunk.

In goodArray we have go x = x : go ((goodArray ! x) + 1). go takes its argument x and just puts in in a newly-allocated list cell without examining it, so if x was a thunk it can stay a thunk. That means we can evaluate go 0 to any depth we like, producing something like this:

go 0 = 0 : thunk1 : thunk2 : thunk3 : ...
  where
    thunk1 = (goodArray ! 0) + 1
    thunk2 = (goodArray ! thunk1) + 1
    thunk3 = (goodArray ! thunk2) + 1
    ...

So then we can fill our array with <0, thunk1>, and only later (such as when printing in main) might we actually evaluate thunk1; by that time goodArray exists and has element number 0, so thunk1 can be easily evaluated to produce 1. No problem.

But in badArray we have go !x = x : go ((badArray ! x) + 1). go 0 can be evaluated to WHNF to produce 0 : go ((badArray ! 0) + 1), but that's not enough to allocate and populate our array. We also need to force the second call to go, so that we can get a reference to the second element. But because this go is strict in x, this forces us to evaluate badArray ! 0 before we actually have an array object we can look in for the element at index 0. This is where the problem lies.

This behaviour actually isn't specific to arrays. In many ways an array of a given size behaves a lot like an ordinary data type with a number of fields corresponding to the number of elements³. You can see exactly the same behaviour if you mock-up a datatype specific to the number of elements you have, something like this:

data Array2 a = Constructor2 a a
  deriving (Show)

listArray2 :: [a] -> Array2 a
listArray2 (x : y : _) = Constructor2 x y
listArray2 _ = error "not enough elements"

-- I've written this as a "long-hand" case statement to make clear
-- that accessing any element requires pattern matching the
-- constructor and getting access to all fields; you cannot get the
-- first field out before the second fields exists at least enough to
-- be a thunk
get :: Array2 a -> Int -> a
get arr n
  = case arr of
      Constructor2 x y
        | n == 0 -> x
        | n == 1 -> y
        | otherwise -> error "index out of bounds"

goodArray2 = listArray2 (go 0)
  where
    go x = x : go ((goodArray2 `get` 0) + 1)

badArray2 = listArray2 (go 0)
  where
    go !x = x : go ((badArray2 `get` 0) + 1)

An expression that returns an Array2 cannot have a dependency on its own elements on the path that produces the Constructor2 application. An element's final value can depend on other elements (so long as there's not a cycle where two elements both depend on each other), but the constructor itself cannot depend on an element. You can see that listArray2 needs to match a list with at least 2 elements before it can return a Constructor2 application, so if the elements are strictly evaluated then they cannot depend on fields of the constructor.

¹ Or filled with unitialised garbage, or filled with some default value, or whatever any language might choose to do when you allocate an array.

² Of course an array cannot truly be allocated and filled as a single atomic step, so in the underlying implementation it is in fact allocated and then mutated one step at a time. But no Haskell code can see the intermediate steps. The array will start out as a thunk, which is a pointer to code to execute to create it. When the thunk is evaluated that code will need to run far enough to have allocated the memory and filled the elements before it returns, and only at that point is the thunk pointer overwritten with an actual pointer to the memory block.

No other Haskell code can possibly see the partially filled array; before evaluation starts any other code that looks at the array will see the pointer to the thunk. After evaluation has finished it will see the pointer to the filled array. During evaluation is a more complicated and implementation-detaily question, but it definitely won't see a pointer to a partially-filled array (I believe it is replaced with a "black hole" telling observers that the thunk is currently being evaluated, but there is some possibility of code in other threads seeing the original thunk still and racing the first thread to complete evaluation of it, because it's actually less efficient to do enough synchronisation to guarantee that doesn't happen - purity means that's not actually the major problem that it sounds like it would be).

So it's easier to just think of the "allocate the array and fill it with (possibly thunk) elements" routine as a single atomic step.

³ In fact if you look at things a bit sideways, the only reason we need built-in support for arrays from the compiler is that it would be impossible declare or use a type with an infinite number of constructors (one for each possible number of elements), and it would be less efficient even if we could. But conceptually that's kind-of what an array type provides. In much the same way as a 64-bit integer type (signed or unsigned) could be thought of as an optimised way of representing an enumeration data type with 18 billion billion empty constructors.