I'm confused by Haskell's thunks

The wikibook says that: in this expression

let z = (length [1..5], reverse "olleh") in ...

z is a thunk.

But this stackoverflow post says that the outermost layer of z is the data constructor (,), which is in WHNF, thus z is not a thunk.

So which one is right? I tried this on GHCi

ghci> z = (length [1..5], reverse "olleh")
ghci> :sprint z
z = (_,_)

and see that z is not a thunk, only its components are.

Edit: The wikibook is saying that values are evaluated by layer. z is a outer layer of (,) and it is a thunk. After you deconstruct this layer by pattern matching, you get two sub-components, which are themself thunks. It makes sense to me, because it is in-line with the principle that if you do not refer to z later, then z needs not be evaluated. But GHCi does differently. So does it mean there is a gap between principles and practices or wikibook is simply wrong?

Solution

Lazy evaluation lets us work with the value returned by a function before the call has actually been evaluated. We can pass it on to other callers and only trigger the evaluation later when something needs to look at the value in detail.

To do that, we create a thunk. All a thunk is is a little structure in memory that has fields to store the arguments.

But applying a data constructor (like (,)) also just produces a little structure in memory that the fields to store the arguments it was applied to. So it turns out there's really not any point in creating a thunk for the application of a data constructor.

If we make a thunk, we'll just store the arguments in the thunk without looking at them, and they'll be accessed later when something pattern matches on the value (triggering it to be evaluated). If we skip the thunk and really apply the data constructor, we'll just store the arguments without looking at them, and they'll be accessed later when something pattern matches on the value. The behaviour of the system is basically identical whether we use a thunk and delay the application or not.

So in trying to help learners build a conceptual model for how Haskell works, it makes sense to say that any time a variable is bound to the result of an application, it will create a thunk. This is fine as a conceptual model; it agrees with all of the normally-observable behaviour of the language, and it's simpler than worrying about special exceptions when the thing being applied is a data constructor. I suppose that is the perspective the wikibooks article is taking.

But remember that thunks aren't actually the goal. There's no hard specification for exactly when GHC must use a thunk to delay evaluation of a value. They're just a means to an end. In cases like this when the observable behaviour will be the same either way, GHC is free to create a thunk or not. It's always safe to skip the thunk for a data constructor application¹, but GHC can also do this when strictness analysis reveals that the value was going to be forced soon anyway. GHC simply doesn't always do exactly what textbooks say; it does whatever it thinks will produce the "best" code while still behaving the same as the simpler model you'll get from textbooks.

It's important to remember that :sprint is not a tool for examining nice theoretical properties of Haskell. It's a debugging tool for peeking into the implementation that GHC uses. It specifically lets you see details that are not supposed to be observable from within Haskell itself, which is why it has to be a special GHCi command instead of a feature of the actual language. Sometimes :sprint will just reveal small details like this that are more to do with internal implementation details of GHC than they are to do with Haskell as a language.

¹ Unless it has strict fields, but then you can think about it as syntactic sugar for a function that does examine its arguments before storing them. The "real" underlying constructor still doesn't need a thunk, but that is never actually applied in the user's code, only the wrapper function that does the extra evaluation.