Search code examples
haskelllazy-evaluationghc

Is "hello" a thunk or in normal form?


Both 1 and "hello" are in normal form, thus are not thunks. So why does GHCi show that "hello" is a thunk?

ghci> x = 1::Int
ghci> :sprint x
x = 1

ghci> x = "hello"::String
ghci> :sprint x
x = _

Edit: I copy a comment from my previous post that tries to explain this question

Haskell-the-language (or -standard) doesn't really define what value is a thunk, what's in normal form, etc.. All it defines are strictness semantics of functions you might call on such values. In your examples, that doesn't happen so basically the compiler is free to do anything, possibly involving heuristics that for such and such type a thunk is likely to achieve nothing but higher memory overhead.

Can anyone give me some articles or documents about this?


Solution

  • As I commented earlier, this is mostly about how GHCi decides to store values, rather than anything the Haskell standard specifies. Thunks/laziness are an implementation detail for how non-strictness is achieved, and sometimes they're also a good mental model, but not always. Strictness is all about functions and how ⊥-values are allowed or not allowed to propagate. Any wisdom about where thunks may or may not appear is mostly a by-product of that.

    But there are still a couple of things that should be kept in mind on the pure value/thunk level.

    Polymorphic values are always evaluated on-demand; they're basically functions with invisible arguments. In GHCi (which has the monomorphism restriction turned off by default) it's easy to define polymorphic values without meaning to, in particular number literals. Then :sprint will only ever give _.
    This is not an issue in your examples though: explicitly restricting to a concrete type ensures monomorphic bindings.

    As long as you're directly using plain constructors, there is a good chance that the compiler/interpreter will directly store them in normal form. It does this, for example, for something like ('a', (Just 'b','c')). But I don't think this is ever guaranteed, and usually it's irrelevant anyways because literal constructors only get you so far. All that is guaranteed is that a value defined as an outer constructor can be evaluated to WHNF without running into ⊥, but here we're again in the realm of strictness properties.

    More typically, values you bind comes from some function and thus start out as thunks. This is in particular also the case for many expressions that look like plain constructors, but aren't: most literals in Haskell can be smart literals, either by default or through extensions. Best known is that number literals are implicitly wrapped in fromInteger, so as to be polymorphic over any Num type. As such it is indeed a bit surprising that

    x = 1::Int
    

    does not seem to produce any thunk. But again: this is up to the compiler, and it just makes sense: there is no risk of ⊥, the value is right there, ready to be stored, whereas building a thunk around it would entail a completely unnecessary pointer indirection.

    It's worth noting that this does not happen for all number literals:

    ghci> let y = 1678796789876876987698768798769876986780987609876 :: Integer
    ghci> :sprint y
    y = _
    

    In this case, the number doesn't fit int 64 bit anymore, so it's less clear-cut that putting it straight into memory is advantagous; it turns out GHC doesn't risk it in this case.

    Now, for strings we would have much the same situation if -XOverloadedStrings were active, in which case all string literals would be required to be wrapped in fromString. But there's no guarantee the other way around: just because the extension isn't active doesn't mean GHC can't wrap the literal in some other function. As Li-yao Xia showed, it wraps it in the low-level function unpackCString#. (This certainly makes sense from a memory perspective, because a C string is continuous and efficient in memory, whereas Haskell's linked-list strings are frankly atrocious for storage, taking about 16× as much memory.) It's legal as long as the compiler is sure that you won't run into ⊥ when evaluating the string.

    In this case, explicitly defining the string with plain constructors does cause it to be immediately stored in normal form:

    ghci> let x = "hello" :: String
    ghci> :sprint x
    x = _
    ghci> let x' = 'h':'e':'l':'l':'o':[]
    ghci> :sprint x'
    x' = "hello"