I experimented with the closureSize#
primitive in GHC. Here I define the helper function I use (x
is forced using a bang pattern, so what I am measuring should be the data constructor closures themselves rather than the thunks).
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import Text.Printf
size :: Show a => a -> IO ()
size !x = printf "sizeof %s = %d\n" (showsPrec 11 x "") (I# (closureSize# x))
Now in GHCi, below are the weird results:
GHCi> size ()
sizeof () = 2
GHCi> size (MkSolo 1)
sizeof (MkSolo 1) = 2
GHCi> size (1, 2)
sizeof (1,2) = 3
GHCi> size (1, 2, 3)
sizeof (1,2,3) = 4
GHCi> size Nothing
sizeof Nothing = 2
GHCi> size (Just 42)
sizeof (Just 42) = 2
GHCi> size []
sizeof [] = 2
GHCi> size [1, 2, 3]
sizeof [1,2,3] = 3
Everything but the nullary constructors makes sense to me. I thought they should only take one single word for the STG info header, but instead they take two words. Is this due to some restrictions in GHC RTS that every closure must have at least one word of payload?
It has to do with garbage collection. See the comment Mark bits in mark-compact collector for details.
For the mark-sweep garbage collection, the GC uses two bits per closure. Instead of allocating these on a per-object basis, the bitmap is based on heap words, with one bit allocated per heap word. This makes a one-word heap object too small (because it'll only have one associated bit), so all heap objects have a minimum size of 2 words (or, equivalently, a MIN_PAYLOAD
size of 1).