Search code examples
haskelllazy-evaluationthunk

How much memory does a thunk use?


Let's say I have a very large number (millions/billions+) of these simple Foo data structures:

data Foo = Foo
    { a :: {-# UNPACK #-}!Int
    , b :: Int
    }

With so many of these floating around, it becomes necessary to think about how much memory they consume.

On a 64bit machine, each Int is 8 bytes, so a only takes 8 bytes (because it is strict and unpacked). But how much memory will b take up? I imagine this would change depending on whether the thunk is evaluated or not, right?

I imagine in the general case this is impossible to tell, because b could depend on any number of memory positions that are only staying in memory in case b needs to be evaluated. But what if b only depended on (some very expensive operation on) a? Then, is there a deterministic way to tell how much memory will be used?


Solution

  • In addition to user239558’s answer, and in response to your comment there, I’d like to point out some tools that allow you to inspect the heap representation of your value, find answers to questions like this yourself and to see the effect of optimizations and different ways of compiling.

    ghc-datasize

    tells you the size of a closure. Here you can see that (on a 64 bit machine) in evaluated form and after garbage collection, Foo 1 2 requires 24 bytes on its own and, including dependencies, 40 bytes in total:

    Prelude GHC.DataSize Test> let x = Foo 1 2
    Prelude GHC.DataSize Test> x
    Foo {a = 1, b = 2}
    Prelude GHC.DataSize Test> System.Mem.performGC
    Prelude GHC.DataSize Test> closureSize x
    24
    Prelude GHC.DataSize Test> recursiveSize x
    40
    

    To reproduce this you need to load the data definition in compiled form with -O, otherwise, the {-# UNPACK #-} pragma has no effect.

    Now let us create a thunk and see that the size goes up considerably:

    Prelude GHC.DataSize Test> let thunk = 2 + 3::Int
    Prelude GHC.DataSize Test> let x = Foo 1 thunk
    Prelude GHC.DataSize Test> x `seq` return ()
    Prelude GHC.DataSize Test> System.Mem.performGC
    Prelude GHC.DataSize Test> closureSize x
    24
    Prelude GHC.DataSize Test> recursiveSize x
    400
    

    Now this is quite excessive. The reason is that this calculation includes references to static closures, Num typeclass dictionaries and the like, and generally the GHCi bytecode is very unoptimized. So let’s put that in a proper Haskell program. Running

    main = do
        l <- getArgs
        let n = length l
        n `seq` return ()
        let thunk = trace "I am evaluated" $ n + n
        let x = Foo 1 thunk
        a x `seq` return ()
        performGC
        s1 <- closureSize x
        s2 <- closureSize thunk
        r <- recursiveSize x
        print (s1, s2, r)
    

    gives (24, 24, 48), so now the Foo value is made up of Foo itself and a thunk. Why only the thunk, shouldn’t there also be a reference to n adding another 16 bytes? To answer this, we need a better tool:

    ghc-heap-view

    This library (by me) can investigate the heap and tell you precisely how your data is represented there. So adding this line to the file above:

    buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
    

    we get (when we pass five parameters to the program) the result Foo (_thunk 5) 1. Note that the order of arguments is swapped on the heap, because pointers always come before data. The plain 5 indicates that the closure of the thunk stores its argument unboxed.

    As a last exercise we verify this by making the thunk lazy in n: Now

    main = do
        l <- getArgs
        let n = length l
        n `seq` return ()
        let thunk = trace "I am evaluated" $ n
        let x = Foo 1 thunk
        a x `seq` return ()
        performGC
        s1 <- closureSize x
        s2 <- closureSize thunk
        s3 <- closureSize n
        r <- recursiveSize x
        buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
        print (s1, s2, s3, r)
    

    gives a heap representation of Foo (_thunk (I# 4)) 1 with a separate closure for n (as indicated by the presence of the I# constructor) and shows the expected sizes for the values and their total, (24,24,16,64).

    Oh, and if this is still too high level, getClosureRaw gives you the raw bytes.