Let's say I have a very large number (millions/billions+) of these simple Foo
data structures:
data Foo = Foo
{ a :: {-# UNPACK #-}!Int
, b :: Int
}
With so many of these floating around, it becomes necessary to think about how much memory they consume.
On a 64bit machine, each Int
is 8 bytes, so a
only takes 8 bytes (because it is strict and unpacked). But how much memory will b
take up? I imagine this would change depending on whether the thunk is evaluated or not, right?
I imagine in the general case this is impossible to tell, because b
could depend on any number of memory positions that are only staying in memory in case b
needs to be evaluated. But what if b
only depended on (some very expensive operation on) a
? Then, is there a deterministic way to tell how much memory will be used?
In addition to user239558’s answer, and in response to your comment there, I’d like to point out some tools that allow you to inspect the heap representation of your value, find answers to questions like this yourself and to see the effect of optimizations and different ways of compiling.
tells you the size of a closure. Here you can see that (on a 64 bit machine) in evaluated form and after garbage collection, Foo 1 2
requires 24 bytes on its own and, including dependencies, 40 bytes in total:
Prelude GHC.DataSize Test> let x = Foo 1 2 Prelude GHC.DataSize Test> x Foo {a = 1, b = 2} Prelude GHC.DataSize Test> System.Mem.performGC Prelude GHC.DataSize Test> closureSize x 24 Prelude GHC.DataSize Test> recursiveSize x 40
To reproduce this you need to load the data definition in compiled form with -O
, otherwise, the {-# UNPACK #-}
pragma has no effect.
Now let us create a thunk and see that the size goes up considerably:
Prelude GHC.DataSize Test> let thunk = 2 + 3::Int Prelude GHC.DataSize Test> let x = Foo 1 thunk Prelude GHC.DataSize Test> x `seq` return () Prelude GHC.DataSize Test> System.Mem.performGC Prelude GHC.DataSize Test> closureSize x 24 Prelude GHC.DataSize Test> recursiveSize x 400
Now this is quite excessive. The reason is that this calculation includes references to static closures, Num
typeclass dictionaries and the like, and generally the GHCi bytecode is very unoptimized. So let’s put that in a proper Haskell program. Running
main = do
l <- getArgs
let n = length l
n `seq` return ()
let thunk = trace "I am evaluated" $ n + n
let x = Foo 1 thunk
a x `seq` return ()
performGC
s1 <- closureSize x
s2 <- closureSize thunk
r <- recursiveSize x
print (s1, s2, r)
gives (24, 24, 48)
, so now the Foo
value is made up of Foo
itself and a thunk. Why only the thunk, shouldn’t there also be a reference to n
adding another 16 bytes? To answer this, we need a better tool:
This library (by me) can investigate the heap and tell you precisely how your data is represented there. So adding this line to the file above:
buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
we get (when we pass five parameters to the program) the result Foo (_thunk 5) 1
. Note that the order of arguments is swapped on the heap, because pointers always come before data. The plain 5
indicates that the closure of the thunk stores its argument unboxed.
As a last exercise we verify this by making the thunk lazy in n
: Now
main = do
l <- getArgs
let n = length l
n `seq` return ()
let thunk = trace "I am evaluated" $ n
let x = Foo 1 thunk
a x `seq` return ()
performGC
s1 <- closureSize x
s2 <- closureSize thunk
s3 <- closureSize n
r <- recursiveSize x
buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
print (s1, s2, s3, r)
gives a heap representation of Foo (_thunk (I# 4)) 1
with a separate closure for n
(as indicated by the presence of the I#
constructor) and shows the expected sizes for the values and their total, (24,24,16,64)
.
Oh, and if this is still too high level, getClosureRaw gives you the raw bytes.