Search code examples
haskellgarbage-collectionlazy-evaluationbytestring

Why creating and disposing temporal ByteStrings eats up my memory in Haskell?


Here is a code which creates 1M Int numbers and put them in a list.

main = do
  let l =  [1..1000000]
  putStrLn $ show $ sum (foldl (\aux p -> p:aux) [] l)

(I know it could be more optimal (sum in the fold) but my point is different.) And look at this version

import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.ByteString.Lazy.Builder as Builder
import Data.ByteString.Lazy.Builder.ASCII
import Data.Maybe
import Data.List

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p:aux) [] l2)

This version needs 90MB memory! Why? Here is a profiling output

Memory consumption of creating 1M ByteString

What is the purple area?

EDIT after reading the comments I would like to add some clarification.

This is a test. I want to keep 1M numbers in the memory (I'm building a lookup table). So I "want to force the entire list to be held in memory". But I do not want to hold the bytestrings. My small code is a simulation of a case when I load the bytestrings from the disk, convert it to integers and keep the integers in the memory. (that's my goal). But somehow bytestrings remain in the memory. Why?


Solution

  • The problem is laziness here.

    main = do
        let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
        let l2 = map (fst . fromJust . B.readInt) l
        putStrLn $ show $ sum (foldl' (\aux p -> p:aux) [] l2)
    

    keeps the ByteStrings until sum forces the thunks fst . fromJust . B.readInt to be evaluated. Before that, what you have is a list of thunks each having a reference to one short ByteString - the reversing forces the entire list of thunks + ByteStrings into memory at once.

    If you don't want to keep the ByteStrings around, force the thunks to be evaluated,

    main = do
        let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
        let l2 = map (fst . fromJust . B.readInt) l
        putStrLn $ show $ sum (foldl' (\aux p -> p `seq` (p:aux)) [] l2)
    

    produces a much smaller memory usage, needing not much more than the plain list of Ints:

    enter image description here