haskell garbage-collection lazy-evaluation bytestring

Why creating and disposing temporal ByteStrings eats up my memory in Haskell?

Here is a code which creates 1M Int numbers and put them in a list.

main = do
  let l =  [1..1000000]
  putStrLn $ show $ sum (foldl (\aux p -> p:aux) [] l)

(I know it could be more optimal (sum in the fold) but my point is different.) And look at this version

import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.ByteString.Lazy.Builder as Builder
import Data.ByteString.Lazy.Builder.ASCII
import Data.Maybe
import Data.List

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p:aux) [] l2)

This version needs 90MB memory! Why? Here is a profiling output

Memory consumption of creating 1M ByteString

What is the purple area?

EDIT after reading the comments I would like to add some clarification.

This is a test. I want to keep 1M numbers in the memory (I'm building a lookup table). So I "want to force the entire list to be held in memory". But I do not want to hold the bytestrings. My small code is a simulation of a case when I load the bytestrings from the disk, convert it to integers and keep the integers in the memory. (that's my goal). But somehow bytestrings remain in the memory. Why?

Solution

The problem is laziness here.

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p:aux) [] l2)

keeps the ByteStrings until sum forces the thunks fst . fromJust . B.readInt to be evaluated. Before that, what you have is a list of thunks each having a reference to one short ByteString - the reversing forces the entire list of thunks + ByteStrings into memory at once.

If you don't want to keep the ByteStrings around, force the thunks to be evaluated,

main = do
    let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
    let l2 = map (fst . fromJust . B.readInt) l
    putStrLn $ show $ sum (foldl' (\aux p -> p `seq` (p:aux)) [] l2)

produces a much smaller memory usage, needing not much more than the plain list of Ints:

enter image description here