Search code examples
haskellmemory-managementfold

How to make foldl consume constant memory?


We define the following data type Stupid:

import qualified Data.Vector as V
import Data.List (foldl')
data Stupid = Stupid {content::V.Vector Int, ul::Int} deriving Show

Now I have two slightly different code.

foldl' (\acc x->Stupid{content=(content acc) V.// [(x,x+123)],ul=1}) (Stupid {content=V.replicate 10000 10,ul=1}) $ take 100000 $ cycle [0..9999]

takes constant memory (~100M), while

foldl' (\acc x->Stupid{content=(content acc) V.// [(x,x+123)],ul=ul acc}) (Stupid {content=V.replicate 10000 10,ul=1}) $ take 100000 $ cycle [0..9999]

takes a huge amount of memory(~8G).

Theoretically, only one copy of the current Stupid object is needed though out the process for both cases. I don't understand why there is such a difference in memory consumption if I want to access and record the ul acc.

Can someone explain why this happens and give a workaround for constant memory if I need to access ul acc? Thanks.

Note: I know that I can do replacements of a vector in batch, this script is just for demonstration purpose, so please don't modify that part.


Solution

  • I would try to force the fields of Stupid and see if that helps.

    let f acc x = c `seq` a `seq` Stupid{content=c,ul=a}
           where
           c = content acc V.// [(x,x+123)]
           a = ul acc
    in foldl' f (Stupid {content=V.replicate 10000 10,ul=1}) $
       take 100000 $
       cycle [0..9999]
    

    This should be nearly equivalent to forcing the parameters of the function:

    foldl' (\acc x -> acc `seq` x `seq` 
            Stupid{content=(content acc) V.// [(x,x+123)],ul=ul acc})
       (Stupid {content=V.replicate 10000 10,ul=1}) $ take 100000 $ cycle [0..9999]
    

    (This can also be written with bang patterns, if one prefers those.)

    Another option, more aggressive, would be to use strictness annotations in the definition of the Stupid constructor.

    data ... = Stupid { content = ! someType , ul :: ! someOtherType }
    

    This will always force those fields in the whole program.