Search code examples
haskellmemory-managementgarbage-collection

How to free memory of a specific data structure in Haskell?


Let’s say I have several very large vectors. They are stored on disk. I need to access them individually by reading from each respective file which would place them into memory. I would perform some function on a single vector and then move to the next one I need access. I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector. I’m not sure if performMajorGC would ensure that the vector would be garbage collected if it is stated in my program that I have to access that same vector again later by referencing the same function name that read the vector in from disk. In such a case I would read it into memory again, use it, then garbage collect it. How would I ensure it’s garage collection while using the same function name for the vector that is read from the same file?

Would appreciate any advice thanks

In response to Daniel Wagner:

    myvec x :: Int -> IO (Vector (Vector ByteString))
    myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
                              guard (isRight (Data.Csv.decode NoHeader y)) 
                              return y
                 yy <- ioy 
                 return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])

    myvecvec :: Vector (IO (Vector (Vector ByteString)))
    myvecvec = generate 100 (\x -> myvec x)

    somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
    somefunc1 iovv = do vv <- iovv
                        somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()  

-- same thing for somefunc2 and 3

    oponvec :: IO ()
    oponvec = do somefunc1 (myvecvec ! 0)
                 performGC
                 somefunc2 (myvecvec ! 1)
                 performGC
                 somefunc3 (myvecvec ! 0)
    

Solution

  • You can test this by using a weak pointer as follows:

    import qualified Data.Vector.Unboxed as V
    import System.Mem.Weak
    import System.Mem
    
    main :: IO ()
    main = do
      let xs = V.fromList [1..1000000:: Int]
      wkp <- mkWeakPtr xs Nothing
      performGC
      xs' <- deRefWeak wkp
      print xs'
    

    On my system this prints Nothing which means that the vector has been deallocated. However, I don't know if GHC guarantees that this happens.

    Here's a program which checks @amalloy's suggestion:

    import qualified Data.Vector.Unboxed as V
    import Control.Monad
    import Data.Word
    
    {-# NOINLINE newLarge #-}
    newLarge :: Word8 -> V.Vector Word8
    newLarge n = V.replicate 5000000000 n -- 5GB
    
    main :: IO ()
    main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))
    

    This uses exactly 5GB on my machine, which shows that there are never two large vectors allocated at the same time.