Let’s say I have several very large vectors. They are stored on disk. I need to access them individually by reading from each respective file which would place them into memory. I would perform some function on a single vector and then move to the next one I need access. I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector. I’m not sure if performMajorGC
would ensure that the vector would be garbage collected if it is stated in my program that I have to access that same vector again later by referencing the same function name that read the vector in from disk. In such a case I would read it into memory again, use it, then garbage collect it. How would I ensure it’s garage collection while using the same function name for the vector that is read from the same file?
Would appreciate any advice thanks
In response to Daniel Wagner:
myvec x :: Int -> IO (Vector (Vector ByteString))
myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
guard (isRight (Data.Csv.decode NoHeader y))
return y
yy <- ioy
return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])
myvecvec :: Vector (IO (Vector (Vector ByteString)))
myvecvec = generate 100 (\x -> myvec x)
somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
somefunc1 iovv = do vv <- iovv
somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()
-- same thing for somefunc2 and 3
oponvec :: IO ()
oponvec = do somefunc1 (myvecvec ! 0)
performGC
somefunc2 (myvecvec ! 1)
performGC
somefunc3 (myvecvec ! 0)
You can test this by using a weak pointer as follows:
import qualified Data.Vector.Unboxed as V
import System.Mem.Weak
import System.Mem
main :: IO ()
main = do
let xs = V.fromList [1..1000000:: Int]
wkp <- mkWeakPtr xs Nothing
performGC
xs' <- deRefWeak wkp
print xs'
On my system this prints Nothing
which means that the vector has been deallocated. However, I don't know if GHC guarantees that this happens.
Here's a program which checks @amalloy's suggestion:
import qualified Data.Vector.Unboxed as V
import Control.Monad
import Data.Word
{-# NOINLINE newLarge #-}
newLarge :: Word8 -> V.Vector Word8
newLarge n = V.replicate 5000000000 n -- 5GB
main :: IO ()
main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))
This uses exactly 5GB on my machine, which shows that there are never two large vectors allocated at the same time.