Search code examples
socketsmemoryhaskelllazy-evaluationbytestring

Lazy ByteString built from Socket handle cannot be consumed and GCed lazily


I'm writing a network file transfer application. Using Lazy ByteString as a intermediate

import qualified Data.ByteString.Lazy as BSL

When constructing a BSL from local file, then put the BSL to a Handle of Socket:

BSL.readFile filename >>= BSL.hPut remoteH  -- OK

This works fine. Memory usage is constant. But for receiving data from Socket, then write to local file:

BSL.hGet remoteH size >>= BSL.hPut fileH bs  -- starts swapping in 1 second

I can see memory usage keep going up, BSL takes size bytes of memory. Worse, for large size that exceeded my physical memory size, OS starts swapping immediately.

I have to receive segments of ByteStrings recursively. That is OK.

Why BSL behave like that?


Solution

  • hGet is strict -- it immediately demands the number of bytes you requested. It does this in order to facilitate packet level reading of data.

    However, hGetContentsN is lazy, and readFile is implemented in terms of hGetContentsN.

    Consider the two implementations:

    hGetContentsN :: Int -> Handle -> IO ByteString
    hGetContentsN k h = lazyRead -- TODO close on exceptions
      where
        lazyRead = unsafeInterleaveIO loop
    
        loop = do
            c <- S.hGetSome h k -- only blocks if there is no data available
            if S.null c
              then do hClose h >> return Empty
              else do cs <- lazyRead
                      return (Chunk c cs)
    

    and

    hGet :: Handle -> Int -> IO ByteString
    hGet = hGetN defaultChunkSize
    
    hGetN :: Int -> Handle -> Int -> IO ByteString
    hGetN k h n | n > 0 = readChunks n
      where
        STRICT1(readChunks)
        readChunks i = do
            c <- S.hGet h (min k i)
            case S.length c of
                0 -> return Empty
                m -> do cs <- readChunks (i - m)
                        return (Chunk c cs)
    

    The key magic is the laziness in hGetContentsN.