I'm writing a network file transfer application. Using Lazy ByteString as a intermediate
import qualified Data.ByteString.Lazy as BSL
When constructing a BSL from local file, then put the BSL to a Handle of Socket:
BSL.readFile filename >>= BSL.hPut remoteH -- OK
This works fine. Memory usage is constant. But for receiving data from Socket, then write to local file:
BSL.hGet remoteH size >>= BSL.hPut fileH bs -- starts swapping in 1 second
I can see memory usage keep going up, BSL takes size bytes of memory. Worse, for large size that exceeded my physical memory size, OS starts swapping immediately.
I have to receive segments of ByteStrings recursively. That is OK.
Why BSL behave like that?
hGet
is strict -- it immediately demands the number of bytes you requested. It does this in order to facilitate packet level reading of data.
However, hGetContentsN
is lazy, and readFile
is implemented in terms of hGetContentsN
.
Consider the two implementations:
hGetContentsN :: Int -> Handle -> IO ByteString
hGetContentsN k h = lazyRead -- TODO close on exceptions
where
lazyRead = unsafeInterleaveIO loop
loop = do
c <- S.hGetSome h k -- only blocks if there is no data available
if S.null c
then do hClose h >> return Empty
else do cs <- lazyRead
return (Chunk c cs)
and
hGet :: Handle -> Int -> IO ByteString
hGet = hGetN defaultChunkSize
hGetN :: Int -> Handle -> Int -> IO ByteString
hGetN k h n | n > 0 = readChunks n
where
STRICT1(readChunks)
readChunks i = do
c <- S.hGet h (min k i)
case S.length c of
0 -> return Empty
m -> do cs <- readChunks (i - m)
return (Chunk c cs)
The key magic is the laziness in hGetContentsN
.