The streaming-bytestring library gives an error after printing about 512 bytes.
Error:
openBinaryFile: resource exhausted (Too many open files)
Code:
import Control.Monad.Trans (lift, MonadIO)
import Control.Monad.Trans.Resource (runResourceT, MonadResource, MonadUnliftIO, ResourceT, liftResourceT)
import qualified Data.ByteString.Streaming as BSS
import qualified Data.ByteString.Streaming.Char8 as BSSC
import System.TimeIt
main :: IO ()
main = timeIt $ runResourceT $ dump $ BSS.drop 24 $ BSS.readFile "filename"
dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
isEmpty <- BSS.null_ bs
if isEmpty then return ()
else do
BSSC.putStr $ BSS.take 1 bs
dump $ BSS.drop 1 bs
When working with streaming libraries, it's usually a bad idea to reuse a effectful stream. That is, you can apply a function like drop
or splitAt
to a stream and then continue working with the resulting stream, or you can consume the stream as a whole with a function like fold, which leaves you in the base monad. But you should never apply the same stream value to two different functions.
Sadly, the Haskell type system as it stands is not able to enforce that restriction at compile time, it would require some form of linear types. Instead, it becomes the responsibility of the user.
The null_
function is perhaps a wart in the streaming-bytestring api, because it doesn’t return a new stream along with the result, giving the impression that stream reuse is normal throughout the API. It would be better if it had a signature like
null_ :: ByteString m r -> m (Bool, ByteString m r)
.
Similarly, don't use drop
and take
with the same stream value. Instead, use splitAt
or uncons
and work with the divided result.
dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
mc <- BSSC.uncons bs -- bs is only used once
case mc of
Left _ -> return ()
Right (c,rest) -> do liftIO $ putChar c
dump rest
So, about the error. As @BobDalgleish mentions in the comments, what is happening is that the file is opened when null_
is invoked (it is the first time we "demand" something from the stream) . In the recursive call we pass the original bs
value again, so it will open the file again, one time for each iteration, until we hit the file handle limit.
Personally, I'm not a fan of using ResourceT
with streaming libraries. I prefer opening the file with withFile
and then create and consume the stream withing the callback, if possible. But some things are more difficult that way.