Search code examples
haskellhaskell-streaming

Why is streaming-bytestring giving me error "openBinaryFile: resource exhausted (Too many open files)"?


The streaming-bytestring library gives an error after printing about 512 bytes.

Error:

openBinaryFile: resource exhausted (Too many open files)

Code:

import           Control.Monad.Trans (lift, MonadIO)
import           Control.Monad.Trans.Resource (runResourceT, MonadResource, MonadUnliftIO, ResourceT, liftResourceT)
import qualified Data.ByteString.Streaming          as BSS
import qualified Data.ByteString.Streaming.Char8    as BSSC
import           System.TimeIt

main :: IO ()
main = timeIt $ runResourceT $ dump $ BSS.drop 24 $ BSS.readFile "filename"

dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
    isEmpty <- BSS.null_ bs
    if isEmpty then return ()
    else do
        BSSC.putStr $ BSS.take 1 bs
        dump $ BSS.drop 1 bs

Solution

  • When working with streaming libraries, it's usually a bad idea to reuse a effectful stream. That is, you can apply a function like drop or splitAt to a stream and then continue working with the resulting stream, or you can consume the stream as a whole with a function like fold, which leaves you in the base monad. But you should never apply the same stream value to two different functions.

    Sadly, the Haskell type system as it stands is not able to enforce that restriction at compile time, it would require some form of linear types. Instead, it becomes the responsibility of the user.

    The null_ function is perhaps a wart in the streaming-bytestring api, because it doesn’t return a new stream along with the result, giving the impression that stream reuse is normal throughout the API. It would be better if it had a signature like null_ :: ByteString m r -> m (Bool, ByteString m r).

    Similarly, don't use drop and take with the same stream value. Instead, use splitAt or uncons and work with the divided result.

    dump :: MonadIO m => BSS.ByteString m r -> m ()
    dump bs = do
        mc <- BSSC.uncons bs -- bs is only used once
        case mc of
            Left _ -> return ()
            Right (c,rest) -> do liftIO $ putChar c
                                 dump rest
    

    So, about the error. As @BobDalgleish mentions in the comments, what is happening is that the file is opened when null_ is invoked (it is the first time we "demand" something from the stream) . In the recursive call we pass the original bs value again, so it will open the file again, one time for each iteration, until we hit the file handle limit.


    Personally, I'm not a fan of using ResourceT with streaming libraries. I prefer opening the file with withFile and then create and consume the stream withing the callback, if possible. But some things are more difficult that way.