Search code examples
haskellconduit

Haskell Conduit: having a Sink return a value based on the values from upstream


I've been trying to use the Conduit library to do some simple I/O involving files, but I'm having a hard time.

I have a text file containing nothing but a few digits such as 1234. I have a function that reads the file using readFile (no conduits), and returns Maybe Int (Nothing is returned when the file actually doesn't exist). I'm trying to write a version of this function that uses conduits, and I just can't figure it out.

Here is what I have:

import Control.Monad.Trans.Resource
import Data.Conduit
import Data.Functor
import System.Directory
import qualified Data.ByteString.Char8 as B
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.Text as CT
import qualified Data.Text as T

myFile :: FilePath
myFile = "numberFile"

withoutConduit :: IO (Maybe Int)
withoutConduit = do
    doesExist <- doesFileExist myFile
    if doesExist
      then Just . read <$> readFile myFile
      else return Nothing

withConduit :: IO (Maybe Int)
withConduit = do
    doesExist <- doesFileExist myFile
    if doesExist
      then runResourceT $ source $$ conduit =$ sink
      else return Nothing
  where
    source :: Source (ResourceT IO) B.ByteString
    source = CB.sourceFile myFile

    conduit :: Conduit B.ByteString (ResourceT IO) T.Text
    conduit = CT.decodeUtf8

    sink :: Sink T.Text (ResourceT IO) (Maybe Int)
    sink = awaitForever $ \txt -> let num = read . T.unpack $ txt :: Int
                                  in -- I don't know what to do here...

Could someone please help me complete the sink function? Thanks!


Solution

  • This isn't really a good example for where conduit actually provides a lot of value, at least not the way you're looking at it right now. Specifically, you're trying to use the read function, which requires that the entire value be in memory. Additionally, your current error handling behavior is a bit loose. Essentially, you're just going to get an read: no parse error if there's anything unexpected in the content.

    However, there is a way we can play with this in conduit and be meaningful: by parsing the ByteString byte-by-byte ourselves and avoiding the read function. Fortunately, this pattern falls into a standard left fold, which the conduit-combinators package provides a perfect function for (element-wise left fold in a conduit, aka foldlCE):

    {-# LANGUAGE OverloadedStrings #-}
    import Conduit
    import Data.Word8
    import qualified Data.ByteString as S
    
    sinkInt :: Monad m => Consumer S.ByteString m Int
    sinkInt =
        foldlCE go 0
      where
        go total w
            | _0 <= w && w <= _9 =
                total * 10 + (fromIntegral $ w - _0)
            | otherwise = error $ "Invalid byte: " ++ show w
    
    main :: IO ()
    main = do
        x <- yieldMany ["1234", "5678"] $$ sinkInt
        print x
    

    There are plenty of caveats that go along with this: it will simply throw an exception if there are unexpected bytes, and it doesn't handle integer overflow at all (though fixing that is just a matter of replacing Int with Integer). It's important to note that, since the in-memory string representation of a valid 32- or 64-bit int is always going to be tiny, conduit is overkill for this problem, though I hope that this code gives some guidance on how to generally write conduit code.