Search code examples
haskellheaderbytestringconduit

Conduit that checks file header


I'd like to use Conduit in a setting where I read a binary file, check that it has the correct header, and then work on the remaining data in the file.

Trying to write a conduit that checks the header and then streams the rest of the data on to the following conduits I run into trouble. I have them live in a Either String monad for some exception handling. Here's a simplified version of the code (I'm aware there's a Condiut.Attoparsec module, but for now I'd like to write it myself):

import Conduit (ConduitM, mapC, mapM_C, takeWhileCE) 
import Data.ByteString.Conversion (toByteString')

separator :: ByteString
separator = toByteString' '#' 

check :: ByteString -> Either String ()

confirmHeader :: ConduitM ByteString ByteString (Either String) ()
confirmHeader = do
  takeWhileC (/= separator) .| mapM_C check
  mapC id

separator is a predefined ByteString that signals the end of the header. The line mapC id is supposed to pass on the rest of the stream if the header checks out. I left out the nonimportant details of check.

The part checking the header works. The last line, however, apart from looking inelegant and non-idiomatic, doesn't work. Running something like

runConduit $ yield (toByteString' "header#rest") .| confirmHeader .| sinkList

Gives Right [] rather than Right ["rest"], as I had hoped. Any ideas?


Solution

  • Your takeWhileC (/= separator) is taking the whole ByteString: it's not working on chunks of ByteStrings! You can use Data.Conduit.Binary to work on individual bytes of the stream. The below code works "as expected" I believe.

    module Main (main) where
    
    import           Conduit
    import           Data.ByteString (ByteString)
    import           Data.ByteString.Conversion (toByteString')
    import           Data.Char (ord)
    import qualified Data.Conduit.Binary as B
    import           GHC.Word (Word8)
    
    separator :: Word8
    separator = toEnum $ ord '#'
    
    check :: ByteString -> Either String ()
    check _ = Right ()
    
    confirmHeader :: ConduitM ByteString ByteString (Either String) ()
    confirmHeader = do
      B.takeWhile (/= separator) .| mapM_C check
      B.drop 1 -- drop separator which stayed in stream
      mapC id
    
    main :: IO ()
    main = print . runConduit $
      yield (toByteString' "header#rest") .| confirmHeader .| sinkList
    

    And the output:

    [nix-shell:/tmp]$ ghc C.hs -fforce-recomp -Wall -Werror -o Main && ./Main
    [1 of 1] Compiling Main             ( C.hs, C.o )
    Linking Main ...
    Right ["rest"]