I'm trying to convert a log file parser that I wrote a while back over to conduit, and I'm running into an issue. I'll simplify the details of the parser itself since that's not relevant to the question. I have a log file that looks like this:
200 GET
404 POST
500 GET
FOO
301 PUT
302 GET
201 POST
So the parsing code is pretty straightforward:
data SimpleLogEntry = SimpleLogEntry {
status :: Int
, method :: String
} deriving (Show, Eq)
parseHTTPStatus :: Parser Int
parseHTTPStatus = validate <$> decimal
where validate d = if (d >= 200 && d < 999) then d else 100
parseHTTPMethod :: Parser String
parseHTTPMethod =
(stringCI "GET" *> return "Get")
<|> (stringCI "POST" *> return "Post")
<|> (stringCI "PUT" *> return "Put")
<|> return "Unknown"
parseLogLine :: Parser SimpleLogEntry
parseLogLine = fmap SimpleLogEntry
parseHTTPStatus
<*> (space *> parseHTTPMethod)
So far so good. Here's how I implemented this in conduit:
import Prelude hiding (lines)
import Control.Applicative
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Resource (runResourceT, ResourceT)
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import Data.Conduit
import qualified Data.Conduit.Attoparsec as CA
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
logLines:: Source (ResourceT IO) B.ByteString
logLines = CB.sourceFile "~/test.log" $= CB.lines
parseEntry :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry = CA.conduitParserEither parseLogLine =$= awaitForever go
where
go (Left err) = liftIO $ putStrLn ("Got an error: " ++ CA.errorMessage err)
go (Right (_, logEntry)) = yield logEntry
sink :: Sink SimpleLogEntry (ResourceT IO) ()
sink = CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a status: " ++ (show . status) t)
main :: IO ()
main = runResourceT $ logLines $= parseEntry $$ sink
When running main
I get this output:
Got a status: 200
Got a status: 404
Got a status: 500
Got an error: Failed reading: takeWhile1
I'm having trouble understanding why the pipeline terminates at this point, rather that continuing to parse the next line of the file, as I'd like to do. Reading the docs for Data.Conduit.Attoparsec
, this seems like exactly the use case conduitParserEither
was designed for.
Per @Fabian, it turns out that conduitParserEither
wasn't really what I wanted here. Here's a definition of parseEntry
that does what I wanted to do:
parseEntry' :: ConduitM B8.ByteString SimpleLogEntry (ResourceT IO) ()
parseEntry' = (CL.map (parseOnly parseLogLine)) =$= awaitForever go
where
go (Left err) = liftIO $ putStrLn ("Got an error: " ++ err)
go (Right logEntry) = yield logEntry
The conduitParser
(or conduitParserEither
) can also consume multiple tokens on one line: for example the following input produces the same result:
200 GET404 POST
500 GET
FOO
301 PUT
302 GET
201 POST
So it makes sense that the parser doesn't continue because it doesn't know where the next token would begin.