Search code examples
jsonparsinghaskellaesonhaskell-pipes

Streaming parsing of JSON in Haskell with Pipes.Aeson


The Pipes.Aeson library exposes the following function:

decode :: (Monad m, ToJSON a) => Parser ByteString m (Either DecodingError a)

If I use evalStateT with this parser and a file handle as an argument, a single JSON object is read from the file and parsed.

The problem is that the file contains several objects (all of the same type) and I'd like to fold or reduce them as they are read.

Pipes.Parse provides:

foldAll :: Monad m => (x -> a -> x) -> x -> (x -> b) -> Parser a m b

but as you can see this returns a new parser - I can't think of a way of supplying the first parser as an argument.

It looks like a Parser is actually a Producer in a StateT monad transformer. I wondered whether there's a way of extracting the Producer from the StateT so that evalStateT can be applied to the foldAll Parser, and the Producer from the decode Parser.

This is probably completely the wrong approach though.

My question, in short:
When parsing a file using Pipes.Aeson, what's the best way to fold all the objects in the file?


Solution

  • Instead of using decode, you can use the decoded parsing lens from Pipes.Aeson.Unchecked. It turns a producer of ByteString into a producer of parsed JSON values.

    {-# LANGUAGE OverloadedStrings #-}
    
    module Main where
    
    import Pipes
    import qualified Pipes.Prelude as P
    import qualified Pipes.Aeson as A
    import qualified Pipes.Aeson.Unchecked as AU
    import qualified Data.ByteString as B
    
    import Control.Lens (view)
    
    byteProducer :: Monad m => Producer B.ByteString m ()
    byteProducer = yield "1 2 3 4"
    
    intProducer :: Monad m => Producer Int m (Either (A.DecodingError, Producer B.ByteString m ()) ())
    intProducer = view AU.decoded byteProducer
    

    The return value of intProducer is a bit scary, but it only means that intProducer finishes either with a parsing error and the unparsed bytes after the error, or with the return value of the original producer (which is () in our case).

    We can ignore the return value:

    intProducer' :: Monad m => Producer Int m ()
    intProducer' = intProducer >> return ()
    

    And plug the producer into a fold from Pipes.Prelude, like sum:

    main :: IO ()
    main = do
        total <- P.sum intProducer'
        putStrLn $ show total
    

    In ghci:

    λ :main
    10
    

    Note also that the functions purely and impurely let you apply to producers folds defined in the foldl package.