Search code examples
haskellmonadsstring-parsingfree-monad

Parsing to Free Monads


Say I have the following free monad:

data ExampleF a
  = Foo Int a
  | Bar String (Int -> a)
  deriving Functor

type Example = Free ExampleF  -- this is the free monad want to discuss

I know how I can work with this monad, eg. I could write some nice helpers:

foo :: Int -> Example ()
foo i = liftF $ Foo i ()

bar :: String -> Example Int
bar s = liftF $ Bar s id

So I can write programs in haskell like:

fooThenBar :: Example Int
fooThenBar =
  do
    foo 10
    bar "nice"

I know how to print it, interpret it, etc. But what about parsing it?

Would it be possible to write a parser that could parse arbitrary programs like:

foo 12
bar nice
foo 11
foo 42

So I can store them, serialize them, use them in cli programs etc.

The problem I keep running into is that the type of the program depends on which program is being parsed. If the program ends with a foo it's of type Example () if it ends with a bar it's of type Example Int.

I do not feel like writing parsers for every possible permutation (it's simple here because there are only two possibilities, but imagine we add Baz Int (String -> a), Doo (Int -> a), Moz Int a, Foz String a, .... This get's tedious and error-prone).

Perhaps I'm solving the wrong problem?

Boilerplate

To run the above examples, you need to add this to the beginning of the file:

{-# LANGUAGE DeriveFunctor #-}

import Control.Monad.Free
import Text.ParserCombinators.Parsec

Note: I put up a gist containing this code.


Solution

  • Not every Example value can be represented on the page without reimplementing some portion of Haskell. For example, return putStrLn has a type of Example (String -> IO ()), but I don't think it makes sense to attempt to parse that sort of Example value out of a file.

    So let's restrict ourselves to parsing the examples you've given, which consist only of calls to foo and bar sequenced with >> (that is, no variable bindings and no arbitrary computations)*. The Backus-Naur form for our grammar looks approximately like this:

    <program> ::= "" | <expr> "\n" <program>
    <expr> ::= "foo " <integer> | "bar " <string>
    

    It's straightforward enough to parse our two types of expression...

    type Parser = Parsec String ()
    
    int :: Parser Int
    int = fmap read (many1 digit)
    
    parseFoo :: Parser (Example ())
    parseFoo = string "foo " *> fmap foo int
    
    parseBar :: Parser (Example Int)
    parseBar = string "bar " *> fmap bar (many1 alphaNum)
    

    ... but how can we give a type to the composition of these two parsers?

    parseExpr :: Parser (Example ???)
    parseExpr = parseFoo <|> parseBar
    

    parseFoo and parseBar have different types, so we can't compose them with <|> :: Alternative f => f a -> f a -> f a. Moreover, there's no way to know ahead of time which type the program we're given will be: as you point out, the type of the parsed program depends on the value of the input string. "Types depending on values" is called dependent types; Haskell doesn't feature a proper dependent type system, but it comes close enough for us to have a stab at making this example work.


    Let's start by forcing the expressions on either side of <|> to have the same type. This involves erasing Example's type parameter using existential quantification.†

    data Ex a = forall i. Wrap (a i)
    
    parseExpr :: Parser (Ex Example)
    parseExpr = fmap Wrap parseFoo <|> fmap Wrap parseBar
    

    This typechecks, but the parser now returns an Example containing a value of an unknown type. A value of unknown type is of course useless - but we do know something about Example's parameter: it must be either () or Int because those are the return types of parseFoo and parseBar. Programming is about getting knowledge out of your brain and onto the page, so we're going to wrap up the Example value with a bit of GADT evidence which, when unwrapped, will tell you whether a was Int or ().

    data Ty a where
        IntTy :: Ty Int
        UnitTy :: Ty ()
    
    data (a :*: b) i = a i :&: b i
    
    type Sig a b = Ex (a :*: b)
    pattern Sig x y = Wrap (x :&: y)
    
    parseExpr :: Parser (Sig Ty Example)
    parseExpr = fmap (\x -> Sig UnitTy x) parseFoo <|>
                fmap (\x -> Sig IntTy x) parseBar
    

    Ty is (something like) a runtime "singleton" representative of Example's type parameter. When you pattern match on IntTy, you learn that a ~ Int; when you pattern match on UnitTy you learn that a ~ (). (Information can be made to flow the other way, from types to values, using classes.) :*:, the functor product, pairs up two type constructors ensuring that their parameters are equal; thus, pattern matching on the Ty tells you about its accompanying Example.

    Sig is therefore called a dependent pair or sigma type - the type of the second component of the pair depends on the value of the first. This is a common technique: when you erase a type parameter by existential quantification, it usually pays to make it recoverable by bundling up a runtime representative of that parameter.

    Note that this use of Sig is equivalent to Either (Example Int) (Example ()) - a sigma type is a sum, after all - but this version scales better when you're summing over a large (or possibly infinite) set.

    Now it's easy to build our expression parser into a program parser. We just have to repeatedly apply the expression parser, and then manipulate the dependent pairs in the list.

    parseProgram :: Parser (Sig Ty Example)
    parseProgram = fmap (foldr1 combine) $ parseExpr `sepBy1` (char '\n')
        where combine (Sig _ val) (Sig ty acc) = Sig ty (val >> acc)
    

    The code I've shown you is not exemplary. It doesn't separate the concerns of parsing and typechecking. In production code I would modularise this design by first parsing the data into an untyped syntax tree - a separate data type which doesn't enforce the typing invariant - then transform that into a typed version by type-checking it. The dependent pair technique would still be necessary to give a type to the output of the type-checker, but it wouldn't be tangled up in the parser.

    *If binding is not a requirement, have you thought about using a free applicative to represent your data?

    Ex and :*: are reusable bits of machinery which I lifted from the Hasochism paper