Search code examples
parsinghaskellmonadslexerfree-monad

Implementing a lexer using the Free Monad


I am thinking about a use case of the free monad which would be a simple lexing DSL. So far I came up with some primitive operations:

data LexF r where
  POP  :: (Char -> r) -> LexF r
  PEEK :: (Char -> r) -> LexF r
  FAIL :: LexF r
  ...

instance Functor LexF where
  ...

type Lex = Free LexF

The problem I encounter is that I would like to have a CHOICE primitive that would describe an operation of trying to execute one parser and in case of failure fallback to another. Something like CHOICE :: LexF r -> LexF r -> (r -> r) -> LexF r...

...and here the stairs begin. Since r is preset at contravariant position, it is impossible (is it?) to create a valid Functor instance for Op. I came up with some other idea, which was to generalize over the type of alternative lexers, so CHOICE :: LexF a -> LexF a -> (a -> r) -> LexF r – now it works as a Functor, though the problem is with thawing it into Free, as I would normally do it with liftF:

choice :: OpF a -> OpF a -> OpF a
choice c1 c2 = liftF $ CHOICE _ _ id  -- how to fill the holes :: Op a ?

I am really running out of any ideas. This of course generalizes to nearly all other combinators, I just find CHOICE a good minimal case. How to tackle it? I am okay to hear that this example is totally broken and it just won't work with Free as I would like to. But therefore, does it even make sense to write lexers/parsers in this manner?


Solution

  • As a general rule when working with free monads, you don't want to introduce primitives for "monadic control". For example, a SEQUENCE primitive would be ill-advised, because the free monad itself provides sequencing. Likewise, a CHOICE primitive is ill-advised because this should be provided by a free MonadPlus.

    Now, there is no free MonadPlus in modern versions of free because equivalent functionality is provided by a free monad transformer over a list base monad, namely FreeT f []. So, what you probably want is to define:

    data LexF r where
      POP  :: (Char -> r) -> LexF r
      PEEK :: (Char -> r) -> LexF r
    deriving instance Functor LexF
    type Lex = FreeT LexF []
    
    pop :: (Char -> a) -> Lex a
    pop f = liftF $ POP f
    
    peek :: (Char -> a) -> Lex a
    peek f = liftF $ PEEK f
    

    but no FAIL or CHOICE primitives.

    If you were to define fail and choice combinators, they would be defined by means of the list base monad using transformer magic:

    fail :: Lex a
    fail = empty
    
    choice :: Lex a -> Lex a -> Lex a
    choice = (<|>)
    

    though there's no actual reason to define these.

    SPOILERS follow... Anyway, you can now write things like:

    anyChar :: Lex Char
    anyChar = pop id
    
    char :: Char -> Lex Char
    char c = do
      c' <- anyChar
      guard $ c == c'
      return c'
    
    a_or_b :: Lex Char
    a_or_b = char 'a' <|> char 'b'
    

    With an interpreter for your monad primitives, in this case intrepreting them to the StateT String [] AKA String -> [(a,String)] monad:

    type Parser = StateT String []
    runLex :: Lex a -> Parser a
    runLex = iterTM go
      where go :: LexF (Parser a) -> Parser a
            go (POP f) = StateT pop' >>= f
              where pop' (c:cs) = [(c,cs)]
                    pop' _      = []
            go (PEEK f) = StateT peek' >>= f
              where peek' (c:cs) = [(c,c:cs)]
                    peek' _      = []
    parse :: Lex a -> String -> [(a, String)]
    parse = runStateT . runLex
    

    you can then:

    main :: IO ()
    main = do
      let test = parse a_or_b
      print $ test "abc"
      print $ test "bca"
      print $ test "cde"
    

    The full example:

    {-# LANGUAGE DeriveFunctor #-}
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE GADTs #-}
    {-# OPTIONS_GHC -Wall #-}
    
    import Control.Monad.State
    import Control.Applicative
    import Control.Monad.Trans.Free
    
    data LexF r where
      POP  :: (Char -> r) -> LexF r
      PEEK :: (Char -> r) -> LexF r
    deriving instance Functor LexF
    type Lex = FreeT LexF []
    
    pop :: (Char -> a) -> Lex a
    pop f = liftF $ POP f
    
    peek :: (Char -> a) -> Lex a
    peek f = liftF $ PEEK f
    
    anyChar :: Lex Char
    anyChar = pop id
    
    char :: Char -> Lex Char
    char c = do
      c' <- anyChar
      guard $ c == c'
      return c'
    
    a_or_b :: Lex Char
    a_or_b = char 'a' <|> char 'b'
    
    type Parser = StateT String []
    runLex :: Lex a -> Parser a
    runLex = iterTM go
      where go :: LexF (Parser a) -> Parser a
            go (POP f) = StateT pop' >>= f
              where pop' (c:cs) = [(c,cs)]
                    pop' _      = []
            go (PEEK f) = StateT peek' >>= f
              where peek' (c:cs) = [(c,c:cs)]
                    peek' _      = []
    parse :: Lex a -> String -> [(a, String)]
    parse = runStateT . runLex
    
    main :: IO ()
    main = do
      let test = parse a_or_b
      print $ test "abc"
      print $ test "bca"
      print $ test "cde"