Search code examples
parsinghaskellparsecparser-combinatorspeg

Implementation differences between parser combinators and packrat algorithm


In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind):

instance Derivs d => Monad (Parser d) where 

  -- Sequencing combinator
  (Parser p1) >>= f = Parser parse

    where parse dvs = first (p1 dvs)

          first (Parsed val rem err) = 
            let Parser p2 = f val
            in second err (p2 rem)
          first (NoParse err) = NoParse err

          second err1 (Parsed val rem err) =
            Parsed val rem (joinErrors err1 err)
          second err1 (NoParse err) =
            NoParse (joinErrors err1 err)

-- Result-producing combinator
return x = Parser (\dvs -> Parsed x dvs (nullError dvs))

-- Failure combinator
fail [] = Parser (\dvs -> NoParse (nullError dvs))
fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg))

For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):

bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s

I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part. Sadly it seems that this concept is not used in this implementation.

What is the big difference between parser combinators and packrat at implementation level?

PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.


Solution

  • The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.

    The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere. Look at the following code (taken from Ford's thesis):

    data Derivs = Derivs {
                       dvAdditive :: Result Int,
                       dvMultitive :: Result Int,
                       dvPrimary :: Result Int,
                       dvDecimal :: Result Int,
                       dvChar :: Result Char}
    
    
    pExpression :: Derivs -> Result ArithDerivs Int
    Parser pExpression = (do char ’(’
                             l <- Parser dvExpression
                             char ’+’
                             r <- Parser dvExpression
                             char ’)’
                             return (l + r))
                         </> (do Parser dvDecimal)
    

    Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.

    This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):

    parse :: String -> Derivs
    parse s = d where
      d = Derivs add mult prim dec chr
      add = pAdditive d
      mult = pMultitive d
      prim = pPrimary d
      dec = pDecimal d
      chr = case s of
              (c:s’) -> Parsed c (parse s’)
              [] -> NoParse
    

    These snippets are really where the packrat trick happens. It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar. There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat. For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.