Search code examples
parsinghaskellmegaparsec

Left and right recursive parser


This is an evolution of this question.

I need to parse with megaparsec a data structure like

data Foo =
    Simple String
    Dotted Foo String
    Paren String Foo

and I would like to parse to it strings like

foo ::= alphanum
    | foo "." alphanum
    | alphanum "(" foo ")"

For example a the string "a(b.c).d" should be parsed to Dotted (Paren "a" (Dotted (Simple "b") "c")) "d".

The problem I have is that this is at the same time left and right recursive.

I have no problems writing the parsers for the first and the third case:

parser :: Parser Foo
parser 
    = try (do
        prefix <- alphanum
        constant "("
        content <- parser
        constant ")"
        pure $ Paren prefix content
        )
    <|> Simple alphanum

but I'm not able to put together also the parser for the second case. I tried to approach it with sepBy1 or with makeExprParser but I couldn't get it right


Solution

  • To factor out the left recursion in this:

    foo ::= alphanum
        | foo "." alphanum
        | alphanum "(" foo ")"
    

    You can start by rewriting it to this:

    foo ::= alphanum ("(" foo ")")?
          | foo "." alphanum
    

    Then you can factor out the left recursion using the standard trick of replacing:

    x ::= x y | z
    

    With:

    x ::= z x'
    
    x' ::= y x' | ∅
    

    In other words:

    x ::= z y*
    

    With x = foo, y = "." alphanum, and z = alphanum ("(" foo ")")?, that becomes:

    foo ::= alphanum ("(" foo ")")? ("." alphanum)*
    

    Then I believe your parser can just be something like this, since ? ~ zero or one ~ Maybe ~ optional and * ~ zero or more ~ [] ~ many:

    parser = do
      prefix <- Simple <$> alphanum
      maybeParens <- optional (constant "(" *> parser <* constant ")")
      suffixes <- many (constant "." *> alphanum)
      let
        prefix' = case maybeParens of
          Just content -> Paren prefix content
          Nothing -> prefix
      pure $ foldl' Dotted prefix' suffixes