Search code examples
parsingrecursioncompiler-constructiongrammar

Indirect Recursion in Grammar


I'm trying to remove the indirect recursion from the following grammar:

F -> P X
X -> R C
R -> C R | I R | epsilon

All capital letters are non-terminals, I have just left off the productions as they aren't important.

Following this through, you can see that I will get something that looks like F -> P (C | I)* C which is causing issues for my recursive descent parser.

Any valid expression must end in C but this final C is always consumed by repeatedly using the R production leaving no tokens left for final C of the X -> R C production.

Take the following expression of tokens as an example: "P C I C"

  • We use the F -> P X production to remove the first P leaving "C I C"
  • Then X -> R C -> C R C so we can consume the C leaving "I C"
  • Then R C -> I R C so we can consume the I leaving "C"

Now we hit the problem. A recursive decent parser will simply choose to continue expanding the R like so:

  • R C -> C R C so we consume the final character C leaving no remaining tokens!

At this point, my program will error despite the valid input because the final R C -> C (using the epsilon production) and there is no remaining C to match!

I think all I need to do is cleverly rearrange the grammar in some way to remove this ambiguity. Would appreciate any help!


Solution

  • You can rewrite P (C | I)* C as P I* C+ (I+ C+)*, which is straightforward to implement as a recursive-descent parser. If you write it out as a right-recursive BNF grammar, you'll end up with incorrect associativity, but normally you would write the recursive descent parser with loops, using the extended BNF as above. [Note 1]

    In effect, once you parse the P, you end up with something like this:

    do {
      while (try_parse(I)) {}
      parse(C);
      while (try_parse(C)) {}
    } while (try_parse(I));
    

    Notes

    1. Or, at least, I would, if I wrote recursive descent parsers any more. This is much more straight-forward as a left-recursive grammar and it's LALR(1) so you can generate a parser with bison.