I'm trying to remove the indirect recursion from the following grammar:
F -> P X
X -> R C
R -> C R | I R | epsilon
All capital letters are non-terminals, I have just left off the productions as they aren't important.
Following this through, you can see that I will get something that looks like F -> P (C | I)* C which is causing issues for my recursive descent parser.
Any valid expression must end in C but this final C is always consumed by repeatedly using the R production leaving no tokens left for final C of the X -> R C production.
Take the following expression of tokens as an example: "P C I C"
Now we hit the problem. A recursive decent parser will simply choose to continue expanding the R like so:
At this point, my program will error despite the valid input because the final R C -> C (using the epsilon production) and there is no remaining C to match!
I think all I need to do is cleverly rearrange the grammar in some way to remove this ambiguity. Would appreciate any help!
You can rewrite P (C | I)* C
as P I* C+ (I+ C+)*
, which is straightforward to implement as a recursive-descent parser. If you write it out as a right-recursive BNF grammar, you'll end up with incorrect associativity, but normally you would write the recursive descent parser with loops, using the extended BNF as above. [Note 1]
In effect, once you parse the P
, you end up with something like this:
do {
while (try_parse(I)) {}
parse(C);
while (try_parse(C)) {}
} while (try_parse(I));