Search code examples
parsinggrammarlalr

How to fix grammar with optional non-terminal?


I wrote grammar for LALR parser and I am stuck at optional non-terminal. Consider for example C++ dereference, when you can write:

******expression; 

Of course you can write:

expression;

And here is my problem, dereference non terminal is optional really and this has such impact on grammar, that now parser sees it fits everywhere (almost), because, well, it might be empty.

Is there a common pattern how should I rewrite the grammar to fix it?

I would also be grateful for pointing out some book or other resources which deals with "common problems & patterns when writing grammars".


Solution

  • First of all, the problem you are having is not the one you are claiming to have. Having a nullable (possibly empty) nonterminal does not mean that the parser will try to stick it everywhere. (I use the term “nullable” here to avoid confusion, because “optional” might refer to an optional occurrence of a nonterminal, as in x? where x is the nonterminal name). It just means that whenever you use that nonterminal in your grammar, the parser might skip over it or match with an empty word (details are according to the rules of the particular parsing algorithm, in your case LALR).

    Secondly, the problem most probably is that the resulting grammar is ambiguous. My guess is that you used some kind of combination of right recursion for defining the nonterminal with the stars, and having an asterisk as a binary multiplication operator. (Feel free to update the question with a grammar fragment, then I might be able to offer more detailed help).

    Thirdly, and mainly concerning your quest for general problems and patterns in grammars: usually people would not put the stars in one nonterminal and the expression in another, because ultimately you would want to transform your parse tree into an abstract syntax tree on which you probably intend to perform some calculations, in that case you would prefer to have a construction that says “dereference of a dereference of a dereference of an expression” rather than “three stars followed by an expression”. Again, the answer would have been less vague if you provided more details.