Search code examples
grammarbisonyaccparser-generator

yacc/bison grammar: whats the difference between two rules?


I have to write some sort of parser and it's quite easy with tools like yacc and bison. But I have the following question:

NotRec: /*empty*/
| NotRec T_NOT
| NotRec T_MINUS
;

Expr:
| Term /*1*/
| NotRec Term /*2*/
;

What is the difference between rule 1 and 2?

In my opinion NotRec can be empty (because it has an empty branch) and therefore Term should be the same as NotRec Term. But if I remove the first rule I get different results!


Solution

  • As written, the grammar is ambiguous, as NotRec will match 0 or more T_NOT or T_MINUS tokens, so if you have an Expr which has no such tokens before the Term, it can be matched by either rule 1 or rule 2.

    If you remove the NotRec: /*empty*/ rule, then NotRec becomes useless as it won't match any finite sequence of tokens. This changes the language, removing any finite strings of T_NOT/T_MINUS.

    If you remove the Expr: Term rule, that gets rid of the ambiguity without changing the langauge.

    If you use this grammar as-is with yacc or bison, you'll get a shift/reduce conflict due to the ambiguity. The default conflict resolution of using the shift will resolve the ambiguity without changing the language -- as long as there are no other conflicting uses of those tokens in the part of the grammar you left out. It will use rule 1 for any Expr with no T_NOT/T_MINUS instructions and rule 2 for any Expr with one or more such tokens. This is equivalent to changing the NotRec rules to

    NotRec: T_NOT
          | T_MINUS
          | NotRec T_NOT
          | NotRec T_MINUS
          ;
    

    This makes NotRec match one or more tokens instead of zero or more.