Search code examples
javaparsinggrammarcup

Shift/Reduce conflict found in state in Java Cup


I am trying to write parsing in java cup but I get conflict

start with prog;

prog ::= header;
header ::= t1 t3 t2 | t3 t1 t2 | t2 t3 t1 | t1 t2 t3 | t2 t1 t3 | t3 t2 t1;

t1 ::= TOK1;
t2 ::= TOK2 | TOK2 TOK2 ;
t3 ::= | t3 TOK3;

I got this error;

Warning : *** Shift/Reduce conflict found in state #3
  between t3 ::= (*) 
  and     t1 ::= (*) TOK1 
  under symbol TOK1
  Resolved in favor of shifting.

Can someone explain where I did mistake?


Solution

  • t3 can be empty (it matches a sequence of zero or more TOK3 tokens). That creates an ambiguity, because an input with no TOK3s can match more than one of the alternatives for header.

    This is ambiguous despite the fact that all the possibilities reduce to the same non-terminal, because each production has a different reduction action, and the grammar has no way to know which action to perform. As noted in the conflict report, the parser will shift rather than reduce. For example, if the first input is TOK1, the parser could assume the input will match t1 t2 t3 or t1 t3 t2, in which case the sequence of TOK3s will come later, or it could assume the input is t3 t1 t2 with an empty t3, in which it must do the action for an empty t3 before proceeding with the parse. Choosing to shift means rejecting the possibility of an empty t3 at the beginning of the input.

    In this particular case, the default shift is probably not going to cause problems. It means that an input consisting only of TOK1 and TOK2 will always be parsed with an empty t3 at the end, but if that's OK, then you can feel free to ignore the warning.

    So-called "nullable non-terminals" are often the cause of shift-reduce conflicts precisely of this form. It's often useful to avoid them; instead of making possibly empty lists, a better solution is often to make the list non-terminal match one or more elements, and then make it optional by adding productions in which the list is absent. In other words, instead of

    header: t3 t1 t2
    t3    :  
          | t3 TOK3
    

    use

    header: t3 t1 t2
          |    t1 t2
    t3    : TOK3
          | t3 TOK3
    

    That solution can lead to undesirable code duplication, but it's an easy way to solve shift-reduce conflicts due to nullable non-terminals.