Search code examples
bisonreduce-reduce-conflict

Bison reduce/reduce


I am new to Bison parsing and I cannot understand how it works. I have the following grammar, where I have kept the bare minimum to highlight the problem.

%left '~'
%left '+' 
%token T_VARIABLE
%% 
start: expr;
expr: composite_expr | variable_expr;
variable_expr: T_VARIABLE;
composite_expr:   
     expr '+' expr      
   | '~' variable_expr   { do_something_1(); }
   | '~' composite_expr  { do_something_2(); }
;

%%

As you can see, I want to apply different functions to the '~' operator depending on the kind of expression that follows. However, this produces 2 reduce/reduce conflicts.

Of course, if I rewrite the composite_expr rule like this...

composite_expr:   
     expr '+' expr      
   | '~' expr           { /* ??? */ }
;

...then there are no conflicts, but now I cannot call either do_something_1() or do_something_2() because I can no longer tell if expr is variable_expr or composite_expr.

Is there any other way that I can do this? Can anyone explain why there where reduce/reduce conflicts in the first place?

Keep in mind that this is a stripped down version and, in reality, the rule composite_expr is very long. So duplicating it is out of the question.


Solution

  • The basic problem is that you have an ambiguous grammar, and you're (attempting to) use precedence rules to resolve the ambiguity, but it fails because the ambiguity manifests as a reduce/reduce conflict, and precedence rules are useless in that case. You can get a better idea what is going on by using bison's -v option to get a listing of the state machine it builds, so you can see exactly where and how the conflict manifests.

    In this case, you'll see something like:

    state 8
    
        3 expr: variable_expr .
        6 composite_expr: '~' variable_expr .
    
        $end      reduce using rule 6 (composite_expr)
        '+'       reduce using rule 3 (expr)
        '+'       [reduce using rule 6 (composite_expr)]
        $default  reduce using rule 3 (expr)
    

    which is telling you it doesn't know which rule to reduce on a lookahead of +. Now obviously from your predence rules you want rule 6 (since ~ is higher precedence than +), but the precedence disambiguation rules in bison are a bit of a hack and can't deal with this -- it fails to understand that reducing rule 3 will result in shifting the + before reducing a rule that will consume the ~.

    So what can you do about this? You can accept the existence of the conflict and order your rules so the right thing happens. In this case, you need to move the expr: composite_expr | variable_expr rule to the end (at least to after the composite_expr rules). This is kind of ugly, hard to understand, and even harder to maintain.

    Alternately, you can unfactor things to get rid of the single rules (rules with a single non-terminal and nothing else on the right side -- these are the rules that tend to trigger reduce/reduce problems.) Something like:

    composite_expr:   
         composite_expr '+' composite_expr
       | composite_expr '+' variable_expr
       | variable_expr '+' composite_expr
       | variable_expr '+' variable_expr
       | '~' variable_expr   { do_something_1(); }
       | '~' composite_expr  { do_something_2(); }
    ;
    

    This is unlikely to be practical if there are many composite_expr rules like you describe.

    The best alternative is likely to be to not do this is in the grammar at all, and instead make the choice based on your semantic rules. Something like:

    expr:   
         expr '+' expr      { $$.isComposite = true; }
       | '~' expr           { if ($2.isComposite)
                                  do_something_2();
                              else
                                  do_something_1();
                              $$.isComposite = true; }
       | T_VARIABLE         { $$.isComposite = false; }
    ;
    

    and you setup the %type for expr to be a struct with an extra bool isComposite field along with whatever else you were using it for.