Remove shift/reduce conflicts with dot expression and arrays

Sorry, I am new to Bison. I don't understand the problem and how to fix it. I appreciate if you can "teach me how to fish" while pointing out the problem and solution:

%left '.' '+'
%right '(' '['

%%
 OptionalExpressions
  : { $$ = nullptr; }
  | Expressions
  ;

Expressions
  : Expression
  | Expressions ',' Expression
  ;

Expression
  : Expression '+' Expression
  | ExpressionDot
  | TOKEN_ADDROF Expression
  | ExpressionIndexable
  | ExpressionFunctionCall
  | TOKEN_INTEGER
  | TOKEN_IDENTIFIER
  | '(' Expression ')'
  ;

ExpressionDot
  : Expression '.' Expression
  ;

ExpressionIndexable
  : Expression '[' Expression ']'
  ;

ExpressionFunctionCall
  : Expression '(' OptionalExpressions ')'
  ;
%%

Thank you.

Solution

Before your next fishing expedition, you should carefully read the section in the Bison manual on using precedence declarations and perhaps some of the related SO answers. The Bison manual also has some very useful information about understanding conflicts and the tools Bison provides to help you. Here, I'm basically following the procedure described in the last link.

The first step is to ask Bison to generate a report of the parser's states, which merely requires giving it the -v option (or --report=all if you want more information, which is occasionally useful). The first line in the resulting .output file tells you which states have shift/reduce conflicts:

State 12 conflicts: 4 shift/reduce

So the next step is to take a look at State 12. The conflicts are indicated by parser actions in brackets; I bolded them to make them more visible. (The bracketed actions are the ones bison eliminated using its default resolution algorithm, also described in the manual.)


    State 12

        5 Expression: Expression . '+' Expression
        7           | TOKEN_ADDROF Expression .
       13 ExpressionDot: Expression . '.' Expression
       14 ExpressionIndexable: Expression . '[' Expression ']'
       15 ExpressionFunctionCall: Expression . '(' OptionalExpressions ')'

        '.'  shift, and go to state 15
        '+'  shift, and go to state 16
        '('  shift, and go to state 17
        '['  shift, and go to state 18

        '.'       [reduce using rule 7 (Expression)]
        '+'       [reduce using rule 7 (Expression)]
        '('       [reduce using rule 7 (Expression)]
        '['       [reduce using rule 7 (Expression)]
        $default  reduce using rule 7 (Expression)

So in this state, bison has not been able to apply any precedence rule, to decide what to do in the case the the reduction in rule 7 is applicable. Rule 7 is conveniently reproduced in the report:

    7           | TOKEN_ADDROF Expression .

The precedence of that rule will be the precedence of the TOKEN_ADDROF terminal. But that precedence is not defined because TOKEN_ADDROF does not appear in any precedence level.

We can try adding it:

%left '.' '+'
%precedence TOKEN_ADDROF
%precedence '(' '['

And, hey presto!

It would be fair to ask why I put it where I put it, and why I used %precedence instead of %left or %right, both for it and for the other unary operators.

To start with the second question, %precedence means "this precedence level involves operators which cannot have a conflict resolved with associativity, so I'm not going to declare any particular associativity."

And that's true in this case: unary operators have no associativity. The - in 3-4-7 could associate to the left ((3-4)-7)) or to the right (3-(4-7)). The resolution will be made based on an Expression production with precedence - (Expression: Expression '-' Expression) and a lookahead token with precedence - (-). That obviously can happen. By constrast, consider the TOKEN_ADDROF operator (is that not &, by the way? If so, just write it as a character token.) Here, the relevant production, as we've already seen, is

Expression: TOKEN_ADDROF Expression

Now, what if the lookahead token is TOKEN_ADDROF? Answer: it's a syntax error, because TOKEN_ADDROF is not a binary operator, so it cannot follow an expression. (It could be that you have a binary operator with the same spelling. But in that case, you would have put %prec UNOP on the above production, and then there would be no possibility that the lookahead token could be UNOP because that token is never produced by the lexical scanner.) So there's no production which allows a shift, and thus no conflict.

A similar line of reasoning applies to postfix operators, like function application and subscripting. (And post-increment and post-decrement, if applicable.) In those cases, a following postfix operator is possible, but it cannot be shifted until the Expression POSTFIX production is reduced. Again, no possible conflict.

So in the case of postfix operators, the only precedence comparisons will be between levels, not with a level, and associativity doesn't apply. Not specifying an associativity will cause bison to generate a conflict warning if you accidentally mistype the grammar in a way that allows associativity (for example, failing to insert the %prec UNOP declaration where it is nececessary), instead of silently ignoring the error.

In this particular grammar, putting '[' and '(' in a precedence level is unnecessary because those tokens are not used directly in any Expression production. That means that the grammar provides explicit precedence for those operators. Using both precedence declarations and explicit precedence in the same grammar is often a sign that the various parts of the grammar have been copy-and-pasted from different sources. (Just sayin'.) It's not usually considered good style although it is sometimes justifiable. In this case, I'd suggest using either explicit or declared precedence explicitly.

So let's suppose that the precedence levels need to be declared. In that case, why did I put the postfix operators at the end? Answer: because it is a general guideline (though not an absolute rule) that postfix operations bind more tightly than prefix operations. For example, -arr[i] does not mean (-arr)[i]. That's so obviously true that most people don't think about it, although they will sometimes fail to apply the rule to *arr[i] and *arr++, which have exactly the same precedence relations.

Hope that helps.