Sorry, I am new to Bison. I don't understand the problem and how to fix it. I appreciate if you can "teach me how to fish" while pointing out the problem and solution:
%left '.' '+'
%right '(' '['
%%
OptionalExpressions
: { $$ = nullptr; }
| Expressions
;
Expressions
: Expression
| Expressions ',' Expression
;
Expression
: Expression '+' Expression
| ExpressionDot
| TOKEN_ADDROF Expression
| ExpressionIndexable
| ExpressionFunctionCall
| TOKEN_INTEGER
| TOKEN_IDENTIFIER
| '(' Expression ')'
;
ExpressionDot
: Expression '.' Expression
;
ExpressionIndexable
: Expression '[' Expression ']'
;
ExpressionFunctionCall
: Expression '(' OptionalExpressions ')'
;
%%
Thank you.
Before your next fishing expedition, you should carefully read the section in the Bison manual on using precedence declarations and perhaps some of the related SO answers. The Bison manual also has some very useful information about understanding conflicts and the tools Bison provides to help you. Here, I'm basically following the procedure described in the last link.
The first step is to ask Bison to generate a report of the parser's states, which merely requires giving it the -v
option (or --report=all
if you want more information, which is occasionally useful). The first line in the resulting .output
file tells you which states have shift/reduce conflicts:
State 12 conflicts: 4 shift/reduce
So the next step is to take a look at State 12. The conflicts are indicated by parser actions in brackets; I bolded them to make them more visible. (The bracketed actions are the ones bison eliminated using its default resolution algorithm, also described in the manual.)
State 12
5 Expression: Expression . '+' Expression
7 | TOKEN_ADDROF Expression .
13 ExpressionDot: Expression . '.' Expression
14 ExpressionIndexable: Expression . '[' Expression ']'
15 ExpressionFunctionCall: Expression . '(' OptionalExpressions ')'
'.' shift, and go to state 15
'+' shift, and go to state 16
'(' shift, and go to state 17
'[' shift, and go to state 18
'.' [reduce using rule 7 (Expression)]
'+' [reduce using rule 7 (Expression)]
'(' [reduce using rule 7 (Expression)]
'[' [reduce using rule 7 (Expression)]
$default reduce using rule 7 (Expression)
So in this state, bison has not been able to apply any precedence rule, to decide what to do in the case the the reduction in rule 7 is applicable. Rule 7 is conveniently reproduced in the report:
7 | TOKEN_ADDROF Expression .
The precedence of that rule will be the precedence of the TOKEN_ADDROF
terminal. But that precedence is not defined because TOKEN_ADDROF
does not appear in any precedence level.
We can try adding it:
%left '.' '+'
%precedence TOKEN_ADDROF
%precedence '(' '['
And, hey presto!
It would be fair to ask why I put it where I put it, and why I used %precedence
instead of %left
or %right
, both for it and for the other unary operators.
To start with the second question, %precedence
means "this precedence level involves operators which cannot have a conflict resolved with associativity, so I'm not going to declare any particular associativity."
And that's true in this case: unary operators have no associativity. The - in 3-4-7
could associate to the left ((3-4)-7)
) or to the right (3-(4-7)
). The resolution will be made based on an Expression
production with precedence -
(Expression: Expression '-' Expression
) and a lookahead token with precedence -
(-). That obviously can happen. By constrast, consider the TOKEN_ADDROF
operator (is that not &, by the way? If so, just write it as a character token.) Here, the relevant production, as we've already seen, is
Expression: TOKEN_ADDROF Expression
Now, what if the lookahead token is TOKEN_ADDROF
? Answer: it's a syntax error, because TOKEN_ADDROF
is not a binary operator, so it cannot follow an expression. (It could be that you have a binary operator with the same spelling. But in that case, you would have put %prec UNOP
on the above production, and then there would be no possibility that the lookahead token could be UNOP
because that token is never produced by the lexical scanner.) So there's no production which allows a shift, and thus no conflict.
A similar line of reasoning applies to postfix operators, like function application and subscripting. (And post-increment and post-decrement, if applicable.) In those cases, a following postfix operator is possible, but it cannot be shifted until the Expression POSTFIX
production is reduced. Again, no possible conflict.
So in the case of postfix operators, the only precedence comparisons will be between levels, not with a level, and associativity doesn't apply. Not specifying an associativity will cause bison to generate a conflict warning if you accidentally mistype the grammar in a way that allows associativity (for example, failing to insert the %prec UNOP
declaration where it is nececessary), instead of silently ignoring the error.
In this particular grammar, putting '['
and '('
in a precedence level is unnecessary because those tokens are not used directly in any Expression
production. That means that the grammar provides explicit precedence for those operators. Using both precedence declarations and explicit precedence in the same grammar is often a sign that the various parts of the grammar have been copy-and-pasted from different sources. (Just sayin'.) It's not usually considered good style although it is sometimes justifiable. In this case, I'd suggest using either explicit or declared precedence explicitly.
So let's suppose that the precedence levels need to be declared. In that case, why did I put the postfix operators at the end? Answer: because it is a general guideline (though not an absolute rule) that postfix operations bind more tightly than prefix operations. For example, -arr[i]
does not mean (-arr)[i]
. That's so obviously true that most people don't think about it, although they will sometimes fail to apply the rule to *arr[i]
and *arr++
, which have exactly the same precedence relations.
Hope that helps.