Index operator precedence in bisonc++ (bison, bison++)

I'm designing my own toy language that compiles into Brainf*ck code and encountered problem with the operator precedence of the index operator. The parser is generated by Bisonc++, which uses mostly the same syntax for its grammar files as its cousins bison and bison++.

Because of the quirks in addressing memory in BF, I have to handle operations on array-elements differently, which is why I am generating specific rules for them (I left out the bodies of the rules on purpose). The relevant grammar (which is not working), is as follows:

%right '=' 
// a bunch of other operators
%left '+' '-'
// more operators
%left indexing

expression:
    variable
|
    array_element
|
    expression '+' expression
|
    // a bunch more rules
;

array_subscript:
    '[' expression ']' 
;

array_element:
    expression array_subscript %prec indexing
;

Now, I was under the impression that by assigning the indexing precedence to the array_element rule, the parser will match the smallest expression directly to the left of it. However, consider the following code:

10 + arr[0]
// is parsed as
(10 + arr)[0]

I can't figure out why this is happening. When the code is changed to 10 + (arr[0]), the parser does the expected thing, which I think proves this has to do with operator precedence. One of the many things I tried was using an unused symbol as the indexing operator. For example:

// bottom-most operator in the precedence definition (replacing 'indexing'):
%left '@'

array_element:
    expression '@' array_subscript
;

// Code:
10 + arr@[0] // is now parsed properly

This resolves the issue for some reason. Am I doing something fundamentally wrong? Am I misunderstanding the way operator precedence works in bison et al?

Thanks in advance!

EDIT

I've pushed my dev branch to github, which contains a folder with a minimal working example including a Makefile and test-file. To rebuild the parser with different grammar-specs (file: grammar), you can run make regenerate, but this requires bisonc++ and flexc++ (available from Debian repo's). To just build the project with the current (faulty) parser, run make.

https://github.com/jorenheit/brainfix/tree/better_indexing/arraybug

Solution

The precedence declaration

array_element:
    expression array_subscript %prec indexing

and the order of precedence of the pseudo-token indexing have absolutely no effect on the disambiguation of the indexing expression

10 + arr[0]

The ambiguity being resolved by precedence in this case, as you say, is between

(10 + arr)[0]       and        10 + (arr[0])

and it is resolved in the moment when the parser has seen 10 + arr and needs to decide whether or not it is (10 + arr). In other words, its options are:

Reduce using the production expression: expression '+' expression, or
Shift the lookahead token, which in this example is [. If it chooses to shift, then expression: expression '+' expression will be reduced later, with a longer second operand, because all reductions are performed precisely at the end of the production being reduced, before the next token is shifted.

So the precedence ordering which is being considered is between the production expression '+' expression and the lookahead token [. Precedence comparisons are always of that form: the precedence of a possible reduction is compared with the precedence of the lookahead token which might be shifted.

Precedence is, somewhat confusingly, always described as a comparison between two tokens, probably because that is the actual algorithm used in operator precedence parsing, which is a different parsing algorithm. In shift-reduce parsing, the precedence comparison is always between a shift and a reduction. To maintain the illusion that it is between two tokens, Bison, like almost all Yacc derivatives, uses the convention that the precedence of a reduction is based on the precedence ordering of the last visible terminal in the right-hand side. (BisonC++ is an exception.)

Usually there is usually only one visible terminal in a production which requires precedence resolution, so the fact that Yacc/Bison selects the last one isn't noticeable, but there are occasional exceptions, such as the so-called "ternary operator" in C. The precedence of the production expression: expression '?' expression ':' expression is based on the : token; the relevant lookahead token would be ?, so the resolution of a?s1:b?s2:s3 is based on the comparison of ? with :; both of those tokens need to be in the same precedence level. To avoid this sort of confusion, and to allow precedence to be used with productions with no visible tokens, such as expression: expression array_subscript, Yacc derivatives typically allow you to override the default by specifying an alternative token for a production, using the %prec declaration.

This is particularly important if BisonC++ is part of your environment, because unlike almost every other Yacc derivative, BisonC++ uses the first non-terminal on the right-hand side. On the whole, it is probably a good idea to always use a %prec declaration in rules with more than one terminal symbol.

Regardless, it's important to remember that all %prec does is identify the precedence of a production with a terminal symbol. It has no effect on the precedence of any lookahead token, or any other production. If two rules have the same precedence, you can (and probably should) use the same %prec declaration.

In short, the precedence of expression array_subscript is not relevant in the resolution of 10 + arr[0], because reducing that production is not an option at this point in the parse. (Actually, it won't ever be relevant because array_subscript is effectively a postfix operator, and therefore not ambiguous. Precedence resolution only occurs when there is an ambiguity in the grammar.)

I don't know why BisonC++ is choosing to reduce the addition. Originally, I thought that it was because you put [ into some precedence level which comes before +, but apparently that is not the case. With Bison or Yacc, if [ were not in any precedence level, then it would have no declared precedence, and the precedence comparison would not be used to resolve ambiguity. If that were the case, (1) a shift-reduce conflict warning would be raised and (2) the parser would choose the shift action. And that's what happens with this grammar when tested with Bison. But BisonC++ apparently uses a default precedence, in some cases..

According to the "verbose" output generated by BisonC++, the conflicts were resolved using precedence. I haven't found any documentation for the precedence resolution algorithm used by Bisonc++, nor any note that it would be different from the algorithm used by Bison/Yacc (and, for what it's worth, required by Posix). But that doesn't mean that the documentation doesn't exist, and I'm reluctant to call this a bug without verifying what the original intent was. So I'll limit myself to saying that it is unusual.

In any case, you can force bisonc++ to resolve the conflict in the way you wish by adding a precedence level for [ (which, as noted above, is the lookahead token you're interested in). Put it after the precedence of + (and all other infix and prefix operators, which I presume exist in the full grammar):

%left '+'
%left '['

(Tested, after a considerable fight with source dependencies, using BisonC++ 6.04.04.)