Search code examples
bisonglr

in Bison, how can I specity left associativity for a non-terminal?


I have the following Bison grammar snippet:

binary_op:         BINARY_OP
                    {
                        ...
                    }
                    | '|' %prec BINARY_OP
                    {
                        ...
                    }
;

non_keyword_expr:   non_keyword_expr binary_op non_keyword_expr %prec BINARY_SEND_PREC %dprec 2
                    {
                        ...
                    }
;

| has overloaded meaning in my grammar, so I can't just return it as token BINARY_OP from my lexer. It could be a different token depending on context.

If I use this as my input:

4 OR 5 OR 6

I can parse it successfully (OR is recognized as a BINARY_OP token by the lexer).

If, however, my input is this:

4 | 5 | 6

I get an ambiguous grammar error. (the | isn't being recognized as left associative)

How can I get binary_op to be left-associative within non_keyword_expr? the %prec statement on the second rule for binary_op seems to have no effect.

edit: this is for a GLR parser


Solution

  • Simple answer: You cannot. Precedence (and associativity) only apply to productions (on the left) and terminals (on the right). They do not apply to non-terminals.

    That's not an arbitrary decision; it's inherent to the way bison handles resolution of shift/reduce conflicts. At every parsing step, the lookahead token (a terminal) must either eventually be shifted, but it is possible that there is a production which could be reduced before the terminal is shifted. If the reduction is not performed immediately, it will never be performed. An LR(1) grammar allows the parser to decide based on the current parse stack and the lookahead token whether a reduction should be performed or whether the lookahead token should be immediately shifted. If both actions are possible, then the grammar is said to have a shift/reduce conflict, and it is not strictly speaking LR(1).

    Precedence and associativity rules are used to resolve shift/reduce conflicts. Productions may have an implicit or explicit precedence: the explicit precedence is provided by the %prec declaration; otherwise the precedence of the last terminal in the production is used. In the event of a shift/reduce conflict, the precedence of the production which could be reduced is compared to the precedence of the lookahead terminal which could be shifted, and whichever has the greater precedence wins. That's it. The precedence is not retained or inherited. In fact, it is inaccurate to say that the precedences are compared, since that doesn't happen during the parse; the parser has an action or transition table which defines what to do in the case of a particular stack configuration ("state") and lookahead token, and the precedence information is used at parser-generation time to fill in the entries in the action table which would otherwise be ambiguous.

    In the case of your production

    binary_op: '|' %prec BINARY_OP
    

    the %prec declaration is useless, because binary_op must always be reduced immediately; it cannot participate in a shift/reduce conflict. The shift/reduce conflict comes with the non_keyword_expression production, which is marked with a (different) %prec declaration, and that is the declaration which will be used for that production.

    The production for non_keyword_expression does not have a terminal, so it has no implicit precedence either. That's generally not what you want, and the use of productions like:

    binary_op: '|' | "OR" ;
    

    are not really compatible with the use of precedence to resolve parsing conflicts.


    Note 1: This is not completely true if you ask for a GLR parser. The GLR parser can perform both the shift and the reduce, because it (effectively) maintains a number of parser states at the same time. Eventually, all but one of these states must be eliminated; otherwise, the parse is ambiguous. GLR parsers use precedence (and %prec declarations) in precisely the same way that non-GLR parsers do; a parser action eliminated by precedence is really eliminated and does not lead to parallel states. However, a GLR parser can also deal with reduce/reduce conflicts, in which there are two possible reductions (possibly to the same non-terminal). These conflicts can be resolved using %dprec ("dynamic precedence") declarations.