Search code examples
bison

Bison reduce conflict with look ahead token set


How would I fix this LR(2) (I think) grammar section? I want to parse both type[] and type in this rule but I get a reduce conflict:

expression:
   ... other expressions
    | expression LSQUARE expression RSQUARE
    | expression KW_IS type
    | expression KW_AS type
   ... other expressions
    ;


type
    : fqtypename LSQUARE RSQUARE
    | live_types LSQUARE RSQUARE
    | fqtypename
    | live_types
    ;

fqtypename
    : identifier
    | fqtypename DOT identifier
    ;

live_types
    : KW_INT | KW_DOUBLE | KW_BOOL | KW_STRING
    ;

The error I get is:


State 113

   56 type: fqtypename • LSQUARE RSQUARE
   58     | fqtypename •
   63 fqtypename: fqtypename • DOT identifier

    LSQUARE  shift, and go to state 123
    DOT      shift, and go to state 21

    LSQUARE   [reduce using rule 58 (type)]
    DOT       [reduce using rule 58 (type)]
    $default  reduce using rule 58 (type)


State 114

   57 type: live_types • LSQUARE RSQUARE
   59     | live_types •

    LSQUARE  shift, and go to state 124

    LSQUARE   [reduce using rule 59 (type)]
    $default  reduce using rule 59 (type)

I tried to recreate it as minimal reproducible example:

%require "3.0"
%defines

%define api.pure full
%define parse.trace

%token LSQUARE '['
%token RSQUARE ']'
%token DOT '.'
%token IS "is"

%%

example
    :expression
    ;
    
expression
    : expression LSQUARE expression RSQUARE
    | expression IS type
    | identifier
    ;
    
type
    : fqtypename LSQUARE RSQUARE
    | fqtypename
    ;

fqtypename
    : identifier
    | fqtypename DOT identifier
    ;

identifier: 
    "blah"
    ;

%%

With error:



State 10

5 type: fqtypename • LSQUARE RSQUARE
6     | fqtypename •
8 fqtypename: fqtypename • DOT identifier

LSQUARE  shift, and go to state 13
DOT      shift, and go to state 14

LSQUARE   [reduce using rule 6 (type)]
$default  reduce using rule 6 (type)

Solution

  • You're right, that's an LR(2) grammar. After the parser reads expression IS identifier, with [ as the lookahead token, it needs to see what comes after the [ to know whether to immediately reduce identifier to a type, or to include [] in the type.

    Although it's always possible to transform an LALR(2) grammar into an equivalent LALR(1) grammar, the algorithm produces enormous grammars. (It can recreate the original parse tree, though.) Sometimes it's possible to come up with a more streamlined solution for specific grammars, but I don't see an obvious one in this case.

    A common solution is to put the deferral logic into the lexical analyser. For example, on finding a [, the lexical analyser could read the next (non-whitespace token); if that token is a ], then it can return a fabricated LRSQUARE ("[]") token; otherwise, it holds onto the new lookahead token for the next call and immediately returns LSQUARE. That's basically the way Bison itself handles the LR(2) aspect of optional semicolons. (If you use a push parser, this is slightly easier to implement since a single lexical analyser action can send two tokens, one at a time.)

    Another solution is to use a GLR parser, which allows arbitrary lookahead by exploring multiple possibilities in parallel, when necessary. In the LR(2) case, the additional overhead is minimal, but it still exists.