How would I fix this LR(2)
(I think) grammar section? I want to parse both type[]
and type
in this rule but I get a reduce conflict:
expression:
... other expressions
| expression LSQUARE expression RSQUARE
| expression KW_IS type
| expression KW_AS type
... other expressions
;
type
: fqtypename LSQUARE RSQUARE
| live_types LSQUARE RSQUARE
| fqtypename
| live_types
;
fqtypename
: identifier
| fqtypename DOT identifier
;
live_types
: KW_INT | KW_DOUBLE | KW_BOOL | KW_STRING
;
The error I get is:
State 113
56 type: fqtypename • LSQUARE RSQUARE
58 | fqtypename •
63 fqtypename: fqtypename • DOT identifier
LSQUARE shift, and go to state 123
DOT shift, and go to state 21
LSQUARE [reduce using rule 58 (type)]
DOT [reduce using rule 58 (type)]
$default reduce using rule 58 (type)
State 114
57 type: live_types • LSQUARE RSQUARE
59 | live_types •
LSQUARE shift, and go to state 124
LSQUARE [reduce using rule 59 (type)]
$default reduce using rule 59 (type)
I tried to recreate it as minimal reproducible example:
%require "3.0"
%defines
%define api.pure full
%define parse.trace
%token LSQUARE '['
%token RSQUARE ']'
%token DOT '.'
%token IS "is"
%%
example
:expression
;
expression
: expression LSQUARE expression RSQUARE
| expression IS type
| identifier
;
type
: fqtypename LSQUARE RSQUARE
| fqtypename
;
fqtypename
: identifier
| fqtypename DOT identifier
;
identifier:
"blah"
;
%%
With error:
State 10
5 type: fqtypename • LSQUARE RSQUARE
6 | fqtypename •
8 fqtypename: fqtypename • DOT identifier
LSQUARE shift, and go to state 13
DOT shift, and go to state 14
LSQUARE [reduce using rule 6 (type)]
$default reduce using rule 6 (type)
You're right, that's an LR(2) grammar. After the parser reads expression IS identifier
, with [
as the lookahead token, it needs to see what comes after the [
to know whether to immediately reduce identifier
to a type
, or to include []
in the type
.
Although it's always possible to transform an LALR(2) grammar into an equivalent LALR(1) grammar, the algorithm produces enormous grammars. (It can recreate the original parse tree, though.) Sometimes it's possible to come up with a more streamlined solution for specific grammars, but I don't see an obvious one in this case.
A common solution is to put the deferral logic into the lexical analyser. For example, on finding a [
, the lexical analyser could read the next (non-whitespace token); if that token is a ]
, then it can return a fabricated LRSQUARE
("[]"
) token; otherwise, it holds onto the new lookahead token for the next call and immediately returns LSQUARE
. That's basically the way Bison itself handles the LR(2) aspect of optional semicolons. (If you use a push parser, this is slightly easier to implement since a single lexical analyser action can send two tokens, one at a time.)
Another solution is to use a GLR parser, which allows arbitrary lookahead by exploring multiple possibilities in parallel, when necessary. In the LR(2) case, the additional overhead is minimal, but it still exists.