I am asking this again because I tried all possible %prec
combinations that do nothing. I get why there is conflict here but I am not sure how to resolve it because it basically all matters if at the end it's type followed by identifier or if it is followed by semicolon.
Here is minimal grammar:
%require "3.0"
%defines
%define api.pure full
%define parse.trace
%left TYPE_UNIT
%%
example
: statement
;
statement
: type identifier '=' expression ';'
| identifier '=' expression ';'
| expression ';'
;
expression
: expression '.' identifier
| identifier
;
type
: fqtypename %prec TYPE_UNIT
;
fqtypename
: identifier
| fqtypename '.' identifier
;
identifier:
"blah"
;
%%
and error:
State 7
3 statement: identifier • '=' expression ';'
6 expression: identifier •
8 fqtypename: identifier •
'=' shift, and go to state 13
'.' reduce using rule 6 (expression)
'.' [reduce using rule 8 (fqtypename)]
"blah" reduce using rule 8 (fqtypename)
$default reduce using rule 6 (expression)
reduce/reduce conflict on token '.':
6 expression: identifier •
8 fqtypename: identifier •
First example: identifier • '.' identifier ';' $end
First reduce derivation
$accept
↳ 0: example $end
↳ 1: statement
↳ 4: expression ';'
↳ 5: expression '.' identifier
↳ 6: identifier •
Second example: identifier • '.' identifier identifier '=' expression ';' $end
Second reduce derivation
$accept
↳ 0: example $end
↳ 1: statement
↳ 2: type identifier '=' expression ';'
↳ 7: fqtypename
↳ 9: fqtypename '.' identifier
↳ 8: identifier •
Yacc/bison do not use precedence levels to resolve reduce/reduce conflicts; only shift/reduce conflicts.
Anyway, the problem here cannot be solved with finite lookahead. You need to avoid forcing the parser to decide at the first token whether a dotted sequence is a type or a value.
The implementation is slightly annoying, because we need to factor dotted sequences out of all the higher-order constructs which might start with one. That requires a bit of repetition in the grammar.
In the example below, based on the example from the OP, I've added function calls, subscript lookups, and one binary operator, by way of showing how all of this will fit together in a more complete grammar.
This is not the only solution. See below for other possible approaches.
%token IDENT /* Identifier (word) */
LITERAL /* Number, character string, etc. */
%%
example
: statement
statement
: type dotted_name '=' expression ';' /* See note */
| lvalue '=' expression ';'
| expression ';'
type: dotted_name
expression
: atom
| expression '+' atom
atom: lvalue_not_name
| base_not_name
| dotted_name
lvalue
: lvalue_not_name
| dotted_name
base: base_not_name
| dotted_name
base_not_name
: LITERAL
| '(' expression ')'
| base '(' expression ')'
lvalue_not_name
: base_not_name '.' IDENT
| base '[' expression ']'
dotted_name
: IDENT
| dotted_name '.' IDENT
Note: I used dotted_name
as the declared object in the first statement
alternative, although I think that IDENT
would have been more correct. Either would work. Indeed, you could replace it with expression
.
You could use a simpler grammar if you ask Bison to produce a GLR grammar. That would eliminate the need for the *_not_name
productions, and thereby simplify the grammar considerably. It will still give you a reduce-reduce conflict warning, though. GLR grammars produce the warnings but then successfully parse as long as the grammar is unambiguous.
Another solution would be to simply allow arbitrary expressions everywhere. You would then have to insert checks during semantic analysis for erroneous constructs, such as assignments whose targets are not lvalues or declarations which start with an expression other than a dotted name (or whatever restrictions you use for typenames). That makes for a streamlined grammar, in which you can even use precedence levels to avoid any extra non-terminals. Of course, it complicates the semantic analysis slightly, but deciding whether the AST for a sub-expression is appropriate should be extremely simple. Many parsers do, in fact, use this technique, because it in many languages you can't tell precisely at parse time what qualifies as an lvalue.