The following (simplified) Bison grammar produces a reduce reduce conflict:
expr
: '(' expr ')'
| ID
| fn
;
arg_list
: ID
| arg_list ID
;
fn
: '(' ')' fnbody
| '(' arg_list ')' fnbody
;
fnbody
: '{' '}'
;
I see the issue—with only one token of lookahead, it's impossible to tell whether (an_id
is an '(' expr ')'
or a fn
. But how can I resolve it?
Well, the easiest answer is to just use more lookahead in the parser -- either use something like btyacc, or use bison's %glr-parser
option.
Second choice is to add lookahead in the lexer -- in this case before returning a ')'
token, look to see if the next token is a '{'
and either return a special tag that tells you this is an arg_list you're about to end rather than a parenthesized expression, or just return the two as a single token and modify your grammar as appropriate.
Third choice is to factor the grammar. This is not always easy and may blow up the size of the grammar. The basic idea is to identify the conflicting rules and combine them into a single rule that can be recgonized by the parser and defer the choice as to what the final construct is until you've seen enough.
For this example you add a new rule for the conflicting case:
expr_or_fnhead: '(' ID ')' ;
which might be either an expression or the start of a fn
, and then modify the other rules to use this. The fn
rule becomes:
fn : '(' ')' fnbody /* 0 arg function */
| expr_or_fnhead fnbody /* 1 arg function */
| '(' ID arg_list ')' fnbody /* 2+ arg function */
;
The expr
rule is more complex:
expr : ID
| non_ID_expr
;
non_ID_expr: '(' non_ID_expr ')'
| expr_or_fnhead
| fn
;