I am trying to solve the r/r and s/r conflicts in the following bison grammar
%right TOK_IF TOK_ELSE
%right '='
%left TOK_EQ TOK_NE '<' TOK_LE '>' TOK_GE
%left '+' '-'
%left '*' '/' '%'
%right TOK_POS TOK_NEG '!' TOK_NEW
%left TOK_BLK '.' TOK_CALL
%precedence TOK_PARENT
%start start
expr : expr BINOP expr {$$=$2->adopt($1,$2);}
| UNOP {$$ = $1;}
| allocator {$$ = $1;}
| call {$$ = $1;}
| expr '[' expr ']' %prec TOK_BLK {
destroy($4);
$$ = $2->adopt($1,$3);}
| '(' expr ')' %prec TOK_PARENT {$$ = $2;}
| expr '.' expr {$$ = $2->adopt($1,$3);}
| variable {$$= $1;}
| constant {$$ = $1;}
;
BINOP : TOK_IF {$$ = $1;}
| TOK_ELSE {$$ = $1;}
| '=' {$$ = $1;}
| TOK_EQ {$$ = $1;}
| TOK_NE {$$ = $1;}
| '<' {$$ = $1;}
| TOK_LE {$$ = $1;}
| '>' {$$ = $1;}
| TOK_GE {$$ = $1;}
| '+' {$$ = $1;}
| '-' {$$ = $1;}
| '*' {$$ = $1;}
| '/' {$$ = $1;}
| '%' {$$ = $1;}
;
UNOP : '+' expr %prec TOK_POS {
$1->swap_token_code(TOK_POS);
$$ = $1->adopt($2);
}
| '-' expr %prec TOK_NEG{
$1->swap_token_code(TOK_NEG);
$$ = $1->adopt($2);
}
| '!' expr {$$ = $1->adopt($2);}
;
allocator : TOK_NEW TOK_IDENT '('')' {
destroy($3);
$2->swap_token_code(TOK_TYPEID);
$$ = $1->adopt($2);
}
| TOK_NEW TOK_STRING '(' expr ')'{
}
| TOK_NEW basetype '[' expr ']'{
destroy($3);destroy($5);
$1->swap_token_code(TOK_NEWARRAY);
$$ = $1->adopt($2,$4);
}
;
call : TOK_IDENT '(' exprs ')' %prec TOK_CALL{
destroy($4);
$2->swap_token_code(TOK_CALL);
$$ = ($2->adopt($1))->cannibalize($3);
}
;
exprs : exprs expr {$$ = $1->adopt($2);}
| {$$ = new astree ('{',{0,0,0}, "}");}
;
variable : TOK_IDENT {$$ = $1;}
| expr '[' expr ']'{
destroy($4);
$$ = $2->adopt($1,$3);
}
| expr '.' TOK_IDENT {$$ = $2->adopt($1,$3);}
;
constant :TOK_INTCON {$$ = $1;}
|TOK_CHARCON {$$ = $1;}
|TOK_STRINGCON {$$ = $1;}
|TOK_NULL {$$ = $1;}
;
%%
I think the problem is the rule expr : expr BINOP expr because when I get rid of it, it stops showing these conflicts. I have declared the precedence rules above to avoid shift/reduce conflicts, but it looks like it's not working. Can someone give me shortcuts in debugging an ambiguous grammar?. Disregard the semantic rules.
parser.y: warning: 24 shift/reduce conflicts [-Wconflicts-sr]
parser.y: warning: 56 reduce/reduce conflicts [-Wconflicts-rr]
parser.y:140.14-143.12: warning: rule useless in parser due to conflicts [-Wother]
| expr '[' expr ']'{
^^^^^^^^^^^
parser.y:144.14-56: warning: rule useless in parser due to conflicts [-Wother]
| expr '.' TOK_IDENT {$$ = $2->adopt($1,$3);}
UPDATE: I detected a problem in my understanding
Although I am getting the right results, my compiler keeps telling that there is a shift/reduce conflict in the following grammar. I think there shouldn't be conflicts because I specified precedence and associativity properly.
%left '+'
%left '*'
%%
expr : expr BINOP expr
| TOK_INTCON
BINOP : '+'
| '*'
%%
This response is to your UPDATE section, since that's kind of the heart of the problem.
The TL;DR answer is that the precedence-and-associativity declarations %left '+'
and %left '*'
apply while parsing BINOP
but not while parsing expr
, so they're too early (they do nothing at this point) and are gone by the time they are needed.
I modified your example so that Bison can handle it:
$ cat expr.y
%token TOK_INTCON
%left '+'
%left '*'
%%
expr : expr BINOP expr
| TOK_INTCON
BINOP : '+'
| '*'
%%
$ bison -v expr.y
expr.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
The point of -v
here is to write expr.output
which shows how Bison is interpreting your grammar. You can peruse this to see exactly where the shift/reduce conflict occurs—but in short, it occurs when the input contains, e.g.,
1 op 2 op 3
The grammar allows this to be parsed as:
op
/ \
1 op
/ \
2 3
(which is the parse tree you get if you choose "shift" instead of "reduce" each time), or:
op
/ \
op 3
/ \
1 2
What %left
, %right
, and %nonassoc
do is assign a precedence to a specific token. Now, as noted in the Bison documentation section on how precedence works, precedence does flow through to rules, but crucially, not to nonterminals themselves: they only apply to individual rules within a nonterminal. They function to decide whether to shift to a new state, or reduce by a rule, and once the shift or reduce has happened, the decision is already made. (This is natural since the token or nonterminal is recognized by the shifting or reduction, which gives you either a new node for your parse tree, or a partial parse tree.)
By reducing all your binary operators to a binop
nonterminal, you rob the parser of a way to distinguish between each different binary operator. In the LALR grammar that Bison will generate, if you have a separate rule for each operator, you get a separate state for each one, which allows a separate shift-or-reduce decision.
$ cat expr-repaired.y
%token TOK_INTCON
%left '+'
%left '*'
%%
expr : expr '+' expr
| expr '*' expr
| TOK_INTCON
%%
$ bison -v expr-repaired.y
$
Interestingly, the total number of states remains the same (both expr.output
and expr-repaired.output
show seven states). However, the meaning of the states is different. Several states for handling the old BINOP
nonterminal are gone; in their place are multiple states for correctly handling the left-associative, different-precedence operators while deciding whether to reduce expr <some-op> expr
depending on how we decided to reduce the first expr
. Look closely at the new state 6, for instance:
State 6
1 expr: expr . '+' expr
1 | expr '+' expr .
2 | expr . '*' expr
'*' shift, and go to state 5
$default reduce using rule 1 (expr)
If we are in this state and the next token is *
, we shift, so that we'll handle expr * expr
and reduce it before we reduce expr + (expr-from-reduction)
.