c parsing grammar bison shift-reduce-conflict

Bison parser shift/reduce conflict

I'm new to Bison and I'm trying to write a parser. I already wrote a scanner in flex. I came up with the following grammar for the parser:

%token NUMBER
%token IDENTIFIER

%start Program

%%

Program: Stats;

Stats: Stats Stat ";"
    | /* Epsilon */
    ;

Stat: "var" IDENTIFIER ":=" Expr
    | Lexpr ":=" Expr
    | Expr;

Lexpr: IDENTIFIER
    | "head" Expr
    | "tail" Expr;

Expr: Term
    | UnaryOperatorChain Term;

UnaryOperatorChain: UnaryOperatorChain UnaryOperator
    | UnaryOperator;

UnaryOperator: "head" | "tail" | "not" | "islist";

Term:
    "(" Expr ")"
    | NUMBER
    | IDENTIFIER;

%%

I get this error: 14 shift/reduce conflicts shift/reduce conflict on token "head"

I think the problem is that the parser cannot distinguish between Lexpr and Expr and therefore does not know which rule to use. I tried several hours to rearrange my grammar, but nothing seems to work. Thanks in advance :)

Solution

Running with bison -v generates a file with the extension .output. This helps you find the reason for the conflicts. In your case, it begins with:

State 7 conflicts: 7 shift/reduce
State 8 conflicts: 7 shift/reduce

and later on, for state 7 (similarly for state 8):

State 7

    8 Lexpr: "head" • Expr
   14 UnaryOperator: "head" •

    NUMBER      shift, and go to state 4
    IDENTIFIER  shift, and go to state 19
    "head"      shift, and go to state 20
    "tail"      shift, and go to state 21
    "not"       shift, and go to state 9
    "islist"    shift, and go to state 10
    "("         shift, and go to state 11

    NUMBER      [reduce using rule 14 (UnaryOperator)]
    IDENTIFIER  [reduce using rule 14 (UnaryOperator)]
    "head"      [reduce using rule 14 (UnaryOperator)]
    "tail"      [reduce using rule 14 (UnaryOperator)]
    "not"       [reduce using rule 14 (UnaryOperator)]
    "islist"    [reduce using rule 14 (UnaryOperator)]
    "("         [reduce using rule 14 (UnaryOperator)]

    Expr                go to state 22
    UnaryOperatorChain  go to state 15
    UnaryOperator       go to state 16
    Term                go to state 17

State 7 corresponds to the item set consisting of the two items in the first two lines. In both items, you have just read the token "head". The following lines show that Bison does not know what to do when, e.g., a token NUMBER follows, or a token IDENTIFIER, and so on.

It could shift and go to state 4, which has the item 19 Term: NUMBER •, thus proceeding with the expression that can follow head in rule 8; or
It could reduce with rule 14, which says that head is a unary operator.

This is telling you that your expression language is ambiguous. With experience you should already be able to find the cause of the ambiguity. If not, running again with bison -Wcounterexamples can help you there too:

  First example: Stats "head" • "head" Term ":=" Expr ";" $end
  Shift derivation
    $accept
    ↳ 0: Stats                                                                     $end
         ↳ 2: Stats Stat                                                       ";"
                    ↳ 5: Lexpr                                       ":=" Expr
                         ↳ 8: "head" Expr
                                     ↳ 11: UnaryOperatorChain   Term
                                           ↳ 13: UnaryOperator
                                                 ↳ 14: • "head"
  Second example: Stats "head" • "head" Term ";" $end
  Reduce derivation
    $accept
    ↳ 0: Program                                                                      $end
         ↳ 1: Stats
              ↳ 2: Stats Stat                                                     ";"
                         ↳ 6: Expr
                              ↳ 11: UnaryOperatorChain                       Term
                                    ↳ 12: UnaryOperatorChain   UnaryOperator
                                          ↳ 13: UnaryOperator  ↳ 14: "head"
                                                ↳ 14: "head" •

This tells you that Bison cannot decide how to parse a string prefix such as head head id. (In fact, even head id would work here, as a counterexample.) Remember that Bison can only parse LALR(1) grammars, i.e., with one look-ahead symbol. When the first head is read and the second head is the look-ahead symbol, Bison needs to decide if this is part of a Lexpr (which will be later followed by a :=) or it's part of an Expr (which will be followed by a ;). The difference between the two will become apparent when the token := is (or is not) found, later on. But, in order to see this token, we would need an arbitrary number of look-ahead symbols, as the size of the Expr that follows head in a Lexpr can be arbitrarily large.

So, the problem here is the common subset of the languages generated by Lexpr and Expr, in the two alternative rules for Stat.

I'm afraid that there's no easy solution for this. You will have to refactor your grammar and eliminate the ambiguity. I think you should give it a shot, so I'm not giving any spoiler. Hint: try to avoid deciding what kind of an expression you have, as long as you see a prefix like head tail head head ....