Search code examples
grammarbisonyacc

Reduce/reduce conflict bison grammar


I am new to bison parsing and I don't fully understand how it works. I have the following simple bison grammar to parse a simple language:

%{

%}

%token T_ASSIGN T_ADD T_SUB T_MUL T_DIV T_MOD T_POW
%token T_EQ T_NE T_LT T_GT T_LE T_GE T_AND T_OR T_NOT
%token T_ENDLINE T_LPAR T_RPAR
%token T_INTEGER T_FLOAT T_BOOL
%token T_ID

%%

program: statement_list

statement_list:
            statement_list T_ENDLINE
            statement
        |   statement

statement: %empty | expr
expr: arthm_expr | bool_expr | assignment
arthm_expr:
            arthm_expr sum_op mul
        |   mul
sum_op: T_ADD | T_SUB
mul:
            mul mul_op power
        |   power
mul_op: T_MUL | T_DIV | T_MOD

power:
            armth_last T_POW power
        |   armth_last

armth_last:
            T_INTEGER
        |   T_FLOAT
        |   T_ID
        |   T_LPAR arthm_expr T_RPAR
assignment:
            T_ID T_ASSIGN expr
bool_expr:
            bool_expr T_OR bool_and
        |   bool_and
bool_and:
            bool_and T_AND bool_cmp_eq
        |   bool_cmp_eq
bool_cmp_eq:
            arthm_expr eq_op arthm_expr
        |   bool_cmp
eq_op: T_EQ | T_NE

bool_cmp:
            arthm_expr cmp_op arthm_expr
        |   bool_unary

cmp_op: T_LT | T_GT | T_LE | T_GE

bool_unary:
            T_NOT bool_unary
        |   bool_last

bool_last:
            T_BOOL
        |   T_ID
        |   T_LPAR bool_expr T_RPAR
%%

As you can see, T_ID can both be in boolean and arithmetic expressions.

When I compile the grammar, it produces the following reduce/reduce conflicts (extracted fom bison output):

example.y: warning: 3 reduce/reduce conflicts [-Wconflicts-rr]
example.y: warning: reduce/reduce conflict on tokens $end, T_ENDLINE [-Wcounterexamples]
  Example: T_ID •
  First reduce derivation
    expr
    ↳ 6: arthm_expr
         ↳ 10: mul
               ↳ 14: power
                     ↳ 19: armth_last
                           ↳ 22: T_ID •
  Second reduce derivation
    expr
    ↳ 7: bool_expr
         ↳ 26: bool_and
               ↳ 28: bool_cmp_eq
                     ↳ 30: bool_cmp
                           ↳ 34: bool_unary
                                 ↳ 40: bool_last
                                       ↳ 42: T_ID •
example.y: warning: reduce/reduce conflict on token T_RPAR [-Wcounterexamples]
  Example: T_LPAR T_ID • T_RPAR
  First reduce derivation
    expr
    ↳ 6: arthm_expr
         ↳ 10: mul
               ↳ 14: power
                     ↳ 19: armth_last
                           ↳ 23: T_LPAR arthm_expr                     T_RPAR
                                        ↳ 10: mul
                                              ↳ 14: power
                                                    ↳ 19: armth_last
                                                          ↳ 22: T_ID •
  Second reduce derivation
    expr
    ↳ 7: bool_expr
         ↳ 26: bool_and
               ↳ 28: bool_cmp_eq
                     ↳ 30: bool_cmp
                           ↳ 34: bool_unary
                                 ↳ 40: bool_last
                                       ↳ 43: T_LPAR bool_expr                                  T_RPAR
                                                    ↳ 26: bool_and
                                                          ↳ 28: bool_cmp_eq
                                                                ↳ 30: bool_cmp
                                                                      ↳ 34: bool_unary
                                                                            ↳ 40: bool_last
                                                                                  ↳ 42: T_ID •

How


Solution

  • The conflict comes about because your grammar is ambiguous -- when the input is a simple T_ID with no operator characters, there's no way to tell if it should be parsed as a bool_expr or an arithm_expr.

    There are basically two ways of dealing with this

    1. Don't do typing in the grammar -- get rid of the distinction between bool_expr and arithm_expr and just have expressions that can have any operators between them. You'll probably want a semantic typechecking pass (visiting the parse tree) after parsing that checks expressions for type consistency. This is arguably the best way to go as it separates parsing from typechecking, making things clearer and generally allowing much better error messages.

    2. Determine the types of IDs directly and use different tokens for bool IDs vs arithm IDs. This can be done by using a symbol table that records definitions in scope and returns a different token depending on the declaration. It generally requires that the declarations always happen before the uses, but many languages require that anyways.