Search code examples
bisonlalrshift-reduce-conflict

LALR(1) shift/reduce error using % for both percent and mod


I started a project with a grammar that used % (and the word mod) for modulus operators, and now I would like to add % as a trailing unary operator to divide by 100.

A few notes, I don't work with a C-based language, I have implemented my own tokenizer/compiler using the XML output from bison. And my choice of steps are critial for my implementation.

Is there a way I can make my grammar to compile without any shift/reduce errors in a LALR(1) compiler?

Basically the following statements are all valid:

  • 5% -> 0.05
  • 5%%5 -> 0.05 mod 5
  • 5%%%5 -> 0.0005 mod 5 etc.

I just don't know how to formulate this into my grammar:

%token S_NUM

%%

default: mod_term ;

mod_term: _mod_value
    | percent_term ;

_mod_value: mod_term O_PERCENT percent_term ;

percent_term: _percent_value
    | value ;

_percent_value: value O_PERCENT ;

value: S_NUM ;

%%

I also compile it using the following statement: bison -v --report=all --warnings=no-other -Werror=conflicts-sr --xml test.y -o test.y.xml

(Where I force shift/reduce as errors because of my environment)

Any ideas? I've played around with %left and %right specifiers, but no luck.


Solution

  • The ambiguity you have here is between '%' being a postfix operator and an infix operator. This is very similar to the common expression parser issue with '-' being both a prefix and an infix operator, and you can solve it the same way with an explicit %prec directive. The traditional way to write this would be:

    %left '%'            /* left-associative infix operator */
    %nonassoc POSTFIX    /* postfix operations are higher precedence */
    %token VAL
    %%
    
    expr: expr '%' expr
        | expr '%' %prec POSTFIX
        | VAL
        ;
    

    using precedence to solve both the associative ambiguity of infix-% and the precedence ambiguity between infix and postfix.

    To solve it without the precedence rules, you need something like:

    %token S_NUM O_PERCENT
    %%
    
    default: mod_term ;
    
    mod_term: _mod_value
        | _mod_value O_PERCENT mod_term ;
    
    _mod_value: _mod_value O_PERCENT ;
        | S_NUM
        ;
    

    which makes the infix-% right associative instead of left associative. Unfortunately I see no way of solving this without using precedence rules that also makes infix-% left associative. This is due to the fact that you can't decide whether a given '%' token is infix or postfix until you see the token after it, so the non-terminal before the '%' needs to be the same for both rules (_mod_value here or expr in the %prec code)