bison yacc flex-lexer context-free-grammar ambiguous-grammar

How to solve this ambiguous grammar?

I am creating a compiler and I am using lex and bison.

This is a part of my grammar :

math
    : math binop math
    | VAR
    | INT
    | GREEK
    | "(" math ")"
    | math comp math
    | "-" math
    | math math
    | "\\sqrt" "{" math "}"
;

I've changed the above to this but it increased the amount of reduce/reduce errors. It decreased the amount of shift/reduce errors.

math
    : math binop element
    | VAR
    | INT
    | GREEK
    | "(" math ")"
    | math comp element
    | "-" element
    | math element
    | "\\sqrt" "{" element "}"
;


element
    : VAR
    | INT
    | GREEK
    | math
;

Is there any way how to let them both decrease?

Thank you!

Solution

You've taken a little bit of a step in the right general direction, but perhaps it's best to step back a moment and think about how things were to start with, and how that would apply to an actual expression.

Let's consider an expression like a + b + c. In your original grammar, you have math binop math. Given the a+b+c, should it treat this as a is a math, + is a binop, and b+c is another math, or should it treat it as a+b is a math, + is a binop, and c is a math (and then the math named a+b is further broken down into a, + and b)? Looking at it from a slightly different viewpoint, should a + b + c be parsed as (a + b) + c or as a + (b + c)? In the case of + it doesn't make any real difference--but for - (for one obviously example) it really does make a difference.

By your first grammar, either one is acceptable. In fact, even by your second one, either is acceptable as well (but by making element sort of...subordinate to math, you've implicitly given yacc some guidance about which way to parse it).

That brings us to the next point: under what circumstances do you (think you) need to use the math binop math production? If we define a math to parse an expression of arbitrary complexity, can an element be limited to basically a single operand? If we do that, then every expression like a + b + c would have to parse as math binop operand, right? No ambiguity in this case (and the a+b part would further parse as something like operand binop operand.

That leaves one fairly basic question. How do we want to handle the precedence of different operators? For example, at least in the usual scheme of things, we'd want * and / to have higher precedence than + or -.

There are (at least) two fundamentally different ways of handling that in something like yacc. One is in the grammar proper, by defining a couple of different types of sub-expressions:

add_expr : mul_expr '+' mul_expr
         | mul_expr '-' mul_expr
         ;

mul_expr : factor '*' factor
         | factor '/' factor
         ;

The other is to set the precedence in a directive, like:

%left '+' '-'
%left '*' '/'
%left UNARYMINUS

This lets us leave the grammar itself ambiguous, but then tells the parser generator how to resolve the ambiguity.

So, using this, we can have something like:

expr : expr '+' operand
     | expr '-' operand
     | expr '*' operand
     | expr '/' operand
     ;

operand: VAR
       | INT
       | '(' expr ')'
       | '-' expr %prec UNARYMINUS
       ;

That last bit (the %prec UNARYMINUS) tells it to treat the - (in this case) as having the precedence we designated above for UNARYMINUS (the highest precedence we've defined, since it's the last in the list).

I haven't tried to cover your entire grammar, but I think that covers at least the majority of basic transformations you probably need (or at least want) to apply to remove most of the ambiguities. It's probably also worth noting that shift/reduce conflicts are often fairly harmless. The resolution the parser generator provides for this case will usually work just fine, and in some cases such an ambiguous grammar will actually be more efficient than one with all the ambiguities resolved, so a fair number of grammars don't try to fix all of them.