Search code examples
yacc

Yacc grammar: Resolving Shift/Reduce and Reduce/Reduce Conflicts


I am developing a parser for a specific DSL language. Below is a working (but incomplete) grammar:

%{
package main
%}

//%type<stmts> stmts
%type<stmt> stmt
//%type<stmt_if> stmt_if
//%type<exprs> exprs
//%type<compstmt> compstmt

%union {
    token Token
    terminal                   Token
    //terms                  Token
    stmt_if statement // оператор Если
    stmts   statements
    stmt    statement
    compstmt    statements
}

%token<token> Directive Identifier Procedure Var EndProcedure If Then ElseIf Else EndIf For Each In To Loop EndLoop Break Not
%token<token> Continue Try Catch EndTry Number String Array Function EndFunction Return Throw NeEq Or And True False Undefind Export

%right '='
%left Or
%left And
%left Identifier
//%nonassoc NeEq '>' '<'
%nonassoc NeEq
%left '>' '<'
%left '+' '-'
%left '*' '/' '%'



%%

main: opt_directive invoke {}
;

opt_directive: /* none */
| Directive EOL {}
;

opt_export: /* none */
| Export {}
;


invoke: Function Identifier '(' exprs ')' opt_export opt_stmt EndFunction opt_terms {}
        | Procedure Identifier '(' exprs ')' opt_export opt_stmt EndProcedure opt_terms {}
;

opt_stmt : opt_terms{}
    | stmts opt_terms{}
;
    
stmts :  stmt {}
        | EOL stmt {}
        | stmts terminals stmt{}
    //| stmts semi stmt {}
;

stmt : Return expr {}
    | Throw throw_terms {}
    | Break {  }
    | Continue {  }
    | expr {}
    | stmt_if {}
;

throw_terms : terminal {}
    | String terminal {}
;


stmt_if : If expr Then opt_stmt EndIf terminal
        {

        }
;


expr : Identifier {}
    | '(' expr ')' { }
    | expr '+' expr { }
    | expr '-' expr { }
    | expr '*' expr { }
    | expr '/' expr { }
    | expr '%' expr {}
    | String {}
    | Number { }
    | True { }
    | False { }
    | Undefind {}
    | expr '>' expr { }
    | expr '<' expr{}
    | expr '=' expr {}
    | Identifier  '=' expr {}
    | expr NeEq expr {}
    | expr And expr { }
    | expr Or expr {}
    | Identifier '(' exprs ')' {}
;

exprs : { }
    | expr {}
    | exprs ',' opt_terms expr {}
;    

opt_terms : /* none */
    | terminals
;

terminals : terminal { }
    | terminals terminal { }
;

terminal : semi { }
        | EOL { }
;

EOL : '\n';
semi: ';';

%%

If I change terminal to terminals in the stmt_if rule, I get a conflict

stmt_if : If expr Then opt_stmt EndIf terminals
        {

        }
;

I need to use terminals because the language syntax allows statements like:

If 1 = 1 Then
// ...
EndIf;

as well as:

If 1 = 1 Then
// ...
EndIf
;

I don't understand why there is ambiguity here, as 'terminals' should accommodate the optional newline and semicolon. How can I write a rule that allows for multiple spaces and a semicolon at the end without ambiguity?

Thanks in advance for your help!


Solution

  • The basic problem is that by having terminals part of both stmt_if and stmts, it is ambiguous as which any give terminal should be associated with. It is made more complicated by the fact that terminals can only be between two stmt in a stmts (not at the end), so any terminals at the end must be associated with the stmt_if (or cause an error if the last stmt is not a stmt_if.

    Now in your case, you probably don't care which stmt or stmts any given terminal is associated with, as you're just going to throw them away. What you do care about is allowing them anywhere they should be allowed and disallowing them anywhere they should not be (and what those are is not clear from your grammar -- should they not be allowed after the last stmt in a function or procedure unless that is an if? That is what it appears you are doing currently)

    The easiest fix is to pick one place in the grammar to put them and put them (only) there. For example, you might put them in the stmt rules -- each kind of stmt would end with terminals or opt_terminals (depending on whether at least one terminal is required or not for that statement). But this would then allow extra terminals at the end of a procedure/function body which you might not want.