Search code examples
parsingbisonyaccshift-reduce-conflictml-yacc

Conflicts in Parser for Propositional logic with IF-THEN-ELSE ternary operator


I want to implement the Parser for proposition logic which has the following operators in decreasing order of precedence:

  1. NOT p
  2. p AND q
  3. p OR q
  4. IF p THEN q
  5. p IFF q
  6. IF p THEN q ELSE r

The main issue is with the IF-THEN-ELSE operator. Without it, I am able to write the grammar properly. Presently my yacc file looks like

%term
    PARSEPROG | AND | NOT | OR | IF | THEN | ELSE | IFF | LPAREN | RPAREN | ATOM of string | SEMICOLON | EOF

%nonterm
    start of Absyn.program | EXP of Absyn.declaration

%start start
%eop EOF SEMICOLON
%pos int
%verbose

%right ELSE
%right IFF
%right THEN
%left AND OR
%left NOT

%name Fol

%noshift EOF

%%

start : PARSEPROG EXP (Absyn.PROGRAM(EXP))


EXP: ATOM ( Absyn.LITERAL(ATOM) )
    | LPAREN EXP RPAREN (EXP)
    | EXP AND EXP ( Absyn.CONJ(EXP1, EXP2) )
    | EXP OR EXP ( Absyn.DISJ(EXP1, EXP2) )
    | IF EXP THEN EXP ELSE EXP ( Absyn.IFTHENELSE(EXP1, EXP2, EXP3) )
    | IF EXP THEN EXP ( Absyn.IMPLI(EXP1, EXP2) )
    | EXP IFF EXP ( Absyn.BIIMPLI(EXP1, EXP2) )
    | NOT EXP ( Absyn.NEGATION(EXP) )

But I don't seem to get the correct idea how to eliminate reduce-shift conflicts. Some examples of correct parsing are:

  1. IF a THEN IF b THEN c________a->(b->c)
  2. IF a THEN IF b THEN c ELSE d IFF e OR f_______IFTHENELSE(a,b->c,d<=>e/\f)

Any help/pointers will be really helpful. Thanks.


Solution

  • A relatively easy way to deal with this requirement is to create a grammar which over-generates, and then reject the syntax we don't want using semantics.

    Concretely, we use a grammar like this:

    expr : expr AND expr
         | expr OR expr
         | expr IFF expr
         | IF expr THEN expr
         | expr ELSE expr   /* generates some sentences we don't want! */
         | '(' expr ')'
         | ATOM
         ;
    

    Note that ELSE is just an ordinary low precedence operator: any expression can be followed by ELSE and another expression. But in the semantic rule, we implement a check that the left side of ELSE is an IF expression. If not, then we raise an error.

    This approach is not only easy to implement, but easy to document for the end-users and consequently easy to understand and use. The end user can accept the simple theory that ELSE is just another binary operator with a very low precedence, along with a rule which rejects it when it's not combined with IF/THEN.

    Here is a test run from a complete program I wrote (using classic Yacc, in C):

    $ echo 'a AND b OR c' | ./ifelse 
    OR(AND(a, b), c)
    $ echo 'a OR b AND c' | ./ifelse 
    OR(a, AND(b, c))
    $ echo 'IF a THEN b' | ./ifelse 
    IF(a, b)
    

    Ordinary single IF/ELSE does what we want:

    $ echo 'IF a THEN b ELSE c' | ./ifelse 
    IFELSE(a, b, c)
    

    The key thing that you're after:

    $ echo 'IF a THEN IF x THEN y ELSE c' | ./ifelse
    IFELSE(a, IF(x, y), c)
    

    correctly, the ELSE goes with the outer IF. Here is the error case with bad ELSE:

    $ echo 'a OR b ELSE c' | ./ifelse 
    error: ELSE must pair with IF
    <invalid>
    

    Here is parentheses to force the usual "else with closest if" behavior:

    $ echo 'IF a THEN (IF x THEN y ELSE c)' | ./ifelse 
    IF(a, IFELSE(x, y, c))
    

    The program shows what parse it is using by building an AST and then walking it to print it in prefix F(X, Y) syntax. (For which as a Lisp programmer, I had to hold back the gagging reflex a little bit).

    The AST structure is also what allows the ELSE rule to detect whether its left argument is an expression of the correct kind.

    Note: You might want the following to be handled, but it isn't:

    $ echo 'IF a THEN IF x THEN y ELSE z ELSE w' | ./ifelse 
    error: ELSE must pair with IF
    <invalid>
    

    The issue here is that the ELSE w is being paired with an IFELSE expression.

    A more sophisticated approach is possible that might be interesting to explore. The parser can treat ELSE as an ordinary binary operator and generate the AST that way. Then a whole separate walk can check the tree for valid ELSE usage and transform it as necessary. Or perhaps we can play here with the associativity of ELSE and treat cascading ELSE in the parser action in some suitable way.

    The complete source code, which I saved in a file called ifelse.y and built using:

    $ yacc ifelse.y
    $ gcc -o ifelse y.tab.c
    

    is here:

    %{
    
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    
    typedef struct astnode {
      int op;
      struct astnode *left, *right;
      char *lexeme;
    } astnode;
    
    void yyerror(const char *s)
    {
      fprintf(stderr, "error: %s\n", s);
    }
    
    void *xmalloc(size_t size)
    {
      void *p = malloc(size);
      if (p)
        return p;
    
      yyerror("out of memory");
      abort();
    }
    
    char *xstrdup(char *in)
    {
      size_t sz = strlen(in) + 1;
      char *out = xmalloc(sz);
      return strcpy(out, in);
    }
    
    astnode *astnode_cons(int op, astnode *left, astnode *right, char *lexeme)
    {
      astnode *a = xmalloc(sizeof *a);
      a->op = op;
      a->left = left;
      a->right = right;
      a->lexeme = lexeme;
      return a;
    }
    
    int yylex(void);
    
    astnode *ast;
    
    %}
    
    %union {
      astnode *node;
      char *lexeme;
      int none;
    }
    
    %token<none> '(' ')'
    
    %token<lexeme> ATOM
    
    %left<none> ELSE
    %left<none> IF THEN
    %right<none> IFF
    %left<none> OR
    %left<none> AND
    
    %type<node> top expr
    
    %%
    
    top : expr { ast = $1; }
    
    expr : expr AND expr
           { $$ = astnode_cons(AND, $1, $3, 0); }
         | expr OR expr
           { $$ = astnode_cons(OR, $1, $3, 0); }
         | expr IFF expr
           { $$ = astnode_cons(IFF, $1, $3, 0); }
         | IF expr THEN expr
           { $$ = astnode_cons(IF, $2, $4, 0); }
         | expr ELSE expr
           { if ($1->op != IF)
             { yyerror("ELSE must pair with IF");
               $$ = 0; }
             else
             { $$ = astnode_cons(ELSE, $1, $3, 0); } }
         | '(' expr ')'
           { $$ = $2; }
         | ATOM
           { $$ = astnode_cons(ATOM, 0, 0, $1); }
         ;
    
    %%
    
    int yylex(void)
    {
      int ch;
      char tok[64], *te = tok + sizeof(tok), *tp = tok;
    
      while ((ch = getchar()) != EOF) {
        if (isalnum((unsigned char) ch)) {
          if (tp >= te - 1)
            yyerror("token overflow");
          *tp++ = ch;
        } else if (isspace(ch)) {
          if (tp > tok)
            break;
        } else if (ch == '(' || ch == ')') {
          if (tp == tok)
            return ch;
          ungetc(ch, stdin);
          break;
        } else {
          yyerror("invalid character");
        }
      }
    
      if (tp > tok) {
        yylval.none = 0;
        *tp++ = 0;
        if (strcmp(tok, "AND") == 0)
          return AND;
        if (strcmp(tok, "OR") == 0)
          return OR;
        if (strcmp(tok, "IFF") == 0)
          return IFF;
        if (strcmp(tok, "IF") == 0)
          return IF;
        if (strcmp(tok, "THEN") == 0)
          return THEN;
        if (strcmp(tok, "ELSE") == 0)
          return ELSE;
        yylval.lexeme = xstrdup(tok);
        return ATOM;
      }
    
      return 0;
    }
    
    void ast_print(astnode *a)
    {
      if (a == 0) {
        fputs("<invalid>", stdout);
        return;
      }
    
      switch (a->op) {
      case ATOM:
        fputs(a->lexeme, stdout);
        break;
      case AND:
      case OR:
      case IF:
      case IFF:
        switch (a->op) {
        case AND:
          fputs("AND(", stdout);
          break;
        case OR:
          fputs("OR(", stdout);
          break;
        case IF:
          fputs("IF(", stdout);
          break;
        case IFF:
          fputs("IFF(", stdout);
          break;
        }
        ast_print(a->left);
        fputs(", ", stdout);
        ast_print(a->right);
        putc(')', stdout);
        break;
      case ELSE:
        fputs("IFELSE(", stdout);
        ast_print(a->left->left);
        fputs(", ", stdout);
        ast_print(a->left->right);
        fputs(", ", stdout);
        ast_print(a->right);
        putc(')', stdout);
        break;
      }
    }
    
    int main(void)
    {
       yyparse();
       ast_print(ast);
       puts("");
       return 0;
    }