How to correctly implement an if-statment using yacc?

I'm trying to implement an if-statement in yacc. I've already wrote some code a few month ago, but i'm not sure in what i have done. This is what i have to do:

  if(condition_1) { z=constant_1; } 
  else_if(condition_2) {z=constant_2;}
  else_if ...
  … 
  … 
  else_if(condition_N-1) {z=constant_N-1;}
  else { z=constant_N;}

Where condition_1..N-1 must involve only var-operation(<,>,==,!=)-var or var-operation-constatnt like x<5, y==x, and so on. The variable must be only 'x' or 'y' and initialised before the if-statement (set to zero otherwise). At the end i have to print the output of the variable 'z'. I try to execute it and it seems to work correctly, but i don't know if i have make some mistake that could lead to an error. Any help would be appreciate.

Here is the lex file:

%{
#include "y.tab.h"
void yyerror (char *s);
int yylex();
%}

%%

"print"             {return PRINT;}
"exit"              {return EXIT_COMMAND;}
"if"                {return IF;}
"elseif"            {return ELSEIF;}
"elif"              {return ELSEIF;}
"else"              {return ELSE;}
"("             {return LP;}
")"             {return RP;}
"{"             {return GLP;}
"}"             {return GRP;}
"=="                {return EQEQ;}
"="             {return EQ;}
"!="                {return NEQ;}
";"             {return SEMI;}
"<"             {return LT;}
">"             {return GT;}

"x"             {return _X;}
"y"             {return _Y;}
"z"             {return _Z;}
[-]?[0-9]+          {yylval.val = atoi(yytext); return NUMBER;}
[ \t\n]+            ;
.               ;



%%

here's the yacc file:

%{
void yyerror (char *s);     /* Viene chiamato in caso di errore sintattico o grammatico */
int yylex();
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int x = 0,y = 0, z = 0;

%}

%union {
    int val;
    int a;
}

%token          PRINT
%token          EXIT_COMMAND
%token          IF
%token          ELSEIF
%token          ELSE
%token          _X
%token          _Y
%token          _Z
$token          _VAR
%token<val>         NUMBER
%token          LP
%token          RP
%token          GLP
%token          GRP
%token          EQEQ
%token          EQ
%token          NEQ
%token          SEMI
%token          LT
%token          GT
%type<a>        then
%type<a>        condition

/* assignment */
%right EQ


%start main_program

%%


main_program:   
        | main_program rule
        ;                 

rule:         '\n'
        | init_variables 
        | if_condition                  
        | printing 
        | EXIT_COMMAND                      {printf("Uscita dal programma in corso...\n"); exit(0);}
        | error rule                        {yyerrok;} /* skip line */
          ;


printing:     PRINT _X                      {printf("\nx=%d\n",x);} 
        | PRINT _Y                      {printf("\ny=%d\n",y);}                 
        | PRINT _Z                      {printf("\nz=%d\n",z);}
          ;


init_variables: 
          _X EQ NUMBER SEMI                 {x = $3;}
        | _Y EQ NUMBER SEMI                 {y = $3;}
          ;

if_condition:     IF LP condition RP GLP then GRP else_condition    {if($3 == 1){z=$6;} printf("\nz=%d\n",z);}  
        ;


condition:    _X LT NUMBER                      {if(x<$3){$$=1;}else{$$=0;}}
        | _X GT NUMBER                      {if(x>$3){$$=1;}else{$$=0;}}
        | _X EQEQ NUMBER                    {if(x==$3){$$=1;}else{$$=0;}}
        | _X NEQ NUMBER                     {if(x!=$3){$$=1;}else{$$=0;}}

        | _X LT _VAR                        {if(x<$3){$$=1;}else{$$=0;}}
        | _X GT _VAR                        {if(x>$3){$$=1;}else{$$=0;}}
        | _X EQEQ _VAR                      {if(x==$3){$$=1;}else{$$=0;}}
        | _X NEQ _VAR                       {if(x!=$3){$$=1;}else{$$=0;}}

        | _Y LT _VAR                        {if(y<$3){$$=1;}else{$$=0;}}
        | _Y GT _VAR                        {if(y>$3){$$=1;}else{$$=0;}}
        | _Y EQEQ _VAR                      {if(y==$3){$$=1;}else{$$=0;}}
        | _Y NEQ _VAR                       {if(y!=$3){$$=1;}else{$$=0;}}

        | _Y LT NUMBER                      {if(y<$3){$$=1;}else{$$=0;}}
        | _Y GT NUMBER                      {if(y>$3){$$=1;}else{$$=0;}}
        | _Y EQEQ NUMBER                    {if(y==$3){$$=1;}else{$$=0;}}
        | _Y NEQ NUMBER                     {if(y!=$3){$$=1;}else{$$=0;}}

          ;

then:         _Z EQ NUMBER SEMI                 {$$ = $3;}
          ;

else_condition:   ELSE GLP then GRP                 {z = $3;}
        | ELSEIF LP condition RP GLP then GRP else_condition    {if($3 == 1){z=$6;}}
          ;

%%

void yyerror(char *s){
    printf("ERRORE: %s\n",s);
}

int yywrap(){
    return 1;
}

int main (void){
    return yyparse();
}

Solution

Your code will work, but the way it works is perhaps unexpected and is not very general.

In effect, the else in your syntax is misleading. One would normally expect

if      (condition1) { statement1; }
else if (condition2) { statement2; }
else                { statement3; }

to only test condition2 if condition1 is false, and to only execute statement3 if both conditions are false. But your code does not short circuit. Every test is evaluated, and the else clause is the unconditioned. Looking at the code, it's as though the elses didn't exist.

However, the code works. Why is that? It's because it executes backwards. In effect, the evaluation proceeds as if the above were written;

{ statement3; }
if (condition2) { statement2; }
if (condition1) { statement1; }

That only works because all of the statements are restricted to setting the value of z. Thus, the last statement which executes wins, and doing the evaluation backwards produces precisely the correct answer.

So that's fine, as long as you understand how it works. But it's not very general and it's not at all efficient.

First, how it works. Roughly speaking, a context free grammar can represent a list in two ways:

left_list : element | left_list element
right_list: element | element right_list

These two formulations recognise the same inputs, but they represent different parses. In the case of operators, we call these left- and right-associative. In the case of a list, the difference is in how elements are added to the list: in the left_list successive elements are added to the end of the list, while in the right_list they are added to the beginning.

If you added actions to create a list to the above productions, you would have to push element onto the end of left_list, but shift element into the beginning of right_list. But it's not just which side of the list the new element is being added to. It's also the order of evaluation of side effects.

The important insight is that the inner list is fully constructed at the point at which it appears in the production. In the case of left_list, that construction happens before element is even parsed, so side effects in the left_list action are evaluated left-to-right. In the case of right_list, the construction happens at the end, so all the evaluations are stacked up until the final right_list: element production is reduced. Only then can the other actions be evaluated, as they are popped off the stack.

(This doesn't affect the order of side-effects in the actions for element. elements are always handled left-to-right.)

There are two types of list in your grammar: the program itself, and the if statements. The program is a left-list of rules, and the side-effects are in the rule actions, which are evaluated left-to-right. But the if statements are right-lists of clauses (where the recursion terminates with a different element, the else clause) and the side effects are in the list actions. So that's why it evaluates backwards.

Now, if the intent is to just parse an input once and evaluate it, it's pretty clear that parsing the entire expression and then throwing away everything but the first successful condition is not exactly efficient. It would be a lot better to just stop when a true condition is encountered. Furthermore, evaluating left-to-right and stopping when a true condition is found will avoid problems when the side-effects don't just cancel out previous ones. So a more general solution would not rely on backward evaluation.

Furthermore, the right recursive production really does use a stack. If the if statement has a lot of clauses, it will use a lot of parser stack. It might even exceed the parser's stack limit.