I'm trying to implement an if-statement in yacc. I've already wrote some code a few month ago, but i'm not sure in what i have done. This is what i have to do:
if(condition_1) { z=constant_1; }
else_if(condition_2) {z=constant_2;}
else_if ...
…
…
else_if(condition_N-1) {z=constant_N-1;}
else { z=constant_N;}
Where condition_1..N-1
must involve only var-operation(<,>,==,!=)-var
or
var-operation-constatnt
like x<5, y==x, and so on. The variable must be only 'x' or 'y' and initialised before the if-statement (set to zero otherwise). At the end i have to print the output of the variable 'z'.
I try to execute it and it seems to work correctly, but i don't know if i have make some mistake that could lead to an error.
Any help would be appreciate.
Here is the lex file:
%{
#include "y.tab.h"
void yyerror (char *s);
int yylex();
%}
%%
"print" {return PRINT;}
"exit" {return EXIT_COMMAND;}
"if" {return IF;}
"elseif" {return ELSEIF;}
"elif" {return ELSEIF;}
"else" {return ELSE;}
"(" {return LP;}
")" {return RP;}
"{" {return GLP;}
"}" {return GRP;}
"==" {return EQEQ;}
"=" {return EQ;}
"!=" {return NEQ;}
";" {return SEMI;}
"<" {return LT;}
">" {return GT;}
"x" {return _X;}
"y" {return _Y;}
"z" {return _Z;}
[-]?[0-9]+ {yylval.val = atoi(yytext); return NUMBER;}
[ \t\n]+ ;
. ;
%%
here's the yacc file:
%{
void yyerror (char *s); /* Viene chiamato in caso di errore sintattico o grammatico */
int yylex();
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int x = 0,y = 0, z = 0;
%}
%union {
int val;
int a;
}
%token PRINT
%token EXIT_COMMAND
%token IF
%token ELSEIF
%token ELSE
%token _X
%token _Y
%token _Z
$token _VAR
%token<val> NUMBER
%token LP
%token RP
%token GLP
%token GRP
%token EQEQ
%token EQ
%token NEQ
%token SEMI
%token LT
%token GT
%type<a> then
%type<a> condition
/* assignment */
%right EQ
%start main_program
%%
main_program:
| main_program rule
;
rule: '\n'
| init_variables
| if_condition
| printing
| EXIT_COMMAND {printf("Uscita dal programma in corso...\n"); exit(0);}
| error rule {yyerrok;} /* skip line */
;
printing: PRINT _X {printf("\nx=%d\n",x);}
| PRINT _Y {printf("\ny=%d\n",y);}
| PRINT _Z {printf("\nz=%d\n",z);}
;
init_variables:
_X EQ NUMBER SEMI {x = $3;}
| _Y EQ NUMBER SEMI {y = $3;}
;
if_condition: IF LP condition RP GLP then GRP else_condition {if($3 == 1){z=$6;} printf("\nz=%d\n",z);}
;
condition: _X LT NUMBER {if(x<$3){$$=1;}else{$$=0;}}
| _X GT NUMBER {if(x>$3){$$=1;}else{$$=0;}}
| _X EQEQ NUMBER {if(x==$3){$$=1;}else{$$=0;}}
| _X NEQ NUMBER {if(x!=$3){$$=1;}else{$$=0;}}
| _X LT _VAR {if(x<$3){$$=1;}else{$$=0;}}
| _X GT _VAR {if(x>$3){$$=1;}else{$$=0;}}
| _X EQEQ _VAR {if(x==$3){$$=1;}else{$$=0;}}
| _X NEQ _VAR {if(x!=$3){$$=1;}else{$$=0;}}
| _Y LT _VAR {if(y<$3){$$=1;}else{$$=0;}}
| _Y GT _VAR {if(y>$3){$$=1;}else{$$=0;}}
| _Y EQEQ _VAR {if(y==$3){$$=1;}else{$$=0;}}
| _Y NEQ _VAR {if(y!=$3){$$=1;}else{$$=0;}}
| _Y LT NUMBER {if(y<$3){$$=1;}else{$$=0;}}
| _Y GT NUMBER {if(y>$3){$$=1;}else{$$=0;}}
| _Y EQEQ NUMBER {if(y==$3){$$=1;}else{$$=0;}}
| _Y NEQ NUMBER {if(y!=$3){$$=1;}else{$$=0;}}
;
then: _Z EQ NUMBER SEMI {$$ = $3;}
;
else_condition: ELSE GLP then GRP {z = $3;}
| ELSEIF LP condition RP GLP then GRP else_condition {if($3 == 1){z=$6;}}
;
%%
void yyerror(char *s){
printf("ERRORE: %s\n",s);
}
int yywrap(){
return 1;
}
int main (void){
return yyparse();
}
Your code will work, but the way it works is perhaps unexpected and is not very general.
In effect, the else
in your syntax is misleading. One would normally expect
if (condition1) { statement1; }
else if (condition2) { statement2; }
else { statement3; }
to only test condition2
if condition1
is false, and to only execute statement3
if both conditions are false. But your code does not short circuit. Every test is evaluated, and the else
clause is the unconditioned. Looking at the code, it's as though the else
s didn't exist.
However, the code works. Why is that? It's because it executes backwards. In effect, the evaluation proceeds as if the above were written;
{ statement3; }
if (condition2) { statement2; }
if (condition1) { statement1; }
That only works because all of the statements are restricted to setting the value of z
. Thus, the last statement which executes wins, and doing the evaluation backwards produces precisely the correct answer.
So that's fine, as long as you understand how it works. But it's not very general and it's not at all efficient.
First, how it works. Roughly speaking, a context free grammar can represent a list in two ways:
left_list : element | left_list element
right_list: element | element right_list
These two formulations recognise the same inputs, but they represent different parses. In the case of operators, we call these left- and right-associative. In the case of a list, the difference is in how elements are added to the list: in the left_list
successive elements are added to the end of the list, while in the right_list
they are added to the beginning.
If you added actions to create a list to the above productions, you would have to push element
onto the end of left_list
, but shift element
into the beginning of right_list
. But it's not just which side of the list the new element is being added to. It's also the order of evaluation of side effects.
The important insight is that the inner list is fully constructed at the point at which it appears in the production. In the case of left_list
, that construction happens before element
is even parsed, so side effects in the left_list
action are evaluated left-to-right. In the case of right_list
, the construction happens at the end, so all the evaluations are stacked up until the final right_list: element
production is reduced. Only then can the other actions be evaluated, as they are popped off the stack.
(This doesn't affect the order of side-effects in the actions for element
. element
s are always handled left-to-right.)
There are two types of list in your grammar: the program itself, and the if
statements. The program is a left-list of rules, and the side-effects are in the rule actions, which are evaluated left-to-right. But the if
statements are right-lists of clauses (where the recursion terminates with a different element, the else
clause) and the side effects are in the list actions. So that's why it evaluates backwards.
Now, if the intent is to just parse an input once and evaluate it, it's pretty clear that parsing the entire expression and then throwing away everything but the first successful condition is not exactly efficient. It would be a lot better to just stop when a true condition is encountered. Furthermore, evaluating left-to-right and stopping when a true condition is found will avoid problems when the side-effects don't just cancel out previous ones. So a more general solution would not rely on backward evaluation.
Furthermore, the right recursive production really does use a stack. If the if
statement has a lot of clauses, it will use a lot of parser stack. It might even exceed the parser's stack limit.