Search code examples
yacclex

Lex & yacc parsing statements


I am trying to build a very simple language using lex and yacc It can have only one int variable through assignment like this

a = 1

It can print variable like this

print a

It has to print the value only if variable name matches else it has to print error message
My lex file is

%{
    #include "y.tab.h"
    #include <stdlib.h>
    void yyerror(char *);
%}
letter      [A-z]
digit       [0-9]
%%
"print"         {return PRINT;}
{letter}+      { yylval.id = yytext;return IDENTIFIER;}
{digit}+      { yylval.num = atoi(yytext);return NUMBER; }

[=\n]      return *yytext;

[ \t]       ; /* skip whitespace */

.           yyerror("invalid character");

%%

int yywrap(void) {
    return 1;
}

And my yacc file is

%{
    #include <stdio.h>
    #include <string.h>
    int yylex(void);
    void yyerror(char *);
    int value;
    char *key;
    void print_var(char *s);
%}
%union {char *id;int num;}
%start exp
%token <id>   IDENTIFIER
%token <num>   NUMBER
%token PRINT

%%

exp: IDENTIFIER '=' NUMBER '\n' {key = $1;value = $3;}
    |PRINT IDENTIFIER '\n'      {print_var($2);}
    |exp IDENTIFIER '=' NUMBER '\n' {key = $2;value = $4;}
    |exp PRINT IDENTIFIER '\n'      {print_var($3);}
    ;
%%
void yyerror(char *s) {
    fprintf(stderr, "%s\n", s);
}

void print_var(char *s){
    if(strcmp(s,key) == 0){
        printf("%s:%d\n",key,value);
    }else{
        printf("%s not found\n",s);
    }
}

int main(void) {
    yyparse();
    return 0;
}

But when I type something like this

a = 1
print a

I get the following error a not found


Solution

  • Once your lex program returns to yacc (or basically, goes anywhere outside the rule), the yytext value is not necessarily untouched. Rather than make id a pointer to yytext, you should use strdup or something that to make a copy of the string in yytext.

    This is mentioned in flex's manual in 21.3 A Note About yytext And Memory.

    In your grammar, this is the only rule shown:

    exp: IDENTIFIER '=' NUMBER '\n' {key = $1;value = $3;}
        |PRINT IDENTIFIER '\n'      {print_var($2);}
        |exp IDENTIFIER '=' NUMBER '\n' {key = $2;value = $4;}
        |exp PRINT IDENTIFIER '\n'      {print_var($3);}
        ;
    

    It copies a pointer into the global variable key (again, that runs into the problem of memory corruption) and prints the $2 identifier. If you allocated a copy of yytext in the lex program, it seems you could safely free that in print_var. The two assignments to key should be replaced by calls to a function which sets key, while checking if it was already set, e.g.,

    void set_key(const char *value)
    {
        if (key != 0)
            free(key);
        key = strdup(value);
    }
    

    That would leave no more than one copy of yytext allocated at a time — an insignificant amount of memory.