Search code examples
ccompiler-constructionbisonflex-lexer

Unexpected value in yytext in production rule for function declaration


I'm writing a compiler with flex and bison for a college assignment. I'm having trouble adding a function identifier to my symbol table - when evaluating a function declaration I'm getting the opening parenthesis in yytext where I'd expect the identifier. In my flex file I have, where yylval is an union and vlex is a struct:

abc         [A-Za-z_]
alphanum    [A-Za-z_0-9]
id          {abc}+{alphanum}*

...

#define STORE_YYLVAL_NONE\
  do{\
    ... // location control irrelevant to the problem
    yylval.vlex.type = none_t;\
    yylval.vlex.value.sValue = yytext;\
  }while(0)

...

{id} {
  LOG_DEBUG("id: %s\n", yytext);
  STORE_YYLVAL_NONE;
  return TK_IDENTIFIER;
}

[,;:()\[\]\{\}\+\-\*/<>!&=%#\^\.\|\?\$] {
  LOG_DEBUG("special\n");
  STORE_YYLVAL_NONE;
  return *yytext;
}

...

And in my bison file I have:

new_identifier_with_node: TK_IDENTIFIER {
  hshsym_add_or_exit(&hshsym, yylval.vlex.value.sValue, &(yylval.vlex));
  $$ = ast_node_create(&(yylval.vlex));
};

func: type new_identifier_with_node '(' param_list ')' func_block { ... };

I also have a log inside hshsym_add_or_exit, which adds an identifier to my symbol table. When parsing the following program:

int k(int x,int y, int z){}
int f(){
        k(10,20,30);
}

I'm getting the following debug output:

yylex: DEBUG! id: k
yylex: DEBUG! special
hshsym_add_or_exit: DEBUG! Declaring: (

That is, when the new_identifier_with_node production is evaluated, the content of yytext is ( instead of k, as I would expect. Is the code above the cause? I have some still unresolved shift/reduce conflicts which I guess could be at fault, but I don't see how in this specific case. I believe I'm missing something really basic but I can't see what. The project is quite large (and shamefully disorganized) at this point, but I can provide a complete and reproducible example if need be.


Solution

  • The basic problem is that you are using yylval in the new_identifier_with_node production, instead of $1. $1 is the semantic value of the first symbol in the production, in this case TK_IDENTIFIER.

    In a bison action, yylval is usually the value of lookahead token, which is the next token in the input stream. That's why it shows up as a parenthesis in this case. But you cannot in general count on that because bison will perform a default reduction before reading the lookahead token. In general, using yylval in a bison action is very rarely useful, aside from some applications in error recovery.

    Even after you fix that, you will find that the semantic values are incorrect because your flex action is forwarding a pointer to an internal data buffer rather than copying the token string. See, for example, this question.