I am trying to write a simple compiler. I am currently in the scanner part. Concerning string token, I have the following rule in the flex file :
\"([^\\\n]|\\.)*\" { clean_string(); return TK_STRING; }
It works perfectly (this is not the question). clean_string function is called to removed leading and trailing " and to transform \n and \t to their corresponding ascii character.
int clean_string () {
char * mystr;
mystr=strdup(yytext+1) ; // copy yytext and remove leading "
if (! mystr) return 1;
mystr[yyleng-2]='\0'; // remove trailing "
for (int i=0, j=0; i<=strlen(mystr); i++, j++) { // "<=" and not "<" to get /0, i : mystr indice and j : yytext indice
if (mystr[i]=='\\') {
i++;
if (mystr[i]=='n') yytext[j]='\n';
else if (mystr[i]=='t') yytext[j]='\t';
else yytext[j]=mystr[i];
}
else yytext[j]=mystr[i];
}
yyleng=strlen(yytext);
free(mystr);
return 0 ;
}
It also works perfectly.
My question is the following :
At the end of the function, I update yyleng because yytext has changed. I wonder if I have another variable to update to avoid some unexpected behavior in another part of the program.
Unless you use yymore()
in your action (and evidently, you do not), the flex-generated scanner does not require yyleng
to reflect the length of yytext
. You can modify yyleng
in any way, or you can modify the contents of yytext
between index 0 and index yyleng-1
, including making it shorter.
Having said that, you need to be aware that the contents of yytext
are only stable until the next time you call yylex
. In almost all applications, particularly if you are planning on using the scanner from a parser with lookahead (such as a parser generated by yacc/bison), you will want the scanner to use a copy of the contents of yytext
. In particular, yacc/bison generated scanners expect to find the semantic value of tokens (that is, the token string or some value derived from it) in some member of the union yylval
, generally in the form of a pointer.
So I'd strongly recommend that your function put the desired string contents into mystr
and then return it (rather than freeing it immediately), and that the action place the pointer in a place where the parser can use it. That will require only a minor modification to your code and will make the scanner usable with a yacc/bison-generated parser.