My input file consists of one byte, the NUL character (hex 0).
I have a Flex rule that matches the NUL character and the action returns it:
\0 { return(yytext[0]); }
Below is my complete Flex file. When I run it, I get no output. I conclude that the lexer is interpreting the value returned from my rule as the end-of-file signal. Yes? If so, how to process NUL symbols in a Flex lexer?
%option noyywrap
%%
\0 { return(yytext[0]); }
%%
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
int token = yylex();
while ( token != 0 ) {
switch(token) {
default:
printf("TOKEN: %c\n", yytext[0]);
}
token = yylex();
}
fclose(yyin);
return 0;
}
You're free to recognise NULs in your input stream as you see fit. But you cannot use 0 as a token number, because when yylex
returns 0 that will be interpreted as meaning end of input by its caller (typically yyparse
, but in this case your own main()
program).
I'm a bit puzzled by your statement:
I conclude that the lexer is interpreting the value returned from my rule as the end-of-file signal.
The lexer doesn't interpret the value returned from your rule at all. Your rules are part of the yylex()
function, and when your rule executes return X
, X
is returned from yylex
. There is no inner function which is called.
So it's not yylex
which is interpreting the value returned from your rule as the end-of-file signal. That interpretation is precisely located at the fifth line of your main()
function:
while ( token != 0 ) {
Since you're not using a parser generated by bison/yacc, you're actually free to use whatever integer you like as an end-of-file return from yylex()
. But you need to be aware that the generated yylex
will return 0 from its default <<EOF>>
rules; if you want 0 to mean something other than end of file, you'll need to add explicit <<EOF>>
rules in every start condition which return what you chose to use. It's almost always simpler to stick with the standard 0, which means that you can't use it as a token number.
So in order to handle a NUL as a single-byte token, you'll need to choose some integer other than 0 to represent that token, and thus you cannot use return yytext[0];
if yytext[0]
might be a NUL.
%option noyywrap
%{
#define NULL_TOKEN 257
%}
%%
\0 { return NULL_TOKEN; }
%%
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
int token = yylex();
while ( token != 0 ) {
switch(token) {
default:
printf("TOKEN: %d\n", token);
}
token = yylex();
}
fclose(yyin);
return 0;
}