Search code examples
flex-lexerlex

With the Flex -d command line flag, why am I getting --(end of buffer or a NUL)?


I am using the -d flag when I run Flex, so that the scanner will generate debug messages. It works fine but the output contains some odd (unexpected) things. Below I show the output for a lexer that tokenizes names, numbers and newlines. Notice the first line of the output:

--(end of buffer or a NUL)

Huh? What is that?

Then there are a couple more towards the end. What are they all about?

--(end of buffer or a NUL)
--accepting rule at line 5 ("John")
--accepting rule at line 7 ("
")
--accepting rule at line 6 ("24")
--accepting rule at line 7 ("
")
--accepting rule at line 5 ("Sally")
--accepting rule at line 7 ("
")
--accepting rule at line 6 ("30")
--accepting rule at line 7 ("
")
--accepting rule at line 5 ("Bill")
--accepting rule at line 7 ("
")
--(end of buffer or a NUL)
--accepting rule at line 6 ("36")
--(end of buffer or a NUL)
--EOF (start condition 0)

Solution

  • This message refers to the scanner's internal buffer. When you first call yylex(), the buffer is empty, since no input has been read. So the scanner reports that, and then fills its buffer by reading from yyin. (That's assuming that you haven't pre-established an input buffer using one of the yy_scan_*() functions.)

    I suppose that your input ends with a line containg 36 which is not terminated by a newline. So the scanner reads the characters 3 and 4, and then attempts to read another character, because the token might be longer. But there is no more data in the buffer. As before, the scanner reports that it reached the end of the buffer and then attempts to refill the buffer from yyin. But since there's nothing more to read, the scanner gets an EOF indication. That means that the token 36 is complete, and needs to be handled.

    Note that at this point, the buffer is empty, since nothing was read. So when yylex is called for the next token, it immediately encounters the end of buffer, which is reported. This time, though, it can't refill the buffer, because yyin has mo more data. So the scanner executes the <<EOF>> action.