A book that I am reading [1] says something that I need help understanding.
The book first describes the input(), output(), and unput() functions. Then it says this:
The use of output() allows Lex to be used as a tool for stand-alone data "filters" for transforming a stream of data.
Below is a lexer I created. It is a stand-alone data filter for transforming a stream of data (it finds all occurrences of a word and outputs the word and its line number). It does not use output()
, it uses fprintf
. How would my lexer be different if it used output()
? Is it recommended to use output()
? Please help me to understand the significance of what the book says.
%option noyywrap
%option always-interactive
%option yylineno
%{
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define DESIRED_STRING "FORMAT"
%}
%%
[^\n]+ {
char *ret;
ret = strstr(yytext, DESIRED_STRING);
if (ret) {
fprintf(stdout, "%d: %s\n", yylineno, yytext);
}
}
\n { }
%%
int main()
{yylex();}
[1] Crafting a Compiler with C, page 69.
There is no output()
function in standard lex, nor in Flex. So I'd suggest ignoring any book which suggests you use it.
The functions specified by Posix which can be used in user code in the lex input file are:
int yylex(void);
int yymore(void);
int yyless(int n);
int input(void);
int unput(int c);
Contrary to what your book says, these functions cannot be replaced or redefined as macros. Redefining them will result in undefined behaviour.
In addition, a function with prototype:
int yywrap(void);
is called but not defined. You must define it yourself, either as a function or a macro, or link with -ll
(which defines it as a function which always returns 1). If you use flex, you also have the option of including %option noyywrap
or linking with -lfl
.
Flex provides some other functions which have to do with buffer management. But it doesn't provide or use an output()
function.
It's not quite the case that your book is inventing things. The original AT&T lex implementation did define and use a macro output(c)
(whose definition was fputc(c, yyout)
). But it didn't provide for the possibility of using a different definition either. In any event, that ship has sailed.
Posix requires that it be possible to use ECHO;
in a lex action, a feature which all lex implementations that I know of provide (including Flex). ECHO;
outputs yytext
to yyout
, and is used as the action for the default fallback rule. It's a macro, obviously. Posix doesn't require that it be possible to redefine it. Flex does allow you to do so, though. Since Flex allows yytext
to contain NUL characters, its definition of ECHO
does not use fputs
, as it was in AT&T lex; instead, Flex expands it to a call to fwrite
, using yyleng
as the number of bytes to output.
Flex's C++ interface (which is clearly non-standard since there is no standard for C++ lexers) adds the method virtual void LexerOutput(const char*, int)
to the yyFlexLexer
class. You can override it in your derived class.