I was writing the cat(1)
utility with lex.
When I consider how to implement option -n
, i.e. number every line.
but I have to write something like this:
^. {
printf("%8d ", ++lino);
ECHO;
}
I know the end of line(EOL) could matched use anchor $
and \n
, so I wonder if there's something alike to match the begin of line(BOL) anchor, so I don't have to use the ECHO;
(I agree with the comment by Joachim Pileborg that lex
is not the tool for implementing cat
. The rest of this answer is in the spirit of explaining a bit about lex
.)
The provided lex program will not work if there are empty lines in the input, because ^.
does not match an empty line. (In lex, .
does not match a newline character.) So a reasonably minimal (f)lex input file would be:
%options noyywrap noinput nounput
%%
int lino = 0;
^(.|\n) { printf("%8d %c", ++lino, *yytext); }
Here, I just print out the matched token in the printf
, which is the equivalent to using ECHO
. So it does not really "eliminate" the ECHO
.
(f)lex rules must match at least one character. So it wouldn't really be possible for a pattern to consist only of $
, any more than it would be possible for a pattern to consist only of ^
(which is a BOL anchor). In that sense, the answer to your question is simply "no".
A more easily-understood (and probably more efficient) solution is to actually match each line. This solution never uses ECHO
, not even in the default rule, so I've told flex to not generate a default rule:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*\n? { printf("%8d %s", ++lino, yytext); }
That's not quite perfect, because it will truncate lines which contain a NUL character. (That is, the printf
will effectively truncate the line; the line will be parsed correctly.) To fix it, it's necessary to use fwrite
instead of printf
:
%options noyywrap noinput nounput nodefault
%%
int lino = 0;
.*\n? { printf("%8d %s", ++lino);
fwrite(yytext, 1, yyleng, yyout); }
The newline is made optional (\n?
) in case the last line of the file is not terminated with a newline. Because (f)lex patterns never match zero characters, that rule is actually equivalent to the more precise but clunkier regular expression .*\n|.+
.