I'm using flex to read the contents of a cminus file and then display the contents in the following format: : tokens I can display the tokens but when I try to display the line numbers I can only view the line numbers. My flex file:
%option noyywrap
%option yylineno
%{
#include <stdio.h>
int lineNo = 1;
%}
line ^.*\n
letter [a-zA-Z]
digit [0-9]
%x IN_COMMENT
%%
{line} {printf("%d:\n", lineNo++);}
{digit}+ {
printf("found NUM token\n");
}
"while" {
printf("found WHILE token\n");
}
"else" {
printf("found ELSE token\n");
}
"if" {
printf("found IF token\n");
}
"return" {
printf("found RETURN token\n");
}
"void" {
printf("found VOID token\n");
}
"int" {
printf("found INT token\n");
}
"+" {
printf("found PLUS token\n");
}
"-" {
printf("found MINUS token\n");
}
"*" {
printf("found TIMES token\n");
}
"/" {
printf("found OVER token\n");
}
"<" {
printf("found LT token\n");
}
"<=" {
printf("found LTEQ token\n");
}
">" {
printf("found GT token\n");
}
">=" {
printf("found GTEQ token\n");
}
"==" {
printf("found EQ token\n");
}
"!=" {
printf("found NEQ token\n");
}
"=" {
printf("found ASSIGN token\n");
}
";" {
printf("found SEMI token\n");
}
"," {
printf("found COMMA token\n");
}
"(" {
printf("found LPAREN token\n");
}
")" {
printf("found RPAREN token\n");
}
"[" {
printf("found LBRACKET token\n");
}
"]" {
printf("found RBRACKET token\n");
}
"{" {
printf("found LBRACE token\n");
}
"}" {
printf("found RBRACE token\n");
}
[ \t]+
<INITIAL>{
"/*" BEGIN(IN_COMMENT);
}
<IN_COMMENT>{
"*/" BEGIN(INITIAL);
[^*\n]+ // eat comment in chunks
"*" // eat the lone star
\n yylineno++;
}
{letter}{letter}* {
printf("found ID token\n");
}
. {printf("Unrecognized character");}
%%
int main( int argc, char **argv )
{
++argv, --argc;
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
yylex();
}
My input file:
/* Sample program
in CMinus language -
computes factorial
*/
void main (void)
{
int x;
int whileimatit;
/* read x; { input an integer } */
x = input();
/* if x > 0 then { don't compute if x <= 0 } */
if ( x > 0 ) {
/* fact := 1; */
whileimatit = 1;
/* repeat */
while (x > 0)
{
/* fact := fact * x; */
whileimatit = whileimatit * x;
/* x := x - 1 */
x = x - 1;
/* until x = 0; */
}
/* write fact { output factorial of x } */
output(whileimatit);
/* end */
}
}
My output:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
Desired output:
1:
2:
3:
4:
5:
found VOID token
found ID token
found LPAREN token
found VOID token
found RPAREN token
6:
found LBRACE token
7:
found INT token
found ID token
found SEMI token
8:
found INT token
found ID token
found SEMI token
9:
10:
11:
found ID token
found ASSIGN token
found ID token
found LPAREN token
found RPAREN token
found SEMI token
12:
13:
14:
found IF token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token
15:
16:
found ID token
found ASSIGN token
found NUM token
found SEMI token
17:
18:
found WHILE token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
19:
found LBRACE token
20:
21:
found ID token
found ASSIGN token
found ID token
found TIMES token
found ID token
found SEMI token
22:
23:
found ID token
found ASSIGN token
found ID token
found MINUS token
found NUM token
found SEMI token
24:
25:
found RBRACE token
26:
27:
found ID token
found LPAREN token
found ID token
found RPAREN token
found SEMI token
28:
29:
30:
found RBRACE token
31:
found RBRACE token
If i remove the following line:
{line} {printf("%d:\n", lineNo++);}
I get the following output:
found VOID token
found ID token
found LPAREN token
found VOID token
found RPAREN token
found LBRACE token
found INT token
found ID token
found SEMI token
found INT token
found ID token
found SEMI token
found ID token
found ASSIGN token
found ID token
found LPAREN token
found RPAREN token
found SEMI token
found IF token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token
found ID token
found ASSIGN token
found NUM token
found SEMI token
found WHILE token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token
found ID token
found ASSIGN token
found ID token
found TIMES token
found ID token
found SEMI token
found ID token
found ASSIGN token
found ID token
found MINUS token
found NUM token
found SEMI token
found RBRACE token
found ID token
found LPAREN token
found ID token
found RPAREN token
found SEMI token
found RBRACE token
found RBRACE token
I am unable to print the line numbers together with the output. Can anyone help?
You define line
as
line ^.*\n
which means that it matches an entire line. So that is what will happen. Every line will be matched as a line
token, and no other rule will ever be used.
You could ditch the line
definition [Note 1], and use the pattern/action rule:
\n {printf("%d:\n", lineNo++);}
However, that will trigger at the end of a line rather than the beginning. Also, it will not trigger at the very beginning of the parse, and it will trigger at the end of the last line which is also undesirable.
If you are just trying to implement debugging output, I strongly recommend using Flex's built-in trace facility, enabled with the -d
option when building your scanner. You might also want to use the %option yylineno
option, which will tell flex to automatically track the input line number. (Getting flex to do this rather than doing it yourself is a lot more robust, and obviously slightly less work.)
If you really want to output a line number at the beginning of every line, you can use a start condition combined with yyless()
to rescan. Here is a minimal example:
%option nodefault noyywrap noinput nounput
%option yylineno
%x BOL
%%
BEGIN(BOL); /* Note 2 */
<BOL>.|\n { yyless(0); /* Note 3 */
printf("Line %d:", yylineno);
BEGIN(INITIAL);
}
\n putchar('\n'); BEGIN(BOL); /* Note 4 */
/* Rest of the rules go here. The following is minimal. */
[[:blank:]]+ ;
[^[:blank:]\n]+ printf(" word: '%s'", yytext);
In fact, you could ditch most if not all of the definitions. Is [0-9]
less readable than {digit}
? I'd say "No", since it has a clear meaning, while digit
might have been defined as anything. Even clearer would be the built-in character class [[:digit:]]
.
(Line 6) Any action before the first rule is executed every time yylex
is called. In this case, we are only calling yylex
once so we can get away with this; if we were actually returning tokens, it would be more convenient to set the initial state from the driver. Or just use INITIAL
for the start-of-line state, and some other start condition for normal operation.
(Lines 7-9) When we're in the BOL state, we respond to any following character, including a newline (which would indicate an empty line). This rule will not be executed if we're at EOF, since in that case there is no following character. The response is to remove the character we just read from the token (which leaves the token empty), then print the message indicating which line we're at. Finally, we change to normal scanning state, which will start with the first character on the line (because of the yyless
).
It's tempting to try to do this with the ^
anchor, but that won't work. First, flex does not permit empty patterns, so an anchor by itself is not a valid pattern. It is still necessary to match the following character. However, it will not be possible to rescan that character without once again triggering the anchor rule, since the character will still be at the beginning of a line when rescanned. Hence the use of a start condition.
(Line 11) When we hit a newline, we need to change to BOL
state, so that the next character (if there is one) will trigger the output of the line number. Since this example prints tokens on the same line as the line number, we need to also send the newline to the output to terminate the current line.