I'm using flex to generate my lexical analyzer for a college homework. It should recognize integers, floats, variables names and math operators. It also should ignore any blank like chars, like " ","\n","\t" and so on. To start, I'm only try to catch whitespace chars like this " " or more whitespace chars concatenated. My rule file is this:
%{
#include<stdio.h>
%}
%%
[0-9]+ printf("inteiro:%s\n",yytext);
[0-9]+\.[0-9]+ printf("fracionário:%s\n",yytext);
[a-zA-Z][a-zA-z0-9]* printf("variável:%s\n",yytext);
\+|\-|\*|\/|\*\* printf("operador:%s\n",yytext);
\(|\) printf("parênteses:%s\n",yytext);
[[:space:]]|[[:space:]]+;
%%
with the following input
12 + 413
it generates this output:
inteiro:12
operador:+
inteiro:413
I would like to how why the last line couldn't be something like:
[[:space:]]+;
The rule
[[:space:]]|[[:space:]]+;
is a bit odd.
A flex rule consists of a pattern and an (optional) action, separated by whitespace. Since there is no space before the ;
, it is part of the pattern, not the action. So that pattern matches either a single whitespace character ([[:space:]]
) or (|
) a sequence of one or more whitespace characters followed by a semicolon ([[:space:]]+;
).
Since there is no action in that rule, the pattern is just ignored. (That's a flex extension. Lex requires that an action be present, even if it does nothing.) In effect, that means that you will ignore all whitespace (one character at a time) and you will also ignore semicolons if they are preceded by whitespace.
What you probably intended was
[[:space:]]+ /* sem ação */
(It is useful to insert a comment to make the absence of action visible.)
By the way, character classes are usually much more readable than a forest of leaning timber (that is, a bunch of backslash escapes). Also, flex lets you use double quotes to quote strings.
So instead of
\+|\-|\*|\/|\*\* /* operador */
\(|\) /* parênteses */
you could write:
[-+*/]|"**" /* operador */
[()] /* parênteses */
In the first character class, it is important to put the - either at the beginning or the end of the list of characters so that it is not interpreted as defining a range of characters.
And, instead of inserting your own debugging printf statements, consider using the -d
(debug) option when building your scanner. That will print out complete debugging information for you, letting you see precisely what is being done by the scanner.