I am doing a homework about constructing lexical analyzer with Flex.
I should convert some infix expression with only +, - operator to post fix expression. Also I should handle integers, real numbers, identifiers(it dose not need to be declared) as a operand.
I defined some regular definitions and patterns like this,
/* regular definition */
delim [ \t]
ws {delim}+
letter [A-Za-z_]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?
%%
{ws} {/* no action and no returns */}
{id} { return (ID); }
{number} { return (NUMBER); }
[+-] { return (OPERATOR); }
[\n] { return (ENTER); }
<<EOF>> { return (END_OF_FILE); }
[.*] { return (INVALID); }
%%
and I defined pattern [.*] to describe all invalid token. For example, Invalid identifier that start with number(0abc), Invalid literal representation(12.23.2)...
If there are some invalid token in the expression(every expression is one line), I just want to print some error message, and ignore that line.
So my question is, There are some better ways to describe or to detect invalid token in my case?
[.*]
matches a dot or an asterisk. To match an arbitrary character, use .
without brackets.
Note that you want to only match single characters here. You don't want .*
as that would match entire lines and would often be chosen over the other rules because it would produce longer matches. For example foo bar
would be interpreted as a single INVALID
token instead of two ID
s separated by a space if you used .*
. So just .
is what you want.