Search code examples
compiler-constructionflex-lexerlex

How can I detect and handling some invalid token which is not listed in patterns in Lex program?


I am doing a homework about constructing lexical analyzer with Flex.

I should convert some infix expression with only +, - operator to post fix expression. Also I should handle integers, real numbers, identifiers(it dose not need to be declared) as a operand.

I defined some regular definitions and patterns like this,

/* regular definition */
delim   [ \t]
ws  {delim}+
letter  [A-Za-z_]
digit   [0-9]
id  {letter}({letter}|{digit})*
number  {digit}+(\.{digit}+)?(E[+-]?{digit}+)?

%%

{ws}        {/* no action and no returns */}
{id}        { return (ID); }
{number}    { return (NUMBER); }
[+-]        { return (OPERATOR); }
[\n]        { return (ENTER); }
<<EOF>>     { return (END_OF_FILE); }
[.*]        { return (INVALID); }

%%

and I defined pattern [.*] to describe all invalid token. For example, Invalid identifier that start with number(0abc), Invalid literal representation(12.23.2)...

If there are some invalid token in the expression(every expression is one line), I just want to print some error message, and ignore that line.

So my question is, There are some better ways to describe or to detect invalid token in my case?


Solution

  • [.*] matches a dot or an asterisk. To match an arbitrary character, use . without brackets.

    Note that you want to only match single characters here. You don't want .* as that would match entire lines and would often be chosen over the other rules because it would produce longer matches. For example foo bar would be interpreted as a single INVALID token instead of two IDs separated by a space if you used .*. So just . is what you want.