Search code examples
posixlex

Handling invalid tokens in Lex


When an invalid token or character is scanned with Lex, is there a special error code which should be returned or should I just call the exit function with EXIT_FAILURE?


Solution

  • Normally, you should not attempt to detect errors in a lexical scanner. It is much simpler to simply rely on a fallback rule

    .     return *yytext;
    

    to handle both single-character operator tokens and errors. Bison/yacc will treat any unknown token type as an error, and that will allow error-handling to be centralised in the parser component.

    Occasionally it is impossible to avoid noticing an error. For example, in a language like C whose string literals cannot span multiple source lines, an unclosed quote must be detected by the lexical scanner if error recovery is to be attempted. (If you are not going to attempt error recovery you might as well just let the fallback rule handle the unmatched quote as a single '"' token, as above, but if you are going to attempt error recovery, it would be better to continue with the next line rather than the next character.)

    In such a case, it is still possible to use some otherwise unused single character token. Or you could define a special bad-token token in your bison/flex file, which will have almost exactly the same effect.

    ["]([^\n\\]|\\(.|\n))*["]    { return STRING; }
    ["]([^\n\\]|\\(.|\n))*       { return '"'; }
    

    or

    ["]([^\n\\]|\\(.|\n))*["]    { return STRING; }
    ["]([^\n\\]|\\(.|\n))*       { return BAD_STRING; }
    

    Even if you are not going to attempt error recovery (for now), a lexer should not take it on itself to call exit. That would preclude the parser from producing an error message or returning an error code. Like any library function, not even the parser should call exit; only the client code can take that decision.