Search code examples
bisonflex-lexer

How to enforce whitespace separation with Flex?


How can I enforce that keywords should be separated by whitespace with Flex?

For example, if cat and dog are the keywords, then cat dog should be accepted, but catdog should not.

Using trailing context (see below) works, but adding it to every keyword feels inconvenient and ugly. Is there a better way?

cat/[ \t\n\r] { return CAT; }
dog/[ \t\n\r] { return DOG; }

Solution

  • Usually you have another rule that matches alphanumeric sequences that aren't keywords, which might look like this:

    [a-zA-Z_][a-zA-Z0-9_]* { return IDENTIFIER; }
    

    If you have that, catdog will be recognized as an identifier, not as two keywords (as per the maximum munch rule).

    If your language doesn't have anything like identifiers, you can still do the same thing to explicitly mark non-keywords as invalid:

    [a-z]+ { /* produce an appropriate error message here */ }