Search code examples
compilationlexical-analysisjflex

Jflex ambiguity


I have these two rules from a jflex code:

Bool = true
Ident = [:letter:][:letterdigit:]*

if I try for example to analyse the word "trueStat", it gets recognnized as an Ident expression and not Bool. How can I avoid this type of ambiguity in Jflex?


Solution

  • In almost all languages, a keyword is only recognised as such if it is a complete word. Otherwise, you would end up banning identifiers like format, downtime and endurance (which would instead start with the keywords for, do and end, respectively). That's quite confusing for programmers, although it's not unheard-of. Lexical scanner generators, like Flex and JFlex generally try to make the common case easy; thus, the snippet you provide, which recognises trueStat as an identifier. But if you really want to recognise it as a keyword followed by an identifier, you can accomplish that by adding trailing context to all your keywords:

    Bool = true/[:letterdigit:]*
    Ident = [:letter:][:letterdigit:]*
    

    With that pair of patterns, true will match the Bool rule, even if it occurs as trueStat. The pattern matches true and any alphanumeric string immediately following it, and then rewinds the input cursor so that the token matched is just true.

    Note that like Lex and Flex, JFlex accepts the longest match at the current input position; if more than one rule accepts this match, the action corresponding to the first such rule is executed. (See the manual section "How the Input is Matched" for a slightly longer explanation of the matching algorithm.) Trailing context is considered part of the match for the purposes of this rule (but, as noted above, is then removed from the match).

    The consequence of this rule is that you should always place more specific patterns before the general patterns they might override, whether or not the specific pattern uses trailing context. So the Bool rule must precede the Ident rule.