Search code examples
javalexerjavacc

EBNF to JavaCC lexer


How do you convert ::= [A-Za-z] into JavaCC?

What I have done:

TOKEN :
{
  < LETTER : (["A"-"Z"])>
}

but I don't know how to do the smaller letter parts


Solution

  • Like this:

    TOKEN :
    {
      < LETTER : (["A"-"Z", "a"-"z"])>
    }
    

    Reference:

    A character list describes a set of characters. A legal match for a character list is any character in this set. A character list is a list of character descriptors separated by commas within square brackets. Each character descriptor describes a single character or a range of characters (see character descriptor below), and this is added to the set of characters of the character list. If the character list is prefixed by the "~" symbol, the set of characters it represents is any UNICODE character not in the specified set.

    Note that the rule:

    TOKEN :
    {
      < LETTER : (["A"-"Z", "a"-"z"])>
    }
    

    is equivalent to:

    TOKEN :
    {
      < LETTER : ["A"-"Z", "a"-"z"]>
    }
    

    which both match a single letter. If you want to repeat the class, you do need the parentheses and append a + quantifier:

    TOKEN :
    {
      < LETTERS : (["A"-"Z", "a"-"z"])+ >
    }