Search code examples
javacc

Are shorthand character classes (such as \d) not supported in JavaCC


I am trying to learn to use JavaCC and realized that it has support for regular expressions. Call me lazy but I thought the default/common way to define digits is a bit too long:

TOKEN : { < #DIGITS : (["0" - "9"])+ >}

I tried using the shorthand character classes such as:

TOKEN : { < #DIGITS : (\d)+ >}

but the "compiler compiler" doesn't seem to like it. I get Lexical errors for the shorthand character. I could not find any documentation on the matter so I am not sure if I am doing something wrong or that it's simply not supported. If anyone can confirm/deny my assumption, that javacc not playing well with the shorthand character classes, I would be very appreciative.


Solution

  • Your finding that it's not supported is correct. Regular expressions in JavaCC are made up only of string literals, references to other regular expressions, and references to the predefined regular expression < EOF >.

    However, what you are doing with the code you have there is creating your own shortcut. The number sign means that the symbol is private, i.e., can be used only inside regular expressions. So, defining it as TOKEN : { < #D : (["0" - "9"])+ > } means you could then use < D > within other token definitions.

    The example grammar javacc.jj, included with the binary distribution, is the official grammar, so looking in this file you can see just what is parsable by this grammar. The output seems to be a essentially a grammar validator.