Defining an "anything except" regex pattern for parsing in Rascal

Lex, a Unix lexer tool, allows you to define this pattern as follows: [^\a]

In this example, it specifies anything except character a. We are trying to do the same in rascal, but cannot figure out how to specify this in our mini-parser.

import String;
import util::FileSystem;

lexical CommentStart = ^"/*";

lexical CommentEnd = "*/";

lexical LineComment = ^"//";

lexical Any = ????;

syntax Badies = CommentStart | CommentEnd | LineComment | Any;


/* Parses a single string */
int parseLine (str line) {  
    pt = parse(#Badies, line);
    visit (pt) {
        case CommentStart:
            return 1;
        case CommentEnd:
            return 2;
        case LineComment:
            return 3;
    }
    return 4;
}

Perhaps we are going about our problem wrong, but if anyone can assist in defining our "anything except" regular expression, we'd be grateful.

Solution

Another possibility, which may be appropriate in some cases, is to use a character range and then subtract unwanted characters. For example, legal characters in a JSON string are any Unicode character except the ASCII control characters, double quote and backslash, OR an escaped character sequence. You may express this as:

lexical JsonChar
    = [\u0020-\U10FFFF] - [\"\\]
    | [\\] [\" \\ / b f n r t]
    | [\\] [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
    ;

(Note the capital U for 6-digit Unicode escapes.)

Or, equivalently (I hope) with ![\a00-\a19 \" \\] | .... Or even ![] - [\a00-\a19 \" \\] | ....

For example:

rascal>parse(#JsonChar, "\U01f41d")
JsonChar: (JsonChar) `🐝`

(Yes, Unicode now almost comes with a Rascal-logo emoji!)

There could possibly be a difference if the range of legal Unicode characters are every extended (or if Rascal makes it own extension), but it's probably mostly up to you what works for your brain. (The JSON standard writes it as "%x20-21 / %x23-5B / %x5D-10FFFF", in RFC ABNF notation.)