Search code examples
regexrascal

Why is this seemingly correct Regex not working correctly in Rascal?


In have following code:

set[str] noNnoE = { v | str v <- eu, (/\b[^eEnN]*\b/ := v) };

The goal is to filter out of a set of strings (called 'eu'), those strings that have no 'e' or 'n' in them (both upper- and lowercase). The regular expression I've provided:

/\b[^eEnN]?\b/

seems to work like it should, when I try it out in an online regex-tester.

When trying it out in the Rascel terminal it doesn't seem to work:

 rascal>/\b[^eEnN]*\b/ := "Slander";
 bool: true

I expected no match. What am I missing here? I'm using the latest (stable) Rascal release in Eclipse Oxygen1a.


Solution

  • Actually, the online regex-tester is giving the same match that we are giving. You can look at the match as follows:

    if (/<w1:\b[^eEnN]?\b>/ := "Slander") 
      println("The match is: |<w1>|");
    

    This is assigning the matched string to w1 and then printing it between the vertical bars, assuming the match succeeds (if it doesn't, it returns false, so the body of the if will not execute). If you do this, you will get back a match to the empty string:

    The match is: ||
    

    The online regex tester says the same thing:

     Match 1
     Full match 0-0 ''
    

    If you want to prevent this, you can force at least one occurrence of the characters you are looking for by using a +, versus a ?:

    rascal>/\b[^eEnN]+\b/ := "Slander";
    bool: false
    

    Note that you can also make the regex match case insensitive by following it with an i, like so:

    /\b[^en]+\b/i
    

    This may make it easier to write if you need to add more characters into the character class.