Search code examples
regexrascal

Is there a difference in behavior of parameterized vs. literal regular expressions in Rascal?


I was working on a method that accepts a regular expression and a string to check this expression against.

public bool match_case_insensitive(str regexp, str toMatch)
{
    bool match = /<regexp>/i := toMatch;
    if(match) println(toMatch);
    return match;
}

Assume the following regular expression: (.*[e]){2}, which matches any string with at least two e's. Assume the following string to check: merely

Calling match_case_insensitive("(.*[e]){2}", "merely") will return false.

Evaluating the expression in the terminal will yield true: /(.*[e]){2}/ := "merely" returns bool:true and it is the same for /(.*[e]){2}/i := "merely" when case-insensitive.

I would expect /<regexp>/i in my function to evaluate to /(.*[e]){2}/i but this is apparently not true. What is the supposed difference between running the raw comparison in the terminal and using this method? I think Rascal has no support for capturing groups, as I couldn't find it in the documentation. Another reason I can think of is that Rascal escapes all string characters and therefore a string can never really contain a regex that contains metacharacters.


Solution

    1. you deduced correctly, at interpolation time Rascal will escape meta characters (I do hope all of them) so you can not construct regular expressions dynamically. So regex=".";//:= "bla" will expand to /\./ := "bla" before even compiling the regular expression.
    2. the notation does support capturing groups as using this notation <name: regex>

      rascal>if (/<a:a*><b:b*>/ := "aaabbb")

      >>>>>println("<a> - <b>");

      aaa - bbb