Search code examples
rascal

regular expression for removing empty lines produces wrong results


Can someone help me solve the problem I'm having with a regular expression? I have a file containing the following code: file content

I'm using a visit to find matches and replace them so that I can remove the empty lines. The result is, however, not what I'm expecting. The code is as follows:

str content = readFile(location);
// Remove empty lines
content = visit (content) {
    case /^[ \t\f\v]*?$(?:\r?\n)*/sm => ""
}

This regular expression also removes non empty lines resulting in an output equal to: output code

Can someone explain what I'm doing wrong with the regular expression as well as the one shown below? I can't seem to figure out why it's not working.

str content = readFile(location);
// Remove empty lines
content = visit (content) {
    case /^\s+^/m => ""
}

Kind regards,

Bob


Solution

  • I think the big issue here is that in the context of visit, the ^ anchor does not mean what you think it does. See this example:

    rascal>visit ("aaa") { case /^a/ : println("yes!"); }
    yes!
    yes!
    yes!
    
    • visit matches the regex at every postfix of the string, so the ^ is relative for every postfix.
    • first it starts at "aaa", then at "aa" and then at "a".

    In your example visit, what will happen is that empty postfixes of lines will also match your regex, and substitute those by empty strings. I think an additional effect is that the carriage return is not eaten up eagerly.

    To fix this, simply not use a visit but a for loop or while, with a := match as the condition.