Search code examples
javascriptregexescaping

How to test for word boundries, if the pattern starts or ends with punctuation?


I'm having a hard time testing whether a provided string (that likely starts with !) is surrounded by word boundries.

// found in Mozilla's RegExp guide.
function escapeRegExp(str) {
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

let msg = "a b c !test1 d e f";
let cmd = "!test1";

let re = new RegExp("\\b" + escapeRegExp(cmd) + "\\b");

console.log(`re: ${re.test(msg)}`);        // re: false

I assume this behaviour occurs, because punctuation itself is counted as a word boundry?

At least escaping the punctuation seems not to solve the problem. (I've tested a modified version of escapeRegExp() that includes !, same result.)

As an workaround I've used a version that splits msg at the white space and compares the elements with cmd. I'm not very happy with this solution as it breaks when cmd itself includes whitespace.


Solution

  • You can use adaptive dynamic word boundaries:

    // found in Mozilla's RegExp guide.
    function escapeRegExp(str) {
      return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
    }
    
    let msg = "a b c !test1 d e f";
    let cmd = "!test1";
    
    let re = new RegExp("(?!\\B\\w)" + escapeRegExp(cmd) + "(?<!\\w\\B)");
    // console.log(re.source);// => (?!\B\w)!test1(?<!\w\B)
    console.log(`re: ${re.test(msg)}`); 
    // => re: true

    The (?!\B\w)!test1(?<!\w\B) regex matches !test1 and

    • (?!\B\w) - checks if the next char is a word char, and if it is, a word boundary is required at the current location, else, the word boundary is not required
    • (?<!\w\B) - checks if the previous char is a word char, and if it is, a word boundary is required at the current location, else, the word boundary is not required.

    See some more details about adaptive dynamic word boundaries in my YT video.