Search code examples
regexword-boundary

Regular Expression Not Matching Escaped Parenthesis


I have a very simple regular expression that will not match an escaped parenthesis as I am expecting (Live example: https://regex101.com/r/RRupbC/1/).

The Regular Expression:

/\b(?:Test1 \(VI|Test2 \(VI\))\b/gi

Sample Input:

Test1 (VI      <- Match
Test2 (VI)     <- No Match

I would expect the regular expression to match on the second input as it satisfies the second condition of Test2 (VI). It seems to not acknowledge the closing parenthesis as a valid match.

Why is this logic incorrect, and how can I modify the expression to successfully match my second input example?


Solution

  • The second string doesn't match because of the final \b. In particular, the effective regex you're executing is

    \bTest2 \(VI\)\b
    

    , which matches

    • a word boundary
    • the string Test2 (VI)
    • another word boundary

    A "word boundary" is the empty space between a word character and a non-word character or vice versa (the beginning/end of the string counts as a non-word character for this purpose).

    Because ) is a non-word character, it needs to be followed by a word character in the target string to make )\b match (e.g. Test2 (VI)x should match successfully).

    A minimal solution might be to pull the \b into the first branch:

    \b(?:Test1 \(VI\b|Test2 \(VI\))