Search code examples
regexword-boundary

Strange behaviour of word boundary `\b` and `\B` with special characters


For regex (456)\b and input 123456 xyz it works as expected and the output is 456. Case 1..

For almost the same regex (456)#\b and input 123456# xyz I expected the output to be 456#. Because \b should still match the end of the line after matching #.

But the regex engine failed to find a match. Case 2.

Strangely, it works for the regex (456)#\B. Notice the non-word boundary \B in this regex. Case 3. What does \B match here?

I went through This answer for understanding \b and \B and seems like my understanding is right.

So why is it strange? What am I missing here? Why does \B work while \b doesn't in case 2 and case 3?


Solution

  • A word boundary asserts the position using the following regex - (^\w|\w$|\W\w|\w\W). A word here is anything in [a-zA-Z0-9_]

    So in your case, for the regex (456)#\b, trying to match the string 123456# xyz will fail since # and the space after it are BOTH non- words(there needs to be one word and one non-word for a boundary) and thereby not satisfying the above regex.

    Amusingly, if you try adding a word after the # in the string, say 123456#b xyz, it'll match, like shown here