For regex (456)\b
and input 123456 xyz
it works as expected and the output is 456. Case 1..
For almost the same regex (456)#\b
and input 123456# xyz
I expected the output to be 456#
. Because \b
should still match the end of the line after matching #
.
But the regex engine failed to find a match. Case 2.
Strangely, it works for the regex (456)#\B
. Notice the non-word boundary \B in this regex. Case 3. What does \B
match here?
I went through This answer for understanding \b and \B
and seems like my understanding is right.
So why is it strange? What am I missing here? Why does \B
work while \b
doesn't in case 2 and case 3?
A word boundary asserts the position using the following regex - (^\w|\w$|\W\w|\w\W)
. A word here is anything in [a-zA-Z0-9_]
So in your case, for the regex (456)#\b
, trying to match the string 123456# xyz
will fail since # and the space after it are BOTH non- words(there needs to be one word and one non-word for a boundary) and thereby not satisfying the above regex.
Amusingly, if you try adding a word after the # in the string, say 123456#b xyz
, it'll match, like shown here