I just read this answer and cannot wrap my head around why it works. It seems that both of these patterns (negative lookahead and negative lookbehind) will match a -
not at the beginning of the string:
(?!^)-
(?<!^)-
Similarly a positive lookahead and positive lookbehind both behave the same and will match a -
at the beginning of the string:
(?=^)-
(?<=^)-
But this behaviour only seems to apply to lookahead and lookbehind patterns when using an ^
anchor. In other words, they behave as expected below, where only the second pattern (positive lookbehind) will match X-
.
(?=X)-
(?<=X)-
Can someone please explain the mechanics of this?
When the regex engine processes a lookahead, it checks whether the following part of the input matches the pattern, but it doesn't advance the position in the input. So if you have something after the lookahead, it must match at the same place as the lookahead.
This works fine when the lookahead contains a zero-width assertion like ^
or \b
. But if it contains something that matches 1 or more characters, the pattern after the lookahead must match at the same place. Since X
and -
can't both match the same character, (?=X)-
can never succeed. On the other hand, a pattern like (?=X.*).*-
can work because .*
allows the matching to include the portion that includes X
.
Lookbehind work differently, since it's asserting that the text before the main pattern matches. It goes backwards, so it doesn't require the lookbehind match to be at the same place as the main match.