Recently I stumbled upon this weird REGEX, which is a combination of positive and negative lookahead and I can not wrap my head around what does really it do. Keep in mind this is some Java regex syntax.
(?=((?!\bword1\b|\bword2\b).)+?\s*?)
^^ ^^
What does those two nested lookaheads do? Can this be simplified?
.
matches if it is not "w" in "word1" or "word2" (can be simplified \bword1\b|\bword2\b
→ \bword[12]\b
), between non-words. This is the meaning of the negative assertion,+?
means at least one such .
,\s*
that always matches. Therefore+?
can be dropped,\s*?
in this assertion is meaningless, as it always matches, and consumes no input, and not followed by anything,(?=...)
here means that the position is followed by any character (except for "w" "word", etc. as is described above).Further simplifications would remove group captures, which could be required in the context.
So, the simplified regex is (?=((?!\bword[12]\b).))
. It will succeed before any character of the input, except at the beginning of "word1" or "word2" between non-words. The match will be empty, but the first capture group will contain the following character.