.net regex syntactic-sugar regex-lookarounds negative-lookbehind

Unifying lookahead and lookbehind into a single regex operator

I am making a simplified/sugary wrapper for regex which cuts out many of the more complicated regex functions (whilst still keeping the essentials for 99% of uses), and which also tries to tidy up the syntax a little.

In regards to negative lookahead/lookbehind, I found it confusing why they can't be combined into a single function. To clarify what I mean, let me demonstrate with an example:

I know you use negative lookbehind if you don't want to match the "mo" if it's preceded by "giz". So the expression (?<!giz)mo will handle that.

And I know you use negative lookahead if you don't want to match the "giz" part if it's followed by "mo". So the expression giz(?!mo) will handle that.

What I DON'T know is why regex can't figure this out for itself. In theory, I shouldn't need to specify whether it's ahead or behind - it should just look at the disallowed bit, and exclude any expression which contains that.

To further clarify, and maybe prove my point, I might get my sugary wrapper to interpret my own custom-purpose symbols - ⊄ and ⊅ - like this:

...Replace this: giz⊄mo⊅ with this: giz(?!mo)(?<!mo)

...and replace this: ⊄giz⊅mo with this: (?!giz)(?<!giz)mo

As you can see, in both instances, it's using both lookahead and lookbehind, so the user doesn't have to decide which one to use. You may say the user is being lazy, but then I can just say back Regex is being lazy for not doing this behind the scenes.

To restate the question in yet another way, what practical things can you do with (?!xyz) and/or (?<!xyz) that you can't do with the single: (?!xyz)(?<!xyz)? Why does Regex need two operators to apparently perform the functionality of essentially one?

I'm using .NET so lookbehind has full versatility.

Am I missing something?

Solution

In theory, it's easy enough to say: "Well, just get the program to automatically decide the direction based on the position of any adjacent literals", so (?<!xyz)house or .*(?<!xyz)house and house(?!xyz) or house(?!xyz).* would all make sense. The rule would be "If the literal is to the left, use the lookahead operator, whilst if it's to the right, then use the lookbehind operator.". If both sides were literal, then the expression would be worthless anyway. This holds most of the time (though as pointed out by hvd, it won't work if the number of chars in xyz overlaps a char in the adjacent text which ISN'T a literal - e.g: the asterisk in (?!xyz)xy*z).

But further problems come when both sides are not a literal.

For example, try the regex: the ..(?!u).. house against the text "the blue house". Obviously, ?! would act differently to ?<! here, and either option may be desired.