I'm trying to use a regex to find something in a later-occurring groupB that does not exist in the earlier groupA. It's fine if it exists in A, but not B. This seems to imply the necessity of a negative lookbehind if I must use a regex.
Massively simplified example:
Text:
groupA:
tag 789
tag 123
groupB:
Item 1:
... 123
... 456
I'm rather new to lookarounds. This is what immediately came to mind (or one of a dozen variations) but the informed among you will see that it does not work as intended.
regex:(\.\.\. (?<ID>(\d+)))(?<=(?s).*?tag (\k<ID>))
My ideal goal would be to match items in groupB that do not exist in groupA and I cannot reorder the input. Correct example output: (not done by the provided regex)
... 456
.NET supports variable lookback distance, but obviously I'm missing something!
The .NET regex engine parses characters from left to right, so you can't backreference from right to left. By the time you reach group B, all the characters in group A have already been consumed & can't be matched against. While regex may appear to backtrack, it's actually prematching or branching - the parser never runs backwards.
With a forward-running parser, you'd need to match (and retain) items in group A first, then only return items from group B if they weren't present in group A. This is too complex for a deterministic finite automation to calculate, as it can't be done in constant space.
You could solve your problem using regular expressions by reversing your string and running your matches backwards, but the code could get pretty incomprehensible:
321 ...
:1 metI
:Bpuorg
321 gat
987 gat
:Apuorg
"((?<id>\d+)(?=\s\.\.\.))(?!.*\k<id>\sgat)"
Result:
"654"
Instead, I'd suggest keeping it simple with something like this:
var groupA = Regex.Matches(text, @"(?<=tag\s)\d+").Cast<Match>().Select(x => x.Value);
var groupB = Regex.Matches(text, @"(?<=\.\.\.\s)\d+").Cast<Match>().Select(x => x.Value);
var bNotA = groupB.Except(groupA);