I have the following patter:
(?!.*(land|building).*)(\b\w*area( |_)?code\w*\b).*
The goal for this pattern is to match all rows with area code but exclude when the row has land or building. For example, given the below dataset, only the 4 top ones would have anything matching it, all the others would be ignored:
area code
area_code
state_area_code
employee_area_code
land area code
land_area code
land area_code
land_area_code
area code land
area code_land
area_code land
area_code_land
building area_code
area_code building
Unfortunately, the ones with spaces before the word "area" are being also selected. I'm not able to figure out why it is not working on these cases and how to change the regex to ignore these 3 while still working for the correct ones:
land area code
land area_code
building area_code
I tried changing the look ahead by look behind, using a combination of patterns and replacing \b
and \w
by .*
or adding \s
and there is always some mismatches.
Add a start anchor ^
:
^(?!.*(land|building).*)(\b\w*area( |_)?code\w*\b).*
See live demo.
Without ^
the match can begin part-way through, at a point after the parts you want to exclude.