Need a Regex string to work with custom Exchange DLP "Sensitive Information" type.
i.e match on Smith but not if John Smith or Smith John
(?i)(?<!John\s)Smith
appears to work for "John Smith" though I'm not convinced it is 100% efficient.
(?i)(Smith.*\s(?!John))
appears to work for "Smith John" but not if followed by a space or new line.
Have tried the following to combine them into one string but it doesn't seem to work at all.
(?i)(?<!John\s)Smith |(?i)(Smith.*\s(?!John))
(?i)(?<!John\s)Smith.*\s(?!John)
What schoolboy error am I making?
The (?i)(?<!John\s)Smith |(?i)(Smith.*\s(?!John))
pattern is matching Smith
that does not have John
+ 1 whitespace before it, OR a Smith
that is followed with any amount of chars followed with a whitespace that is not immediately followed with John
. Thus, it matches Smith
in a lot of positions.
The (?i)(?<!John\s)Smith.*\s(?!John)
pattern grabs a Smith
that is not immediately preceded with John
+ whitespace, and all text up to the final whitespace that is not immediately followed with John
.
Make sure the \s
pattern is inside the lookahead:
(?i)(?<!John\s)Smith(?!\s+John)
See the regex demo
Details
(?i)
- case insensitive inline modifier(?<!John\s)
- a location that is not immediately preceded with Hohn
and a whitespace charSmith
- a literal substring(?!\s+John)
- the Smith
substring should not be immediately followed with 1+ whitespaces (or if you use \s*
, with 0+ whitespaces) and the substring John
.