Search code examples
regexexchange-serverregex-lookarounds

Regex to match on a word but only if it isn't preceded or followed by another specific word


Need a Regex string to work with custom Exchange DLP "Sensitive Information" type.

i.e match on Smith but not if John Smith or Smith John

(?i)(?<!John\s)Smith appears to work for "John Smith" though I'm not convinced it is 100% efficient.

(?i)(Smith.*\s(?!John)) appears to work for "Smith John" but not if followed by a space or new line.

Have tried the following to combine them into one string but it doesn't seem to work at all.

(?i)(?<!John\s)Smith |(?i)(Smith.*\s(?!John))

(?i)(?<!John\s)Smith.*\s(?!John)

What schoolboy error am I making?


Solution

  • The (?i)(?<!John\s)Smith |(?i)(Smith.*\s(?!John)) pattern is matching Smith that does not have John+ 1 whitespace before it, OR a Smith that is followed with any amount of chars followed with a whitespace that is not immediately followed with John. Thus, it matches Smith in a lot of positions.

    The (?i)(?<!John\s)Smith.*\s(?!John) pattern grabs a Smith that is not immediately preceded with John + whitespace, and all text up to the final whitespace that is not immediately followed with John.

    Make sure the \s pattern is inside the lookahead:

    (?i)(?<!John\s)Smith(?!\s+John)
    

    See the regex demo

    Details

    • (?i) - case insensitive inline modifier
    • (?<!John\s) - a location that is not immediately preceded with Hohn and a whitespace char
    • Smith - a literal substring
    • (?!\s+John) - the Smith substring should not be immediately followed with 1+ whitespaces (or if you use \s*, with 0+ whitespaces) and the substring John.