Search code examples
pythonregexregex-lookaroundsnegative-lookbehind

How can I use use a regex to match characters that aren't included in certain words?


Suppose I want to return all occurrences of 'lep' in a string in Python, but not if an occurrence is in a substring like 'filepath' or 'telephone'. Right now I am using a combination of negative lookahead/lookbehind:

(?<!te|fi)lep(?!hone|ath)

However, I do want 'telepath' and 'filephone' as well as 'filep' and 'telep'. I've seen similar questions but not one that addresses this type of combination of lookahead/behind.

Thanks!


Solution

  • You can place lookaheads inside lookbehinds (and vice-versa; any combination, really, so long as every lookbehind has a fixed length). That allows you to combine the two conditions into one (doesn't begin with X and end with Y):

    lep(?<!telep(?=hone))(?<!filep(?=ath))
    

    Putting the lookbehinds last is more efficient, too. I would advise doing it that way even if there's no suffix (for example, lep(?<!filep) to exclude filep).

    However, generating the regexes from user input like lep -telephone -filepath promises to be finicky and tedious. If you can, it would be much easier to search for the unwanted terms first and eliminate them. For example, search for:

    (?:telephone|filepath|(lep))
    

    If the search succeeds and group(1) is not None, it's a hit.