Search code examples
pythonregexpattern-matchingnegative-lookbehind

Python pattern negative look behind


I am trying to figure out a regex pattern that matches a file path string in which a file called "cmd.exe" is NOT located in the "System32" folder OR any subfolder of it.

The Pattern should match on this:

C:\Tools\calc.exe

But not on this:

C:\Windows\System32\calc.exe
C:\Windows\System32\De-de\calc.exe

I tried a negative look behind:

(?<![Ss]ystem32)\\calc\.exe
(?<![Ss]ystem32).*\\calc\.exe
(?<![Ss]ystem32[.*])\\calc\.exe

But nothing worked so far. Does anyone see my error?

You can see my example and try it out yourself here: http://rubular.com/r/syAoEn7xxx

Thanks for your help.


Solution

  • To answer the regex aspect of the question, the problem is that re doesn't support lookbehinds of variable length:

    rx = r'(?<!System32.*)calc.exe'
    re.search(rx, r'C:\Tools\calc.exe')
    
    > sre_constants.error: look-behind requires fixed-width pattern
    

    There are two workarounds:

    install and use the newer regex module that does support that (and much, much more):

    rx = r'(?<!System32.*)calc.exe'
    print regex.search(rx, r'C:\Tools\calc.exe')  # <_regex.Match object at 0x1028dd238>
    print regex.search(rx, r'C:\Windows\System32\calc.exe') # None
    

    or rephrase the expression so that it doesn't require a variable lookbehind:

    rx = r'^(?!.*System32).*calc.exe'
    print re.search(rx, r'C:\Tools\calc.exe')  # <_sre.SRE_Match object at 0x10aede238>
    print re.search(rx, r'C:\Windows\System32\calc.exe') # None