I am trying to figure out a regex pattern that matches a file path string in which a file called "cmd.exe" is NOT located in the "System32" folder OR any subfolder of it.
The Pattern should match on this:
C:\Tools\calc.exe
But not on this:
C:\Windows\System32\calc.exe
C:\Windows\System32\De-de\calc.exe
I tried a negative look behind:
(?<![Ss]ystem32)\\calc\.exe
(?<![Ss]ystem32).*\\calc\.exe
(?<![Ss]ystem32[.*])\\calc\.exe
But nothing worked so far. Does anyone see my error?
You can see my example and try it out yourself here: http://rubular.com/r/syAoEn7xxx
Thanks for your help.
To answer the regex aspect of the question, the problem is that re
doesn't support lookbehinds of variable length:
rx = r'(?<!System32.*)calc.exe'
re.search(rx, r'C:\Tools\calc.exe')
> sre_constants.error: look-behind requires fixed-width pattern
There are two workarounds:
install and use the newer regex module that does support that (and much, much more):
rx = r'(?<!System32.*)calc.exe'
print regex.search(rx, r'C:\Tools\calc.exe') # <_regex.Match object at 0x1028dd238>
print regex.search(rx, r'C:\Windows\System32\calc.exe') # None
or rephrase the expression so that it doesn't require a variable lookbehind:
rx = r'^(?!.*System32).*calc.exe'
print re.search(rx, r'C:\Tools\calc.exe') # <_sre.SRE_Match object at 0x10aede238>
print re.search(rx, r'C:\Windows\System32\calc.exe') # None