Search code examples
pythonregexlookbehindnegative-lookbehind

Regex negative look behind isn't working as expected


In python I used this regex

(?<!\d\d\d)(\s?lt\.?\s?blue)

on this string

ltblue
500lt.blue
4009 lt blue
lt. blue
032 lt red

I expected it to capture this

ltblue
lt. blue

but instead it captured

ltblue
lt. blue
lt blue

From how I wrote it I don't think it should have captured the 'lt blue' after 4009, but for some reason the \s? before 'lt' doesnt seem to work, anyone know how I could change the regex to get the expected output?


Solution

  • Regex will try to match your pattern by all means so if \s is optional, it will try with and without and keep the one matching. In the case of 4009 lt blue it matches if there is no space in the group (the space is before the group, fooling your lookbehind).

    Since lookbehinds must have fixed width in python, you cannot add \s? to your negative lookbehind but you can still handle this case in another one:

    (?<!\d{3})(?<!\d{3}\s)(lt\.?\s?blue)