Search code examples
pythonregexlookbehind

Conditional look-behind (python regex), how to exclude certain words but include certain words?


I am having trouble creating a python regex string to retrieve only valid places.

Take example the following paragraph with 4 lines:

Enjoy up to 70% off at New York branches.

Enjoy up to 70% off in Canada.

Not valid at London branches.

Not valid in Germany.

I only want to get the texts of "New York branches" and "Canada", without getting the "London branches" and "Germany".

This works but it got all the locations: ((?<=at ).*(?=\.))|((?<=in ).*(?=\.))

But why this does not work: ((?<!not )((?<=at ).*(?=\.))|((?<!not )((?<=in ).*(?=\.))

Specifically: I want all text after the word 'at' or 'in', and before a full stop. However, I do not want them if there is a 'not valid' infront.


Solution

  • I think the answer provided by hwnd above is the best way to go

    ^(?!Not valid\b).*(?:at|in)(.*)\.$
    

    but to answer your question, what you're trying to accomplish is this

    (?<=(?<!not valid )(?:at|in) ).*(?=\.)
    

    Demo