I have to match certain certain criteria in a phrase, (group of words)(anything in between)(group of words) for example:
(mirror|reflect|serve|adapt)(\s*\w+\s*\W*\s*)*?(population|client|customer|stakeholder|market|society|culture|consumer|end-user)
So anytime in a phrase I have "mirror bananas banannas population" I want to match it. Is this the best solution ? Is it prone to catastrophic backtracking ?
The (\s*\w+\s*\W*\s*)*?
part may lead to catastrophic backtracking since the only obligatory pattern inside the *?
-quantified group is \w+
and it is enclosed with other optional patterns (\s*
and \W*
may match empty strings and note that adjoining *
-quantified patterns like \s*\W*\s*
match match the same chars, which is bad practice leading to catastrophic backtracking).
If you test your regex against mirror banana banannas populatio
you will get the catastrophic backtracking error.
The best regex way in your case, that is, when you read the leading/trailing word lists from a JSON file, is with a regex like
(?:leading_word1|leading_word2|...|leading_wordN)(.*?)(?:trailing_word1|trailing_word2|...|trailing_wordN)
The value you need will be in Group 1, or all values in a list if you use re.findall
(you say you are using Python).