Search code examples
pythonregexoptimizationpython-refindall

Combine in an efficient way regex python


Setup

I create dynamically a list of regex, namely regex_list. Each regex in the list does for sure at least one match on the text to which is applied. It may happens that some regex in the list are equals.

regex_list = []
for f in foo: # foo is a list of strings e.g. foo = ['foo1', 'foo2', 'foo1', ...]
    # f is a valid expression to be used inside the regex
    regex_list.append(f'[^.]*?{f}[^.]*\.')

regex = re.compile('|'.join(regex_list), flags=re.DOTALL)
result = re.findall(regex, text)

Problem

Since

  1. some regex in regex_list may be equals
  2. regex in regex_list are combined together with the OR operator

for the regex for which exists another copy in the list, only the first match in the text is captured.

Question

A workaround could be to apply each regex individually with a for-loop, but it is very slow.

Is there a good way to combine regex and make them match everything possible?


Solution

  • Casually discovered that applying each regex individually in a for-loop is very slow using the re module, while it's surprisingly faster using the regex module.