How can you get overlapping matches in regex?

If I run

it returns:

[('groupone|grouptwo', 'groupone', '|', 'grouptwo'), ('groupthree|groupfour', 'groupthree', '|', 'groupfour')]

This is not my desired result. I would also like grouptwo and groupthree to be matched, like this:

What do I need to correct about my regex to make this possible?

Solution

You could use the third-party regex module for this. Unlike the standard library re, it supports overlapping matches.

import regex

regex.findall(r"(\b([a-zA-Z]+\b)(&|\|)(\b[a-zA-Z]+)\b)", "groupone|grouptwo|groupthree|groupfour", overlapped=True)

[('groupone|grouptwo', 'groupone', '|', 'grouptwo'),
 ('grouptwo|groupthree', 'grouptwo', '|', 'groupthree'),
 ('groupthree|groupfour', 'groupthree', '|', 'groupfour')]

N.B. please note the addition of word boundaries (\b) in the pattern. If you were to keep your original pattern, you would get a bunch of unwanted matches as well using this method.