Search code examples
pythonregexregex-alternation

Matching multiple regex patterns with the alternation operator?


I ran into a small problem using Python Regex.

Suppose this is the input:

(zyx)bc

What I'm trying to achieve is obtain whatever is between parentheses as a single match, and any char outside as an individual match. The desired result would be along the lines of:

['zyx','b','c']

The order of matches should be kept.

I've tried obtaining this with Python 3.3, but can't seem to figure out the correct Regex. So far I have:

matches = findall(r'\((.*?)\)|\w', '(zyx)bc')

print(matches) yields the following:

['zyx','','']

Any ideas what I'm doing wrong?


Solution

  • From the documentation of re.findall:

    If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

    While your regexp is matching the string three times, the (.*?) group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:

    >>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
    [('zyx', ''), ('', 'b'), ('', 'c')]
    

    Alternatively, you could remove all the groups to get a simple list of strings again:

    >>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
    ['(zyx)', 'b', 'c']
    

    You would need to manually remove the parentheses though.