I ran into a small problem using Python Regex.
Suppose this is the input:
(zyx)bc
What I'm trying to achieve is obtain whatever is between parentheses as a single match, and any char outside as an individual match. The desired result would be along the lines of:
['zyx','b','c']
The order of matches should be kept.
I've tried obtaining this with Python 3.3, but can't seem to figure out the correct Regex. So far I have:
matches = findall(r'\((.*?)\)|\w', '(zyx)bc')
print(matches)
yields the following:
['zyx','','']
Any ideas what I'm doing wrong?
From the documentation of re.findall
:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
While your regexp is matching the string three times, the (.*?)
group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:
>>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
[('zyx', ''), ('', 'b'), ('', 'c')]
Alternatively, you could remove all the groups to get a simple list of strings again:
>>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']
You would need to manually remove the parentheses though.