Search code examples
regexregex-grouppython-re

regex groups with uneven number of groups


I'm not good with terms used in regex, so picking a suitable title for this question is some how difficult for me, so feel free the suggest a good title.

but anyway I have this txt and regex expression

import re
txt = """
%power {s}
shop %power {w}
%electricity {t}
""".replace('\n',' ')
x = re.findall("((\%?\w+\s)*)\{([^\}]*)\}",txt)

the result is

[('%power ', '%power ', 's'), ('shop %power ', '%power ', 'w'), ('%electricity ', '%electricity ', 't')] but I was intended to get

[('%power ', 's'), ('shop ', '%power ', 'w'), ('%electricity ', 't')] so how can I achieve the desired?


Solution

  • You need to pip install regex and then use

    import regex
    txt = """
    %power {s}
    shop %power ow {w}
    %electricity {t}
    """.replace('\n',' ')
    x = regex.finditer(r"(%?\w+\s)*\{([^{}]*)}", txt)
    z = [tuple(y.captures(1) + y.captures(2)) for y in x] 
    print(z)
    

    See the Python demo.

    Output:

    [('%power ', 's'), ('shop ', '%power ', 'ow ', 'w'), ('%electricity ', 't')]
    

    NOTE on regex.finditer usage

    The regex.finditer method returns an iterable, not a list. It has an implication that you cannot re-use the x inside a list comprehension. In order to re-use the contents of x, either convert it to a list (list(x)), or use the approach above, use it only once to get the necessary output structure, and do whatever you need with the result.