How to read an optional tuple of integers using a regex with python?

I try to implement a regex to read lines such as :

*     DCH  :   0.80000000                             *
*      PYR  : 100.00000000                            *
*    Bond (  1,   0)  :   0.80000000                  *
*     Angle (  1,   0,   2)  : 100.00000000           *

To that end, I wrote the following regex. It works, but I would like to have some feedback about the way to get the integer numbers in parenthesis. On the lines 3 and 4 above, the part with the integers between parenthesis (a kind of tuple of integers) is optional.

I have to define several groups to be able to define that tuple of integer as optional and to manage the fact that that tuple may contain 2, 3 or 4 integers.

In [64]: coord_patt = re.compile(r"\s+(\w+)\s+(\(((\s*\d+),?){2,4}\))?\s+:\s+(\d+.\d+)")

In [65]: line2 = "*     Angle (  1,   0,   2)  : 100.00000000           *"

In [66]: m = coord_patt.search(line2)

In [67]: m.groups()
Out[67]: ('Angle', '(  1,   0,   2)', '   2', '   2', '100.00000000')

Another example :

In [68]: line = "         *                 Bond (  1,   0)  :   0.80000000           *"

In [69]: m = coord_patt.search(line)
    
In [71]: m.groups()
Out[71]: ('Bond', '(  1,   0)', '   0', '   0', '0.80000000')

As you can see it works, but I do not understand why, in the groups, I got only the last integer and not the each integer separately ? Is there a way to get that integers individually or to avoid to define all that groups and catch only the group 2 which is a string of the tuple which can be easily read otherwise.

Solution

As indicated in Capturing repeating subpatterns in Python regex, the re module doesn't support repeated captures, but regex does.

Here are two solutions, one based on regex, the other on re and the safe evaluation of the tuple when one is encountered.

Setup

txt = r"""*     DCH  :   0.80000000                             *
*      PYR  : 100.00000000                            *
*    Bond (  1,   0)  :   0.80000000                  *
*     Angle (  1,   0,   2)  : 100.00000000           *
"""

Using `regex`

import regex

p = regex.compile(r'\s+(\w+)\s+(?:\((?:\s*(\d+),?){2,4}\))?\s+:\s+(\d+.\d+)')

for s in txt.splitlines():
    if m := p.search(s):
        name = m.group(1)
        tup = tuple(int(k) for k in m.captures(2) if k.isnumeric())
        val = float(m.group(3))
        print(f'{name!r}\t{tup!r}\t{val!r}')

Prints:

'DCH'   ()  0.8
'PYR'   ()  100.0
'Bond'  (1, 0)  0.8
'Angle' (1, 0, 2)   100.0

Using `re`

import re
import ast

p = re.compile(r'\s+(\w+)\s+(\((?:\s*\d+,?){2,4}\))?\s+:\s+(\d+.\d+)')

for s in txt.splitlines():
    if m := p.search(s):
        name, tup, val = m.groups()
        tup = ast.literal_eval(tup) if tup is not None else ()
        val = float(val)
        print(f'{name!r}\t{tup!r}\t{val!r}')

Prints:

'DCH'   ()  0.8
'PYR'   ()  100.0
'Bond'  (1, 0)  0.8
'Angle' (1, 0, 2)   100.0

How to read an optional tuple of integers using a regex with python?

Setup

Using regex

Using re

Using `regex`

Using `re`