regular expression python index:count

I have list of values as string "index:count" I want to extract the index and count in the string as in the below code:

          string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
          values=[v for v in re.findall('.+?:.+?.', string)]
          for g in values:
              index=g[:g.index(":")]
              count=g[g.index(":")+1:]
              print(int(index)+" "+str(count))

But I got error message

ValueError: invalid literal for int() with base 10: '2 1550'

it seems I wrote the regular expression operations wrongly. any idea how to fix this?

Solution

I think you won't need the ? lazy modifier at the end of the regex pattern. The ? lazy modifier you put there can actually produce more noise than capturing the right data

EDIT NOTE: the pattern .+:.+ I introduced in previous edits was a wrong or even a bad regex pattern to capture the desired pattern. Please use the \d+:\d+ pattern instead. However, I leave it be because it still can solve the OP's problem using another workaround.

As long as your data is not malformed or contain noise and is neatly separated with a whitespace, I think '.+:.+' is sufficient to find your index:count format. Probably the best way is to use \d+:\d+ since you know it is at least one digit separated by a : and followed by another digit.

Here are good links regexr and regex101 to better design/visualize your regex pattern.

If you use the .+:.+ pattern, it will return you the string as a whole since it matches the string as a whole. You need to preprocess the result since re.findall returns a list, in this example, it returns only 1 element.

In [  ]: string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
    ...: values=[v for v in re.findall('.+:.+', string)]
    ...: print(values)
['358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186']

Since it returns a list with only one element, you can use pop() to take the only str element out and print it nicely with str function split().

In [  ]: print(values.pop().split())
['358:6', '1260:2', '1533:7', '1548:292', '1550:48', '1561:3', '1564:186']

If you are using \d+:\d+ pattern, it will directly return you a nicely separated list since it correctly finds them. Therefore, you can directly print its value.

In [  ]: string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
    ...: values=[v for v in re.findall('\d+:\d+', string)]
    ...: print(values)
['358:6', '1260:2', '1533:7', '1548:292', '1550:48', '1561:3', '1564:186']

Finally, you can print the result nicely with built-in string formatting. Disclaimer: I do not own that website, I just found it useful for beginner me :)

In [  ]: for s in values:
    ...:     index, count = s.split(":")
    ...:     print("Index: {:>8} Count: {:>8}".format(index, count))
    ...:     
Index:      358 Count:        6
Index:     1260 Count:        2
Index:     1533 Count:        7
Index:     1548 Count:      292
Index:     1550 Count:       48
Index:     1561 Count:        3
Index:     1564 Count:      186