python regex python-3.x pattern-matching python-itertools

How to let groupby() group doubles with a condition?

Example:

import regex
import itertools

m = "90.80.19 90.43.19 908019 92.11.15 90.80.19 930000"
reg = regex.compile("\d\d\.?\d\d\.?\d\d")
[list(g) for k, g in itertools.groupby(sorted(reg.findall(m)))]

Output: [['90.43.19'], ['90.80.19', '90.80.19'], ['908019'], ['92.11.15'], ['930000']]

groupby() groups doubles: only the double 90.80.19 has been grouped.

What I want to do is to group by above regex: The \.? is optional in above regex.

Expected output: [['90.43.19'], ['90.80.19', '90.80.19', '908019'], ['92.11.15'], ['930000']]

Is it possible to let groupby() group with a condition?

Solution

Use a custom key function for itertools.groupby(iterable, key=None) as shown below (the initial input string was extended):

import re, itertools

s = "90.80.19 90.43.19 908019 92.11.15 90.80.19 930000 921115"
matches = re.findall(r'\d\d\.?\d\d\.?\d\d', s)
result = [ list(g) for k,g in itertools.groupby(sorted(matches),
                                                key=lambda x: x.replace('.', '') or x) ]

print(result)

The output:

[['90.43.19'], ['90.80.19', '90.80.19', '908019'], ['92.11.15', '921115'], ['930000']]