Search code examples
pythonregexpython-3.xpattern-matchingpython-itertools

How to let groupby() group doubles with a condition?


Example:

import regex
import itertools

m = "90.80.19 90.43.19 908019 92.11.15 90.80.19 930000"
reg = regex.compile("\d\d\.?\d\d\.?\d\d")
[list(g) for k, g in itertools.groupby(sorted(reg.findall(m)))]

Output: [['90.43.19'], ['90.80.19', '90.80.19'], ['908019'], ['92.11.15'], ['930000']]

groupby() groups doubles: only the double 90.80.19 has been grouped.

What I want to do is to group by above regex: The \.? is optional in above regex.

Expected output: [['90.43.19'], ['90.80.19', '90.80.19', '908019'], ['92.11.15'], ['930000']]

Is it possible to let groupby() group with a condition?


Solution

  • Use a custom key function for itertools.groupby(iterable, key=None) as shown below (the initial input string was extended):

    import re, itertools
    
    s = "90.80.19 90.43.19 908019 92.11.15 90.80.19 930000 921115"
    matches = re.findall(r'\d\d\.?\d\d\.?\d\d', s)
    result = [ list(g) for k,g in itertools.groupby(sorted(matches),
                                                    key=lambda x: x.replace('.', '') or x) ]
    
    print(result)
    

    The output:

    [['90.43.19'], ['90.80.19', '90.80.19', '908019'], ['92.11.15', '921115'], ['930000']]