Search code examples
pythonlistgroupingpython-itertools

Grouping list of tuples in python


I have a list that consist of tuples and I already sorted this list based on 2nd item. Then I want to make my list grouped based on the 2nd item, and put 1st item into list.

This is my input:

[('aaa', 1), ('bbb', 1), ('ccc', 2), ('ddd', 2), ('eee', 3)]

and what I need is this:

[(g1, 1, ['aaa', 'bbb']), (g2, 2, ['ccc', 'ddd']), (g3, 1, ['eee'])]

Each tuple, 1st item is an id (increment). The second is how many item which grouped by its grouping, and 3rd item is list of grouped tuple. How this input could be implemented in python? Already trying with itertools, still get nothing. Any help would be appreciated.


Solution

  • One way would be to do it in steps:

    >>> grouped = enumerate(groupby(seq, key=lambda x: x[1]), 1)
    >>> extracted = ((i, [g[0] for g in gg]) for i, (k,gg) in grouped)
    >>> final = [(i, len(x), x) for i,x in extracted]
    >>> final
    [(1, 2, ['aaa', 'bbb']), (2, 2, ['ccc', 'ddd']), (3, 1, ['eee'])]
    

    But even though each line makes sense on its own, I think it's hard to see what it's actually doing. Using a generator function makes everything much clearer:

    def grouper(elems):
        grouped = groupby(elems, key=lambda x: x[1])
        for i, (k, group) in enumerate(grouped, 1):
            vals = [g[0] for g in group]
            yield i, len(vals), vals
    
    >> list(grouper(seq))
    [(1, 2, ['aaa', 'bbb']), (2, 2, ['ccc', 'ddd']), (3, 1, ['eee'])]
    

    (Here I've arbitrarily used an index starting at one for your g1/g2/g3; it'd be easy to replace it with yield 'g{}'.format(i) or something.)