Search code examples

How can I group the items in a list according to a "key" in the list elements?

I have some list that consist of a tuples like this

one = [(4, 'a'), (3, 'b'), (2, 'c'), (3, 'd'), (5, 'e'), (6, 'f')]

and i want to group item on list one based on that integer to create new array that has an output like this

final = [(g1, 2, ['c']), (g2, 3, ['b','d']), (g3, 4, ['a']), (g4, 5, ['e']), (g5, 6, ['f'])]

I have no idea in creating the final list. How is python doing that? Any ideas would be appreciated. Thank you.

Note: g1, g2, and so on is just some string with increment.


  • Since you want the output to be sorted, you can sort the original list based on the first element

    >>> first = lambda x: x[0]
    >>> one_sorted = sorted(one, key=first)

    then you can group the elements based on the first elements with itertools.groupby, like this

    groupby(one_sorted, first)

    since you want to assign numbers, in ascending order, to the groups, you can wrap it with enumerate like this

    enumerate(groupby(one_sorted, first), 1)

    then you can unpack the result of enumerate in a for loop, like this

    for index, (item, group) in enumerate(groupby(one_sorted, first), 1)

    now you just have to construct the result list. You can use list comprehension to do that, like this

    >>> from itertools import groupby
    >>> [(index, item, [j[1] for j in group])
    ...     for index, (item, group) in enumerate(groupby(one_sorted, first), 1)]
    [(1, 2, ['c']), (2, 3, ['b', 'd']), (3, 4, ['a']), (4, 5, ['e']), (5, 6, ['f'])]

    [j[1] for j in group] actually iterates the grouped items and fetches the second item, which is the actual string.

    Alternatively, you can group the elements in a dictionary, like this

    >>> groups = {}
    >>> for number, string in one:
    ...     groups.setdefault(number, []).append(string)
    >>> groups
    {2: ['c'], 3: ['b', 'd'], 4: ['a'], 5: ['e'], 6: ['f']}

    and then return apply the enumerate on the sorted dictionary, like this

    >>> [(index, number, groups[number])
    ...     for index, number in enumerate(sorted(groups), 1)]
    [(1, 2, ['c']), (2, 3, ['b', 'd']), (3, 4, ['a']), (4, 5, ['e']), (5, 6, ['f'])]