Search code examples
pythonlistgroup-bygroupingpython-itertools

Group tuples inside a list by matching positions of two of its sub-elements


I have a list of tuples as below. The tuple in itself is a nested tuple with 3 sub-elements (tuples) inside it.

[(('a', 'apple'), ('b', 'mango'), ('c', 'grapes')),
 (('a', 'apple'), ('b', 'mango'), ('c', 'grapes')),
 (('e', 'apple'), ('b', 'mango'), ('c', 'grapes')),
 (('a', 'apple'), ('d', 'mango'), ('c', 'peach')),
 (('e', 'apple'), ('d', 'mango'), ('f', 'grapes')),
 (('f', 'grapes'), ('e', 'apple'), ('d', 'mango')),
 (('f', 'peach'), ('e', 'apple'), ('e', 'mango')),
 (('f', 'grapes'), ('c', 'apple'), ('d', 'mango')), 
 (('e', 'apple'), ('f', 'grapes'), ('d', 'mango')),
 (('a', 'apple'), ('c', 'grapes'), ('b', 'mango')),
 ]

I want to group these tuples by matching the positions of two of its elements viz. apple and mango (which is fixed and known beforehand) inside the tuples!

Desired output:

[
# apple and mango at positions 1 and 2.
[(('a', 'apple'), ('b', 'mango'), ('c', 'grapes')),
 (('a', 'apple'), ('b', 'mango'), ('c', 'grapes')),
 (('e', 'apple'), ('b', 'mango'), ('c', 'grapes')),
 (('a', 'apple'), ('d', 'mango'), ('c', 'peach')),
 (('e', 'apple'), ('d', 'mango'), ('f', 'grapes'))],

# apple and mango at positions 2 and 3.
 [(('f', 'grapes'), ('e', 'apple'), ('d', 'mango')),
 (('f', 'peach'), ('e', 'apple'), ('e', 'mango')),
 (('f', 'grapes'), ('c', 'apple'), ('d', 'mango'))], 

# apple and mango at positions 1 and 3.
 [(('e', 'apple'), ('f', 'grapes'), ('d', 'mango')),
 (('a', 'apple'), ('c', 'grapes'), ('b', 'mango'))]
 ]

I tried using Counter and also checked some other examples but couldn't succeed in coming close the desired output. As such, any help or pointers would be really appreciated.


Solution

  • My go-to solution for grouping tasks like this is collections.defaultdict. I've written a lengthy answer about grouping things, which you can read here. Picking out the relevant snippets from that answer gives us this piece of code:

    import collections
    
    groupdict = collections.defaultdict(list)
    for value in your_list_of_tuples:  # input
        group = ???  # group identifier
        groupdict[group].append(value)
    
    result = list(groupdict.values())  # output
    

    Where all that's left is to find a way to uniquely represent each group with a hashable value (that is, we need to fill in the group = ??? line).

    The easiest solution is probably to extract the apple and mango values from the nested tuples and replace all other values with None:

    >>> tup = (('a', 'apple'), ('c', 'grapes'), ('b', 'mango'))
    >>> tuple((t[1] if t[1] in {'apple','mango'} else None) for t in tup)
    ('apple', None, 'mango')
    

    Add that in and we're done:

    import collections
    
    groupdict = collections.defaultdict(list)
    for value in your_list_of_tuples:
        group = tuple((t[1] if t[1] in {'apple','mango'} else None) for t in value)
        groupdict[group].append(value)
    
    result = list(groupdict.values())
    
    # result:
    # [[(('a', 'apple'), ('b', 'mango'), ('c', 'grapes')),
    #   (('a', 'apple'), ('b', 'mango'), ('c', 'grapes')),
    #   (('e', 'apple'), ('b', 'mango'), ('c', 'grapes')),
    #   (('a', 'apple'), ('d', 'mango'), ('c', 'peach')),
    #   (('e', 'apple'), ('d', 'mango'), ('f', 'grapes'))],
    #  [(('f', 'grapes'), ('e', 'apple'), ('d', 'mango')),
    #   (('f', 'peach'), ('e', 'apple'), ('e', 'mango')),
    #   (('f', 'grapes'), ('c', 'apple'), ('d', 'mango'))],
    #  [(('e', 'apple'), ('f', 'grapes'), ('d', 'mango')),
    #   (('a', 'apple'),('c', 'grapes'), ('b', 'mango'))]]