Search code examples
pythonpython-3.xlistdictionaryunique

Group and aggregate a list of dictionaries by multiple keys


I have a list that includes dictionaries (List[Dict, Dict, ...]) , I would like to uniqify the list based on two keys, but I want to retain the value of another key in the dictionary to make sure I do not lose it by making a list in the key I want to retain. I am using Python for the code. If it is of any significance Python 3.x to be exact.

Let's assume I have the following list of dictionaries with three keys: number, favorite, and color. I want to uniqify the list elements using the keys number and favorite. However for the dictionaries that have the same values number and favorite, I'd like to add a list under the key color to make sure I have all the colors for the same combination of number and favorite. This list should also be unique since it shouldn't need the repeated colors for the same combination. However, if there is only one element for the key color in the final result, it should be a string and not a list.

lst = [
{'number': 1, 'favorite': False, 'color': 'red'},
{'number': 1, 'favorite': False, 'color': 'green'},
{'number': 1, 'favorite': False, 'color': 'red'},
{'number': 1, 'favorite': True, 'color': 'red'},
{'number': 2, 'favorite': False, 'color': 'red'}]

Using the aforementioned uniqify, I would get the following result:

lst = [
    {'number': 1, 'favorite': False, 'color': {'red', 'green'}},
    {'number': 1, 'favorite': True, 'color': 'red'},
    {'number': 2, 'favorite': False, 'color': 'red'},
]

Note that there is only one instance of red where the number is 1 and favorite is False even though it appeared twice in the list before it was uniqified. Also note that when there is only one element for the key color in the second dict, it is a string and not a list.


Solution

  • Using pure python, you can do insert into an OrderedDict to retain insertion order:

    from collections import OrderedDict
    
    d = OrderedDict()
    for l in lst:
        d.setdefault((l['number'], l['favorite']), set()).add(l['color'])
    
    [{'number': k[0], 'favorite': k[1], 'color': v.pop() if len(v) == 1 else v} 
        for k, v in d.items()]   
    # [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
    #  {'color': 'red', 'favorite': True, 'number': 1},
    #  {'color': 'red', 'favorite': False, 'number': 2}]
    

    This can also be done quite easily using the pandas GroupBy API:

    import pandas as pd
    
    d = (pd.DataFrame(lst)
           .groupby(['number', 'favorite'])
           .color
           .agg(set)
           .reset_index()
           .to_dict('r'))
    d
    # [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
    #  {'color': {'red'}, 'favorite': True, 'number': 1},
    #  {'color': {'red'}, 'favorite': False, 'number': 2}]
    

    If the condition of a string for a single element is required, you can use

    [{'color': (lambda v: v.pop() if len(v) == 1 else v)(d_.pop('color')), **d_} 
         for d_ in d]
    # [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
    #  {'color': 'red', 'favorite': True, 'number': 1},
    #  {'color': 'red', 'favorite': False, 'number': 2}]