Search code examples
pythonpython-3.xlistdictionaryunique

Group and aggregate a list of 2n dictionaries by multiple keys


I have this list of dicts (just a sample the list is bigger):

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '100'], [1671260400, '100']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '300'], [1671260400, '300']]}]

I want to sum all values for each account,email_domain by version, and update my_list accordingly. Desired output:

my_list = [
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'}, 
 'values': [[1671256800, '200'], [1671260400, '200']]},
{'metric': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'}, 
 'values': [[1671256800, '600'], [1671260400, '600']]}]

Notes:

  • In 'values': [[1671256800, '600'], [1671260400, '600']] the first value of each array is timestamp (1671256800,1671260400).
  • I went through a lot of threads on this site before posting this question. For this use case I could not find a correct syntax for a list of 2n dictionaries. Your help is much appreciated!

I tried following group-and-aggregate-a-list-of-dictionaries-by-multiple-keys.

I started:

d = (pd.DataFrame(my_list)).groupby(['metric']['ebs_account'], ['metric']['version']).values.

Solution

  • You can use defaultdict with frozenset

    from collections import defaultdict
    
    
    group_dict = defaultdict(dict)
    
    for record in my_list:
        key = frozenset(record['metric'].items())
        for x, y in record['values']:
            group_dict[key][x] = group_dict[key].setdefault(x, 0) + int(y)
            
    res = [{'metrics': dict(k), 'values': [[k, str(vv)] for k, vv in v.items()]} for k, v in group_dict.items()]
    
    print(res)
    

    Output:

    [{'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'a'},
      'values': [[1671256800, '200'], [1671260400, '200']]},
     {'metrics': {'account': '1', 'email_domain': 'gmail.com', 'version': 'b'},
      'values': [[1671256800, '600'], [1671260400, '600']]}]