Search code examples
pythonpython-2.7python-collections

What is the most efficient way to sum a dict with multiple keys by one key?


I have the following dict structure.

product1 = {'product_tmpl_id': product_id,
'qty':product_uom_qty,
'price':price_unit,
'subtotal':price_subtotal,
'total':price_total,
}

And then a list of products, each item in the list is a dict with the above structure

list_ = [product1,product2,product3,.....]

I need to sum the item in the list, group by the key product_tmpl_id ... I'm using dictcollections but it only sum the qty key, I need to sum key except the product_tmpl_id which is the criteria to group by

c = defaultdict(float)
for d in list_:
    c[d['product_tmpl_id']] += d['qty']
c = [{'product_id': id, 'qty': qty} for id, qty in c.items()]

I know how to do it with a for iteration but trying to look for a more pythonic way

thanks

EDIT:

What is need is to pass from this:

lst = [
{'Name': 'A', 'qty':100,'price':10},
{'Name': 'A', 'qty':100,'price':10},
{'Name': 'A', 'qty':100,'price':10},
{'Name': 'B', 'qty':100,'price':10},
{'Name': 'C', 'qty':100,'price':10},
{'Name': 'C', 'qty':100,'price':10},
]

to this

group_lst = [
{'Name': 'A', 'qty':300,'price':30},
{'Name': 'B', 'qty':100,'price':10},
{'Name': 'C', 'qty':200,'price':20},
]

Solution

  • Using basic Python, this doesn't get a whole lot better. You could hack something together with itertools.groupby, but it'd be ugly and probably slower, certainly less clear.

    As @9769953 suggested, though, Pandas is a good package to handle this sort of structured, tabular data.

    In [1]: import pandas as pd
    In [2]: df = pd.DataFrame(lst)
    Out[2]:
      Name  price  qty
    0    A     10  100
    1    A     10  100
    2    A     10  100
    3    B     10  100
    4    C     10  100
    5    C     10  100
    In [3]: df.groupby('Name').agg(sum)
    Out[3]:
          price  qty
    Name
    A        30  300
    B        10  100
    C        20  200
    

    You just need a little extra mojo if you don't want to keep the data as a dataframe:

    In [4]: grouped = df.groupby('Name', as_index=False).agg(sum)
    In [5]: list(grouped.T.to_dict().values())
    Out[5]:
    [{'Name': 'A', 'price': 30, 'qty': 300},
     {'Name': 'B', 'price': 10, 'qty': 100},
     {'Name': 'C', 'price': 20, 'qty': 200}]