Search code examples
pythondictionarypandaslambdasum

Python Pandas: How can I sum all of the values of a dictionary in a column of my dataframe?


Here is my dataframe:

    name                            count_dic
0  name1  {'x2,bv.': 435, 'x3': 4, 'x1': 123}
1  name2            {'x5': 98, 'x2,bv.': 435}

and I want to sum up all of the value of the dic in 'count_dic' column to have something like this:

    name                            count_dic   sum_vals
0  name1  {'x2,bv.': 435, 'x3': 4, 'x1': 123}    562
1  name2            {'x5': 98, 'x2,bv.': 435}    533

Here is what I have tried:

df_map.count_dic.apply(lambda L: sum(L.values())).sum()

But I am getting the following error:

TypeError: unsupported operand type(s) for +: 'dict_values' and 'dict_values'

Can anybody help?


Solution

  • Note: Your dataframe structure looks a bit odd to me, and probably will perform quite suboptimally if the dataset gets big.


    In any case your code appears well-formed [Tested on python 2.7.8 and 3.4.1]

     df = pd.DataFrame(columns = ['name','count_dic'])
     df.loc[0] = ['name0',{'x2,bv.': 435, 'x3': 4, 'x1': 123}]
     df.loc[1] = ['name1',{'x5': 98, 'x2,bv.': 435}]
    
     df.count_dic.apply(lambda x : sum(x.values())).sum()
    
           1095
    

    and if you want the values by row

     df.count_dic.apply(lambda x : sum(x.values()))
    
            0    562
            1    533
        Name: count_dic, dtype: int64
    

    The use had a further specific problem related to the type of the variables involved which were not int and an explicit cast was needed.

     df.count_dic.apply(lambda x : sum([int(y) for y in x.values())])