Search code examples
pythonpandasdataframeweighted-average

Weighted average grouping by date (in index) in pandas DataFrame (different operations per column)


I have a dataframe with several rows per day, a 'mass' column and a '%' value that needs to be ponderated as a weighted average depending on the mass; and the mass column a sum... creating a new dataframe with all values.

d = {'date': [1, 1, 1, 2, 2], 'mass': [3, 40, 10, 12, 15], '%': [0.4, 0.7, 0.9, 0.1, 0.2]}
df = pd.DataFrame(data=d)
df.set_index('date')

I need the output to be like this:

date(index) | mass | %
1           |  53  | 0.72
2           |  27  | 0.46

Being '%' calculated as a weighted average:

0.72 = ((0.4 * 3) + (0.7 * 40) + (0.9 * 10))/(3 + 40 +10)

And the mass a sum per day.


Solution

  • Multiply the 2 columns and then groupby with aggregate, then divide:

    #df = df.set_index('date')
    out = df.assign(k=df['mass'].mul(df['%']))[['mass','k']].sum(level=0)
    
    out['%'] = out.pop('k').div(out['mass'])
    

    print(out)
    
          mass         %
    date                
    1       53  0.720755
    2       27  0.155556  #<- Note that ((12*.1)+(15*.2))/(15+12) != 0.46