Search code examples
pythonpandasdataframeaverageweighted-average

Calculate the weighted average using groupby in Python


here is the dataframe I'm currently working on :

df_weight_0

enter image description here

What I'd like to calculate is the average of the variable "avg_lag" weighted by "tot_SKU" in each product_basket for both SMB and CORP groups. This means that, taking CORP as an example, I want to calculate something as:

(585,134 * 46.09 + 147,398 * 104.55 + ... + 1,112,941 * 75.73) / (585,134 + 147,398 + ... + 1,112,941)

and the same will need to be done for SMB.

Ideally, I'd like to have this two numbers in the dataframe above in a new column (SMB and CORP rows will have their weighted average repeated according to the two values calculated as shown above).

P.S. I will go deeper on more levels in my analysis so the most general the approach, the better.

Thanks in advance,

Stefano


Solution

  • So this should do the trick I think

    import pandas as pd
    
    
    def calculator(df, columns):
        weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum()
        return weighted_sum
    
    cols = ['tot_SKU', 'avg_lag']
    
    Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols))
    df.join(Sums.rename(('sums')), on='SF_type')
    

    Edit: Added the requested merge with the old dataframe