Search code examples
pythonpandasmultiple-columnscalculated-columnsweighted-average

Weighted Mean as a Column in Pandas


I am trying to add a column with the weighted average of 4 columns with 4 columns of weights

df = pd.DataFrame.from_dict(dict([('A', [2000, 1000, 2509, 2145]),
                                  ('A_Weight', [37, 47, 33, 16]),
                                  ('B', [2100, 1500, 2000, 1600]),
                                  ('B_weights', [17, 21, 6, 2]),
                                  ('C', [2500, 1400, 0, 2300]),
                                  ('C_weights', [5, 35, 0, 40]),
                                  ('D', [0, 1600, 2100, 2000]),
                                  ('D_weights', [0, 32, 10, 5])]))

I want the weighted average to be in a new column named "WA" but every time I try it displays NaN

Desired Dataframe would be a new column with the following values as ex:

Formula I used (((A * A_weight)+(B * b_weight)+(C * C_weight)+(D * D_weight)) / sum(all weights)

df['WA'] = [2071.19,1323.70, 2363.20,2214.60 ]

Thank you


Solution

  • A straight-forward and simple way to do is as follows:

    (Since your columns name for the weights are not consistently named, e.g. some with 's' and some without, some with capital 'W' and some with lower case 'w', it is not convenient to group columns e.g. by .filter())

    df['WA'] = ( (df['A'] * df['A_Weight']) + (df['B'] * df['B_weights']) + (df['C'] * df['C_weights']) + (df['D'] * df['D_weights']) ) / (df['A_Weight'] + df['B_weights'] + df['C_weights'] + df['D_weights'])
    

    Result:

    print(df)
    
    
          A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
    0  2000        37  2100         17  2500          5     0          0  2071.186441
    1  1000        47  1500         21  1400         35  1600         32  1323.703704
    2  2509        33  2000          6     0          0  2100         10  2363.204082
    3  2145        16  1600          2  2300         40  2000          5  2214.603175