Search code examples
pythonpandasaverageweighted-average

Get surface weighted average of multiple columns in pandas dataframe


I want to take the surface-weighted average of the columns in my dataframe. I have two surface-columns and two U-value-columns. I want to create an extra column 'U_av' (surface-weighted-average U-value) and U_av = (A1*U1 + A2*U2) / (A1+A2). If NaN occurs in one of the columns, NaN should be returned.

Initial df:

   ID  A1   A2    U1  U2
0  14   2  1.0  10.0  11
1  16   2  2.0  12.0  12
2  18   2  3.0  24.0  13
3  20   2  NaN   8.0  14
4  22   4  5.0  84.0  15
5  24   4  6.0  84.0  16

Desired Output:

   ID  A1   A2    U1  U2   U_av
0  14   2  1.0  10.0  11  10.33
1  16   2  2.0  12.0  12     12
2  18   2  3.0  24.0  13   17.4
3  20   2  NaN   8.0  14    NaN
4  22   4  5.0  84.0  15  45.66
5  24   4  6.0  84.0  16   43.2

Code:

import numpy as np

import pandas as pd



df = pd.DataFrame({"ID": [14,16,18,20,22,24],
                   "A1": [2,2,2,2,4,4],
  
                   "U1": [10,12,24,8,84,84],
   
                   "A2": [1,2,3,np.nan,5,6],
   
                   "U2": [11,12,13,14,15,16]})



print(df)

#the mean of two columns U1 and U2 and dropping NaN is easy (U1+U2/2 in this case) 
#but what to do for the surface-weighted mean (U_av = (A1*U1 + A2*U2) / (A1+A2))?
df.loc[:,'Umean'] = df[['U1','U2']].dropna().mean(axis=1)



EDIT:
adding to the solutions below:
df["U_av"] = (df.A1.mul(df.U1) + df.A2.mul(df.U2)).div(df[['A1','A2']].sum(axis=1))



Solution

  • Hope I got you correct:

    df['U_av'] = (df['A1']*df['U1'] + df['A2']*df['U2']) / (df['A1']+df['A2'])
    df
    
        ID  A1  U1  A2  U2  U_av
    0   14  2   10  1.0 11  10.333333
    1   16  2   12  2.0 12  12.000000
    2   18  2   24  3.0 13  17.400000
    3   20  2   8   NaN 14  NaN
    4   22  4   84  5.0 15  45.666667
    5   24  4   84  6.0 16  43.200000