Search code examples
pandasmissing-dataweighted-average

Weighted Average - omit data if missing from value or weight


I have code like this

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> df1 = pd.DataFrame({'value':[10,20,np.nan,40],
...                         'weight':[1,np.nan,3,4]}) 
>>> df1
   value  weight
0   10.0     1.0
1   20.0     NaN
2    NaN     3.0
3   40.0     4.0
>>> (df1["value"] * df1["weight"]).sum() / df1["weight"].sum()
21.25

I want to omit data from calculation if values or weight is missing . i.e I want weighted average like like (10*1 + 40*4) /(1+4) = 34

Please help if this is possible using single expression in pandas.


Solution

  • You can filter first with boolean indexing, mask is created by notnull and all for check all True values per row:

    df1 = df1[df1.notnull().all(axis=1)]
    print (df1)
       value  weight
    0   10.0     1.0
    3   40.0     4.0
    
    df2 = (df1["value"] * df1["weight"]).sum() / df1["weight"].sum()
    print (df2)
    34.0
    

    Or check both columns separately:

    df1 = df1[df1["value"].notnull() & df1["weight"].notnull()]
    print (df1)
       value  weight
    0   10.0     1.0
    3   40.0     4.0
    

    Simplier solution with dropna:

    df1 = df1.dropna()
    print (df1)
       value  weight
    0   10.0     1.0
    3   40.0     4.0
    

    Or if is necessary specify columns:

    df1 = df1.dropna(subset=['value','weight'])
    print (df1)
       value  weight
    0   10.0     1.0
    3   40.0     4.0