Search code examples
pythonpandasdata-cleaning

How can I fill null values with a mean using Pandas?


Having a hard time understanding why the apply function isn't working here. I'm trying to fill the null values for SalePrice with the mean sales price of their corresponding quality ratings (OverallQual)

I expected the function to itterate through each row and return the mean SalePrice for the coresponding OverallQual feature where SalePrice is a null, else return the original SalePrice.

sale_price_by_qual = df.groupby('OverallQual').mean()['SalePrice']

def fill_sales_price(SalePrice, OverallQual):
   if np.isnan(SalePrice):
      return sale_price_by_qual[SalePrice]
   else:
      return SalePrice

df[SalePrice] = df.apply(lambda x: fill_sales_price(x['SalePrice], x['OverallQaul]), axis=1)
  

KeyError: nan


Solution

  • Try this,

    def fill_sales_price(SalePrice, OverallQual):
      if np.isnan(SalePrice):
         return sale_price_by_qual[OverallQual]
      else:
         return SalePrice
    
    df['SalePrice'] = df.apply(lambda x: fill_sales_price(x['SalePrice'], x['OverallQual']), axis=1)