Search code examples
python-3.xpandaslambdamissing-datacomparison-operators

Comparison operator with Lambda Expression is not able to find NaN values


I am trying to replace the null values in a column based on categorical value of another column.But the == operator is making me regret all the big decision in my life. I have 8523 rows and 12 columns in Train set, of which 7 are categorical and 5 are numerical.

Columns are 'Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility', 'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type', 'Outlet_Type', 'Item_Outlet_Sales'

I want to fill the NaN values(float dtype) in the 'Item_Weight' column based on the categorical value of 'Outlet_Location_Type'. I have a dictionary(city_type_mean) with the categorical values as keys and the corresponding values to be replaced as values. I used the following code

train["Item_Weight"] = train.apply(lambda x: city_type_mean[x['Outlet_Location_Type']] if x["Item_Weight"] == np.nan else x["Item_Weight"], axis=1) 

But the Nan value remains unaffected. I am attaching a train data sample following the problemmatic code image. Train data sample.problemmaticcode snippet The problem I've so far troubleshooted was the above if condition always evaluates to false leading to else being executed. And I've tried the condition with is and pd.isnull() methods but to no avail.Any help with the problem is much appreciated.Also please intimate me before marking this question in case of duplication.


Solution

  • can you please try isnan instead of == np.nan ?

    train["Item_Weight"] = train.apply(lambda x: city_type_mean[x['Outlet_Location_Type']] if  np.isnan(x["Item_Weight"]) else x["Item_Weight"], axis=1)