Having issues with fillna() and df slices. Still have my python training wheels on and would appreciate any assistance. I've found lots of close examples on SE, but because of the conditionals and multiple columns I haven't found anything that worked for this case.
Data: Advanced regression comp: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
However, no need to pull down and combine the train and test data like I have because this slice only contains ~150 np.nan rows across the columns/factors listed in garage_cat_columns.
The Id column is int and between 1 and 3000
# Columns to update and criteria to slice
garage_cat_columns = ['GarageType','GarageYrBlt','GarageQual','GarageCond','GarageFinish']
no_garage = testing.Id[((testing['GarageType'].isnull()) &
(testing['GarageYrBlt'].isnull()) &
(testing['GarageQual'].isnull()) &
(testing['GarageCond'].isnull()) &
(testing['GarageFinish'].isnull()))].tolist() # are all null?
# Best of my knowledge this is textbook (old text book) fillna()
testing[testing.Id.where(df.Id.isin(no_garage)).notnull()][garage_cat_columns].fillna("None", inplace=True)
I'm getting the "copy of a slice" warning (to be expected), but can't seem to find a way to write this back to the original dataframe so my changes will "commit."
Some failed attempts:
# 1 tried with and without double brackets
testing[[garage_cat_columns]] = testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()] \
[garage_cat_columns] = 'None'
# 2
testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()][garage_cat_columns] = 'None'
# 3
testing[garage_cat_columns] = testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()] \
[garage_cat_columns].fillna("None")
# 4 tried with and without double brackets
testing[[garage_cat_columns]] = testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()] \
[garage_cat_columns].fillna("None", inplace=True)
# 5 Hail Mary
testing[testing['Id'].where(df['Id'].isin(no_garage)).notnull()] \
[garage_cat_columns].fillna("None", inplace=True) = \ # equality here
testing[testing['Id'].where(df['Id'].isin(no_garage)).notnull()] \
[garage_cat_columns].fillna("None", inplace=True)
Any help is much appreciated!
If need set all Id
if contain at least one row of data with only missing values to string None
use:
garage_cat_columns = ['GarageType','GarageYrBlt','GarageQual','GarageCond','GarageFinish']
no_garage = testing.loc[testing[garage_cat_columns].isnull().all(axis=1), 'Id'].tolist()
testing.loc[df['Id'].isin(no_garage), garage_cat_columns] = 'None'
If need set only rows with all None
s like Nonetype:
mask = testing[garage_cat_columns].isnull().all(axis=1)
testing.loc[mask, garage_cat_columns] = 'None'