Search code examples
pythonpandasdataframefillna

df.fillna() not working when updating multiple columns from a slice


Having issues with fillna() and df slices. Still have my python training wheels on and would appreciate any assistance. I've found lots of close examples on SE, but because of the conditionals and multiple columns I haven't found anything that worked for this case.

Data: Advanced regression comp: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

However, no need to pull down and combine the train and test data like I have because this slice only contains ~150 np.nan rows across the columns/factors listed in garage_cat_columns.

The Id column is int and between 1 and 3000

# Columns to update and criteria to slice

garage_cat_columns = ['GarageType','GarageYrBlt','GarageQual','GarageCond','GarageFinish']
no_garage = testing.Id[((testing['GarageType'].isnull()) &
                        (testing['GarageYrBlt'].isnull()) &
                        (testing['GarageQual'].isnull()) &
                        (testing['GarageCond'].isnull()) &
                        (testing['GarageFinish'].isnull()))].tolist() # are all null?

# Best of my knowledge this is textbook (old text book) fillna()

testing[testing.Id.where(df.Id.isin(no_garage)).notnull()][garage_cat_columns].fillna("None", inplace=True)

I'm getting the "copy of a slice" warning (to be expected), but can't seem to find a way to write this back to the original dataframe so my changes will "commit."

Some failed attempts:

# 1 tried with and without double brackets

testing[[garage_cat_columns]] = testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()] \
        [garage_cat_columns] = 'None'

# 2 

testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()][garage_cat_columns] = 'None'      

# 3

testing[garage_cat_columns] = testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()] \
        [garage_cat_columns].fillna("None")

# 4 tried with and without double brackets

testing[[garage_cat_columns]] = testing[testing['Id'].where(df['Id']isin(no_garage)).notnull()] \
        [garage_cat_columns].fillna("None", inplace=True)

# 5 Hail Mary

testing[testing['Id'].where(df['Id'].isin(no_garage)).notnull()] \
        [garage_cat_columns].fillna("None", inplace=True) = \ # equality here
testing[testing['Id'].where(df['Id'].isin(no_garage)).notnull()] \
        [garage_cat_columns].fillna("None", inplace=True)

Any help is much appreciated!


Solution

  • If need set all Id if contain at least one row of data with only missing values to string None use:

    garage_cat_columns = ['GarageType','GarageYrBlt','GarageQual','GarageCond','GarageFinish']
    
    no_garage = testing.loc[testing[garage_cat_columns].isnull().all(axis=1), 'Id'].tolist()
    testing.loc[df['Id'].isin(no_garage), garage_cat_columns] = 'None'
    

    If need set only rows with all Nones like Nonetype:

    mask = testing[garage_cat_columns].isnull().all(axis=1)
    testing.loc[mask, garage_cat_columns] = 'None'