Search code examples

Replace Missing Values with Most Frequent number under Condition

I'm trying to replace missing values of column "Age" but under condition of other columns on this data Titanic - Machine Learning from Disaster

df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]

I tried to do that using SimpleImputer:

from sklearn.impute import SimpleImputer
Imputer = SimpleImputer(missing_values=np.nan, strategy='most_frequent')

Imputer.fit_transform( pd.DataFrame(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]) )

but it doesn't work and tried to save values to the column:

df.loc[(df.Age.isnull()) & (df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]), 'Age'] = Imputer.fit_transform( pd.DataFrame(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]) )

but doesn't work also.

I tried to do it manually using fillna()

df.loc[(df['Sex'] == 0) & (df['Pclass'] == 1), 'Age'].fillna(int(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)].mode()), inplace=True)

I tried to use indexes to access rows and update their values:

mod = int(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)].mode())
indices = df.loc[(df.Age.isnull()) & (df.Sex == 0) & (df.Pclass == 1), 'Age'].isnull().index
df.loc[ind, 'Age'] = mod
df[(df['Sex'] == 0) & (df['Pclass'] == 1)]['Age'].isnull().sum()

it worked and the output was: 0, but when I'm trying to apply it in for loop it gives me an error

for i in range(1,3):
    for j in range(1,4):    
        indices = df.loc[(df.Sex == i) & (df.Pclass == j), 'Age'].isnull().index
        mod = int(df.Age[(df['Sex'] == i) & (df['Pclass'] == j)].mode())
        df.loc[ind, 'Age'] = mod

I want to know what is the wrong of first 2 ways an why the 3rd doesn't work in loop?


  • This solution works well but I don't know why above don't work!

    Imputer = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
    for i in range(2):
        for j in range(1,4):
            ls = np.array(df.Age[((df.Sex==i) & (df.Pclass==j))]).reshape(-1,1)
            df.Age[((df.Sex==i) & (df.Pclass==j))] = Imputer.fit_transform(ls)[:,0]