Search code examples
pythonpandasmachine-learningscikit-learnimputation

Imputer on some columns in a Dataframe


I am trying to use Imputer on a single column called "Age" to replace missing values. But, I get the error: "Expected 2D array, got 1D array instead:"

Following is my code

import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer

dataset = pd.read_csv("titanic_train.csv")

dataset.drop('Cabin', axis=1, inplace=True)
x = dataset.drop('Survived', axis=1)
y = dataset['Survived']

imputer = Imputer(missing_values="nan", strategy="mean", axis=1)
imputer = imputer.fit(x['Age'])
x['Age'] = imputer.transform(x['Age'])

Solution

  • The Imputer is expecting a 2-dimensional array as input, even if one of those dimensions is of length 1. This can be achieved using np.reshape:

    imputer = Imputer(missing_values='NaN', strategy='mean')
    imputer.fit(x['Age'].values.reshape(-1, 1))
    x['Age'] = imputer.transform(x['Age'].values.reshape(-1, 1))
    

    That said, if you are not doing anything more complicated than filling in missing values with the mean, you might find it easier to skip the Imputer altogether and just use Pandas fillna instead:

    x['Age'].fillna(x['Age'].mean(), inplace=True)