Search code examples
pandasmachine-learningscikit-learnsklearn-pandas

Imputer reduces the size of columns in my dataframe


print(np.shape(ar_fulldata_input_xx))

Output: (9027, 1443)

Now I use Imputer to impute the missing values of my dataframe ar_fulldata_input_xx as follows.

fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=0)
imputed_DF = pd.DataFrame(fill_NaN.fit_transform(ar_fulldata_input_xx))

Now I check the size of my imputed dataframe as follows.

print(np.shape(imputed_DF))

Output: (9027, 1442)

Why is the column size reduced by one?

Is there any way I can find which column is mixing after impute function??

I have run the following line of code to remove the all columns with entire "NAN" values or entire "0" values.

ar_fulldata_input_xx = ar_fulldata_input_xx.loc[:, (ar_fulldata_input_xx != 0).any(axis=0)]

and

ar_fulldata_input_xx=ar_fulldata_input_xx.dropna(axis=1, how='all')

Solution

  • You can do it on pandas using this:

    ndf = df.fillna(df.mean())
    

    It seems that there was an issue with one of the columns that was not importing properly the numeric values from the original file, so it is likely that this was the reason that the Imputer didn't work. OP is taking a look at it.