Search code examples
pythonnumpyscikit-learnmissing-dataimputation

ValueError: Input contains NaN, infinity or a value too large for dtype('float64'), when using sklearn IterativeImputer


I'm using IterativeImputer (from sklearn.impute import IterativeImputer) on a small (42* 7) normalized (mean=0, variance = 1) numpy data that includes missing values. When I activate the IterativeImputer command fit on this data, I get the following warnings (many times):

 RuntimeWarning: overflow encountered in square eigen_vals_ = S ** 2

 RuntimeWarning: invalid value encountered in true_divide
  gamma_ = np.sum((alpha_ * eigen_vals_) /

 RuntimeWarning: overflow encountered in matmul
  ret = a @ b

At last, I get this error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

If I change the max_iter value (from 4000 to 100) of the IterativeImputer, the warnings and the error are not appearing, but this is not a good solution.

What is the reason for the warnings and the error and how to fix it?

The code and PrintScreen of the data are attached below:

import numpy as np
import pandas as pd

x= pd.read_csv("small datasets/check_31_7.csv", header= None)
z= x.to_numpy()

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from numpy import isnan


miss_mean_imputer = IterativeImputer(missing_values=np.nan, max_iter= 4000)
miss_mean_imputer = miss_mean_imputer.fit(z)
imputed_data = miss_mean_imputer.transform(z)
print("")

The data (row 41 isn't appearing in the picture):

enter image description here


Solution

  • IterativeImputer fills missing values based on a regression model. I assume that since you are doing lots of iterations and getting overflow errors, you're missing values are growing after each iteration until it hits infinity.

    One solution could be doing some post-processing by setting a min or max value after each iteration, at least to stop the warnings and errors. This is simply done by supplying a min_value and max_value argument as:

    miss_mean_imputer = IterativeImputer(missing_values=np.nan, max_iter= 4000, min_value=-3, max_value=3)