Search code examples
scikit-learnnanmean-square-error

How to fix 'Input contains NaN, infinity or a value too large for dtype('float64').' while calculating the MSLE


While trying to calculate the mean squared log error I get the following error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Calculating the mean squared error does not give an error. The following code can be used to reproduce the problem:

from sklearn.datasets import load_boston
dataset = load_boston()

import pandas as pd
df = pd.DataFrame(dataset.data, columns=dataset.feature_names, )

df["MEDV"] = dataset.target

#tried this, no difference
df = df.reset_index()

df.isnull().sum()
#No missing values

df.dtypes
# all float64

cols = ["LSTAT", "RM"]
X = df[cols]#.astype(np.float)
y = df["MEDV"]#.astype(np.float)

from sklearn.linear_model import LinearRegression
slr = LinearRegression()
slr.fit(X, y)
y_pred = slr.predict(X)

np.all(np.isfinite(X))
# true
np.all(np.isfinite(y))
#true

np.all(np.isfinite(y_pred))
#true

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y, y_pred)
print(mse)

from sklearn.metrics import mean_squared_log_error
# THIS produces the error message:
msle = mean_squared_log_error(y, y_pred)
print(msle)

I've done several checks:

  1. no missing values
  2. no infinite values
  3. datatypes is float64

I don't understand why it is giving me the error. Anyone know what i am doing wrong?

Kind regards,

Jaap


Solution

  • Running:

    y_pred[y_pred<0]
    

    you get:

    array([-4.66638608, -2.08933711])
    

    And you know, this is a problem for a natural logarithm.