Search code examples
pythonpandasnumpyinfinityfloat32

Possible bug with inf or too large values?


I'm trying to train a neural network with keras and tesorflow. As usual, I replace -np.inf and np.inf values with np.nan to later run a dropna sequence and clear all that wrong data such as:

 Data.replace([np.inf, -np.inf], np.nan, inplace=True)
 Data.dropna(inplace=True)

However, after that I couldn't cast the data as float32 as I got the error [when trying to normalize the values]: ValueError: Input contains infinity or a value too large for dtype('float32'). I tried to cast it to float64, which allowed me to. But then the training processes get weird errors all the time. So I ran the next snippet:

a = np.array([np.finfo(np.float64).max])
print(x > a.any())

and surprisingly I got these result:

[[ True  True  True ... False False False]
 [ True  True  True ... False False False]
 [ True  True  True ... False False False]
 ...
 [ True  True  True ... False False False]
 [ True  True  True ... False False False]
 [ True  True  True ... False False False]]
[[False False False ...  True  True  True]
 [False False False ...  True  True  True]
 [False False False ...  True  True  True]
 ...
 [False False False ...  True  True  True]
 [False False False ...  True  True  True]
 [False False False ...  True  True  True]]

meaning, there are (True) values bigger than the maximum float64. Isn't it an infinite value? why aren't they replaced with the above code? Is there any way to replace them?

Edit:

I see that the problem is not when casting it as float64 or float32 but when I try to normalize the results with any normalization function (standard, minmax, etc.)


Solution

  • Instead of specifically looking for infinities, just throw out data which is out of bounds, something like this:

    bad = Data < -1e20 | Data > 1e20 # use whatever your valid range is
    Data.drop(bad.any('columns'), inplace=True)