I am a beginner at using Python code, and I have a problem with calculating the RMSE in two matrixes when they include Nan values.
For example, I have two matrixes, which include a couple of columns with Nan values. How can I calculate the RMSE value in each column?
X = ndarray with shape (1500, 27), y = ndarray with shape (1500, 27)
Any help would be appreciated.
I tried to calculate it but I get Nan value each time. So, any help would be appreciated.
Your ndarray is very small. So, just iterate over rows and columns and use
import numpy as np
arr1 = np.array([[1, 2, 10, 50, -np.nan, 0, np.nan],[1, 2, 10, 30, -np.nan, 0, np.nan]])
arr2 = np.array([[5, 2, 10, 50, 10, 0, np.nan],[1, 2, 10, 50, -np.nan, 0, np.nan]])
for i in range(arr1.shape[0]):
rmse = 0
skipped_cells = 0
for j in range(arr1.shape[1]):
if np.isnan(arr1[i, j]) or np.isnan(arr2[i, j]):
skipped_cells += 1
continue
rmse += (arr1[i,j] - arr2[i,j])**2
print((rmse/(arr1.shape[1]-skipped_cells))**(1/2))
Now, if you are working with larger arrays AND your bottleneck is this part - look for set operations. Briefly. Use np.isnan
, then apply np.where
to substitute nan
with zeros in both arrays. Then compute rmse. Finally, correct for the fact that some values were nan. That is multiply by sqrt of the length of a row and and divide by (np.sum(is_not_nan))**0.5