Search code examples
python-3.xnannumpy-ndarray

Calculate RMSE in two matrix when they include Nan value in Python Code


I am a beginner at using Python code, and I have a problem with calculating the RMSE in two matrixes when they include Nan values.

For example, I have two matrixes, which include a couple of columns with Nan values. How can I calculate the RMSE value in each column?

X = ndarray with shape (1500, 27), y = ndarray with shape (1500, 27)

Any help would be appreciated.

I tried to calculate it but I get Nan value each time. So, any help would be appreciated.


Solution

  • Your ndarray is very small. So, just iterate over rows and columns and use

        import numpy as np
    
    arr1 = np.array([[1, 2, 10, 50, -np.nan, 0, np.nan],[1, 2, 10, 30, -np.nan, 0, np.nan]])
    arr2 = np.array([[5, 2, 10, 50, 10, 0, np.nan],[1, 2, 10, 50, -np.nan, 0, np.nan]])
    
    
    for i in range(arr1.shape[0]):
        rmse = 0
        skipped_cells = 0
        for j in range(arr1.shape[1]):
            if np.isnan(arr1[i, j]) or np.isnan(arr2[i, j]):
                skipped_cells += 1
                continue
            rmse += (arr1[i,j] - arr2[i,j])**2
        print((rmse/(arr1.shape[1]-skipped_cells))**(1/2))
    

    Now, if you are working with larger arrays AND your bottleneck is this part - look for set operations. Briefly. Use np.isnan, then apply np.where to substitute nan with zeros in both arrays. Then compute rmse. Finally, correct for the fact that some values were nan. That is multiply by sqrt of the length of a row and and divide by (np.sum(is_not_nan))**0.5