Search code examples
pythonpandasscikit-learn

Ignore NaN to calculate mean_absolute_error


I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When I used:

from sklearn.metrics import mean_absolute_error

and selected these columns, it gave me an error: "Input contains NaN'.

As an example, I'm trying to do something like this:

from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7, 10]
y_pred = [2.5, np.NaN, 2, 8, np.NaN]
mean_absolute_error(y_true, y_pred)

Is it possible to skip or ignore the rows with NaN?

UPDATE

I was analyzing with my advisor teacher, and we decided that the best is to drop all these NaN values.


Solution

  • If you want to ignore the NaNs, build a mask a perform boolean indexing:

    from sklearn.metrics import mean_absolute_error
    import numpy as np
    
    y_true = np.array([3, -0.5, 2, 7, 10])
    y_pred = np.array([2.5, np.NaN, 2, 8, np.NaN])
    m = ~np.isnan(y_pred)
    
    mean_absolute_error(y_true[m], y_pred[m])
    

    Output: 0.5