Search code examples
numpymachine-learningpredictionmissing-data

remove zeros vlaues from test labels in python


I have trained a model and I would like to find its accuracy in percent, by dividing the error to the test labels of my dataset. However, there are some zero values in the test labels, which account for the missing values. Therefore, dividing the corresponding error to these values would result to infinity.

mape = 100 * (errors / y_test)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')

The above snippet would print "inf" as the output. I should somehow get rid of zero values in the "y_test" series. One method is to find the indices of the zero values in this series, then removing the corresponding values in the error array.

erry = np.array([errors,y_test])

Now, I wonder how can I write a code that removes those elements in the erry whose second column is equal to zero?

In case you know some wiser methods to calculate model accuracy while paying attention to missing values, please point out to them


Solution

  • I would use y_test to create a boolean index for both arrays:

    idx = y_test != 0
    mape = 100 * (errors[idx] / y_test[idx])
    # Calculate and display accuracy
    accuracy = 100 - np.mean(mape)
    print('Accuracy:', round(accuracy, 2), '%.')