I have the RMSE loss, defined as :
RMSE = np.sum(np.sqrt((np.array(pred_df.real_values) - np.array(pred_df.estimate_values))**2))
where the real values and predictions are between 0.0 and 5.0 .
I want to use this as an accuracy metric, not as a loss, however I don't know the interval in which this function takes values. The only thing I can think of is that:
Worse case - all predictions are wrong (all are 5.0 apart) : RMSE = 5.0 * len(pred_df)
Best case - all predictions are correct : RMSE = 0.0
Can I just use RMSE - 5.0 * len(pred_df)
as my accuracy metric? Is there a smarter way of doing this?
Actually, your loss is more of an TRSE
as you are taking the root first then the total sum instead of mean, hence the "total root squared error" :). If you really want RMSE loss,
RMSE = np.sqrt(np.mean((np.array(pred_df.real_values) - np.array(pred_df.estimate_values))**2))
To convert this to accuracy metric, you are correct in finding the min/max values but you should not be subtracting by max value; you should first subtract the min value and then divide by the difference of max and min values i.e. min-max normalization. This will give values in the range [0, 1]
. The min value of RMSE is 0 and the max value is 5 (your best/worst case approach justifies this). Then, (RMSE - 0) / (5 - 0) = RMSE / 5 is the accuracy metric: acc = RMSE / 5