I have a 2D time series data-set with integers ranging in 1,000,000 - 2,000,000 output on any given day. Of course my data is not limited, as I can sum up to weekly values hence the range increasing to over 10,000,000.
I'm able to achieve RMSE = 0.02 whenever I normalize my data, but when I feed the raw(1 million range) data into the algorithm, RSME can equal up to 30k - 150k error range.
Why in one version of the RMSE outputs my "global minima" is 0.02, while the other output in higher ranges? I've been testing with AdaDelta.
The definition of RMSE is:
The scale of this value directly depends on the scale on predictions and actuals, so it's quite normal that you get a higher RMSE value when you don't normalize the dataset.
This is why normalization is important, as it lets us compare error metrics across models and datasets.