I am using LSTM as the hidden layer function in a time series prediction network. Is input normalization necessary? If it is, is data = data / sum(data) the correct normalization? Should the output also be normalized with the inputs?
Is input normalization necessary?
No, but it might make your network converge faster. Use this calculation to scale your values to [0,1]:
.
Should the output also be normalized with the inputs?
No, I can't think of a reason why you would ever want to do that.