Search code examples
neural-networklstmbackpropagation

Why scaling data is very important in neural network(LSTM)


I am writing my master thesis about how to apply LSTM neural network in time series. In my experiment, i found out that scaling data can have a great impact on the result. For example, when i use a tanh activation function, and the value range is between -1 and 1, the model seems to converge faster and the validation error also does not jump dramatically after each epoch.

Does anyone know is there any mathmetical explanation for that? Or is there any papers already explain about this situation?


Solution

  • Your question reminds me of a picture used in our class, but you can find a similar one from here at 3:02.

    enter image description here

    In the picture above you can see obviously that the path on the left is much longer than that on the right. The scaling is applied to the left to become the right one.