Search code examples
neural-networklstmrecurrent-neural-network

How loss in RNN/LSTM is calculated?


I'm learing how LSTM works by practicing with time series training data(input is a list of features and output is a scalar). There is a problem that i couldnt understand when calculating loss for RNN/LSTM:

How loss is calculated? Is it calculated at each time i give the nn new input or acummulated through all the given inputs and then be backprop


Solution

  • The answer does not depend on the neural network model. It depends on your choice of optimization method.

    If you are using batch gradient descent, the loss is averaged over the whole training set. This is often impractical for neural networks, because the training set is too big to fit into RAM, and each optimization step takes a lot of time.

    In stochastic gradient descent, the loss is calculated for each new input. The problem with this method is that it is noisy.

    In mini-batch gradient descent, the loss is averaged over each new minibatch - a subsample of inputs of some small fixed size. Some variation of this method is typically used in practice.

    So, the answer to your question depends on the minibatch size you choose.

    Convergence of gradient descent depending on minibatch size

    (Image is from here)