Search code examples
keraslstmloss

Batch training in Keras LSTM


If I use a batch_size of 32 in an LSTM made with Keras, is the loss function applied to each sequence and then averaged, or is it applied directly to all sequences without taking each sequence into account?

Thanks in advance.


Solution

  • Since a batch_size of one would imply updating the weights after a sequence, a batch size of 32 would mean updating the weights after those 32 sequences.

    So the weights are updated only after this chunk of 32 sequences, with the loss as average on all of those, since otherwise if the loss would be updated to each one in itself, it would actually represent the plain SGD with batch_size = 1.