I had a look at this tutorial in the PyTorch docs for understanding Transfer Learning. There was one line that I failed to understand.
After the loss is calculated using loss = criterion(outputs, labels)
, the running loss is calculated using running_loss += loss.item() * inputs.size(0)
and finally, the epoch loss is calculated using running_loss / dataset_sizes[phase]
.
Isn't loss.item()
supposed to be for an entire mini-batch (please correct me if I am wrong). i.e, if the batch_size
is 4, loss.item()
would give the loss for the entire set of 4 images. If this is true, why is loss.item()
being multiplied with inputs.size(0)
while calculating running_loss
? Isn't this step like an extra multiplication in this case?
Any help would be appreciated. Thanks!
It's because the loss given by CrossEntropy
or other loss functions is divided by the number of elements i.e. the reduction parameter is mean
by default.
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
Hence, loss.item()
contains the loss of entire mini-batch, but divided by the batch size. That's why loss.item()
is multiplied with batch size, given by inputs.size(0)
, while calculating running_loss
.