deep-learning computer-vision pytorch backpropagation gradient-descent

Loss over pixels

During backpropagation, will these cases have different effect:-

sum up loss over all pixels then backpropagate.
average loss over all pixels then backpropagate
backpropagate individuallyover all pixels.

My main doubts in regarding the numerical value but the effect all these would be having.

Solution

The difference between no 1 and 2 is basically : since sum will result in bigger than mean, the magnitude of gradients from sum operation will be bigger, but direction will be same.

Here's a little demonstration, lets first declare necessary variables:

x = torch.tensor([4,1,3,7],dtype=torch.float32,requires_grad=True)
target = torch.tensor([4,2,5,4],dtype=torch.float32)

Now lets compute gradient for x using L2 loss with sum:

loss = ((x-target)**2).sum()
loss.backward()
print(x.grad)

This outputs: tensor([ 0., -2., -4., 6.])

Now using mean: (after resetting x grad)

loss = ((x-target)**2).mean()
loss.backward()
print(x.grad)

And this outputs: tensor([ 0.0000, -0.5000, -1.0000, 1.5000]) Notice how later gradients are exactly 1/4th of that of sum, that's because the tensors here contain 4 elements.

About third option, if I understand you correctly, that's not possible. You can not backpropagate before aggregating individual pixel errors to a scalar, using sum, mean or anything else.