python deep-learning pytorch neural-network

Mismatch in pytorch Hubber Loss calculation

So I have two matrices that I need to calculate the hubber loss, my predictions x and my labels y as follows: their shapes return respectively

x.shape = torch.Size([16, 3])

y.shape = torch.Size([16, 3])

when I extract the zero values from the matrices with:

    Z = x[y!=0]
    Y = y[y!=0]

their shapes turn to torch.Size([47]) they lose one sample because it was null so their contents is as follows:

Z:
        tensor([-0.0458177179,  0.0794119537,  0.0132298730, 
        -0.0684378445,  0.0239270329, -0.0998859257,
        -0.0052077156, -0.0251933523,  0.0673011318, 
        -0.0612106398, -0.0158440936, -0.0163329616,
        -0.0501042753, -0.0480697751,  0.0425285585,
        -0.0132632162, -0.0011744332, -0.0364661291, 
        -0.0337458402, -0.0496659800,  0.0198419951, 
        -0.0582559109,  0.0249673352, -0.0003011823,         
        -0.0630656779, -0.0028106403, -0.0478336737,  
         0.0248849690, -0.0248453412, -0.0383186191, 
        -0.0145248026, -0.0016013812, -0.0033918321, 
        -0.0392261222, -0.0121430475, -0.0213719532, 
        -0.0376458839, -0.0001880229,  0.0371584892,  
         0.0006527156,  0.0143199209, -0.0159018263, 
        -0.0016747080, -0.1677681506, 
        -0.1180625483, -0.0595825985, -0.0515905097], grad_fn=<IndexBackward0>)

and

Y:
        tensor([35.7957916260, 21.6673698425,  9.7700004578, 
          34.3571853638, 22.8384342194, 12.3599996567,
          32.5878181458, 19.4447669983, 17.7210159302,
          34.8954925537, 21.9499092102, 12.2500000000,
          31.5930919647, 22.8414421082,  9.0699996948,
          33.4950599670, 22.7711448669, 10.4899997711,
          34.4383850098, 21.3011169434, 12.8400001526,
          36.1783103943, 20.6615600586, 10.4700002670,
          35.6059494019, 21.8775119781, 12.0799999237,
          32.5706367493, 25.7370777130, 10.4099998474,
          34.3967170715, 22.7617359161, 8.9700002670,
          35.6082382202, 22.0040664673,  9.6199998856,
          34.3121032715, 22.4560070038, 10.2299995422,
          34.3565788269, 20.7696933746, 10.2200002670,
          38.5010910034, 23.5856552124, 
          34.1491165161, 22.1938228607,  9.5699996948])

    loss = nn.functional.huber_loss(Z, Y)

when I calculate the loss from this I get the result tensor(22.3675804138, grad_fn=HuberLossBackward0) but for my needs, I must calculate the loss from each of the three classes separately, and I do it as follows

for i in range(x.shape[1]):
            loss1 += nn.functional.huber_loss(x[y[:,i]!=0,i], y[y[:,i]!=0,i])
            
loss1 = loss1/x.shape[1]

first I take out any number that is zero (as done before) and then I select each class, and calculate the loss from the three classes sum them, and then I divide them by the number of classes, resulting in the loss tensor(22.1220340729, grad_fn=DivBackward0), the print of the classes contents is as follows:

class 0 X:
        tensor([-0.0458177179, -0.0684378445, -0.0052077156, -0.0612106398,
        -0.0501042753, -0.0132632162, -0.0337458402, -0.0582559109,
        -0.0630656779,  0.0248849690, -0.0145248026, -0.0392261222,
        -0.0376458839,  0.0006527156, -0.0016747080, -0.1180625483],
       grad_fn=<IndexBackward0>) 
class 0 Y:
       tensor([35.7957916260, 34.3571853638, 32.5878181458, 34.8954925537,
        31.5930919647, 33.4950599670, 34.4383850098, 36.1783103943,
        35.6059494019, 32.5706367493, 34.3967170715, 35.6082382202,
        34.3121032715, 34.3565788269, 38.5010910034, 34.1491165161])
class 1 X:
 tensor([ 0.0794119537,  0.0239270329, -0.0251933523, -0.0158440936,
        -0.0480697751, -0.0011744332, -0.0496659800,  0.0249673352,
        -0.0028106403, -0.0248453412, -0.0016013812, -0.0121430475,
        -0.0001880229,  0.0143199209, -0.1677681506, -0.0595825985],
       grad_fn=<IndexBackward0>) 
class 1 Y:
 tensor([21.6673698425, 22.8384342194, 19.4447669983, 21.9499092102,
        22.8414421082, 22.7711448669, 21.3011169434, 20.6615600586,
        21.8775119781, 25.7370777130, 22.7617359161, 22.0040664673,
        22.4560070038, 20.7696933746, 23.5856552124, 22.1938228607])
class 2 X:
 tensor([ 0.0132298730, -0.0998859257,  0.0673011318, -0.0163329616,
         0.0425285585, -0.0364661291,  0.0198419951, -0.0003011823,
        -0.0478336737, -0.0383186191, -0.0033918321, -0.0213719532,
         0.0371584892, -0.0159018263, -0.0515905097], 
         grad_fn=<IndexBackward0>) 
class 2 Y:
 tensor([ 9.7700004578, 12.3599996567, 17.7210159302, 12.2500000000,
         9.0699996948, 10.4899997711, 12.8400001526, 10.4700002670,
        12.0799999237, 10.4099998474,  8.9700002670,  9.6199998856,
        10.2299995422, 10.2200002670,  9.5699996948])

there is a difference of 0,2455463409 the losses even though all the numbers match, does anyone knows why this could be happening? Maybe some kind of rounding error? Or am I doing something wrong?

Solution

You're getting a different result because you're doing a different calculation: in the former you average across all elements in X and Y, in the latter you average across all classes meaning that elements are weighted differently in the average because not all classes have the same number of elements.