So I have two matrices that I need to calculate the hubber loss, my predictions x and my labels y as follows: their shapes return respectively
x.shape = torch.Size([16, 3])
y.shape = torch.Size([16, 3])
when I extract the zero values from the matrices with:
Z = x[y!=0]
Y = y[y!=0]
their shapes turn to torch.Size([47]) they lose one sample because it was null so their contents is as follows:
Z:
tensor([-0.0458177179, 0.0794119537, 0.0132298730,
-0.0684378445, 0.0239270329, -0.0998859257,
-0.0052077156, -0.0251933523, 0.0673011318,
-0.0612106398, -0.0158440936, -0.0163329616,
-0.0501042753, -0.0480697751, 0.0425285585,
-0.0132632162, -0.0011744332, -0.0364661291,
-0.0337458402, -0.0496659800, 0.0198419951,
-0.0582559109, 0.0249673352, -0.0003011823,
-0.0630656779, -0.0028106403, -0.0478336737,
0.0248849690, -0.0248453412, -0.0383186191,
-0.0145248026, -0.0016013812, -0.0033918321,
-0.0392261222, -0.0121430475, -0.0213719532,
-0.0376458839, -0.0001880229, 0.0371584892,
0.0006527156, 0.0143199209, -0.0159018263,
-0.0016747080, -0.1677681506,
-0.1180625483, -0.0595825985, -0.0515905097], grad_fn=<IndexBackward0>)
and
Y:
tensor([35.7957916260, 21.6673698425, 9.7700004578,
34.3571853638, 22.8384342194, 12.3599996567,
32.5878181458, 19.4447669983, 17.7210159302,
34.8954925537, 21.9499092102, 12.2500000000,
31.5930919647, 22.8414421082, 9.0699996948,
33.4950599670, 22.7711448669, 10.4899997711,
34.4383850098, 21.3011169434, 12.8400001526,
36.1783103943, 20.6615600586, 10.4700002670,
35.6059494019, 21.8775119781, 12.0799999237,
32.5706367493, 25.7370777130, 10.4099998474,
34.3967170715, 22.7617359161, 8.9700002670,
35.6082382202, 22.0040664673, 9.6199998856,
34.3121032715, 22.4560070038, 10.2299995422,
34.3565788269, 20.7696933746, 10.2200002670,
38.5010910034, 23.5856552124,
34.1491165161, 22.1938228607, 9.5699996948])
loss = nn.functional.huber_loss(Z, Y)
when I calculate the loss from this I get the result tensor(22.3675804138, grad_fn=HuberLossBackward0) but for my needs, I must calculate the loss from each of the three classes separately, and I do it as follows
for i in range(x.shape[1]):
loss1 += nn.functional.huber_loss(x[y[:,i]!=0,i], y[y[:,i]!=0,i])
loss1 = loss1/x.shape[1]
first I take out any number that is zero (as done before) and then I select each class, and calculate the loss from the three classes sum them, and then I divide them by the number of classes, resulting in the loss tensor(22.1220340729, grad_fn=DivBackward0), the print of the classes contents is as follows:
class 0 X:
tensor([-0.0458177179, -0.0684378445, -0.0052077156, -0.0612106398,
-0.0501042753, -0.0132632162, -0.0337458402, -0.0582559109,
-0.0630656779, 0.0248849690, -0.0145248026, -0.0392261222,
-0.0376458839, 0.0006527156, -0.0016747080, -0.1180625483],
grad_fn=<IndexBackward0>)
class 0 Y:
tensor([35.7957916260, 34.3571853638, 32.5878181458, 34.8954925537,
31.5930919647, 33.4950599670, 34.4383850098, 36.1783103943,
35.6059494019, 32.5706367493, 34.3967170715, 35.6082382202,
34.3121032715, 34.3565788269, 38.5010910034, 34.1491165161])
class 1 X:
tensor([ 0.0794119537, 0.0239270329, -0.0251933523, -0.0158440936,
-0.0480697751, -0.0011744332, -0.0496659800, 0.0249673352,
-0.0028106403, -0.0248453412, -0.0016013812, -0.0121430475,
-0.0001880229, 0.0143199209, -0.1677681506, -0.0595825985],
grad_fn=<IndexBackward0>)
class 1 Y:
tensor([21.6673698425, 22.8384342194, 19.4447669983, 21.9499092102,
22.8414421082, 22.7711448669, 21.3011169434, 20.6615600586,
21.8775119781, 25.7370777130, 22.7617359161, 22.0040664673,
22.4560070038, 20.7696933746, 23.5856552124, 22.1938228607])
class 2 X:
tensor([ 0.0132298730, -0.0998859257, 0.0673011318, -0.0163329616,
0.0425285585, -0.0364661291, 0.0198419951, -0.0003011823,
-0.0478336737, -0.0383186191, -0.0033918321, -0.0213719532,
0.0371584892, -0.0159018263, -0.0515905097],
grad_fn=<IndexBackward0>)
class 2 Y:
tensor([ 9.7700004578, 12.3599996567, 17.7210159302, 12.2500000000,
9.0699996948, 10.4899997711, 12.8400001526, 10.4700002670,
12.0799999237, 10.4099998474, 8.9700002670, 9.6199998856,
10.2299995422, 10.2200002670, 9.5699996948])
there is a difference of 0,2455463409 the losses even though all the numbers match, does anyone knows why this could be happening? Maybe some kind of rounding error? Or am I doing something wrong?
You're getting a different result because you're doing a different calculation: in the former you average across all elements in X
and Y
, in the latter you average across all classes meaning that elements are weighted differently in the average because not all classes have the same number of elements.