for the nn.CrossEntropyLoss(), What if the input labels are all ignore_index ? I got a 'nan' loss, how to fix it ? thanks!
detail of my code is following: code notice that IGNORE_INDEX is -100 then, the loss_ent becomes 'nan', the input of func criterion() is following: criterion's input
my python environment is following:
thank for your attention, Please let me know if there is anything else I need to provide.
There seem to be two possibilities.
import torch
import torch.nn as nn
x = torch.randn(5, 10, requires_grad=True)
y = torch.ones(5).long() * (-100)
criterion = nn.CrossEntropyLoss(ignore_index=-100)
loss = criterion(x, y)
print(loss)
tensor(nan, grad_fn=<NllLossBackward0>)
However, if there is even one value to calculate, the loss is calculated properly.
import torch
import torch.nn as nn
x = torch.randn(5, 10, requires_grad=True)
y = torch.ones(5).long() * (-100)
y[0] = 1
criterion = nn.CrossEntropyLoss(ignore_index=-100)
loss = criterion(x, y)
print(loss)
tensor(1.2483, grad_fn=<NllLossBackward0>)
As you can see in the figure below, if the learning rate is large, the gradient gradually increases and diverges.
How about reducing the learning rate of the parameters of the model with is_gold_ent as output?
Usually, when reducing the learning rate, reduce it by 1/3. For example 0.01, 0.003, 0.001,...