python machine-learning pytorch gradient-descent

pytorch, for the cross_entropy function, What if the input labels are all ignore_index?

for the nn.CrossEntropyLoss(), What if the input labels are all ignore_index ? I got a 'nan' loss, how to fix it ? thanks!

detail of my code is following: code notice that IGNORE_INDEX is -100 then, the loss_ent becomes 'nan', the input of func criterion() is following: criterion's input

my python environment is following:

Python:3.7.11
torch:1.12.0+cu116
GPU:NVIDIA A100 80G

thank for your attention, Please let me know if there is anything else I need to provide.

Solution

There seem to be two possibilities.

This is the code to help understand the ignore index. If you set all inputs to ignore index, criterion makes nan as output because there is no value to compute.

import torch
import torch.nn as nn

x = torch.randn(5, 10, requires_grad=True)
y = torch.ones(5).long() * (-100)

criterion = nn.CrossEntropyLoss(ignore_index=-100)
loss = criterion(x, y)
print(loss)

tensor(nan, grad_fn=<NllLossBackward0>)

However, if there is even one value to calculate, the loss is calculated properly.

import torch
import torch.nn as nn

x = torch.randn(5, 10, requires_grad=True)
y = torch.ones(5).long() * (-100)
y[0] = 1

criterion = nn.CrossEntropyLoss(ignore_index=-100)
loss = criterion(x, y)
print(loss)

tensor(1.2483, grad_fn=<NllLossBackward0>)

It seems the learning rate is too large that cause diverge.

As you can see in the figure below, if the learning rate is large, the gradient gradually increases and diverges.

How about reducing the learning rate of the parameters of the model with is_gold_ent as output?

Usually, when reducing the learning rate, reduce it by 1/3. For example 0.01, 0.003, 0.001,...