Getting nan as loss value

I have implemented focal loss in Pytorch with using of this paper. And ran into a problem with loss - got nan as loss function value.

This is implementation of focal loss:

def focal_loss(y_real, y_pred, gamma = 2):
    y_pred = torch.sigmoid(y_pred)
    return -torch.sum((1 - y_pred)**gamma * y_real * torch.log(y_pred) +
                       y_pred**gamma * (1 - y_real) * torch.log(1 - y_pred))

Train loop and my SegNet are working, I think so, because I have tested them with dice and bce losses.

I think errors occurs in backprop. Why can it be? Maybe my implementation is wrong?

Solution

This version is working:

def focal_loss(y_real, y_pred, eps = 1e-8, gamma = 0):
    probabilities = torch.clamp(torch.sigmoid(y_pred), min=eps, max=1-eps)
    return torch.mean((1 - probabilities)**gamma * 
           (y_pred - y_real * y_pred + torch.log(1 + torch.exp(-y_pred))))

What do BatchNorm2d's running_mean / running_var mean in PyTorch?
Can batch normalization be considered a linear transformation?
Doing PyWavelets calculation on GPU
Training loss increases instead of decrease with epochs
What is the difference between an Embedding Layer with a bias immediately afterwards and a Linear Layer in PyTorch
How to improve the performance of CNN Model for a specific Dataset? Getting Low Accuracy on both training and Testing Dataset
TypeError: Only integers, slices, ellipsis, tf.newaxis and scalar tf.int32/tf.int64 tensors are valid indices
Traceback (most recent call last) in Colab when looping through dataloader in pytorch
The “Forward/Backward Passage Size” is too large for the pytorch model (Yolov3)
How do I use distributed DNN training in TensorFlow?
Neural network learning to sum two numbers
override pytorch Dataset efficiently
Implementation of F1-score, IOU and Dice Score
use matplotlib_inline and torch、d2l show error :NotImplementedError: Implement enable_gui in a subclass
how to implement custom metric in keras?
torchrl: Using SyncDataCollector with a custom pytorch dqn
Does peft train newly initialized weights?
Do I have to write custom AutoModel transformers class in case "TypeError: NVEmbedModel.forward() got an unexpected keyword argument 'inputs_embeds'"
Why RAG is slower than LLM?
"RuntimeError: Numpy is not available" when using inverse_transform
Pytorch RuntimeError: "host_softmax" not implemented for 'torch.cuda.LongTensor'
AMD ROCm with Pytorch on Navi10 (RX 5700 / RX 5700 XT)
Can we use multiple loss functions in same layer?
How do I update pixelClassificationLayer() to a custom loss function?
Neuralnet RMSE is 10x bigger than linear model's RMSE on test data set
Back Propagation in Convolutional Neural Networks and how to update filters
Face alignment megaface
autoencoder.fit() raises 'KeyError: 'Exception encountered when calling Functional.call()'
When to use numpy.random.randn(...) and when numpy.random.rand(...)?
What is freezing/unfreezing a layer in neural networks?