I am trying to create a custom loss function for a binary classifier case. I need the binary predictions as an input to the function. However, I am getting to a point where I am unable to create a the process differentiable. I get the raw output from the model which has autograd attached to it. It is as follows.
outputs = tensor([[-0.1908, 0.4115],
[-1.0019, -0.1685],
[-1.1265, -0.3025],
[-0.5925, -0.6610],
[-0.4076, -0.4897],
[-0.6450, -0.2863],
[ 0.1632, 0.4944],
[-1.0743, 0.1003],
[ 0.6172, 0.5104],
[-0.2296, -0.0551],
[-1.3165, 0.3386],
[ 0.2705, 0.1200],
[-1.3767, -0.6496],
[-0.5603, 1.0609],
[-0.0109, 0.5767],
[-1.1081, 0.8886]], grad_fn=<AddmmBackward0>)
Then I take the predictions from it using;
_, preds = torch.max(outputs, 1)
However, when taking a look at the preds
variable, the grad function is gone;
preds = tensor([0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0])
#labels
labels: tensor([0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1])
The preds
variable goes as input to the custom loss function.
My question is; Is there a way that I could get the preds
variable with the autograd attached to it. So that it could be differentiated.
I get a warning when I manually attach autograd to the preds
variable.
#Custom loss function
def pfbeta_torch(preds, labels, beta=1.3):
#labels = torch.tensor(labels.clone().detach(), dtype=torch.float64, requires_grad=True)
preds = torch.tensor(preds.clone(), dtype=torch.float64, requires_grad=True)
pTP = torch.sum(labels * preds)
pFP = torch.sum((1 - labels) * preds)
num_positives = torch.sum(labels) # = pTP+pFN
pPrecision = pTP / (pTP + pFP)
pRecall = pTP / num_positives
beta_squared = beta ** 2
# x=0
if (pPrecision > 0 and pRecall > 0):
pF1 = (1 + beta_squared) * pPrecision * pRecall / (beta_squared * pPrecision + pRecall)
return pF1
else:
return torch.tensor(0, dtype=torch.float64, requires_grad=True)
#Warning
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
This is separate from the ipykernel package so we can avoid doing imports until
I used the following function below as per new info.
def pfbeta_torch(outputs, labels, beta=1.3):
logits = F.softmax(outputs, dim=-1)
outputs = F.gumbel_softmax(logits, tau=1, hard=True)
pTP = torch.sum(labels * outputs[:,1])
pFP = torch.sum((1 - labels) * outputs[:,1])
num_positives = torch.sum(labels) # = pTP+pFN
pPrecision = pTP / (pTP + pFP)
pRecall = pTP / num_positives
beta_squared = beta ** 2
# x=0
if (pPrecision > 0 and pRecall > 0):
pF1 = (1 + beta_squared) * pPrecision * pRecall / (beta_squared * pPrecision + pRecall)
return pF1
else:
return torch.tensor(0, dtype=torch.float64, requires_grad=True)
Printing the loss value:
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.5000, grad_fn=<DivBackward0>)
tensor(0.5000, grad_fn=<DivBackward0>)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.7610, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.8433, grad_fn=<DivBackward0>)
tensor(0.5000, grad_fn=<DivBackward0>)
tensor(0.6142, grad_fn=<DivBackward0>)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.6142, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.6667, grad_fn=<DivBackward0>)
tensor(1., grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
nn.CrossEntropyLoss()
) it gives 94% accuracy. Hence, I believe there that the custom loss function is not properly done.Would anyone be able to help me in this regards please. Thanks & Best Regards AMJS
Max has a derivative of 0 everywhere except at the transition point where it is undefined. For this reason implementing what you’re asking for is impossible. That said, there are tricks to work around. If you’re fine with the outputs being relaxed you can use ‘Preds = outputs.softmax(dim=1)’. Based on your example code, it seems you’re implementing something close to the Jaccard index and this is the approach I would suggest. If you really need them to be discrete you can use hard Gumbel-softmax or straight through estimators, but those are rather advanced topics and I’d recommend against it unless you know what you’re doing.