Currently trying to implement a custom loss function for linear regression with the following logic: *If output value of model is greater than or equal to the target, return loss as (output - target). *If output value of model is less than the target, return loss as (target - output)^2
import torch.nn as nn
class E_Loss(nn.Module):
def __init__(self, weight=None, size_average=True):
super(E_Loss, self).__init__()
def forward(self, inputs, targets, smooth=1):
inputs = inputs.view(-1)
targets = targets.view(-1)
is_greater = torch.gt(inputs, outputs)
print(is_greater)
if is_greater: #torch.gt(inputs, targets):
loss = (inputs - targets)
else:
loss = np.square(targets - outputs)
return loss
When running with my model for training, I get this error on my loss.backward() step: RuntimeError: grad can be implicitly created only for scalar outputs
Assuming it wants a scalar output, how can I rewrite my loss function to produce this? Would it be easier to rewrite my code to not use a dataloader?
train_df, test_df = train_test_split(df, test_size=0.4)
train_dataset = FeatureDataset(train_df)
test_dataset = FeatureDataset(test_df)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False)
#setup dataloader
eloss = E_Loss()
criterion = eloss
model = linearRegression(16, 1)
learningRate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate, weight_decay=0.05)
h_loss = []
epochs = 100
for epoch in range(epochs):
running_loss = 0.0
for i, (x, y) in enumerate(train_dataloader):
optimizer.zero_grad()
#clear gradients after each epoch so it isnt cumulative
outputs = model(x)
#get current output from model for comparison
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-39-fb6074e17ffb> in <module>
---> 68 loss.backward()
69 optimizer.step()
70 running_loss += loss.item()
2 frames
/usr/local/lib/python3.8/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
486 inputs=inputs,
487 )
--> 488 torch.autograd.backward(
489 self, gradient, retain_graph, create_graph, inputs=inputs
490 )
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
188
189 grad_tensors_ = _tensor_or_tensors_to_tuple(grad_tensors, len(tensors))
--> 190 grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
191 if retain_graph is None:
192 retain_graph = create_graph
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in _make_grads(outputs, grads, is_grads_batched)
83 if out.requires_grad:
84 if out.numel() != 1:
---> 85 raise RuntimeError("grad can be implicitly created only for scalar outputs")
86 new_grads.append(torch.ones_like(out, memory_format=torch.preserve_format))
87 else:
RuntimeError: grad can be implicitly created only for scalar outputs
The code you posted doesn't really make sense because
is_greater = torch.gt(inputs, outputs)
uses a variable outputs
that isn't defined.torch.gt
is an element-wise greater-than operation and it doesn't make sense to use its result in a conditional statement unless both inputs
and outputs
are scalars. Since your batch-size is 64 then you should have gotten an exception RuntimeError: Boolean value of tensor with more than one value is ambiguous
.np.square
on a torch tensor. This is ambiguous and may or may not work depending on how it's implemented under to hood. Use PyTorch functions with PyTorch tensors. Tensors support most python operators so just use either x**2
or x*x
to do an element-wise square.The error you proposed indicates that you were able to get the loss function code to run despite all the apparent errors, but that backpropagation failed because the loss function isn't a scalar value. The code you posted doesn't run for me but also doesn't appear to do any form of mean-reduction. Assuming the issues above were addressed and it actually ran, then I would have expected you to encounter such an error. This error occurs because Tensor.backward
requires the tensor to be scalar-valued, i.e. loss should be a single number. Most often, this is accomplished by averaging the loss over the entire batch.
To address the loss function implementation, the function you describe is easier to implement if you consider it to be a function of output - target
. Since output >= target
is equivalent to output - target >= 0
then after letting x = output - target
we just want a function that is equal to x
when x
is non-negative and x**2
otherwise.
This can be achieved in different ways, but one easy way is to recognize that Tensor.relu
can be used to decompose a function into positive and negative terms x = relu(x) + (-relu(-x))
. Using this identity, it's hopefully clear that what you want is loss = relu(x) + relu(-x)**2
. Fixing the mean-reduction problem as well a working version of your loss function could be:
class E_loss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, outputs, targets):
x = outputs.flatten() - targets.flatten()
return (x.relu() + ((-x).relu())**2).mean()