Search code examples
pythonmachine-learningpytorchlinear-regressionloss-function

Produce Scalar output for Custom Loss Function with different formula depending on difference between output and target


Currently trying to implement a custom loss function for linear regression with the following logic: *If output value of model is greater than or equal to the target, return loss as (output - target). *If output value of model is less than the target, return loss as (target - output)^2

Here is my current implementation:

import torch.nn as nn
class E_Loss(nn.Module):
    def __init__(self, weight=None, size_average=True):
      super(E_Loss, self).__init__()
    def forward(self, inputs, targets, smooth=1):
      inputs = inputs.view(-1)
      targets = targets.view(-1)
      is_greater = torch.gt(inputs, outputs)
      print(is_greater)
      if is_greater: #torch.gt(inputs, targets):
        loss = (inputs - targets)
      else:
        loss = np.square(targets - outputs)
      return loss

When running with my model for training, I get this error on my loss.backward() step: RuntimeError: grad can be implicitly created only for scalar outputs

Assuming it wants a scalar output, how can I rewrite my loss function to produce this? Would it be easier to rewrite my code to not use a dataloader?

Below is the entire model section

train_df, test_df = train_test_split(df, test_size=0.4)
train_dataset = FeatureDataset(train_df)
test_dataset = FeatureDataset(test_df)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False)
#setup dataloader

eloss = E_Loss()
criterion = eloss
model = linearRegression(16, 1)
learningRate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate, weight_decay=0.05)

h_loss = []
epochs = 100

for epoch in range(epochs):
  running_loss = 0.0
  for i, (x, y) in enumerate(train_dataloader):
    optimizer.zero_grad()
    #clear gradients after each epoch so it isnt cumulative
    outputs = model(x)
    #get current output from model for comparison
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-39-fb6074e17ffb> in <module>
---> 68     loss.backward()
     69     optimizer.step()
     70     running_loss += loss.item()

2 frames
/usr/local/lib/python3.8/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    486                 inputs=inputs,
    487             )
--> 488         torch.autograd.backward(
    489             self, gradient, retain_graph, create_graph, inputs=inputs
    490         )

/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    188 
    189     grad_tensors_ = _tensor_or_tensors_to_tuple(grad_tensors, len(tensors))
--> 190     grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
    191     if retain_graph is None:
    192         retain_graph = create_graph

/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in _make_grads(outputs, grads, is_grads_batched)
     83             if out.requires_grad:
     84                 if out.numel() != 1:
---> 85                     raise RuntimeError("grad can be implicitly created only for scalar outputs")
     86                 new_grads.append(torch.ones_like(out, memory_format=torch.preserve_format))
     87             else:

RuntimeError: grad can be implicitly created only for scalar outputs

Solution

  • The code you posted doesn't really make sense because

    • The line is_greater = torch.gt(inputs, outputs) uses a variable outputs that isn't defined.
    • torch.gt is an element-wise greater-than operation and it doesn't make sense to use its result in a conditional statement unless both inputs and outputs are scalars. Since your batch-size is 64 then you should have gotten an exception RuntimeError: Boolean value of tensor with more than one value is ambiguous.
    • You are directly using numpy function np.square on a torch tensor. This is ambiguous and may or may not work depending on how it's implemented under to hood. Use PyTorch functions with PyTorch tensors. Tensors support most python operators so just use either x**2 or x*x to do an element-wise square.

    The error you proposed indicates that you were able to get the loss function code to run despite all the apparent errors, but that backpropagation failed because the loss function isn't a scalar value. The code you posted doesn't run for me but also doesn't appear to do any form of mean-reduction. Assuming the issues above were addressed and it actually ran, then I would have expected you to encounter such an error. This error occurs because Tensor.backward requires the tensor to be scalar-valued, i.e. loss should be a single number. Most often, this is accomplished by averaging the loss over the entire batch.

    To address the loss function implementation, the function you describe is easier to implement if you consider it to be a function of output - target. Since output >= target is equivalent to output - target >= 0 then after letting x = output - target we just want a function that is equal to x when x is non-negative and x**2 otherwise.

    This can be achieved in different ways, but one easy way is to recognize that Tensor.relu can be used to decompose a function into positive and negative terms x = relu(x) + (-relu(-x)). Using this identity, it's hopefully clear that what you want is loss = relu(x) + relu(-x)**2. Fixing the mean-reduction problem as well a working version of your loss function could be:

    class E_loss(nn.Module):
        def __init__(self):
            super().__init__()
    
        def forward(self, outputs, targets):
            x = outputs.flatten() - targets.flatten()
            return (x.relu() + ((-x).relu())**2).mean()