Search code examples
pythonmachine-learningdeep-learningpytorchloss-function

The parameters of the model with custom loss function doesn’t upgraded thorough its learning over epochs


Thank you for reading my post. I’m currently developing the peak detection algorithm using CNN to determine the ideal convolution kernel which is representable as the ideal mother wavelet function that will maximize the peak detection accuracy.

To begin with, I created my own IoU loss function and the simple model and tried to run the learning. The execution itself worked without any errors, but somehow it failed.

The parameters of the model with custom loss function doesn't upgraded thorough its learning over epochs

My own loss function is described as below.

def IoU(inputs: torch.Tensor, labels: torch.Tensor, 
             smooth: float=0.1, threshold: float = 0.5, alpha: float = 1.0):
  '''
  - alpha: a parameter that sharpen the thresholding.
    if alpha = 1 -> thresholded input is the same as raw input.
  '''

  thresholded_inputs = inputs**alpha / (inputs**alpha + (1 - inputs)**alpha)
  inputs = torch.where(thresholded_inputs < threshold, 0, 1)
  batch_size = inputs.shape[0]

  intersect_tensor = (inputs * labels).view(batch_size, -1)
  intersect = intersect_tensor.sum(-1)

  union_tensor = torch.max(inputs, labels).view(batch_size, -1)
  union = union_tensor.sum(-1)

  iou = (intersect + smooth) / (union + smooth)  # We smooth our devision to avoid 0/0
  iou_score = iou.mean()

  return 1- iou_score

and my training model is,

class MLP(nn.Module):
  def __init__(self):
    super().__init__()
    self.net = nn.Sequential(
        nn.Conv1d(1, 1, kernel_size=32, stride=1, padding=16),
        nn.Linear(257, 256),
        nn.LogSoftmax(1)
    )
  def forward(self, x):
    return self.net(x)

model = MLP()
opt = optim.Adadelta(model.parameters())

# initialization of the kernel of Conv1d
def init_kernel(m):
  if type(m) == nn.Conv1d:
    nn.init.kaiming_normal_(m.weight)
    print(m.weight)
    plt.plot(m.weight[0][0].detach().numpy())

model.apply(init_kernel)

def step(x, y, is_train=True):
  opt.zero_grad()

  y_pred = model(x)
  y_pred = y_pred.reshape(-1, 256)

  loss = IoU(y_pred, y)
  loss.requires_grad = True
  loss.retain_grad = True

  if is_train:
    loss.backward()
    opt.step()

  return loss, y_pred

and lastly, the execution code is,

from torch.autograd.grad_mode import F

train_loss_arr, val_loss_arr = [], []
valbose = 10
epochs = 200

for e in range(epochs):
  train_loss, val_loss, acc = 0., 0., 0.,
  for x, y in train_set.as_numpy_iterator():
    x = torch.from_numpy(x)
    y = torch.from_numpy(y)
    model.train()
    loss, y_pred = step(x, y, is_train=True)
    train_loss += loss.item()
  train_loss /= len(train_set)

  for x, y ,in val_set.as_numpy_iterator():
    x = torch.from_numpy(x)
    y = torch.from_numpy(y)
    model.eval()
    with torch.no_grad():
      loss, y_pred = step(x, y, is_train=False)
    val_loss += loss.item()
  val_loss /= len(val_set)

  train_loss_arr.append(train_loss)
  val_loss_arr.append(val_loss)

  # visualize current kernel to check whether the learning is on progress safely.
  if e % valbose == 0: 
    print(f"Epoch[{e}]({(e*100/epochs):0.2f}%):  train_loss: {train_loss:0.4f}, val_loss: {val_loss:0.4f}")
    fig, axs = plt.subplots(1, 4, figsize=(12, 4))
    print(y_pred[0], y_pred[0].shape)
    axs[0].plot(x[0][0])
    axs[0].set_title("spectra")
    axs[1].plot(y_pred[0])
    axs[1].set_title("y pred")
    axs[2].plot(y[0])
    axs[2].set_title("y true")
    axs[3].plot(model.state_dict()["net.0.weight"][0][0].numpy())
    axs[3].set_title("kernel1")
    plt.show()

with these programs, I tried to evaluate this simple model, however, model parameters didn't change at all over epochs.

Visualization of the results at epoch 0 and 30.

epoch 0: prediction and kernel at epoch0

epoch 30: prediction and kernel at epoch30

As you can see, the kernel has not be modified through its learning over epochs.

I took a survey to figure out what causes this problem for hours but I'm still not sure how to fix my loss function and model into trainable ones.

Thank you.


Solution

  • Try printing the gradient after loss.backward() with:

    y_pred.grad()
    

    I suspect what you'll find is that after a backward pass, the gradient of y_pred is zero. This means that either a.) gradient is not enabled for one or more of the variables at which the computation graph has a node, or b.) (more likely) you are using an operation which is not differentiable.

    In your case, at a minimum torch.where is non-differentiable, so you'll need to replace that. Thersholding operations are non-differentiable and are generally replaced with "soft" thresholding operations (see Softmax instead of max function for classification) so that gradient computation still works. Try replacing this with a soft threshold or no threshold at all.