Search code examples
pythonmachine-learningdeep-learningpytorchgradient-descent

Training with threshold in PyTorch


I have a neural network, which produces a single value when excited with input. I need to use this value returned by the network to threshold another array. The result of this threshold operation is used to compute a loss function (the value of threshold is not known before hand and needs to be arrived at by training). Following is an MWE

import torch

x = torch.randn(10, 1)  # Say this is the output of the network (10 is my batch size)
data_array = torch.randn(10, 2)  # This is the data I need to threshold
ground_truth = torch.randn(10, 2)  # This is the ground truth
mse_loss = torch.nn.MSELoss()  # Loss function

# Threshold
thresholded_vals = data_array * (data_array >= x)  # Returns zero in all places where the value is less than the threshold, the value itself otherwise

# Compute loss and gradients
loss = mse_loss(thresholded_vals, ground_truth)
loss.backward()  # Throws error here

Since the operation of thresholding returns a tensor array that is devoid of any gradients the backward() operation throws error.

How does one train a network in such a case?


Solution

  • Your threshold function is not differentiable in the threshold, therefore torch does not calculate the gradient for the threshold which is why your example is not working.

    import torch
    
    x = torch.randn(10, 1, requires_grad=True)  # Say this is the output of the network (10 is my batch size)
    data_array = torch.randn(10, 2, requires_grad=True)  # This is the data I need to threshold
    ground_truth = torch.randn(10, 2)  # This is the ground truth
    mse_loss = torch.nn.MSELoss()  # Loss function
    
    # Threshold
    thresholded_vals = data_array * (data_array >= x)  # Returns zero in all places where the value is less than the threshold, the value itself otherwise
    
    # Compute loss and gradients
    loss = mse_loss(thresholded_vals, ground_truth)
    loss.backward()  # Throws error here
    print(x.grad)
    print(data_array.grad)
    

    Output:

    None #<- for the threshold x
    tensor([[ 0.1088, -0.0617],  #<- for the data_array
            [ 0.1011,  0.0000],
            [ 0.0000,  0.0000],
            [-0.0000, -0.0000],
            [ 0.2047,  0.0973],
            [-0.0000,  0.2197],
            [-0.0000,  0.0929],
            [ 0.1106,  0.2579],
            [ 0.0743,  0.0880],
            [ 0.0000,  0.1112]])