Custom loss function with regularization cost added in TensorFlow

I wrote a custom loss function that add the regularization loss to the total loss, I added L2 regularizer to kernels only, but when I called model.fit() a warning appeared which states that the gradients does not exist for those biases, and biases are not updated, also if I remove a regularizer from a kernel of one of the layers, the gradient for that kernel also does not exist.

I tried to add bias regularizer to each layer and everything worked correctly, but I don't want to regularize the biases, so what should I do?

Here is my loss function:

 def _loss_function(y_true, y_pred):
    # convert tensors to numpy arrays
    y_true_n = y_true.numpy()
    y_pred_n = y_pred.numpy()
    # modify probablities for Knowledge Distillation loss
    # we do this for old tasks only
    old_y_true = np.float_power(y_true_n[:, :-1], 0.5)
    old_y_true = old_y_true / np.sum(old_y_true)
    old_y_pred = np.float_power(y_pred_n[:, :-1], 0.5)
    old_y_pred = old_y_pred / np.sum(old_y_pred)
    # Define the loss that we will used for new and old tasks
    bce = tf.keras.losses.BinaryCrossentropy()
    # compute the loss on old tasks
    old_loss = bce(old_y_true, old_y_pred)
    # compute the loss on new task
    new_loss = bce(y_true_n[:, -1], y_pred_n[:, -1])
    # compute the regularization loss
    reg_loss = tf.compat.v1.losses.get_regularization_loss()
    assert reg_loss is not None
    # convert all tensors to float64
    old_loss = tf.cast(old_loss, dtype=tf.float64)
    new_loss = tf.cast(new_loss, dtype=tf.float64)
    reg_loss = tf.cast(reg_loss, dtype=tf.float64)
    return old_loss + new_loss + reg_loss

Solution

In keras, loss function should return the loss value without regularization losses. The regularization losses will be added automatically by setting kernel_regularizer or bias_regularizer in each of the keras layers.

In other words, when you write your custom loss function, you don't have to care about regularization losses.

Edit: the reason why you got the warning messages that gradients don't exist is because of the usage of numpy() in your loss function. numpy() will stop any gradient propagation.

The warning messages disappeared after you added regularizers to the layers do not imply that the gradients were then computed correctly. It would only include the gradients from the regularizers but not from the data. numpy() should be removed in the loss function in order to get the correct gradients.

One of the solutions is to keep everything in tensors and use tf.math library. e.g. use tf.pow to replace np.float_power and tf.reduce_sum to replace np.sum