Search code examples
pythontensorflowkerasconv-neural-networkloss-function

Tensorflow loss function having no gradients


I might need some help with implementing an specific regularization term for the loss function. It does however not have a gradient and I wonder if there is any way of changing that. I have read this approach in a paper, however it is not important to read this paper for actually helping me. I will just describe the method where the problem is and show a testing code in a Google-Colab.

The Neural Network is just made of 2 convolutional layers and the final layer is with the Sigmoid activation function. Therefore the output is between 0 and 1 due to the Sigmoid. This value will be treated as a probability for every neuron in the output layer to be either 0 or 1. So I want to implement this with the 'tf.keras.backend.switch' function in such a way:

def regularization_term(y_true, y_pred):
    zeros = tf.zeros_like(y_pred)
    ones = tf.ones_like(y_pred)
    random = tf.random.uniform(tf.shape(y_pred),minval=0,maxval=1,dtype=tf.dtypes.float64)
    y_pred_new = tf.keras.backend.switch(random > y_pred, zeros, ones)
    return y_pred_new

I draw a random number and check with a condition to make each value 0 or 1. This should be clear from the code. However when I do this, the term actually has no gradient:

ValueError: No gradients provided for any variable: ['conv2d_4/kernel:0', 'conv2d_4/bias:0', 'conv2d_5/kernel:0', 'conv2d_5/bias:0'].

The full testing code can be found in this colab: https://colab.research.google.com/drive/1YuX00BUAj-BVCZRbr4opo5wHbcaVbYvx?usp=sharing

Is there any way of implementing this method while keeping a gradient for making the network learn? If anything is unclear, please ask, I tried to put some effort in to make my question as understandable as possible. I'm really glad for any help.

Thank you very much!

[EDIT, COPY PASTE FROM GOOGLE_COLAB, ignore otherwise:]

#just importing some libraries
import tensorflow as tf
import numpy as np
from tensorflow.python.framework import ops
from tensorflow.keras import datasets, layers, models
tf.keras.backend.set_floatx('float64')
#length of the dataset
L=16
#THIS REGULARIZATION TERM NEEDS SOME AID

def regularization_term(y_true, y_pred):
    zeros = tf.zeros_like(y_pred)
    ones = tf.ones_like(y_pred)
    random = tf.random.uniform(tf.shape(y_pred),minval=0,maxval=1,dtype=tf.dtypes.float64)
    y_pred_new = tf.keras.backend.switch(random > y_pred, zeros, ones)
    #here is actually some additional operatios, but they dont need to be taken into consideration so I've removed them
    return tf.reduce_sum(y_pred_new)

def my_custom_loss(y_true, y_pred):
    # I try this only with the regularization term to get the 'No gradients provided' error message
    return regularization_term(y_true, y_pred)
    #actually i would add a binary crossentropy term to this, i did not here for showcase purpose
    enter code here
#creating a dataset for input (initial) and true data (target) for testing purposes
initial = np.random.randint(2,size=(10000,L+2,L+2)).astype("float")
target = np.random.randint(2,size=(10000,L,L)).astype("float")

#adding a model with CNN and 1 hidden layer with relu and 1 output layer with sigmoid
EPOCHS = 2
BATCH_SIZE = 1000
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[L+2,L+2,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[L,L,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=my_custom_loss)
model.fit(initial.reshape(10000,L+2,L+2,1),target.reshape(10000,L,L,1),batch_size = BATCH_SIZE, epochs=EPOCHS, verbose=1)

Solution

  • Those 2 tensors, zeros and ones, that you are creating inside your normalization_term function is preventing tf from finding the path to the trainable variables.

    Just change your code to the following one and it will work

    def regularization_term(y_true, y_pred):
        random = tf.random.uniform(tf.shape(y_pred),minval=0,maxval=1,dtype=tf.dtypes.float64)
    
        y_pred_new = tf.keras.backend.switch(random > y_pred,  y_pred * 0,  (y_pred*0)+1)
    
        return tf.reduce_sum(y_pred_new)