Search code examples
pythontensorflowkeraskeras-layerdifferentiation

How do I find the non differentiable operation in my layer?


I am trying to create a rather complex lambda-layer with many operations in keras. After I implemented it, I got a ValueError: No gradients provided for any variable.

While I am using only keras operations to transform the data, (except for a constant I create using numpy which I later add onto a Tensor) I understand that there must be some operations which are not differentiable. Now I want to know how I can figure out which one it is, so I can find a workaround.

I don't want publish any code yet as it is part of a competition and I want to figure this out on my own. If it is difficult to understand my problem because of that, please let me know. I can however give a list of all the functions I am using:

from tensorflow.keras import backend as K
from tensorflow.python.keras.layers import Lambda

...
def my_lambda_function(x):
    # uses:
    K.batch_dot
    K.cast
    K.clip
    K.concatenate
    K.one_hot
    K.reshape
    K.sum
    K.tile  # only applied to a constant created in numpy

...
# using the function in a model like this:
my_lambda_layer = Lambda(my_lambda_function)
result_tensor = my_lambda_layer(some_input)

I think K.one_hot could be problematic, but I want a way to know this for sure before I try making it differentiable


Solution

  • After a few hours of sleep, here is my simple solution: Create a simple NN for testing and add a lambda layer in which I try out all functions seperately. This is however only an indirect way of finding the problem. Here is my code:

    from tensorflow.python.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Lambda
    from tensorflow.python.keras.models import Sequential
    from tensorflow.keras.datasets.mnist import load_data
    
    from tensorflow.keras import backend as K
    
    import tensorflow as tf
    import numpy as np
    
    (x_train, y_train), (x_test, y_test) = load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    x_train, x_test = np.reshape(x_train, (-1, 28, 28, 1)), np.reshape(x_test, (-1, 28, 28, 1))
    
    
    def test_function(x):
        x_int = K.cast(x, tf.int16)  # this was one of the gradient killers in my case
        return K.cast(x_int, tf.float16)
    
    
    model = Sequential()
    model.add(Input(shape=(28, 28, 1)))
    model.add(Conv2D(10, (5, 5), padding='same', activation='relu'))
    model.add(MaxPooling2D())
    model.add(Lambda(test_function))
    model.add(UpSampling2D())
    model.add(Conv2DTranspose(4, (5, 5), padding='same', activation='relu'))
    model.add(Conv2DTranspose(1, (3, 3), padding='same', activation='sigmoid'))
    
    model.compile(optimizer='adam',
                  loss='mse',
                  metrics=['accuracy'])
    
    model.fit(x_train, x_train, epochs=5)
    model.evaluate(x_test, x_test)
    

    This worked for me but I hope there are better solutions.

    Btw. I can approximate a floor operation (which is also killing the gradients) using these functions:

    def a(x):
        two_pi = 2 * math.pi
        two_pi_x = x * two_pi
        sine = K.sin(two_pi_x)
        numerator = sine + two_pi_x
        return numerator / two_pi
    
    
    def approximated_floor(x):
        x2 = a(a(a(x))) - 0.5
        return x2