I am trying to create a rather complex lambda-layer with many operations in keras. After I implemented it, I got a ValueError: No gradients provided for any variable
.
While I am using only keras operations to transform the data, (except for a constant I create using numpy which I later add onto a Tensor) I understand that there must be some operations which are not differentiable. Now I want to know how I can figure out which one it is, so I can find a workaround.
I don't want publish any code yet as it is part of a competition and I want to figure this out on my own. If it is difficult to understand my problem because of that, please let me know. I can however give a list of all the functions I am using:
from tensorflow.keras import backend as K
from tensorflow.python.keras.layers import Lambda
...
def my_lambda_function(x):
# uses:
K.batch_dot
K.cast
K.clip
K.concatenate
K.one_hot
K.reshape
K.sum
K.tile # only applied to a constant created in numpy
...
# using the function in a model like this:
my_lambda_layer = Lambda(my_lambda_function)
result_tensor = my_lambda_layer(some_input)
I think K.one_hot could be problematic, but I want a way to know this for sure before I try making it differentiable
After a few hours of sleep, here is my simple solution: Create a simple NN for testing and add a lambda layer in which I try out all functions seperately. This is however only an indirect way of finding the problem. Here is my code:
from tensorflow.python.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Lambda
from tensorflow.python.keras.models import Sequential
from tensorflow.keras.datasets.mnist import load_data
from tensorflow.keras import backend as K
import tensorflow as tf
import numpy as np
(x_train, y_train), (x_test, y_test) = load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train, x_test = np.reshape(x_train, (-1, 28, 28, 1)), np.reshape(x_test, (-1, 28, 28, 1))
def test_function(x):
x_int = K.cast(x, tf.int16) # this was one of the gradient killers in my case
return K.cast(x_int, tf.float16)
model = Sequential()
model.add(Input(shape=(28, 28, 1)))
model.add(Conv2D(10, (5, 5), padding='same', activation='relu'))
model.add(MaxPooling2D())
model.add(Lambda(test_function))
model.add(UpSampling2D())
model.add(Conv2DTranspose(4, (5, 5), padding='same', activation='relu'))
model.add(Conv2DTranspose(1, (3, 3), padding='same', activation='sigmoid'))
model.compile(optimizer='adam',
loss='mse',
metrics=['accuracy'])
model.fit(x_train, x_train, epochs=5)
model.evaluate(x_test, x_test)
This worked for me but I hope there are better solutions.
Btw. I can approximate a floor operation (which is also killing the gradients) using these functions:
def a(x):
two_pi = 2 * math.pi
two_pi_x = x * two_pi
sine = K.sin(two_pi_x)
numerator = sine + two_pi_x
return numerator / two_pi
def approximated_floor(x):
x2 = a(a(a(x))) - 0.5
return x2