i have problems in the computation of gradients using automatic differentiation in TensorFlow. Basically i want to create a neural network which has just one output-value f and get an input of two values (x,t). The network should act like a mathematical function, so in this case f(x,t) where x and t are the input-variables and i want to compute partial derivatives, for example df_dx, d2f/dx2
or df_dt
. I need those partial derivatives later for a specific loss-function.
Here is my simplified code:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.flatten = Flatten(input_shape=(2, 1))
self.d1 = Dense(28)
self.f = Dense(1)
def call(self, y):
y = self.flatten(y)
y = self.d1(y)
y = self.f(y)
return y
if __name__ == "__main__":
#inp contains the input-variables (x,t)
inp = np.random.rand(1,2,1)
inp_tf = tf.convert_to_tensor(inp, np.float32)
#Create a Model
model = MyModel()
#Here comes the important part:
x = inp_tf[0][0]
t = inp_tf[0][1]
with tf.GradientTape(persistent=True) as tape:
tape.watch(inp_tf[0][0])
tape.watch(inp_tf)
f = model(inp_tf)
df_dx = tape.gradient(f, inp_tf[0][0]) #Derivative df_dx
grad_f = tape.gradient(f, inp_tf)
tf.print(f) #--> [[-0.0968768075]]
tf.print(df_dx) #--> None
tf.print(grad_f) #--> [[[0.284864038]
# [-0.243642956]]]
What i expected was that i get df_dx = [0.284864038]
(the first component of grad_f), but it results in None
. My questions are:
None
?What i think could do is to modify the architecture of the class MyModel
that i use two different Inputlayer (one for x and one for t) so that i can call the model like f = model(x,t)
but that seems unnatural for me and i think there should be an easier way.
Another point is that i don't get an Error when i change the input_shape of the Flattenlayer for example to self.flatten = Flatten(input_shape=(5,1)
but my inputvector has shape(1,2,1), so i expect to get an error but that's not the case, why? I'm grateful for your help :)
I use the following configurations:
Each time you do inp_tf[0][0]
or inp_tf[0][1]
you are creating a new tensor, but that new tensor is not used as input to your model, inp_tf
is. Even if inp_tf[0][0]
if part of inp_tf
, from the point of view of TensorFlow there is no computation graph between your newly created inp_tf[0][0]
and f
, hence there is no gradient. You have to compute the gradient with respect to inp_tf
and then take the parts of the gradient that you want from there.
In addition to that, as shown in the documentation of tf.GradientTape
, you can use nested tapes to compute second order derivatives. And, if you use the jacobian
, you can avoid using persistent=True
, which is better for performance. Here is how it could work in your example (I changed the layer activation functions to sigmoid
, as the default linear activation would not have a second order derivative).
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.flatten = Flatten(input_shape=(2, 1))
self.d1 = Dense(28, activation='sigmoid')
self.f = Dense(1, activation='sigmoid')
def call(self, y):
y = self.flatten(y)
y = self.d1(y)
y = self.f(y)
return y
np.random.seed(0)
inp = np.random.rand(1, 2, 1)
inp_tf = tf.convert_to_tensor(inp, np.float32)
model = MyModel()
with tf.GradientTape() as tape:
tape.watch(inp_tf)
with tf.GradientTape() as tape2:
tape2.watch(inp_tf)
f = model(inp_tf)
grad_f = tape2.gradient(f, inp_tf)
df_dx = grad_f[0, 0]
df_dt = grad_f[0, 1]
j = tape.jacobian(grad_f, inp_tf)
d2f_dx2 = j[0, 0, :, 0, 0]
d2f_dyx = j[0, 0, :, 0, 1]
d2f_dy2 = j[0, 1, :, 0, 1]
d2f_dxy = j[0, 1, :, 0, 0]
tf.print(df_dx)
# [0.0104712956]
tf.print(df_dt)
# [-0.00301733566]
tf.print(d2f_dx2)
# [[-0.000243180315]]
tf.print(d2f_dyx)
# [[-0.000740956515]]
tf.print(d2f_dy2)
# [[1.49392872e-05]]
tf.print(d2f_dxy)
# [[-0.000740956573]]