Search code examples
pythontensorflowmachine-learninggradientautomatic-differentiation

second derivative is None in tensorflow automatic differentiation


In the code below, I'm computing the second derivative (y_xx_lin) of a linear network modelLinear which has linear activation functions throughout, and the second derivative (y_xx_tanh) of a tanh network modelTanh which has tanh activations for all its layers except the last layer which is linear.

My question is: y_xx_lin is None but y_xx_tanh shows some values. Following this Stackoverflow question I'm guessing that y_xx_lin is None because the second derivative of a linear function is zero for all input values and thus in some sense not linked to the input. Is this the case?

Even if this is so, I would like TensorFlow to calculate the derivative and return it, instead of returning None. Is this possible?

# Second derivative of a linear network appears to be None

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import MeanSquaredError
import tensorflow.keras.backend as K
import numpy as np
import matplotlib.pyplot as plt

def build_network(activation='linear'):
    input_layer  = Input(1)
    inner_layer  = Dense(6, activation=activation)(input_layer)
    inner_layer1 = Dense(6, activation=activation)(inner_layer)
    inner_layer2 = Dense(6, activation=activation)(inner_layer1)
    output_layer = Dense(1, activation='linear')(inner_layer2)
    model = Model(input_layer, output_layer)
    return model

def get_first_second_derivative(X_train,y_train,model):
    with tf.GradientTape(persistent=True) as tape_second:
        tape_second.watch(X_train)
        
        with tf.GradientTape(persistent=True) as tape_first:
            # Watch the variables with who/whom we want to compute gradients
            tape_first.watch(X_train)
    
            # get the output of the NN
            output = model(X_train)
    
        y_x  = tape_first.gradient(output,X_train)

    y_xx = tape_second.gradient(y_x,X_train)
    
    return y_x,y_xx

modelLinear = build_network(activation='linear')
modelLinear.compile(optimizer=Adam(learning_rate=0.1),loss='mse')

modelTanh = build_network(activation='tanh')
modelTanh.compile(optimizer=Adam(learning_rate=0.1),loss='mse')

X_train = np.linspace(-1,1,10).reshape((-1,1))
y_train = X_train*X_train

X_train = tf.convert_to_tensor(X_train,dtype=tf.float64)
y_train = tf.convert_to_tensor(y_train,dtype=tf.float64)

y_x_lin,y_xx_lin   = get_first_second_derivative(X_train,y_train,modelLinear)
y_x_tanh,y_xx_tanh = get_first_second_derivative(X_train,y_train,modelTanh)

print('Type of y_xx_lin = ',type(y_xx_lin))

Solution

  • It works if you set lambda x: x ** 1 instead of 'linear' like

    ...
    
    id_func = lambda x: x ** 1
    
    def build_network(activation=id_func):
        input_layer  = Input(1)
        inner_layer  = Dense(6, activation=activation)(input_layer)
        inner_layer1 = Dense(6, activation=activation)(inner_layer)
        inner_layer2 = Dense(6, activation=activation)(inner_layer1)
        output_layer = Dense(1, activation=id_func)(inner_layer2)
        model = Model(input_layer, output_layer)
        return model
    
    ...
    
    modelLinear = build_network(activation=id_func)
    
    ...
    

    The reason why it works and why you code fails is in the answer you already cited. With such a weird implementation of the identity function TensorFlow backpropagation works correctly.

    Tested with TensorfFlow version 2.9.2.