Search code examples
pythontensorflowtensorflow2.0gradientgradienttape

Using Gradient Tape for Jacobian of LSTM model - Python


I am building a sequence to one model prediction using LSTM. My data has 4 input variables and 1 output variable which needs to be predicted. The data is a time series data. The total length of the data is 38265 (total number of timesteps). The total data is in a Data Frame of size 38265 *5

I want to use the previous 20 timesteps data of the 4 input variables to make prediction of my output variable. I am using the below code for this purpose.

model = Sequential()

model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape = 
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))

model.add(Dense(1))

I want to calculate the Jacobian of the output variable w.r.t the LSTM model function using tf.Gradient Tape .. Can anyone help me out with this??


Solution

  • The solution to segregate the Jacobian of the output with respect to the LSTM input can be done as follows:

    1. Using tf.GradientTape(), we can compute the Jacobian arising from the gradient flow.

    2. However for getting the Jacobian , the input needs to be in the form of tf.EagerTensor which is usually available when we want to see the Jacobian of the output (after executing y=model(x)). The following code snippet shares this idea:

    #Get the Jacobian for each persistent gradient evaluation
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(2,activation='relu'))
    model.add(tf.keras.layers.Dense(2,activation='relu'))
    x = tf.constant([[5., 6., 3.]])
    
    with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
      # Forward pass
      tape.watch(x)
      y = model(x)
      loss = tf.reduce_mean(y**2)
    print('Gradients\n')
    jacobian_wrt_loss=tape.jacobian(loss,x)
    print(f'{jacobian_wrt_loss}\n')
    jacobian_wrt_y=tape.jacobian(y,x)
    print(f'{jacobian_wrt_y}\n')
    
    1. But for getting intermediate outputs ,such as in this case, there have been many samples which use Keras. When we separate the outputs coming out from model.layers.output, we get the type to be a Keras.Tensor instead of an EagerTensor. However for creating the Jacobian, we need the Eager Tensor. (After many failed attempts with @tf.function wrapping as eager execution is already present in TF>2.0)

    2. So alternatively, an auxiliary model can be created with the layers required (in this case, just the Input and LSTM layers).The output of this model will be a tf.EagerTensor which will be useful for the Jacobian tensor creation. The following has been shown in this snippet:

    #General Syntax for getting jacobians for each layer output
    import numpy as np
    import tensorflow as tf
    tf.executing_eagerly()
    x=tf.constant([[15., 60., 32.]])
    x_inp = tf.keras.layers.Input(tensor=tf.constant([[15., 60., 32.]]))
    model=tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
    model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_2'))
    
    aux_model=tf.keras.Sequential()
    aux_model.add(tf.keras.layers.Dense(2,activation='relu',name='dense_1'))
    #model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
    
    with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
      # Forward pass
      tape.watch(x)
      x_y = model(x)
      act_y=aux_model(x)
      print(x_y,type(x_y))
      ops=[layer.output for layer in model.layers]
        
    # ops=[layer.output for layer in model.layers]
    # inps=[layer.input for layer in model.layers]
    print('Jacobian of Full FFNN\n')
    jacobian=tape.jacobian(x_y,x)
    print(f'{jacobian[0]}\n')
    
    print('Jacobian of FFNN with Just first Dense\n')
    jacobian=tape.jacobian(act_y,x)
    print(f'{jacobian[0]}\n')
    

    Here I have used a simple FFNN consisting of 2 Dense layers, but I want to evaluate w.r.t the output of the first Dense layer. Hence I created an auxiliary model having just 1 Dense layer and determined the output of the Jacobian from it.

    The details can be found here.