Tensorflow2 LSTM (cuDNN args) vs tensorflow1 CuDNNLSTM implementation difference

In tensorflow 1, there's the layer tf.compat.v1.keras.layers.CuDNNLSTM that is built for using cuDNN while in tensorflow 2 this layer has been deprecated in favor of using tf.keras.layers.LSTM with

  1. `activation` == `tanh`
  2. `recurrent_activation` == `sigmoid`
  3. `recurrent_dropout` == 0
  4. `unroll` is `False`
  5. `use_bias` is `True`
  6. Inputs are not masked or strictly right padded.

for cuDNN implementation. I do not know if there is a bug or some difference that wasn't implemented but there seems to be a difference with CuDNNLSTM using an input bias and recurrent bias where as LSTM under the above tf2 cuDNN rules only uses a recurrent bias.

related code

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.compat.v1.keras.layers import CuDNNLSTM

print(tf.__version__)

model1 = Sequential()
model1.add(LSTM(1, activation='tanh', recurrent_dropout=0, unroll=False, use_bias=True, return_sequences=0, input_shape=(1, 1)))
print(model1.summary())

model2 = Sequential()
model2.add(CuDNNLSTM(1, return_sequences=0, input_shape=(1, 1)))
print(model2.summary())

2.2.0
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)               (None, 1)                 12        
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
None
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
cu_dnnlstm (CuDNNLSTM)    (None, 1)                 16        
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________

Notice the total params differ by N_units * 4, meaning its missing an additional bias vector for each cell.

Note the pytorch implementation of LSTM matches tf1 CuDNNLSTM which is how I stumbled across this.

Is there some fix I'm missing or should this be elevated to a github issue?

Solution

No, it's not a bug.

The 2x bias in CuDNNLSTM is separate biases for the recurrent kernel.

When the CuDNNLSTM is made available in tf.keras.layers.LSTM, you can see the code is written in such a way that it does not use separate bias for a recurrent kernel instead it calls LSTMCell which is a base class and does not have a separate bias.

You can use model.layers[0].trainable_weights to see the shape difference of bias between both the implementations.