Search code examples
tensorflowmachine-learningkeraslstm

Proper understanding of Keras implementation of LSTM: how do the units work?


How N_u units of LSTM works on a data of N_x length? I know that there are many similar questions asked before but the answers are full of contradictions and confusions. Therefore I am trying to clear my doubts by asking specific questions. I am following the simple blog here: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Q0) Is keras implementation consistent with the above blog? Please consider the following code.

import tensorflow as tf
N_u,N_x=1,1
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(N_u, stateful=True, batch_input_shape=(32, 1, N_x))
])
model.summary()

For simplicity, my input data here is just a scalar and I have one time step to keep things simple. The output shape is (32,1). No. of parameter is 12.
Q1) I have one LSTM unit or cell, right? The following represent a cell, right? enter image description here I understand from the picture that there would be 12 parameters : forget gate=2 weights+1 bias; input_gate=2*(2 weights+1 bias); output gate=(2 weights+1 bias). So everything is fine up to this point.

Q2) Now let us set N_u,N_x=1,2. I expect the same cell will be applied to the two elements of x. But I found that the total number of parameters now is 16! Why? Is it because I get 4 additional weight parameters corresponding to the LSTM connection between the x_2 and the LSTM unit?

Q3) Now let us set N_u,N_x=2,1. I have now two units of LSTM. My understanding was the two cells will operate parallelly on the same data (a scalar number in this case). Are these two units completely independent or do they influence each other? I expected the parameters number would be 2*12=24, but I in reality got 32 instead. Why 32?

Q4) If I set N_u,N_x=2,2, number of parameter is 40. I think I can get it if I understand the above two points.

Q5) Finally, is there a documentation/paper which the keras implementation is based on?


Solution

  • The nuber of parameters can be computed with this formula:

    LSTM parameter number = 4 × ((x + h) × h + h)

    where x is the dimension of the input vector and h is the size of the output space.

    See this link for an explanation: https://www.kaggle.com/code/kmkarakaya/lstm-understanding-the-number-of-parameters

    N_u=1, N_x=1, means the ouput space size is 1, the input space size is 1, so P=12

    N_u=1, N_x=2, means you have changed the input space size (x) to 2, so using the formula you get 16.

    N_u=2, Nx_1, means you have doubled the output space. Again using the formula you get 32

    The formula holds also for N_u=2, N_x=2.

    I did not find a paper used as a reference for LSTM Keras implementation, but maybe this source code explanation could be of help: https://blog.softmaxdata.com/keras-lstm/

    Please note the source code link is not working. Use this instead: https://github.com/keras-team/keras/blob/master/keras/src/layers/rnn/lstm.py