I have developed an LSTM Model with 1 LSTM layer and 3 dense layers as shown below
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
I have trained the model and obtained the trained weights and biases of the model. The details are shown below.
w = model.get_weights()
w[0].shape, w[1].shape,w[2].shape,w[3].shape,w[4].shape,w[5].shape,w[6].shape,w[7].shape,w[8].shape
The output I got is,
((5, 480),(120, 480),(480,),(120, 100),(100,),(100, 50),(50,),(50, 1),(1,))
It has given out 2 weight matrices of dimensions (5,480)&(120,480) and one bias matrix of dim (480,) corresponding to the LSTM layer. the others are related to the dense layers.
The thing I want to know is that, LSTM has 4 layers. So How can I get the weights and biases of these 4 layers separately? Can I divide the total weights(5,480) into 4 equal parts and consider the 1st 120 correspond to the 1st layer of LSTM, 2nd 120 belong to the 2nd layer of LSTM and so on??
Please share your valuable thoughts on this. Also any good references please
An LSTM doesn't have 4 layers but 4 weight matrices due to its internal gate-cell structure. If this is confusing, it is helpful to read some resources on how an LSTM works. To summarize, the internals consist of 3 gates and 1 cell state which are used to calculate the final hidden state.
If you check the underlying implementation, you can see in which order they are concatenated:
[i, f, c, o]
# i is input gate weights (W_i).
# f is forget gate weights (W_f).
# o is output gate weights (W_o).
# c is cell gate weights (W_c).
So on the example of your bias tensor (480,)
, you can divide this into 4 subtensors with size 120, where w[:120]
represents the input gate weights, w[120:240]
represents the forget gate weights and so on.