There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20). can anyone help me please? and also how can I add a self attention layer in the model?
from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
embedding_vector_length = 100
model2 = Sequential()
model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,
input_length=409) )
model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Dropout(0.4))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
model2.add(SeqSelfAttention())
#model.add(Dropout(dropout))
#model2.add(Dense(256, activation='relu'))
#model.add(Dropout(0.2))
model2.add(Dense(3, activation='softmax'))
model2.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy'])
print(model2.summary())
and the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_23 (Embedding) (None, 409, 100) 1766600
_________________________________________________________________
bidirectional_12 (Bidirectio (None, 409, 20) 8880
_________________________________________________________________
dropout_8 (Dropout) (None, 409, 20) 0
_________________________________________________________________
bidirectional_13 (Bidirectio (None, 20) 2480
_________________________________________________________________
dense_15 (Dense) (None, 3) 63
=================================================================
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
_________________________________________________________________
None
For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).
For attention mechanism in LSTM, you may refer to this and this links.