Search code examples
pythontensorflowkerasdeep-learning

Bidirectional LSTM output shape


There is Bidirectional LSTM model, I don't understand why after the second implementation of model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2))), in the result we get 2 dimension (None, 20) but in the first bi directionaL LSTM we have (None, 409, 20). can anyone help me please? and also how can I add a self attention layer in the model?

from tensorflow.keras.layers import LSTM,Dense, Dropout,Bidirectional
from tensorflow.keras.layers import SpatialDropout1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.text import Tokenizer


embedding_vector_length = 100

model2 = Sequential()

model2.add(Embedding(len(tokenizer.word_index) + 1, embedding_vector_length,     
                                         input_length=409) )

model2.add(Bidirectional(LSTM(10, return_sequences=True, recurrent_dropout=0.2)))
model2.add(Dropout(0.4))
model2.add(Bidirectional(LSTM(10, recurrent_dropout=0.2)))
model2.add(SeqSelfAttention())

#model.add(Dropout(dropout))
#model2.add(Dense(256, activation='relu'))


#model.add(Dropout(0.2))

model2.add(Dense(3, activation='softmax'))
model2.compile(loss='binary_crossentropy',optimizer='adam', 
                           metrics=['accuracy'])
print(model2.summary())


and the output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_23 (Embedding)     (None, 409, 100)          1766600   
_________________________________________________________________
bidirectional_12 (Bidirectio (None, 409, 20)           8880      
_________________________________________________________________
dropout_8 (Dropout)          (None, 409, 20)           0         
_________________________________________________________________
bidirectional_13 (Bidirectio (None, 20)                2480      
_________________________________________________________________
dense_15 (Dense)             (None, 3)                 63        
=================================================================
Total params: 1,778,023
Trainable params: 1,778,023
Non-trainable params: 0
_________________________________________________________________
None

Solution

  • For the second Bidirectional-LSTM, by default, return_sequences is set to False. Therefore, the output of this layer will be like many-to-one. If you want to get the output of each time_step, then simply use model2.add(Bidirectional(LSTM(10, return_sequences=True , recurrent_dropout=0.2))).

    For attention mechanism in LSTM, you may refer to this and this links.