I am building an encoder-decoder architecture to do text summarization on restaurant reviews. I have been following this guide. My model has 3 LSTM layers for the the encoder and one LSTM layer for decoding. This is what it looks like right now:
latent_dim = 300
embedding_dim = 200
encoder_inputs = Input(shape=(max_review_len, ))
enc_emb = Embedding(x_voc, embedding_dim,
trainable=True)(encoder_inputs)
encoder_lstm1 = LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4)
encoder_lstm2 = LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.4)
(encoder_output2, state_h2, state_c2) = encoder_lstm2(encoder_output1)
encoder_lstm3 = LSTM(latent_dim, return_state=True,
return_sequences=True, dropout=0.4,
recurrent_dropout=0.4)
(encoder_outputs, state_h, state_c) = encoder_lstm3(encoder_output2)
decoder_inputs = Input(shape=(None, ))
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True,
return_state=True, dropout=0.4,
recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
decoder_lstm(dec_emb, initial_state=[state_h, state_c])
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
In order to improve the summarization results I would like to add an attention layer, ideally like this (as suggested by this guide):
Attention layer attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_outputs, decoder_outputs])
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])
The guide suggests doing the following import:
from attention import Attention
However this leads to an invalid syntax error on the declaration of the attn_layer
variable. Even resolving the error leads to an ÀttentionLayer not defined
error.
I've turned to the attention_keras module which seems to be just what I need but pip install attention_keras
is unsuccesful:
ERROR: Could not find a version that satisfies the requirement attention_keras (from versions: none)
I solved the problem by using this import:
from tensorflow.keras.layers import Attention
The attention layer now takes the encoder and decoder outputs in order to create the desired attention distribution:
attention = Attention()
attention_outputs = attention([decoder_outputs, encoder_outputs])
concatenate = Concatenate(axis=-1)
decoder_concat_input = concatenate([decoder_outputs, attention_outputs])
# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
I followed this guide