Search code examples
pythontensorflowkerasdeep-learninglstm

Keras - adding an attention layer


I am building an encoder-decoder architecture to do text summarization on restaurant reviews. I have been following this guide. My model has 3 LSTM layers for the the encoder and one LSTM layer for decoding. This is what it looks like right now:

latent_dim = 300

embedding_dim = 200

encoder_inputs = Input(shape=(max_review_len, ))

enc_emb = Embedding(x_voc, embedding_dim,
                    trainable=True)(encoder_inputs)

encoder_lstm1 = LSTM(latent_dim, return_sequences=True,
                     return_state=True, dropout=0.4,
                     recurrent_dropout=0.4)

encoder_lstm2 = LSTM(latent_dim, return_sequences=True,
                     return_state=True, dropout=0.4,
                     recurrent_dropout=0.4)
(encoder_output2, state_h2, state_c2) = encoder_lstm2(encoder_output1)

encoder_lstm3 = LSTM(latent_dim, return_state=True,
                     return_sequences=True, dropout=0.4,
                     recurrent_dropout=0.4)
(encoder_outputs, state_h, state_c) = encoder_lstm3(encoder_output2)

decoder_inputs = Input(shape=(None, ))

dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True,
                    return_state=True, dropout=0.4,
                    recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
    decoder_lstm(dec_emb, initial_state=[state_h, state_c])

decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])

decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.summary()

In order to improve the summarization results I would like to add an attention layer, ideally like this (as suggested by this guide):

Attention layer attn_layer = AttentionLayer(name='attention_layer') 
attn_out, attn_states = attn_layer([encoder_outputs, decoder_outputs]) 

decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])

The guide suggests doing the following import:

from attention import Attention

However this leads to an invalid syntax error on the declaration of the attn_layervariable. Even resolving the error leads to an ÀttentionLayer not defined error.

I've turned to the attention_keras module which seems to be just what I need but pip install attention_keras is unsuccesful:

ERROR: Could not find a version that satisfies the requirement attention_keras (from versions: none)

Solution

  • I solved the problem by using this import:

    from tensorflow.keras.layers import Attention
    

    The attention layer now takes the encoder and decoder outputs in order to create the desired attention distribution:

    attention = Attention()
    attention_outputs = attention([decoder_outputs, encoder_outputs])
    
    concatenate = Concatenate(axis=-1)
    decoder_concat_input = concatenate([decoder_outputs, attention_outputs])
    
    # Dense layer
    decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))
    decoder_outputs = decoder_dense(decoder_concat_input)
    

    I followed this guide