Search code examples
kerasdeep-learningneural-networklstmattention-model

LSTM +Attetion performance decreases


Parden the beginner question. I tried following a simple code tutorial on Kaggle:

https://www.kaggle.com/code/haithemhermessi/attention-mechanism-keras-as-simple-as-possible#About-77%-of-accuracy-has-been-observed-with-basic-LSTM-and-without-Attention.-Lets-now-implement-an-Attetion-layer-and-add-it-to-the-LSTM-Based-model 

to basically compare the performance of a pure LSTMmodel with a LSTM+Attention on, on a text sentiment analysis scenario.

I borrowed mostly all the code from the link, with two small modifications. Firstly,I've changed the dataset to be the IMDB Dataset of 50K Movie Reviews on Kaggle, with the label changed to 0 and 1s. Secondly, I've split the dataset into training and testing set instead of putting all data for training in the original tutorial.

X_train, X_test, Y_train, Y_test =  train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=0)

A code segement for building the LSTM+Attention model is:

inputs=Input(shape=(text_pad.shape[1],))
x=Embedding(input_dim=vocab_lenght+1,output_dim=32,\
         input_length=text_pad.shape[1],embeddings_regularizer=tf.keras.regularizers.l2(.001)) 
        (inputs)
x1=LSTM(100,return_sequences=True,dropout=0.3,recurrent_dropout=0.2)(x)
atte_layer=attention()(x1)
outputs=Dense(1,activation='sigmoid',trainable=True)(atte_layer)
model=Model(inputs,outputs)

The Attention class is the same as in the code tutorial.

However, the performance of the LSTM+Attention is not improved. The orginal LSTM training loss/accuracy is plotted as:

LSTM

Accuracy on the validation set = 0.7989

Whereas the LSTM+Attetion model has:

LSTM+Attention

Accuracy on the validation set = 0.7878

I tried increasing epochs but not very helpful. The performance of the LSTM+Attention model is slightly below of the pure LSTM model(not huge decrease).

I am seeking some advice on where to debug the original code from the Kaggle tutorial above. Thanks.


Solution

  • Who knows, maybe that guy faked it.
    Normally when I use attention I concatenate the result with the previous output, in this way you don't lose any information. You can try it.
    For example:

    ...
    x = layers.Concatenate()([atte_layer, x1])
    outputs=Dense(1,activation='sigmoid',trainable=True)(x)
    

    Just look at the example of the doc:
    https://keras.io/api/layers/attention_layers/attention/