keras deep-learning nlp lstm sentiment-analysis

Underfitting Pre-Trained Glove + LSTM Model: Accurcacy Unchanged

I am doing a sentiment classification using Pre-Trained Glove and LSTM model. I use google play review and scrap it by myself, resulting in 50k++ texts. I implement random over sampling on the minority classes.

However, when I train my LSTM model, the training accuracy is remain unchanged after several epoch, need insight how to fix the issue.

This is several information about the dataset:

Embedding size: (41151, 100)

Maximum sequence length: 731

Label distribution before random over sampling: {'positive': 58749, 'negative': 26643, 'neutral': 9106}

Label distribution after random over sampling: ('positive': 58749, 'negative': 26643, 'neutral': 9106}

Total x training set (padded): (140997, 200)

Total x validation set (padded): (17625, 200)

Total x testing set (padded): (17625, 200)

Total y training set (one hot): (140997, 3)

Total y validation set (one hot): (17625, 3)

Total y testing set (one hot): (17625, 2003

This is my full code: enter link description here

This is my highlight code for this issue:

lstm_model = Sequential()
lstm_model.add(Input(shape=(max_len,)))
lstm_model.add(Embedding(input_dim=total_vocab, output_dim=embedding_dim, weights=[embedding_matrix], trainable=False))
lstm_model.add(LSTM(256, return_sequences=True))
lstm_model.add(LSTM(128, return_sequences=True))
lstm_model.add(LSTM(64))
lstm_model.add(Dense(128, activation='relu'))
lstm_model.add(Dense(units=3, activation='softmax'))

lstm_model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['accuracy'])

lstm_model.summary()

Solution

Based on extra information in the comments, I'm going to say the reason the LSTM model hits a wall at an (unspecified) lower accuracy than the 85% you are trying to reach is because it is not the best type of model for the problem. In which case tweaking parameters is likely to be wasted effort.

I'm fairly sure encoder transformers (e.g. BERT) surpassed them in sentiment analysis benchmarks a number of years back (but sorry, a quick search couldn't find a killer reference to insert here), and transformers have only got bigger and better since then.

Extra thought: building on top of GloVe embeddings presents you with the problem that they don't handle multiple meanings of the word. So "queen" might be a female king (as in embedding's party trick: king - male + female = queen) or it might be a pop group, or it might be a gay man, or it might be a chess piece. This is going to put a limit on the accuracy of models built on them, whereas transformers don't have that limitation because they look at the whole string to see the words in context. (It is possible to argue with that, of course, because bringing in the context is where the LSTM comes in. But transformers are still scaling strongly with 20+ layers, whereas LSTMs tend to choke after two layers.)