I am using MOSI dataset for the multimodal sentiment analysis, where for now I am training the model for text dataset only. For text, I am using glove embeddings of 300 dimensions for processing text. My total vocab size is 2173 and my padded sequence length is 30. My target array is [0,0,0,0,0,0,1]
where left most is highly -ve and right most highly +ve.
I am splitting the dataset like this
X_train, X_test, y_train, y_test = train_test_split(WDatasetX, y7, test_size=0.20, random_state=42)
My tokenization process is
MAX_NB_WORDS = 3000
tokenizer = Tokenizer(num_words=MAX_NB_WORDS,oov_token = "OOV")
tokenizer.fit_on_texts(Text_X_Train)
tokenized_X_train = tokenizer.texts_to_sequences(Text_X_Train)
tokenized_X_test = tokenizer.texts_to_sequences(Text_X_Test)
My embedding matrix:
vocab_size = len(tokenizer.word_index)+1
emb_mean=0
def embedding_matrix_filteration():
all_embs = np.stack(list(embeddings_index.values()))
print(all_embs.shape)
emb_mean, emb_std = np.mean(all_embs), np.std(all_embs)
print(emb_mean)
embedding_matrix = np.random.normal(emb_mean, emb_std, (vocab_size, embed_dim)) gives the matrix of specified
size filled with values from gauss distribution
print(embedding_matrix.shape)
print("length of word2id:",len(word2id))
embeddedCount = 0
not_found = []
for word, idx in tokenizer.word_index.items():
embedding_vector = embeddings_index.get(word.lower())
if word == ' ':
embedding_vector = np.zeros_like(emb_mean)
if embedding_vector is not None:
embedding_matrix[idx] = embedding_vector
embeddedCount += 1
else:
print(word)
print("$$$")
print('total embedded:',embeddedCount,'common words')# words common between glove vector and wordset
print("length of word2id:",len(word2id))
print(len(embedding_matrix))
return embedding_matrix
emb = embedding_matrix_filteration()
Embedding Layer:
embedding_layer = Embedding(
vocab_size,
300,
weights=[emb],
trainable=False,
input_length=sequence_length
)
My model:
from keras import regularizers,layers
model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(256,return_sequences=True)))
model.add(Bidirectional(layers.LSTM(256)))#kernel_regularizer=regularizers.l2(0.001)
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))
For some reason when my training accuracy reached 80%, val. accuracy still remains very low. I have tried different regularization techniques, optimizers, loss functions, but the result is the same. I don't know why.
Please Help!!
Edit: The total no. of tokens are 2719 and the total no. of sentences (including test and train dataset ) are 2183.
Compiler: model.compile(optimizer='adam',
loss='mean-squred-error',
metrics=['accuracy']
)
I have decreased the label size from 7 to 3 i.e. [0,1,0] -> +ve, neutral ,-ve.
model = Sequential()
model.add(embedding_layer)
model.add(Bidirectional(layers.LSTM(16,activation='relu')))
model.add(Dropout(0.2))
model.add(Dense(3, activation='softmax'))
Compiled:
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.00005),
loss='categorical_crossentropy',
metrics = ['accuracy'])
But loss is still high and Also, I have stratified the dataset.
A couple of recommendations:
categorical_crossentropy
instead of mean_squared_error
, it can help you a lot when doing classification (although the latter could also work, the former also does it better).softmax
+ categorical_crossentropy
, otherwise (e.g. label appears like [1,0,0,0,0,0,1]
use sigmoid
+ binary_crossentropy
.Dropout()
. Use only one layer of LSTM.64
/128
would probably suffice).stratified split
(in this way, certain examples definitely appear both in the training set and in the validation set, also keeping a good proportion.0.0001
/0.00005
).Bear in mind that, in order to have a reasonable final result in your case, you need to employ a data-centric approach, rather than a model-centric one. Regardless of the possible improvements, if the data is scarce + not comprehensive, you will not be able to achieve great results.