I'm doing sentiment classification using LSTM with Keras and I want to obtain the probability that the LSTM assigns to each word of a sentence in order to know which words are more representatives.
For example, for the following sentence:
"This landscape is wonderful and calming"
I consider that the most representative words for classifying the sentence into positive are "wonderful" and "calming" words.
How can I obtain the probability that LSTM assigns to each word?
lstm_layer = layers.LSTM(size)(embedding_layer)
output_layer1 = layers.Dense(50, activation=activation)(lstm_layer)
output_layer1 = layers.Dropout(0.25)(output_layer1)
output_layer2 = layers.Dense(1, activation="sigmoid")(output_layer1)
model = models.Model(inputs=input_layer, outputs=output_layer2)
model.compile(optimizer=optimizer, loss='binary_crossentropy')
Thanks
You can get the probabilities from the final layer (dense layer with softmax). Example model:
import keras
import keras.layers as L
# instantiate sequential model
model = keras.models.Sequential()
# define input layer
model.add(L.InputLayer([None], dtype='int32'))
# define embedding layer for dictionary size of 'len(all_words)' and 50 features/units
model.add(L.Embedding(len(all_words), 50))
# define fully-connected RNN with 64 output units. Crucially: we return the outputs of the RNN for every time step instead of just the last time step
model.add(L.SimpleRNN(64, return_sequences=True))
# define dense layer of 'len(all_words)' outputs and softmax activation
# this will produce a vector of size len(all_words)
stepwise_dense = L.Dense(len(all_words), activation='softmax')
# The TimeDistributed layer adds a time dimension to the Dense layer so that it applies across the time dimension for every batch
# That is, TimeDistributed applies the Dense layer to each time-step (input word) independently. Without it, the Dense layer would apply only once to all of the time-steps concatenated.
# So, for the given time step (input word), each element 'i' in the output vector is the probability of the ith word from the target dictionary
stepwise_dense = L.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)
Then, compile and fit (train) your model:
model.compile('adam','categorical_crossentropy')
model.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
callbacks=[EvaluateAccuracy()], epochs=5,)
Finally- just use the predict function to get the probabilities:
model.predict(input_to_your_network)
And just to be clear, the ith output unit of the softmax layer represents the predicted probability of the ith class (also see here).