Search code examples
python-3.xtensorflowkerasgoogle-colaboratorylstm

How to use my pretrained LSTM saved model to make new classifications


I have a simple pretrained LSTM model builded with Keras and Tensorflow, I trained, compiled and fitted it, and make a test prediction with a simple sentence, and it works, then I saved my model using model.save(sentanalysis.h5 and everything OK. Then, I loaded this model with model.load_model(), and it loads without error, but when I tried model.predict() I got an array with floats that doesn't shows anything related to the classes:

How can I use my pretrained model to make new classifications? The dataset I use to train it is very simple, a csv with text and sentiment columns, nothing else. Can you help me? This is the code of the model:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import nlp
import random
from keras.preprocessing.text import Tokenizer
from keras_preprocessing.sequence import pad_sequences

dataset = nlp.load_dataset('csv', data_files={'train':'/content/drive/MyDrive/Proyect/BehaviorClassifier/tass2019_pe_train_final.csv',
                                              'test': '/content/drive/MyDrive/Proyect/BehaviorClassifier/tass2019_pe_test_final.csv',
                                              'validation': '/content/drive/MyDrive/Proyect/BehaviorClassifier/tass2019_pe_val_final.csv'})
train = dataset['train']
val = dataset['validation']
test = dataset['test']

def get_tweet(data):
    tweets = [x['Text'] for x in data]
    labels = [x['behavior'] for x in data]
    return tweets, labels

tweets, labels = get_tweet(train)

tokenizer = Tokenizer(num_words=10000, oov_token='<UNK>')
tokenizer.fit_on_texts(tweets)

maxlen = 140

def get_sequences(tokenizer, tweets):
    sequences = tokenizer.texts_to_sequences(tweets)
    padded = pad_sequences(sequences, truncating='post', padding='post', maxlen=maxlen)
    return padded

padded_train_seq = get_sequences(tokenizer, tweets)

classes = set(labels)
class_to_index = dict((c, i) for i, c in enumerate(classes))
index_to_class = dict((v, k) for k, v in class_to_index.items())
names_to_ids = lambda labels: np.array([class_to_index.get(x) for x in labels])
train_labels = names_to_ids(labels)

model = tf.keras.models.Sequential([
    tf.keras.layers.Embedding(10000, 16, input_length=maxlen),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(20, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(20)),
    tf.keras.layers.Dense(6, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

val_tweets, val_labels = get_tweet(val)
val_seq = get_sequences(tokenizer, val_tweets)
val_labels= names_to_ids(val_labels)
h = model.fit(
     padded_train_seq, train_labels,
     validation_data=(val_seq, val_labels),
     epochs=8#,
     #callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2)]
)

test_tweets, test_labels=get_tweet(test)
test_seq = get_sequences(tokenizer, test_tweets)
test_labels=names_to_ids(test_labels)
model.evaluate(test_seq, test_labels)

# This code works when I loaded the previos code
sentence = 'I am very happy now'
sequence = tokenizer.texts_to_sequences([sentence])
paddedSequence = pad_sequences(sequence, truncating = 'post', padding='post', maxlen=maxlen)
p = model.predict(np.expand_dims(paddedSequence[0], axis=0))[0]
pred_class=index_to_class[np.argmax(p).astype('uint8')]
print('Sentence: ', sentence)
print('Sentiment: ', pred_class)

And this is how I save and load my model withouth loading previous code:

model.save('/content/drive/MyDrive/Proyect/BehaviorClassifier/twitterBehaviorClassifier.h5')
model = keras.models.load_model('/content/drive/MyDrive/Proyect/BehaviorClassifier/twitterBehaviorClassifier.h5')

#### ISSUE HERE
new = ["I am very happy"]
tokenizer = Tokenizer(num_words=10000, oov_token='<UNK>')
tokenizer.fit_on_texts(new)
seq = tokenizer.texts_to_sequences(new)
padded = pad_sequences(seq, maxlen=140)
pred = model.predict(padded)

And I get this:

1/1 [==============================] - 0s 29ms/step
[[7.0648360e-01 1.1568426e-01 1.7581969e-01 7.2872970e-04 4.2903548e-04
  8.5460022e-04]]

I've reading some doc, but nothing helped me.


Solution

  • So, from your model code, you have the following:

    tf.keras.layers.Dense(6, activation='softmax')

    Presumably, you have 6 different sentiment classes. The output you are seeing from your model.predict() are the probabilities that the input belongs to the corresponding class, i.e. 70.6% chance that the sentiment is class 0, 11.5% that the sentiment is class 1, 17.5% that the sentiment is class 2, etc.

    So what is typically done to postprocess these results is take the largest probability as the prediction using np.argmax(pred), which in the case you posted should give you 0, which then can be interpreted as your model believes your tweet is 70.6% likely to belong to class zero.