Search code examples
deep-learningkeraslstmkeras-layer

How to input new text for prediction in keras while using an inbuilt dataset


I am going through the examples in keras and I ran the example for using an LSTM for classifying sentiment on the inbuilt imdb dataset (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py).

On inspecting the data, each review is depicted as an array of numbers which I assume is their index from a vocabulary built using this dataset.

My question however is how do I input a new piece of text(something that I make up) into this model to get a prediction? How would I get access to this vocabulary of words?

After that I could preprocess by input text into an array of numbers and feed it in. Thanks!


Solution

  • The dataset also makes available the word index used for encoding the sequences:

    word_index = reuters.get_word_index(path="reuters_word_index.pkl")

    It also returns a dictionary where key are words (str) and values are indexes (integer). eg. word_index["giraffe"] might return 1234.