I am going through the examples in keras and I ran the example for using an LSTM for classifying sentiment on the inbuilt imdb dataset (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py).
On inspecting the data, each review is depicted as an array of numbers which I assume is their index from a vocabulary built using this dataset.
My question however is how do I input a new piece of text(something that I make up) into this model to get a prediction? How would I get access to this vocabulary of words?
After that I could preprocess by input text into an array of numbers and feed it in. Thanks!
The dataset also makes available the word index used for encoding the sequences:
word_index = reuters.get_word_index(path="reuters_word_index.pkl")
It also returns a dictionary where key are words (str) and values are indexes (integer). eg. word_index["giraffe"]
might return 1234.