I cant seem to make sense of the dataset provided by Keras' reuters dataset.
The set is loaded like so:
(x_train, y_train), (x_test, y_test) = reuters.load_data()
As far as I understand the "x" arrays are arrays of sequences (lists) of word indices from news stories and the "y" arrays are arrays of the topics of these sequences.
But when I try to translate the word indices of one of the sequences with the provided dictionary into actual words:
wordDict = {y:x for x,y in reuters.get_word_index().items()}
for index in x_train[0]:
print (wordDict.get(index))
The sequence seems to make no sense. How do I turn the sequences back into the original news?
Edit: found a similar thread here. Seems like there is a problem with the indices in the dictionary not matching the word indices in the dataset. But redownloading the data does not resolve the problem for me.
The default value for the load_data argument "index_from" lets the indices of actual word to >3.
One can reconstruct the texts by using wordDict.get(index - 3)
.