I am a newbie in implementation of language models in Keras RNN structures. I have a dataset of discrete words (not from a single paragraph) that have the following statistics,
Now, I want to build a model that will accept a character and predict the next character in the word. I have padded all the words so that they have same length. So my input is Word_input with shape 1953 x 9 and target is 1953 x 9 x 33. I also want to use Embedding layer. So my network architecture is,
self.wordmodel=Sequential()
self.wordmodel.add(Embedding(33,embedding_size,input_length=9))
self.wordmodel.add(LSTM(128, return_sequences=True))
self.wordmodel.add(TimeDistributed(Dense(33)))
self.wordmodel.compile(loss='mse',optimizer='rmsprop',metrics=['accuracy'])
As an example a word "CAT" with padding represents
Input to Network -- START C A T END * * * * (9 Characters)
Target of the same --- C A T END * * * * *(9 Characters)
So with the TimeDistributed
output I am measuring the difference of network prediction and target. I have also set the batch_size
to 1. So that after reading every sample word the network reset its state.
My question is am I doing it conceptually right? Whenever I am running my training the accuracy is stuck about 56%.
Kindly enlighten me. Thanks.
In my knowledge, the structure is basic and may work to some degree. I have some suggestions
In the TimeDistributed
layer, you should add an activation
function softmax
which is wide employed in multi-classification
.
And now in your structure, the output is non-limited and it's not
intuitive as your target is just one-hot.
With softmax
function, you could change the loss to
cross-entropy
which increase the probability of correct class and
decrease the others. It's more appropriate.
you can take a try. For more useful model, you could try following structure which is given in Pytorch tutorial. Thanks.