Search code examples
nlprecurrent-neural-networkword2vec

word prediction with rnn using word2vec


I'm trying to predict word with recurrent neural network. I'm training network by putting independently pre-trained word2vec of words as input.

And I wonder if I can use word2vec of target word to calculate error cost. It seems not working and I've never seen such examples or papers. Is it possible to use word2vec as a target value for calculating error cost? If so, what kind of cost function should I use? If not, please explain the reason mathematically.

And how should I set input and target? Now I'm using architecture like below :

input : word1, word2, word3, target : word4
input : word1, word2, word3, word4, target : word5

Maybe I can use another option like :

input : word1, word2 target : word2, word3
input : word1, word2, word3, target : word2, word3, word4

Which one is better? Or is there another option?

If there's any reference let me know.


Solution

  • The prediction is usually made through an output softmax layer that gives the probabilities for all words in the vocabulary.

    However a recent paper suggests tying the input word vectors with the output word classifiers and training them end-to-end. This significantly reduces the number of parameters. https://arxiv.org/abs/1611.01462

    With regards to architectures, atleast for training I would prefer the second option since the first one loses information about the second and third word that can also be used for training.