word prediction with rnn using word2vec

I'm trying to predict word with recurrent neural network. I'm training network by putting independently pre-trained word2vec of words as input.

And I wonder if I can use word2vec of target word to calculate error cost. It seems not working and I've never seen such examples or papers. Is it possible to use word2vec as a target value for calculating error cost? If so, what kind of cost function should I use? If not, please explain the reason mathematically.

And how should I set input and target? Now I'm using architecture like below :

input : word1, word2, word3, target : word4
input : word1, word2, word3, word4, target : word5

Maybe I can use another option like :

input : word1, word2 target : word2, word3
input : word1, word2, word3, target : word2, word3, word4

Which one is better? Or is there another option?

If there's any reference let me know.

Solution

The prediction is usually made through an output softmax layer that gives the probabilities for all words in the vocabulary.

However a recent paper suggests tying the input word vectors with the output word classifiers and training them end-to-end. This significantly reduces the number of parameters. https://arxiv.org/abs/1611.01462

With regards to architectures, atleast for training I would prefer the second option since the first one loses information about the second and third word that can also be used for training.