Search code examples
deep-learningseq2seqencoder-decoder

why the context vector is not passed to every input of the decoder


enter image description here

In this model, in the encoder part, we give an input sentence with 3 words A, B, and c, and we get a context vector W, which is passed to the decoder. why don't we pass W to all the cells of the decoder instead of the output of the previous cell, e.g. (W is passed first, then X in the next cell and Y for the next cell)

Can someone explain what exactly is going on in the cell state of the decoder? what is happening to the cell state of the encoder that is passed to the decoder


Solution

  • This is a vanilla encoder-decoder model without attention, there is no context vector, which is how the output of the attention mechanism is called.

    After reading the sentence ABC, the LSTM state should contain the information about the entire input sequence, so we can start decoding. As a first word, we decode the word W and feed it as input in the next step, where we decoder the word X and so on. The LSTM is not fed with a context vector, but with an embedding of the respective word.

    The decoder must always get the previous word, simply because it does not know what word was decoded in the previous step. The LSTM state is projected to the vocabulary size and we have a distribution over all possible words and any word from the distribution can be sampled and put on the input in the next step.