Search code examples
tensorflowkeraslstmmencoder

Start Token in LSTM Decoder


I understand the encoder-decoder model, and how the output of the encoder will be the input of the decoder. Assume here I have decoder model only, I have the decoder initial_state (i.e. decoder_states_inputs are given).

I want to give the "decoder_inputs" to be the start token (for example < start > )... but I don't know how and in what format?!

decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)    
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)

Also, must I add the start token to my original sequences? i.e.:

 <start> statemnt1
 <start> statemnt2
 ....

Solution

  • How to add <start> and <end> symbol really depends on how you implement the rest of the model but in most of the cases the results are the same. For example in the official tensorflow example, it adds these symbols to every sentence.

    def preprocess_sentence(w):
        # other preprocessing
    
        w = w.rstrip().strip()
    
        # adding a start and an end token to the sentence
        # so that the model know when to start and stop predicting.
        w = '<start> ' + w + ' <end>'
        return w
    
    # rest of the code
    # ... word2idx is a dictionary that map words into unique ids
    

    Then, in the tokenization part, <start> and <end> symbols map to 4 and 5 respectively. But, as you can see in the picture, it only feeds <start> in the decoder's input and <end> to the decoder's output. It means it our data is similar to:

    decoder_inp = raw_decoder_input[:, 0:-1]
    decoder_out = raw_decoder_input[:, 1:]
    

    enter image description here