Search code examples
tensorflowlstmmachine-translationseq2seq

The role of initial state of lstm layer in seq2seq encoder


I am trying to follow this guide to implement a seq2seq machine tranlsation model: https://www.tensorflow.org/tutorials/text/nmt_with_attention

The tutorial's Encoder has an initialize_hidden_state() function that is used to generate all 0 as initial state for the encoder. However I am a bit confused as to why this is neccessary. As far as I can tell, the only times when encoder is called (in train_step and evaluate), they were initialized with the initialize_hidden_state() function. My questions are 1.) what is the purpose of this initial state? Doesn't Keras layer automatically initialize LSTM states to begin with? And 2.) why not always just initialize the encoder with all 0 hidden states if encoder is always called with initial states generated by initialize_hidden_state()?


Solution

  • you are totally right. The code in the example is a little misleading. The LSTM cells are automatically initialized with zeros. You can just delete the initialize_hidden_state() function.