Here is my understanding of a basic Sequence to Sequence LSTMs. Suppose we are tackling a question-answer setting.
You have two set of LSTMs (green and blue below). Each set respectively sharing weights (i.e. each of the 4 green cells have the same weights and similarly with the blue cells). The first is a many to one LSTM, which summarises the question at the last hidden layer/ cell memory.
The second set (blue) is a Many to Many LSTM which has different weights to the first set of LSTMs. The input is simply the answer sentence while the output is the same sentence shifted by one.
The question is two fold: 1. Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory. 2. Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?
- Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory.
Both hidden state h
and cell memory c
are passed to the decoder.
In seq2seq source code, you can find the following code in basic_rnn_seq2seq()
:
_, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
return rnn_decoder(decoder_inputs, enc_state, cell)
If you use an LSTMCell
, the returned enc_state
from the encoder will be a tuple (c, h)
. As you can see, the tuple is passed directly to the decoder.
In Keras, the "state" defined for an LSTMCell
is also a tuple (h, c)
(note that the order is different from TF). In LSTMCell.call()
, you can find:
h_tm1 = states[0]
c_tm1 = states[1]
To get the states returned from an LSTM
layer, you can specify return_state=True
. The returned value is a tuple (o, h, c)
. The tensor o
is the output of this layer, which will be equal to h
unless you specify return_sequences=True
.
- Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?
###TensorFlow###
Just provide the initial state to an LSTMCell
when calling it. For example, in the official RNN tutorial:
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
...
output, state = lstm(current_batch_of_words, state)
There's also an initial_state
argument for functions such as tf.nn.static_rnn
. If you use the seq2seq module, provide the states to rnn_decoder
as have been shown in the code for question 1.
###Keras###
Use the keyword argument initial_state
in the LSTM function call.
out = LSTM(32)(input_tensor, initial_state=(h, c))
You can actually find this usage on the official documentation:
###Note on specifying the initial state of RNNs###
You can specify the initial state of RNN layers symbolically by calling them with the keyword argument
initial_state
. The value ofinitial_state
should be a tensor or list of tensors representing the initial state of the RNN layer.
EDIT:
There's now an example script in Keras (lstm_seq2seq.py) showing how to implement basic seq2seq in Keras. How to make prediction after training a seq2seq model is also covered in this script.