I am building a toy encoder-decoder model for machine translation by using Tensorflow.
I use Tensorflow 1.8.0 cpu version. FastText pretrained word vector of 300 dimension is used in the embedding layer. Then the batch of training data goes through encoder and decoder with attention mechanism. In training stage decoder uses the TrainHelper and in inference stage GreedyEmbeddingHelper is used.
I already ran the model successfully by using a bidirectional LSTM encoder. However when I try to further improve my model by using multilayer LSTM, the bug arises. The code to build the training stage model is below:
def BuildTrainModel(train_iterator):
((source, source_lengths), (target, target_lengths)) = train_iterator.get_next()
encoder_inputs = tf.transpose(source, [1,0]) # to time major
decoder_inputs = tf.transpose(target, [1,0])
decoder_outputs = tf.pad(decoder_inputs[1:], tf.constant([[0,1],[0,0]]), constant_values=tar_eos_id)
embedding_encoder = tf.Variable(embedding_matrix_src, name='embedding_encoder')
embedding_decoder = tf.Variable(embedding_matrix_tar, name='embedding_decoder')
# Embedding layer
encoder_emb_inp = tf.nn.embedding_lookup(embedding_encoder, encoder_inputs)
decoder_emb_inp = tf.nn.embedding_lookup(embedding_decoder, decoder_inputs)
# Encoder
# Construct forward and backward cells
forward_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
backward_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
encoder_outputs, encoder_states_fw, encoder_states_bw = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
[forward_cell] * num_layers, [backward_cell] * num_layers, encoder_emb_inp, dtype=tf.float64,
sequence_length=source_lengths, time_major=True)
Here I just show the encoder part. For the full code and hyper parameter, please see my github:https://github.com/nkjsy/Neural-Machine-Translation/blob/master/nmt3.ipynb
The error message is:
InvalidArgumentError: Dimensions must be equal, but are 96 and 332 for 'stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/fw/while/basic_lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [?,96], [332,128].
I tried to set the input as [forward_cell] and [backward_cell] and no problem, which means only 1 layer as I did before. Once I add more layers, the issue arises.
Use the following method to define a list of cell instances,
forward_cell = [tf.contrib.rnn.BasicLSTMCell(num_units),tf.contrib.rnn.BasicLSTMCell(num_units)]
You can see the difference when you print two lists,
num_units =128
num_layers =2
#Method1
forward_cell = [tf.contrib.rnn.BasicLSTMCell( num_units),tf.contrib.rnn.BasicLSTMCell(num_units)]
print(forward_cell)
#Method2
forward_cell = [tf.contrib.rnn.BasicLSTMCell(num_units)]*num_layers
print(forward_cell)
Above code snippet prints something similar to following,
[<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x00000087798E6EF0>, <tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x0000008709AE72E8>]
[<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x0000008709AFDC50>, <tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x0000008709AFDC50>]
As you could see #Method2
outputs a list of same cell instance, which is not expected.
Hope this helps.