Search code examples
pythontensorflowartificial-intelligencelstmseq2seq

using LSTMs Decoder without teacher forcing - Tensorflow


I'm trying to build a sequence to sequence model in Tensorflow , I have followed several tutorials and all is good. Untill I reached a point where I decided to remove the teacher forcing in my model . below is a sample of decoder network that I'm using :

def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                     target_sequence_length, max_summary_length, 
                     output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""

training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                    sequence_length=target_sequence_length,
                                                    time_major=False)

training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)

training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                            impute_finished=True,
                                                            maximum_iterations=max_summary_length)[0]
return training_decoder_output

As per my understanding the TrainingHelper is doing the teacher forcing. Especially that is it taking the true output as part of its arguments. I tried to use the decoder without training help but it appears to be mandatory. I tried to set the true output to 0 but apparently the output is needed by the TrainingHelper . I have also tried to google a solution but I did not find anything related .

===================Update=============

I apologize for not mentioning this earlier but I tried using GreedyEmbeddingHelper as well .The model runs fine a couple of iterations and then starts throwing a run time error . it appears that the GreedyEmbeddingHelper starts predicting output different that the expectected shape . Below is my function when using the GreedyEmbeddingHelper

def decoding_layer_train(encoder_state, dec_cell, dec_embeddings, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_summary_length: The length of the longest sequence in the batch
    :param output_layer: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing training logits and sample_id
    """

    start_tokens = tf.tile(tf.constant([target_vocab_to_int['<GO>']], dtype=tf.int32), [batch_size], name='start_tokens')


    training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings,
                                                                start_tokens,
                                                                target_vocab_to_int['<EOS>'])

    training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)

    training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                                impute_finished=True,
                                                                maximum_iterations=max_summary_length)[0]
    return training_decoder_output

this is a sample of the error that gets thrown after a coupe of training iterations :

    Ok

Epoch   0 Batch    5/91 - Train Accuracy: 0.4347, Validation Accuracy: 0.3557, Loss: 2.8656
++++Epoch   0 Batch    5/91 - Train WER: 1.0000, Validation WER: 1.0000

Epoch   0 Batch   10/91 - Train Accuracy: 0.4050, Validation Accuracy: 0.3864, Loss: 2.6347
++++Epoch   0 Batch   10/91 - Train WER: 1.0000, Validation WER: 1.0000

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-115-1d2a9495ad42> in <module>()
     57                  target_sequence_length: targets_lengths,
     58                  source_sequence_length: sources_lengths,
---> 59                  keep_prob: keep_probability})
     60 
     61 

/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1116     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1117       results = self._do_run(handle, final_targets, final_fetches,
-> 1118                              feed_dict_tensor, options, run_metadata)
   1119     else:
   1120       results = []

/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1313     if handle is None:
   1314       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1315                            options, run_metadata)
   1316     else:
   1317       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
   1332         except KeyError:
   1333           pass
-> 1334       raise type(e)(node_def, op, message)
   1335 
   1336   def _extend_graph(self):

InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [1100,78] and labels shape [1400]

I'm not sure but I guess the GreedyEmbeddingHepler should not be used for training. , I would appreciate your help and thoughts on how to stop the teacher forcing.

thank you.


Solution

  • There are different Helpers which all inherit from the same class. More information you can find in the documentation. As you said TrainingHelper requires predefined true inputs which are expected to be outputted from the decoder and this true inputs are fed as next steps (instead of feeding the output of a previous step). This approach (by some research) should speed up training of decoder.

    In your case, you are looking for GreedyEmbeddingHelper. Just replace it instead of TrainingHelper as:

    training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
        embedding=embedding,
        start_tokens=tf.tile([GO_SYMBOL], [batch_size]),
        end_token=END_SYMBOL)
    

    Just replace it with embedding tensor and variables which you use in your problem. This helper automatically takes the output of a step applies embedding and feed it as input to next steps. For the first step is used the start_token.

    The resulting output by using GreedyEmbeddingHelper doesn't have to match the length of expected output. You have to use padding to match their shapes. TensorFlow provides functiontf.pad(). Also tf.contrib.seq2seq.dynamic_decode returns tuple containing (final_outputs, final_state, final_sequence_lengths), so you can use value of final_sequece_lengths for padding.

    logits_pad = tf.pad(
        logits,
        [[0, tf.maximum(expected_length - tf.reduce_max(final_seq_lengths), 0)],
         [0, 0]],
        constant_values=PAD_VALUE,
        mode='CONSTANT')
    
    targets_pad = tf.pad(
        targets,
        [[0, tf.maximum(tf.reduce_max(final_seq_lengths) - expected_length, 0)]],
        constant_values=PAD_VALUE,
        mode='CONSTANT')
    

    You may have to change the padding a little bit depending on the shapes of your inputs. Also you don't have to pad the targets if you set the maximum_iterations parameter to match targets shape.