Search code examples
keraslstmrecurrent-neural-network

LSTM Encoder-Decoder stuck in plateau and not learning


I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:

molteyhpr   010011000
dlkz        0000
fabgovmgg   010010000
qgvowdykl   000100100
kgncpiot    00000110
pisvdf      010000

I've generated 100K samples of it.

My model code: (slightly modified version of keras example)

        self.latent_dim = 256

        enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
        enc_lstm_layer  = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
        enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)

        # We discard 'enc_outputs' and only keep the states.
        enc_states = [state_h, state_c]

        # Set up the decoder, using 'enc_states' as initial state.
        dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))

        # We set up our decoder to return full output sequences,
        # and to return internal states as well. We don't use the
        # return states in the training model, but we will use them in inference.
        dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)

        dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
        dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
        dec_outputs = dec_dense_layer(dec_outputs)

        # Define the model that will turn
        # 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'

        model = Model([enc_input_layer, dec_input_layer], dec_outputs)

All data is turned into equal length one-hot representations. This is how it is generated:

    def _generator(self, enc_data, dec_data, is_training):
        enc_oh_input_batch  = None
        dec_oh_input_batch  = None
        dec_oh_output_batch = None

        enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
        dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]

        current_idx = 0
        samples_len = len(enc_data)

        while True:
            # Create zero batch arrays
            enc_oh_input_batch = np.zeros(
                (self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
            dec_oh_input_batch = np.zeros(
                (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
            dec_oh_output_batch = np.zeros(
                (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')

            # Compile batch
            for i in range(self.batch_size):
                # when we get to the end of samples - start over
                if i + current_idx >= samples_len:
                    current_idx = 0
                    if is_training:
                        self.epoch += 1

                tokens_in  = enc_data[i + current_idx]
                tokens_out = dec_data[i + current_idx]

                # vectorize encoder input
                for t, token in enumerate(tokens_in):
                    enc_oh_input_batch[i, t, token] = 1
                enc_oh_input_batch[i, t + 1:, enc_space_token] = 1

                # vectorize decoder input and output
                for t, token in enumerate(tokens_out):
                    dec_oh_input_batch[i, t, token] = 1
                    if t > 0:
                        # self.dec_oh_output will be ahead by one timestep
                        # and will not include the start character.
                        dec_oh_output_batch[i, t - 1, token] = 1
                    dec_oh_input_batch[i, t + 1:, dec_space_token] = 1

            current_idx += self.batch_size

            yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]

I train it like so:

        h = self.model.fit(self.source.train_generator(),
            batch_size       = self.conf.batch_size,
            epochs           = self.conf.epochs,
            initial_epoch    = self.source.epoch,
            steps_per_epoch  = batches_per_epoch,
            validation_steps = batches_per_epoch,
            validation_data  = self.source.validation_generator(),
            validation_freq  = self.conf.validation_freq
        )

With these settings:

epochs           = 10
validation_freq  = 10
validation_split = 0.2
batch_size       = 30
loss             = 'categorical_crossentropy'
metrics          = ['accuracy']
optimizer = {
    'name'          : 'Adam',
    'learning_rate' : 0.0001,
}

I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:

Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
Epoch 10/10
33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630

What am I doing wrong?


Solution

  • In the original character-by-character translation task, the decoder input and target data are shifted by one time step because the decoder needs to predict the next character based on the current and past characters.

    However, in your task, the goal is to map each character in the input directly to a character in the output. So, there's no need to shift the target data.

    I have changed the for-loop where the encoder_input_data and decoder_target_data are pre-processed.

    Try using this:

    for t, char in enumerate(target_text):
       decoder_input_data[i, t, target_token_index[char]] = 1.
       decoder_target_data[i, t, target_token_index[char]] = 1.