I am testing my LSTM Encoder-Decoder architecture with a simple task: to recognise vowels in random character sequences. My tsv data looks like this:
molteyhpr 010011000
dlkz 0000
fabgovmgg 010010000
qgvowdykl 000100100
kgncpiot 00000110
pisvdf 010000
I've generated 100K samples of it.
My model code: (slightly modified version of keras example)
self.latent_dim = 256
enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
enc_lstm_layer = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)
# We discard 'enc_outputs' and only keep the states.
enc_states = [state_h, state_c]
# Set up the decoder, using 'enc_states' as initial state.
dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)
dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
dec_outputs = dec_dense_layer(dec_outputs)
# Define the model that will turn
# 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'
model = Model([enc_input_layer, dec_input_layer], dec_outputs)
All data is turned into equal length one-hot representations. This is how it is generated:
def _generator(self, enc_data, dec_data, is_training):
enc_oh_input_batch = None
dec_oh_input_batch = None
dec_oh_output_batch = None
enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]
current_idx = 0
samples_len = len(enc_data)
while True:
# Create zero batch arrays
enc_oh_input_batch = np.zeros(
(self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
dec_oh_input_batch = np.zeros(
(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
dec_oh_output_batch = np.zeros(
(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
# Compile batch
for i in range(self.batch_size):
# when we get to the end of samples - start over
if i + current_idx >= samples_len:
current_idx = 0
if is_training:
self.epoch += 1
tokens_in = enc_data[i + current_idx]
tokens_out = dec_data[i + current_idx]
# vectorize encoder input
for t, token in enumerate(tokens_in):
enc_oh_input_batch[i, t, token] = 1
enc_oh_input_batch[i, t + 1:, enc_space_token] = 1
# vectorize decoder input and output
for t, token in enumerate(tokens_out):
dec_oh_input_batch[i, t, token] = 1
if t > 0:
# self.dec_oh_output will be ahead by one timestep
# and will not include the start character.
dec_oh_output_batch[i, t - 1, token] = 1
dec_oh_input_batch[i, t + 1:, dec_space_token] = 1
current_idx += self.batch_size
yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]
I train it like so:
h = self.model.fit(self.source.train_generator(),
batch_size = self.conf.batch_size,
epochs = self.conf.epochs,
initial_epoch = self.source.epoch,
steps_per_epoch = batches_per_epoch,
validation_steps = batches_per_epoch,
validation_data = self.source.validation_generator(),
validation_freq = self.conf.validation_freq
)
With these settings:
epochs = 10
validation_freq = 10
validation_split = 0.2
batch_size = 30
loss = 'categorical_crossentropy'
metrics = ['accuracy']
optimizer = {
'name' : 'Adam',
'learning_rate' : 0.0001,
}
I tried playing around with learning rate, batch size, different optimizer kinds, but no matter what training gets stuck:
Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
Epoch 10/10
33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630
What am I doing wrong?
In the original character-by-character translation task, the decoder input and target data are shifted by one time step because the decoder needs to predict the next character based on the current and past characters.
However, in your task, the goal is to map each character in the input directly to a character in the output. So, there's no need to shift the target data.
I have changed the for-loop where the encoder_input_data and decoder_target_data are pre-processed.
Try using this:
for t, char in enumerate(target_text):
decoder_input_data[i, t, target_token_index[char]] = 1.
decoder_target_data[i, t, target_token_index[char]] = 1.