I am experimenting with LSTM using variable-length input due to this reason. I wanted to be sure that loss is calculated correctly under masking. So, I trained the below model that uses Masking
layer on padded sequences.
from tensorflow.keras.layers import LSTM, Masking, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import models, losses
import tensorflow as tf
import numpy as np
import os
"""
For generating reproducible results, set seed.
"""
def set_seed(seed):
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
"""
Set some right most indices to mask value like padding
"""
def create_padded_seq(num_samples, timesteps, num_feats, mask_value):
feats = np.random.random([num_samples, timesteps, num_feats]).astype(np.float32) # Generate samples
for i in range(0, num_samples):
rand_index = np.random.randint(low=2, high=timesteps, size=1)[0] # Apply padding
feats[i, rand_index:, 0] = mask_value
return feats
set_seed(42)
num_samples = 100
timesteps = 6
num_feats = 1
num_classes = 3
num_lstm_cells = 1
mask_value = -100
num_epochs = 5
X_train = create_padded_seq(num_samples, timesteps, num_feats, mask_value)
y_train = np.random.randint(num_classes, size=num_samples)
cat_y_train = to_categorical(y_train, num_classes)
masked_model = models.Sequential(name='masked')
masked_model.add(Masking(mask_value=mask_value, input_shape=(timesteps, num_feats)))
masked_model.add(LSTM(num_lstm_cells, return_sequences=False))
masked_model.add(Dense(num_classes, activation='relu'))
masked_model.compile(loss=losses.categorical_crossentropy, optimizer='adam', metrics=["accuracy"])
print(masked_model.summary())
masked_model.fit(X_train, cat_y_train, batch_size=1, epochs=5, verbose=True)
This is the verbose output,
Model: "masked"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
masking (Masking) (None, 6, 1) 0
_________________________________________________________________
lstm (LSTM) (None, 1) 12
_________________________________________________________________
dense (Dense) (None, 3) 6
=================================================================
Total params: 18
Trainable params: 18
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
100/100 [==============================] - 0s 2ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 2/5
100/100 [==============================] - 0s 2ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 3/5
100/100 [==============================] - 0s 2ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 4/5
100/100 [==============================] - 0s 2ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 5/5
100/100 [==============================] - 0s 2ms/step - loss: 10.6379 - accuracy: 0.3400
I also removed Masking
layer and trained another model on the same data to see the effect of masking, this is the model,
unmasked_model = models.Sequential(name='unmasked')
unmasked_model.add(LSTM(num_lstm_cells, return_sequences=False, input_shape=(timesteps, num_feats)))
unmasked_model.add(Dense(num_classes, activation='relu'))
unmasked_model.compile(loss=losses.categorical_crossentropy, optimizer='adam', metrics=["accuracy"])
print(unmasked_model.summary())
unmasked_model.fit(X_train, cat_y_train, batch_size=1, epochs=5, verbose=True)
And this is the verbose output,
Model: "unmasked"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 1) 12
_________________________________________________________________
dense (Dense) (None, 3) 6
=================================================================
Total params: 18
Trainable params: 18
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
100/100 [==============================] - 0s 1ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 2/5
100/100 [==============================] - 0s 2ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 3/5
100/100 [==============================] - 0s 1ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 4/5
100/100 [==============================] - 0s 1ms/step - loss: 10.6379 - accuracy: 0.3400
Epoch 5/5
100/100 [==============================] - 0s 1ms/step - loss: 10.6379 - accuracy: 0.3400
Losses are the same in both outputs, what is the reason for that ? It seems like Masking
layer has no effect on loss, is that correct ? If not, then how can I observe the effect of Masking
layer ?
In the case of a multi-classification task, the problem seems to be the last activation function...
If you change relu
with softmax
, your network can produce probabilities in the range [0,1]