python tensorflow machine-learning keras lstm

LSTM Auto Encoder, use first LSTM output as the target for the decoder

Having a sequence of 10 days of sensors events, and a true / false label, specifying if the sensor triggered an alert within the 10 days duration:

sensor_id	timestamp	feature_1	feature_2	10_days_alert_label
1	2020-12-20 01:00:34.565	0.23	0.1	1
1	2020-12-20 01:03:13.897	0.3	0.12	1
2	2020-12-20 01:00:34.565	0.13	0.4	0
2	2020-12-20 01:03:13.897	0.2	0.9	0

95% of the sensors do not trigger an alert, therefore the data is imbalanced. I was thinking of an autoEncoder model in order to detect the anomalies (Sensors that triggered an alarm). Since I'm not interested in decoding the entire sequence, just the LSTM learned context vector, I was thinking of something like the figure below, where the decoder is reconstructing the encoder output:

I've googled around and found this simple LSTM auto encoder example:

# lstm autoencoder recreate sequence
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import RepeatVector
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.utils import plot_model
# define input sequence
sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
# reshape input into [samples, timesteps, features]
n_in = len(sequence)
sequence = sequence.reshape((1, n_in, 1))
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_in,1)))
model.add(RepeatVector(n_in))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(sequence, sequence, epochs=300, verbose=0)
plot_model(model, show_shapes=True, to_file='reconstruct_lstm_autoencoder.png')
# demonstrate recreation
yhat = model.predict(sequence, verbose=0)
print(yhat[0,:,0])

I would like to modify the above example so the first LSTM output is used as the decoder target. Something like:

# lstm autoencoder recreate sequence
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import RepeatVector
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.utils import plot_model
# define input sequence
sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
# reshape input into [samples, timesteps, features]
n_in = len(sequence)
sequence = sequence.reshape((1, n_in, 1))
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_in,1)))

model.add(Dense(100, activation='relu')) # First LSTM output
model.add(Dense(32, activation='relu')) # Bottleneck 
model.add(Dense(100, activation='sigmoid')) # Decoded vector

model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(sequence, FIRST_LSTM_OUTPUT, epochs=300, verbose=0) # <--- ???

Q: Can I use the first LSTM output vector as a target?

Solution

You can do it using model.add_loss. In add_loss we specify the loss of our interest (in our case: mse) and set the layers used to compute it (in our case: the LSTM output and model predictions)

Below a dummy example:

n_sample, timesteps = 100, 9
X = np.random.uniform(0,1, (100, 9, 1))

def mse(enc_output, pred):
    return  tf.reduce_mean(tf.square(enc_output - pred))
    
inp = Input((timesteps,1,))
enc = LSTM(100, activation='relu')(inp)
x = Dense(100, activation='relu')(enc)
x = Dense(32, activation='relu')(x)
out = Dense(100, activation='sigmoid')(x)
model = Model(inp, out)

model.add_loss(mse(enc, out))
model.compile(optimizer='adam', loss=None)
model.fit(X, y=None, epochs=3)

Here the running code