I am currently working with an LSTM sequence to sequence model for time domain signal predictions. Because of domain knowledge, I know that the first part of the prediction (about 20%) can never be predicted correctly, since the information required is not available in the given input sequence. The remaining 80% of the predicted sequence are usually predicted quite well. In order to exclude the first 20% from the training optimization, it would be nice to define a loss function that would basically operate on a given index range like the numpy code below:
start = int(0.2*sequence_length)
stop = sequence_length
def mse(pred, target):
""" Mean squared error between two time series np.arrays """
return 1/target.shape[0]*np.sum((pred-target)**2)
def range_mse_loss(y_pred, y):
return mse(y_pred[start:stop],y[start:stop])
How do I have to write this loss function in order to have it work with my preexisting keras code, where loss is simply given by model.compile(loss='mse')
?
You can slice your tensor to just last 80% of the data.
size = int(y_true.shape[0] * 0.8) # for 2D vector, e.g., (100, 1)
loss_fn = tf.keras.losses.MeanSquaredError(name='mse')
loss_fn(y_pred[:-size], y_true[:-size])
You can also use the sample_weights at the tf.keras.losses.MeanSquaredError(), passing an array of weights and the first 20% of weights is zero
size = int(y_true.shape[0] * 0.8) # for 2D vector, e.g., (100, 1)
zeros = tf.zeros((y_true.shape[0] - size), dtype=tf.int32)
ones = tf.ones((size), dtype=tf.int32)
weights = tf.concat([zeros, ones], 0)
loss_fn = tf.keras.losses.MeanSquaredError(name='mse')
loss_fn(y_pred, y_true, sample_weights=weights)
There is a warming of the second solution, the final loss will be lower than the first solution, because you are putting zero in the first predictions values, but you aren't removing them in the formula MSE = 1 /n * sum((y-y_hat)^2).