I see loss jumps in epoch graphs for my LSTM model. (SCREENSHOT attached) What does it tell me? (overfitting?, wrong loss functions, wrong activation Unnecessary layers? etc..)
# Model
time_input= Input(shape=(timesteps,data_dim)) #inputs dimension
DKLOC_input= Input(shape=(timesteps,data_dim))
LSTM_outs=LSTM(hidden_dim, unroll= True, return_sequences=True)(time_input)
MLP_inputs=concatenate([LSTM_outs,Feature_Data_input])
MLP_outs= Dense(MLP_hidden_dim, activation='relu')(MLP_inputs)
MLP_outs= Dense(MLP_hidden_dim, activation='relu')(MLP_outs)
outs= Dense(data_dim, activation="linear")(MLP_outs)
#optimizaiton # compile #fit
model = Model(inputs=[time_input,Feature_Data_input], outputs=[outs])
model.compile(loss='mse', optimizer="adam", metrics=['mse', 'mae', 'mape', 'cosine'])
history = model.fit(x=[input_data, Feature_Data_train] , y= truth, batch_size=1, epochs=1000, verbose=2)
Epoch 999/1000 - 0s - loss: 0.0132 - mean_squared_error: 0.0132 - mean_absolute_error: 0.0619 - mean_absolute_percentage_error: 45287253.3333 - cosine_proximity: -6.5984e-01 Epoch 1000/1000 - 0s - loss: 0.0132 - mean_squared_error: 0.0132 - mean_absolute_error: 0.0618 - mean_absolute_percentage_error: 45145968.0000 - cosine_proximity: -6.5985e-01
I would start by using a batch_size greater than 1. You want the optimizer to consider multiple data points and not just a single sample at the time. Hopefully, your data samples are different so you want the optimization to consider a set of average values.