python keras time-series lstm tensorflow2.0

Keras LSTM model overfitting

I am using an LSTM model in Keras. During the fitting stage, I added the validation_data paramater. When I plot my training vs validation loss, it seems there are major overfitting issues. My validation loss just won't decrease.

My full data is a sequence with shape [50,]. The first 20 records are used as training and the remaining used for the test data.

I have tried adding dropout and reducing the model complexity as much as I can and still no luck.

# transform data to be stationary
raw_values = series.values
diff_values = difference_series(raw_values, 1)

# transform data to be supervised learning
# using a sliding window
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values

# split data into train and test-sets
train, test = supervised_values[:20], supervised_values[20:]

# transform the scale of the data
# scale function uses MinMaxScaler(feature_range=(-1,1)) and fit via training set and is applied to both train and test.
scaler, train_scaled, test_scaled = scale(train, test)

batch_size = 1
nb_epoch = 1000
neurons = 1
X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
testX, testY = test_scaled[:, 0:-1].reshape(-1,1,1), test_scaled[:, -1]
model = Sequential()
model.add(LSTM(units=neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]),
              stateful=True))
model.add(Dropout(0.1))
model.add(Dense(1, activation="linear"))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(X, y, epochs=nb_epoch, batch_size=batch_size, verbose=0, shuffle=False,
                    validation_data=(testX, testY))

This what it looks like when changing the amount of neurons. I even tried using Keras Tuner (hyperband) to find the optimal parameters.

def fit_model(hp):
  batch_size = 1
  model = Sequential()
  model.add(LSTM(units=hp.Int("units", min_value=1,
                              max_value=20, step=1), 
                  batch_input_shape=(batch_size, X.shape[1], X.shape[2]),
                  stateful=True))
  model.add(Dense(units=hp.Int("units", min_value=1, max_value=10),
                                activation="linear"))
  model.compile(loss='mse', metrics=["mse"],
                optimizer=keras.optimizers.Adam(
      hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])))
  return model

X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])

tuner = kt.Hyperband(
    fit_model,
    objective='mse',
    max_epochs=100,
    hyperband_iterations=2,
    overwrite=True)

tuner.search(X, y, epochs=100, validation_split=0.2)

When evaluating the model against X_test and y_test, I get the same loss and accuracy score. But when fitting the "best model", I get this:

However, my predictions looks very reasonable against my true values. What should I do to get a better fit?

Solution

20 records as training data is too small. There won't be enough variation in the training data for the model to approximate a function accurately, and so your validation data, which is likely much smaller than 20, will likely contain an example wildly different from just those 20 in the training data (i.e. it hasn't seen an example of that nature during training) resulting in a loss that is much higher.