tensorflow deep-learning keras keras-layer

Why "Self forcast" is worse than "Forecast from input"?

I tried to implement the code available from @Daniel Möller into my data. This is time-series forecasting problem using LSTM learning. https://github.com/danmoller/TestRepo/blob/master/TestBookLSTM.ipynb

import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from keras.callbacks import EarlyStopping

fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)

print (raw.shape)

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)
n_rows = raw.shape[0] 
n_feats = raw.shape[1]
time_shift = 7
train_size = int(n_rows * 0.8)
train_data = raw[:train_size, :] 
test_data = raw[train_size:, :] 
x_train = train_data[:-time_shift, :] 
x_test = test_data[:-time_shift,:] 
x_predict = raw[:-time_shift,:] 
y_train = train_data[time_shift:, :] 
y_test = test_data[time_shift:,:]
y_predict_true = raw[time_shift:,:]
x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1]) 
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])
y_test = y_test.reshape(1, y_test.shape[0], y_test.shape[1])
x_predict = x_predict.reshape(1, x_predict.shape[0], x_predict.shape[1])
y_predict_true = y_predict_true.reshape(1, y_predict_true.shape[0], y_predict_true.shape[1])

print (x_train.shape)
print (y_train.shape)
print (x_test.shape)
print (y_test.shape)

model = Sequential()
model.add(LSTM(64,return_sequences=True,input_shape=(None, n_feats)))
model.add(LSTM(32,return_sequences=True))
model.add(LSTM(n_feats,return_sequences=True)) 

stop = EarlyStopping(monitor='loss',min_delta=0.000000000001,patience=30) 

model.compile(loss='mse', optimizer='Adam')
model.fit(x_train,y_train,epochs=10,callbacks=[stop],verbose=2,validation_data=(x_test,y_test))

newModel = Sequential()
newModel.add(LSTM(64,return_sequences=True,stateful=True,batch_input_shape=(1,None,n_feats)))
newModel.add(LSTM(32,return_sequences=True,stateful=True))
newModel.add(LSTM(n_feats,return_sequences=False,stateful=True))

newModel.set_weights(model.get_weights())
newModel.reset_states()

lastSteps = np.empty((1, n_rows, n_feats))  
lastSteps[:,:time_shift] = x_predict[:,-time_shift:] 

newModel.predict(x_predict).reshape(1,1,n_feats)

rangeLen = n_rows - time_shift  
for i in range(rangeLen):
    lastSteps[:,i+time_shift] = newModel.predict(lastSteps[:,i:i+1,:]).reshape(1,1,n_feats)

forecastFromSelf = lastSteps[:,time_shift:,:]
print (forecastFromSelf.shape)
forecastFromSelf = scaler.inverse_transform(forecastFromSelf.reshape(forecastFromSelf.shape[1],forecastFromSelf.shape[2]))

y_predict_true = scaler.inverse_transform(y_predict_true.reshape(y_predict_true.shape[1],y_predict_true.shape[2]))
plt.plot(y_predict_true[:,0], color='b', label='True') 
plt.plot(forecastFromSelf[:,0],color='r', label='Predict')
plt.legend()
plt.title("Self forcast (Feat 1)")
plt.show()


newModel.reset_states()
newModel.predict(x_predict) 
newSteps = []
for i in range(x_predict.shape[1]):
    newSteps.append(newModel.predict(x_predict[:,i:i+1,:]))
forecastFromInput = np.asarray(newSteps).reshape(1,x_predict.shape[1],n_feats)
print (forecastFromInput.shape)
forecastFromInput = scaler.inverse_transform(forecastFromInput.reshape(forecastFromInput.shape[1],forecastFromInput.shape[2]))

plt.plot(y_predict_true[:,0], color='b', label='True')
plt.plot(forecastFromInput[:,0], color='r', label='Predict')
plt.legend()
plt.title("Forecast from input (Feat 1)")
plt.show()

The predictions could be increased by increasing the model layers and number of epochs. However, question here is, why "Self forcast" is worse than "Forecast from input"?

The pollution data is here: https://github.com/sirjanrocky/some-neural-tests-for-study/blob/master/pollution.csv This code runs without error. You can also try it

Solution

Suppose your data ends at step 1000, and you don't have any more data.

But you want even though to predict until step 1100. If you have no input data, you will have to rely on predicted outputs.

In "self forecast", you are going to predict 100 steps out of nothing, without any base

But every prediction has an associated error (it's not perfect). When you predict from a prediction, you provide an input with error and gets an output with even more error. That's unavoidable.

Predicting from predictions, that were predicted from predictions, that were predicted from predictions.... this accumulates a lot of error.

If your model can do this kind of prediction well, then you can surely say it has learned a lot about the sequence.

When you predict from known inputs, then you're doing a much safer thing. You have true data to input, and true data is error free.

So your predictions, although certainly having errors, are not "accumulating" error. You're not predicting from data with error, you're predicting from true data.

In images:

Self forecast (errors accumulate)

   true input     --> model --> predicted output step 1001 (a little error)
(steps 1 to 1000)                                |
                                                 V
                                               model
                                                 |
                                                 V
                                predicted output step 1002 (more error)
                                                 |
                                                 V
                                               model
                                                 |
                                                 V
                                predicted output step 1003 (even more error)

Forecast from input (errors don't accumulate)

    true input    --> model --> predicted output step 1001 (a little error)
(steps 1 to 1000) 

    true input    --> model --> predicted output step 1002 (a little error) 
    (step 1001) 

    true input    --> model --> predicted output step 1003 (a little error)
    (step 1002)

Bugs found in the predict part (self forecast)

If you want to predict test data to compare:

(Numbers in comments are just for my own orientation, they supposed training data with length 1000 and test data with length 200, total 1200 elements)

lastSteps = np.empty((1,n_rows-train_size,n_feats))   #test size = 200
lastSteps[:,:time_shift] = y_train[:,-time_shift:]    #0 to 6 = 993 to 999

newModel.predict(x_train)    #predict 999

rangeLen = n_rows - train_size - time_shift
for i in range(rangeLen):
    lastSteps[:,i+time_shift] = newModel.predict(lastSteps[:,i:i+1,:]).reshape(1,1,n_feats)
        #el 7 (1000) <- pred from el 0 (993)
forecastFromSelf = lastSteps[:,time_shift:,:] #1000 forward

If you want to predicting unknown data after the end:

You should train the entire data (train with x_predict, y_predict_true)

lastSteps = np.empty((1,new_predictions + time_shift,n_feats))   
lastSteps[:,:time_shift] = y_predict_true[:,-time_shift:]    #0 to 6 = 1193 to 1199

newModel.predict(x_predict)    #predict 1199

rangeLen = new_predictions 
for i in range(rangeLen):
    lastSteps[:,i+time_shift] = newModel.predict(lastSteps[:,i:i+1,:]).reshape(1,1,n_feats)
        #el 7 (1200) <- pred from el 0 (1193)
forecastFromSelf = lastSteps[:,time_shift:,:] #1200 forward