Search code examples
pythonkerastime-serieslstmforecasting

python forecasting building LSTM


I came across these two pages - page 1 and page 2 which use LSTM for forecasting. Thing that confused me is how/if they are using past Y variable values to predict future Y variable values - for example Y variable from time 1,2, 3 to predict Y variable for time 4,5,6

Currently these models are seem to be using consecutive data points of x variables to predict Y variable in future. For example x variables from time 1,2 and 3 to predict y variable in time 4, 5 and 6 etc. Would it be okay to use Y variable along with x variables? for example Y variable from time 1,2, and 3 along with x variables from that same time period to predict y variable in time 4, 5 and 6. I could do this just by adding Y variable as new x variable in the data. Rest of the process (function custom_ts_multi_data_prep) that prepares data for modelling will remain exactly the same

Please suggest if there is any better link that employs similar LSTMs and clarify questions from paragraph 1 and 2


Solution

  • It is completely sensible to use y[t-1] or y[t-n] for some n > 0 to predict y[t]. You shouldn't, though, use y[t] to try and predict y[t], as you probably don't know ahead of time that which you are trying to predict.

    In fact, in the example you gave (page 2), the variable traffic_volume which we predict for exists also in the input sequence, so the example you are looking for is exactly that, if I understand you correctly. The function custom_ts_multi_data_prep() adds, for each time step, the data from previous time steps into X and the following time steps into y.(*) That data is also implicitly encoded in the activations of the LSTM itself - LSTM is a type of recurrent network which encodes the data it has seen up until now as input for the next step of the prediction process. However, it may be very logical to incorporate true data from previous time steps into the prediction process for a few reasons:

    • The model's state that is passed on to the next prediction step is only a partial view of the true state, and knowing the actual progression in the "real world" may be critical for predicting the next step.
    • Similar to the rationale behind residual skip connections in CNNs, adding the "raw" value of the previous time step maybe help the model by focusing on only the residual problem - how to get from y[t-1] to y[t], while using x[t] (or x[t-1], depending on your specific problem), rather than performing the jump from x[t] to y[t] with no true data from previous time steps.

    Having said that, adding almost any feature from the system will likely make your model "better" and more prone to overfit, so take this into consideration when choosing which item this wisely.


    (*) small remark: note that in this specific example they leave a gap of one time stpe that doesn't appear here nor there, and I am not sure if it is intentional or a mistake (X contains data from i-window to i-1 while y contains data from i+1 to i+horizon, while i isn't included in either -- this might be a misunderstanding of the author about how range() works; or I might be missing something).