Search code examples
pythonmachine-learningkeraslstm

What is the correct input shape of multivariate time series for LSTM in keras?


I can not understand what is the right way to input multivariate time series to an LSTM.

Let's say i have a dataset with 3 features that vary over time like this:

feat1 feat2 feat3
1 2 3
4 5 6
7 8 9

should I present this to my LSTM as it is using numpy.vstack()? like this:

[[1,2,3],  
[4,5,6],  
[7,8,9]]

Or should i stack it by columns so that each row is the feature sequence using numpy.column_stack()? like this:

[[1,4,7],  
[2,5,8],  
[3,6,9]]

Solution

  • From the keras LSTM API:

    inputs: A 3D tensor with shape [batch, timesteps, feature].

    Therefore, the features (multiple variables) should be represented by the last dimension, which means your 1st suggestion is the right one.

    Obs: The batch dimension should be only of concern if you aren't using the fit function for a whole dataset. Otherwise, if you are presenting a single example (for instance, in inference), you should also apply the numpy.expand_dims function in the 0th axis.