I can not understand what is the right way to input multivariate time series to an LSTM.
Let's say i have a dataset with 3 features that vary over time like this:
feat1 | feat2 | feat3 |
---|---|---|
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
should I present this to my LSTM as it is using numpy.vstack()? like this:
[[1,2,3],
[4,5,6],
[7,8,9]]
Or should i stack it by columns so that each row is the feature sequence using numpy.column_stack()? like this:
[[1,4,7],
[2,5,8],
[3,6,9]]
From the keras LSTM API:
inputs: A 3D tensor with shape [batch, timesteps, feature].
Therefore, the features (multiple variables) should be represented by the last dimension, which means your 1st suggestion is the right one.
Obs: The batch
dimension should be only of concern if you aren't using the fit
function for a whole dataset. Otherwise, if you are presenting a single example (for instance, in inference), you should also apply the numpy.expand_dims
function in the 0th axis.