My data set is composed of sequences and each time step in a sequence has 4 features. Like so
S0:
t0 -> f1, f2, f3, f4
t1 -> f1, f2, f3, f4
t2 -> f1, f2, f3, f4
t3 -> f1, f2, f3, f4
S1:
t0 -> f1, f2, f3, f4
t1 -> f1, f2, f3, f4
t2 -> f1, f2, f3, f4
t3 -> f1, f2, f3, f4
t4 -> f1, f2, f3, f4
t5 -> f1, f2, f3, f4
t6 -> f1, f2, f3, f4
t7 -> f1, f2, f3, f4
etc...
As you see each sequence is variable in length and the variability is large (anywhere from 10-500)
My goal is to input t0 and use each prediction to aid in the next prediction and do so until a goal is reached.
i0 -> [t0] - predicts > t1
i1 -> [t0, t1] - predicts > t2
i2 -> [t0, t1, t2] - predicts > t3
and so on
I'm not sure how to structure my data for training in Keras. I currently have the following for my 'x'
[ [[f1, f2, f3, f4], [f1, f2, f3, f4]] , [[f1, f2, f3, f4]] ] ...
Questions:
How do you handle variable length sequences in Keras?
How do I format my 'y' expected output data?
Would it be possible to have a start timestep and an end timestep and then fill in timesteps between the two?
How do you handle variable length sequences in Keras?
Well keras have nice way to handle variable length sequences. For example if you are using LSTM layer for sequence prediction, you can set None
to time dimension of input shape
model.add(LSTM(num_units,input_shape=(None, data_dim));
How do I format my 'y' expected output data?
Your y
s can be viewed as x
s shifted to the left by one unit.
e.g.
# if
x = [t0,t1,t2,t3,t4]
#then
y = [t1,t2,t3,t4]
If both x
and y
are numpy arrays, you can get y
from x
as follows.
y = x[1:]
Since the last value of x is not going to be used for prediction, you should remove it.
x = x[:-1]