python machine-learning keras deep-learning lstm

Keras predict next time series item

My data set is composed of sequences and each time step in a sequence has 4 features. Like so

S0:
t0 -> f1, f2, f3, f4
t1 -> f1, f2, f3, f4
t2 -> f1, f2, f3, f4
t3 -> f1, f2, f3, f4

S1:
t0 -> f1, f2, f3, f4
t1 -> f1, f2, f3, f4
t2 -> f1, f2, f3, f4
t3 -> f1, f2, f3, f4
t4 -> f1, f2, f3, f4
t5 -> f1, f2, f3, f4
t6 -> f1, f2, f3, f4
t7 -> f1, f2, f3, f4

etc...

As you see each sequence is variable in length and the variability is large (anywhere from 10-500)

My goal is to input t0 and use each prediction to aid in the next prediction and do so until a goal is reached.

i0 -> [t0] - predicts > t1
i1 -> [t0, t1] - predicts > t2
i2 -> [t0, t1, t2] - predicts > t3

and so on

I'm not sure how to structure my data for training in Keras. I currently have the following for my 'x'

[ [[f1, f2, f3, f4], [f1, f2, f3, f4]] , [[f1, f2, f3, f4]] ] ...

Questions:

How do you handle variable length sequences in Keras?
How do I format my 'y' expected output data?
Would it be possible to have a start timestep and an end timestep and then fill in timesteps between the two?

Solution

How do you handle variable length sequences in Keras?

Well keras have nice way to handle variable length sequences. For example if you are using LSTM layer for sequence prediction, you can set None to time dimension of input shape

model.add(LSTM(num_units,input_shape=(None, data_dim));

How do I format my 'y' expected output data?

Your ys can be viewed as xs shifted to the left by one unit.

e.g.

# if
x = [t0,t1,t2,t3,t4]
#then 
y = [t1,t2,t3,t4]

If both x and y are numpy arrays, you can get y from x as follows.

y = x[1:]

Since the last value of x is not going to be used for prediction, you should remove it.

x = x[:-1]