Search code examples
pythontensorflowkerastime-seriesrecurrent-neural-network

Keras timeseriesgenerator: how to predict multiple data points in one step?


I have meteorological data that looks like this:

DateIdx               winddir   windspeed   hum         press       temp
2017-04-17 00:00:00   0.369397  0.155039    0.386792    0.196721    0.238889
2017-04-17 00:15:00   0.363214  0.147287    0.429245    0.196721    0.233333
2017-04-17 00:30:00   0.357032  0.139535    0.471698    0.196721    0.227778
2017-04-17 00:45:00   0.323029  0.127907    0.429245    0.204918    0.219444
2017-04-17 01:00:00   0.347759  0.116279    0.386792    0.213115    0.211111
2017-04-17 01:15:00   0.346213  0.127907    0.476415    0.204918    0.169444
2017-04-17 01:30:00   0.259660  0.139535    0.566038    0.196721    0.127778
2017-04-17 01:45:00   0.205564  0.073643    0.523585    0.172131    0.091667
2017-04-17 02:00:00   0.157650  0.007752    0.481132    0.147541    0.055556
2017-04-17 02:15:00   0.122101  0.003876    0.476415    0.122951    0.091667

My aim: to use the keras timeseriesgenerator (from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator) to train and predict multiple data points (multiple rows) at once, e.g. not to do

[input X]                  | [targets y]
[dp1, dp2, dp3, dp4, dp5]  | [dp6]
[dp2, dp3, dp4, dp5, dp6]  | [dp7]
[dp3, dp4, dp5, dp6, dp7]  | [dp8]
                          ...

but to do

[input X]                  | [targets y]
[dp1, dp2, dp3, dp4, dp5]  | [dp6, dp7, dp8]
[dp2, dp3, dp4, dp5, dp6]  | [dp7, dp8, dp9]
[dp3, dp4, dp5, dp6, dp7]  | [dp8, dp9, dp10]
                          ...

I can achieve the top kind of predictions with

generator = TimeseriesGenerator(
    X,
    X,
    length=5,
    sampling_rate=1,
    stride=1,
    start_index=0,
    end_index=None,
    shuffle=False,
    reverse=False,
    batch_size=1,
)

, but I haven't figured out how I can tweak the generator options for the second kind of predictions.

Is there an easy way to achieve the desired prediction window of 3 data points with the timeseriesgenerator? If not, can you suggest me some code to bin my predictions y to achieve the task? Tnx


Solution

  • What you can do with the TimeSeries generator is to change the target entry. Concretely, since you want to predict the next thee timesteps, your target should be something of the form

                   target=np.concatenate((np.roll(X, -1, axis=0),
                                          np.roll(X, -2, axis=0),
                                          np.roll(X, -3, axis=0)
                                          ),axis=1)
                    
    

    The roll will shift your rows downward, you should probably throw away the last two rows of the target. Therefore when you define your generator , you can now use the target object as a parameter:

    generator = TimeseriesGenerator(
        X,
        target,
        length=5,
        sampling_rate=1,
        stride=1,
        start_index=0,
        end_index=None,
        shuffle=False,
        reverse=False,
        batch_size=1,
    )
    

    Note that now, when you do call model.fit it expect output shaped like 3dim_colX, so your model architecture and/or loss function needs to account for this, you should therefore change the output dim of your last layer directly, or combine 3 models using layer.concatenate([model_timeplus1,model_timeplus2,model_timeplus3], axis=-1) if you choose a shared weight model (the three predicted values generated by one single nn model_timeplus1):

    layer.concatenate([model_timeplus1,model_timeplus1,model_timeplus3], axis=-1)
    

    It is equivalent to an unrolled recursive neural network.