Search code examples
pythonkeraslstm

If I want to predict the next element in a sequence of numbers, what do I need to pass as second argument to Keras' fit method?


I'm trying to program a simple example to understand how LSTMs work. I want to take a simple integer series 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and predict the next number. I've got a code, but I don't know what the second argument of the fit method needs to be.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM

df = pd.DataFrame(columns = ['Serie'])
for i in range(0, 10):
    df.loc[i, 'Serie'] = i
    
sc = MinMaxScaler(feature_range = (0, 1))
train_set = sc.fit_transform(df.iloc[:, [True]])

xTrain = []

for i in range(0, len(train_set) - 3):
    xTrain.append(train_set[i:i + 3, 0])

xTrain = np.array(xTrain)
xTrain = np.reshape(xTrain, (xTrain.shape[0], xTrain.shape[1], 1))

regresor = Sequential()
regresor.add(LSTM(units = 1, input_shape = (3, 1)))
regresor.compile(optimizer = 'rmsprop', loss = 'mse')
regresor.fit(xTrain, ???, batch_size = 1)

Can someone give me a very simple example of this?


Solution

  • You need to set the problem as a supervised one. Every sample contains the independent variable x and the dependent variable y. Based on your question, x contains samples of 3 timesteps and 1 feature. Start off by doing the necessary imports:

    import pandas as pd
    from sklearn.preprocessing import MinMaxScaler
    import numpy as np
    import tensorflow as tf
    

    Let's define some constants:

    points = 30 # number of data points to generate
    timesteps = 3 # number of time steps per sample as LSTM layers need input shape (samples, time steps, features)
    features = 1 # number of features per time step as LSTM layers need input shape (samples, time steps, features)
    

    A sequence generation from 0 ... 30:

    x = np.arange(points + 1) # array([ 0,  1, ..., 29, 30])
    

    Here is where we start setting the problem as a supervised one with xas a sequence of numbers and y as sequence of next numbers:

    y = x[1:] # [ 1,  2, ..., 29, 30 ]
    x = x[:30] # [ 0,  1, ..., 28, 29 ]
    

    Put both x and y together for scaling:

    dataset = np.hstack((x.reshape((points, 1)),y.reshape((points, 1))))
    scaler = MinMaxScaler((0, 1))
    scaled = scaler.fit_transform(dataset)
    

    Let's define the inputs and outputs of our model:

    x_train = scaled[:,0] # first column
    x_train = x_train.reshape((points // timesteps, timesteps, features)) # as i stated before LSTM layers need input shape (samples, time steps, features)
    
    y_train = scaled[:,1] # second column
    y_train = y_train[2::3] # start at the third element in steps of 3, for a total of 10
    

    Model definition and compilation. I decided to make the model architecture a little more robust for "better" performance (see the results below):

    regresor = tf.keras.models.Sequential()
    regresor.add(tf.keras.layers.LSTM(units = 4, return_sequences = True))
    regresor.add(tf.keras.layers.LSTM(units = 2))
    regresor.add(tf.keras.layers.Dense(units = 1))
    regresor.compile(optimizer = 'rmsprop', loss = 'mse')
    

    Train the model: regresor.fit(x_train, y_train, batch_size = 2, epochs = 500, verbose = 1)

    Some predictions: y_hats = regresor.predict(x_train)

    The results;

        real y      predicted y
        0.068966    0.086510
        0.172414    0.162209
        0.275862    0.252749
        0.379310    0.356117
        0.482759    0.467885
        0.586207    0.582081
        0.689655    0.692756
        0.793103    0.795362
        0.896552    0.887317
        1.000000    0.967796
    

    As you can see, the predictions are close enough to the real values.

    A plot of the results:

    enter image description here

    Note that for simplicity I performed the predictions on the training data set, the testing should be done on test data. For that, you will have to generate more points and split them accordingly (70% training, 30% testing). Also, you can obtain the values in the original range by calling the scaler's inverse_transform methods.