Search code examples
tensorflowdeep-learningkeraslstmkeras-layer

lstm time series prediction for event data


I am trying to solve a time series problem. The data set has n systems and we record number of faults occurred on each system for about t days. Then my goal is to predict number faults that can occur on any given system on t+1 day. The toy data set looks some thing like this. In this, each row indicates the number of faults for 11 consecutive days for one system.

        x = [[0,1,2,20,24,8,2,2,1,2,1],
             [0,100,200,250,300,80,25,20,10,1,1],
             [1,25,30,35,10,0,0,1,1,0,1],
             [0,10,10,2,24,8,2,2,1,2,1],
             [0,4,20,20,24,80,10,20,30,90,150]]

then my training data excludes the last day for each row.

         x_train = [[0,1,2,20,24,8,2,2,1,2],
             [0,100,200,250,300,80,25,20,10,1],
             [1,25,30,35,10,0,0,1,1,0],
             [0,10,10,2,24,8,2,2,1,2],
             [0,4,20,20,24,80,10,20,30,90]]

How should I modify my data to work with LSTM. Any sample codes is much appreciated. All the existing codes model single entity where as in my case I have n different systems. Here is my simple attempt. Please provide feedback whether represents my requirement. My data looks as follows.

|    | t1 | t2 | t3 |
|----|----|----|----|
| x1 | 1  | 2  | 3  |   
| x2 | 3  | 4  | 5  |   
| x3 | 5  | 6  | 7  | 
x = np.array([[1,2],[3,4],[5,6]])
y = np.array([[2,3],[4,5],[6,7]])
x = np.reshape(x,(3,1,2))
y = np.reshape(y,(3,2))
test_x  = np.array([[6,7]])
test_x = np.reshape(test_x,(1,1,2))

model = Sequential()  
model.add(LSTM(4,batch_input_shape=(1,1,2), return_sequences=False))
model.add(Dense(2,activation='relu'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(x, y,nb_epoch= 100, batch_size=1)
model.reset_states()
model.predict(test_x)

Thanks


Solution

  • If you need model to predict t+1 you just need to shift your data 1 position to the right to produce your label. If you have data: [1,2,3,4,5,6,7], and the seq_len is 3 for example, your input data batch is [[1,2,3], [4,5,6]] your target data batch will be [[2,3,4],[5,6,7]] the code maybe like below:

    inputs = np.array(int_text[: n_batches * batch_size * seq_length])
    outputs = np.array(int_text[1: n_batches * batch_size * seq_length + 1])
    
    x = np.split(inputs.reshape(batch_size, -1), n_batches, 1)
    y = np.split(outputs.reshape(batch_size, -1), n_batches, 1)
    

    EDIT:

    [[1,2,3], [4,5,6]] is a input batch. batch_size is 2, seq_length is 3.

    [[2,3,4],[5,6,7]] is a target batch. batch_size is 2, seq_length is 3.

    No matter which method you use, all you need is to make your data like above.