Search code examples
time-serieslstmrecurrent-neural-networkmulticlass-classification

How to reshape data for LSTM - Time series multi class classification


I'm working on a time series classification using ASHRAE RP-1043 chiller multiple sensor data set which has 65 columns and more than 3000 rows for each chiller fault and normal condition. And I have used LSTM and I'm not quit sure the data structure I have used here is suitable for time series classification. Below is a image of my data frame created from the collected data set which contains records of multiple chiller conditions (both 7 faulty and normal). Each record has been labeled with relevant class(condition). And structured the data set from different files served faulty conditions and normal condition. enter image description here

And the train data shape is as following X_train.shape,y_train.shape

((81600, 65), (81600, 8))

But for LSTM input needs to be 3D. So reshaped into as following. (with only 1 time step) # make it 3d input X_train = X_train.reshape(-1,1,65) X_train.shape,y_train.shape`

((81600, 1, 65), (81600, 8))

def create_nn_model():
  model = Sequential()
  model.add(LSTM(100, dropout=0.2, input_shape=(X_train.shape[1],
  X_train.shape[2]),return_sequences=True))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(8,activation='softmax'))
  model.compile(loss='categorical_crossentropy',
                optimizer='adam', metrics=['accuracy'])
  return model

And this works for my model and i can fit without any error.

But how can i increase the number of time steps of X_train as in (100 time steps)

scaled_x_train.reshape(-1,100,65) X_train.shape,y_train.shape

((816, 100, 65), (81600, 8))

Now the X_train has been reshaped. But I cannot fit this due to the size difference of the X_train and y_train. I have tried reshaping the y_train the same way done to X_train but then i will have to return sequence which is not my requirement. Is there anything wrong with my data set structure(102000 rows and 65 columns)? Can i split my data shown in above image directly for training and testing or do i need to do more manipulating. Appreciate any help

P.S Related To Priya's answer enter image description here


Solution

  • You cannot directly reshape into this:

    scaled_x_train.reshape(-1,100,65) X_train.shape,y_train.shape
    

    This will not give error when the timesteps=1 because the num_samples in x_train.shape = (num_samples,time_steps,num_features) would not change. Since dim=1 can be created on any axis.

    But when time_steps>1, num_samples=len(dataset)-time_steps.

    I am including a snippet of code that creates input data for Lstm model assuming that last column is your target variable. I think rest of your model code is fine.

    import numpy as np 
    
    # FUNCTION TO CREATE 1D DATA INTO TIME SERIES DATASET
    def new_dataset(dataset, time_steps):
        data_X, data_Y = [], []
        for i in range(len(dataset)-time_steps):
            a = dataset[i:(i+time_steps), :-1]
            data_X.append(a)
            data_Y.append(dataset[i + time_steps, -1])
        return np.array(data_X), np.array(data_Y)