Search code examples
pythonmachine-learningkeraslstmrecurrent-neural-network

Understanding LSTMs - layers data dimensions


I am not understanding how LSTM layers are fed with data.

LSTM layers requires three dimensions (x,y,z).

I do have a dataset of time series: 2900 rows in total, which should conceptually divided into groups of 23 consecutive rows where each row is described by 178 features. Conceptually every 23 rows I have a new sequence 23 rows long regarding a new patient.

Are the following statements right?

  • x samples = # of bunches of sequences 23 rows long - namely len(dataframe)/23
  • y time steps = length of the each sequence - by domain assumption 23 here.
  • z feature size = # of columns for each row - 178 in this case.

Therefore x*y = "# of rows in the dataset"

Assuming this is correct, what's a batch size while training a model in this case?

Might be the number of samples considered in an epoch while training?

Therefore by having x(# of samples) equal to 200, it makes no sense to set a batch_size greater than 200, because that's my upper limit - I don't have more data to train on.


Solution

  • I interpret your description as saying that your total dataset is of 2900 data samples. Where each data sample has 23 time slots, each with a vector of 178 dimensions.

    If that the case the input_shape for your model should be defined as (23, 178). The batch size is simple the number of samples (out of the 2900) that will be used for a training / test / prediction run.

    Try the following:

    from keras.models import Sequential
    from keras.layers import Dense, LSTM
    
    
    model = Sequential()
    model.add(LSTM(64, input_shape=(23,178)))
    model.compile(loss='mse', optimizer='sgd')
    model.summary()
    
    print model.input
    

    This is just a simplistic model that outputs a single 64-wide vector for each sample. You will see that the expected model.input is:

    Tensor("lstm_3_input:0", shape=(?, 23, 178), dtype=float32)
    

    The batch_size is unset in the input shape which means that the model can be used to train / predict batches of different sizes.