python pandas lstm recurrent-neural-network

Training a RNN/LSTM model got KeyError equal to the val of the length

Trying to train this model

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

length = 60
n_features = X_train_s.shape[1]
batch_size = 1

early_stop = EarlyStopping(monitor = 'val_accuracy', mode = 'max', verbose = 1, patience = 5)

generator = TimeseriesGenerator(data = X_train_s, 
                                targets = Y_train[['TARGET_KEEP_LONG', 
                                                   'TARGET_KEEP_SHORT', 
                                                   'TARGET_STAY_FLAT']], 
                                length = length, 
                                batch_size = batch_size)


RNN_model = Sequential()
RNN_model.add(LSTM(180, activation = 'relu', input_shape = (length, n_features)))
RNN_model.add(Dense(3))
RNN_model.compile(optimizer = 'adam', loss = 'binary_crossentropy')

validation_generator = TimeseriesGenerator(data = X_test_s, 
                                           targets = Y_test[['TARGET_KEEP_LONG', 
                                                             'TARGET_KEEP_SHORT', 
                                                             'TARGET_STAY_FLAT']], 
                                           length = length, 
                                           batch_size = batch_size)


RNN_model.fit(generator, 
              epochs=20, 
              validation_data = validation_generator,
              callbacks = [early_stop])

I get the error "KeyError: 60" where actually 60 is the value of the variable "length" (if I change it, the error changes accordingly).

The shapes of the training dataset are

X_test_s.shape
(114125, 89)

same for X_train_s.shape as well as n_features == 89.

Solution

It was exhausting to find the cause due to the poor and misleading error message. Anyway, the trouble was on the target data set form, the TimeseriesGenerator does not accept panda dataframes, just np.arrays. Therefore this

 generator = TimeseriesGenerator(data = X_train_s, 
                                targets = Y_train[['TARGET_KEEP_LONG', 'TARGET_KEEP_SHORT',                                                    'TARGET_STAY_FLAT']], length = length, batch_size = batch_size)

shall have been written as

generator = TimeseriesGenerator(X_train_s, pd.DataFrame.to_numpy(Y_train[['TARGET_KEEP_LONG', 'TARGET_KEEP_SHORT', 'TARGET_STAY_FLAT']]), length=length, batch_size=batch_size)

in the case of just one target, it was enough

 generator = TimeseriesGenerator(data = X_train_s, targets = Y_train['TARGET_KEEP_LONG'], length = length, batch_size = batch_size)

just one level of squared brackets, not two.