Search code examples
pythontensorflowkeraslstm

ValueError: `validation_split` is only supported for Tensors or NumPy arrays, found: (keras.preprocessing.sequence.TimeseriesGenerator object)


When I tried to add validation_split in my LSTM model, I got this error

ValueError: `validation_split` is only supported for Tensors or NumPy arrays, found: (<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator object)

This is the code

from keras.preprocessing.sequence import TimeseriesGenerator
train_generator = TimeseriesGenerator(df_scaled, df_scaled, length=n_timestamp, batch_size=1)

model.fit(train_generator, epochs=50,verbose=2,callbacks=[tensorboard_callback], validation_split=0.1)

----------
ValueError: `validation_split` is only supported for Tensors or NumPy arrays, found: (<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator object)

One reason I could think of is, to use validation_split a tensor or numpy array is expected, as mentioned in the error, however, when passing train data through TimeSeriesGenerator, it changes the dimension of the train data to a 3D array
And since TimeSeriesGenerator is mandatory to be used when using LSTM, does this means for LSTM we can't use validation_split


Solution

  • Your first intution is right that you can't use the validation_split when using dataset generator.

    You will have to understand how the functioninig of dataset generator happens. The model.fit API does not know how many records or batch your dataset has in its first epoch. As the data is generated or supplied for each batch one at a time to the model for training. So there is no way to for the API to know how many records are initially there and then making a validation set out of it. Due to this reason you cannot use the validation_split when using dataset generator. You can read it in their documentation.

    Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling. This argument is not supported when x is a dataset, generator or keras.utils.Sequence instance.

    You need to read the last two lines where they have said that it is not supported for dataset generator.

    What you can instead do is use the following code to split the dataset. You can read in detail here. I am just writing the important part from the link below.

    # Splitting the dataset for training and testing.
    def is_test(x, _):
        return x % 4 == 0
    
    
    def is_train(x, y):
        return not is_test(x, y)
    
    
    recover = lambda x, y: y
    
    # Split the dataset for training.
    test_dataset = dataset.enumerate() \
        .filter(is_test) \
        .map(recover)
    
    # Split the dataset for testing/validation.
    train_dataset = dataset.enumerate() \
        .filter(is_train) \
        .map(recover)
    

    I hope my answer helps you.