Search code examples
pythonkerastypestime-seriesgenerator

Keras timeseries_dataset_from_array with multiple column types?


For a LSTM, I would like to use tensorflow.keras.utils.timeseries_dataset_from_array() to create sequences of training data samples. My training data contains multiple data types (numerical, categorical) which I would like to preprocess by means of Keras' preprocessing layers within the neural network. However, it seems to me that timeseries_dataset_from_array() is not compatible with columns with different data types, although the documentation does not tell this:

X = pd.DataFrame({
    "categorical": ["a", "b", "c"],
    "numerical": [1, 2, 3]
})

y = np.array([1,2,3])
n_timesteps = 1
batch_size = 1

input_dataset = timeseries_dataset_from_array(
    X, y, sequence_length=n_timesteps, sequence_stride=1, batch_size=20
)

This results in the following error: ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

So, can you only use data of the same type with timeseries_dataset_from_array()? And if so, what can I do if I want to create training data sequences of multiple data types?


Solution

  • I think that this answer sums it up nicely: A tensor can't have different data types.