For a LSTM, I would like to use tensorflow.keras.utils.timeseries_dataset_from_array()
to create sequences of training data samples. My training data contains multiple data types (numerical, categorical) which I would like to preprocess by means of Keras' preprocessing layers within the neural network.
However, it seems to me that timeseries_dataset_from_array()
is not compatible with columns with different data types, although the documentation does not tell this:
X = pd.DataFrame({
"categorical": ["a", "b", "c"],
"numerical": [1, 2, 3]
})
y = np.array([1,2,3])
n_timesteps = 1
batch_size = 1
input_dataset = timeseries_dataset_from_array(
X, y, sequence_length=n_timesteps, sequence_stride=1, batch_size=20
)
This results in the following error: ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).
So, can you only use data of the same type with timeseries_dataset_from_array()? And if so, what can I do if I want to create training data sequences of multiple data types?
I think that this answer sums it up nicely: A tensor can't have different data types.