Search code examples
pythonkerasencodingone-hot-encoding

Keras CategoryEncoding layer with time sequences


For a LSTM, I create time sequences by means of tensorflow.keras.utils.timeseries_dataset_from_array(). For some of the features, I would like to do one-hot encoding by means of Keras preprocessing layers.

I have the following code:

n_timesteps = 20
n_categorical_features = 1

from tensorflow.keras.layers import Input, IntegerLookup, CategoryEncoding

cat_inp = keras.layers.Input(shape=(n_timesteps, n_categorical_features), name = "categorical_input")
index = IntegerLookup()
index.adapt(X["br"])
encoder = CategoryEncoding(num_tokens=index.vocabulary_size(), output_mode = "one_hot")(cat_inp)

However, the last line gives me the error ValueError: Exception encountered when calling layer "category_encoding_22" (type CategoryEncoding). When output_mode is not 'int', maximum supported output rank is 2. Received output_mode one_hot and input shape (None, 20, 1), which would result in output rank 3. The problem seems to be that CategoryEncoding does not support the shape of my input tensor (None, n_timesteps, n_categorical_features).

How can I one-hot encode the input tensor produced by timeseries_dataset_from_array()?


Solution

  • Please try to use TimeDistributed layer:

    encoder = tf.keras.layers.TimeDistributed(CategoryEncoding(num_tokens=index.vocabulary_size(), output_mode = "one_hot"))(cat_inp)
    

    It will apply CategoryEncoding to each item in your time sequence. Please see https://keras.io/api/layers/recurrent_layers/time_distributed/ for more information.