Search code examples
pythontensorflowmachine-learningkeraslstm

How can I use LSTM for tabular data?


I'm working on an LSTM model for network intrusion detection. My dataset is a table with 48 features and 8 labels, each row represents an instance of network traffic, labels indicate whether the instance is benign (0) or a type of attack (1-7). I have created an LSTM model for traffic classification as follows:

model = keras.Sequential()
model.add(keras.layers.Input(shape=(None, 48)))
model.add(keras.layers.LSTM(256, activation='relu', return_sequences=True))
model.add(keras.layers.LSTM(256, activation='relu', return_sequences=True))
model.add(keras.layers.LSTM(128, activation='relu', return_sequences=False))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(80, activation='relu'))
model.add(keras.layers.Dense(8, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['mae', 'accuracy'])

However, when I try to fit the model, I get an error:

ValueError: Exception encountered when calling layer 'sequential_2' (type Sequential).
    Input 0 of layer "lstm_4" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 48)

Before that, I get the warning:

WARNING:tensorflow:Model was constructed with shape (None, None, 48) for input KerasTensor(type_spec=TensorSpec(shape=(None, None, 48), dtype=tf.float32, name='input_3'), name='input_3', description="created by layer 'input_3'"), but it was called on an input with incompatible shape (None, 48).

I guess I have to do something with the shape of my data, but I have no idea what exactly. Thank you very much for your help.


Solution

  • You mentioned that your data has 48 features, and each row is a timestep. Assuming that you want to build a model that fits a model with only one timestep at a time, then the input shape to your network, (batch_size, n_timesteps, n_features) would be (None, 1, 48). (Note that, by your intent to use an LSTM, you may desire to increase n_timesteps, which you could do by windowing over your data)

    Assuming that your input table is an array of shape (n_rows, 48)- I can tell that this from the warning message, you would need to reshape your data. If your data is a numpy array x, then you can reshape your data with np.expand_dims:

    x_reshaped = np.expand_dims(x, axis=1)
    

    np.expand_dims with axis=1 adds an axis to the data at position 1, so your resulting data shape goes from (n_rows, 48) to (n_rows, 1, 48). Then you can call model.fit(x_reshaped, y) without any errors.