Keras Bidirectional LSTMs

I want to implement the following diagram: diagram The dataset is made up of images where I divide each image into 12 patches and apply feature extraction for each patch resulting a feature vector of length 5376 per patch. I want to train a bidirectional lstm on the 12 feature vectors for each image to classify each image into 1 of 4 categories. This is my code:

model = Sequential()
model.add(Bidirectional(LSTM(12, input_shape=(12, 5376), return_sequences=True, dropout=0.25, recurrent_dropout=0.1)))
model.add(Dense(4, activation="relu"))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(
    x_train,
    y_train,
    steps_per_epoch=10,
    epochs=2,
    validation_data=(x_val, y_val),
    validation_steps=10,
    verbose=1)

where,

x_train is an array (all images) of an array (12 feature vectors concatenated) Example: When all training images are 400 images x_train is an array of length 400 of arrays of length 64512 (12*5376)
y_train is array of integers where each integer represents one of the four classes for each image
Similarly x_val and y_val

I am not sure if the bidirectional lstm parameters are correct and I get the error

input_shape = (None,) + tuple(inputs.shape[1:])
AttributeError: 'list' object has no attribute 'shape'

Solution

There are a few issues with your code. I have mentioned those below with a code example of what you are trying to implement.

You have to have your inputs as a NumPy array to be able to use reshape. The input shape of the model must be (patches, features)
The image you link stacks 2 bidirectional-LSTMs (or more) over a sequence input and then added a dense layer with 4 neurons as output. See code for details.

You are trying to solve a multi-class single label problem. You need to use softmax as output layer activation, and categorical_crossentropy as loss. Refer to this table for more details -

Lastly, as you mention, your output is a 1D array with each element as the class label from 0 to 3 (4 classes). Since your model predicts 4 logits (1 for each class), you have to use sparse_categorical_crossentropy instead of categorical_crossentropy.

import numpy as np
from tensorflow.keras import layers, Model, utils

X = np.random.random((100, 12, 530))  #100 images, 12 patches, 530 features instead of 5376
y = np.random.randint(0, 4, (100,))

inp = layers.Input((12, 530)) #input shape of (batch, 12, 530)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(inp)
x = layers.Bidirectional(layers.LSTM(64))(x)
out = layers.Dense(4, activation='softmax')(x)

model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=3)

utils.plot_model(model, show_layer_names=False, show_shapes=True)

Epoch 1/3
4/4 [==============================] - 0s 25ms/step - loss: 1.5111 - accuracy: 0.2400
Epoch 2/3
4/4 [==============================] - 0s 17ms/step - loss: 1.4031 - accuracy: 0.2600
Epoch 3/3
4/4 [==============================] - 0s 25ms/step - loss: 1.3954 - accuracy: 0.2600