Search code examples
tensorflowkerasdeep-learninglstm

How can I reduce the dimension of data, loaded through the flow_from_directory function of ImageDataGenerator?


Since I load my data (images) from the structured folders, I utilize the flow_from_directory function of the ImageDataGenerator class, which is provided by Keras. I've no issues while feeding this data to a CNN model. But when it comes to an LSTM model, getting the following error: ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (64, 28, 28, 1). How can I reduce the dimension of the input data while reading it via ImageDataGenerator objects to be able to use an LSTM model instead of a CNN?

p.s. The shape of the input images is (28, 28) and they are grayscale.

train_valid_datagen = ImageDataGenerator(validation_split=0.2)

train_gen = train_valid_datagen.flow_from_directory(
    directory=TRAIN_IMAGES_PATH,
    target_size=(28, 28),
    color_mode='grayscale',
    batch_size=64,
    class_mode='categorical',
    shuffle=True,
    subset='training'
)

Update: The LSTM model code:

inp = Input(shape=(28, 28, 1))
inp = Lambda(lambda x: squeeze(x, axis=-1))(inp)  # from 4D to 3D
x = LSTM(num_units, dropout=dropout, recurrent_dropout=recurrent_dropout, activation=activation_fn, return_sequences=True)(inp)
x = BatchNormalization()(x)
x = Dense(128, activation=activation_fn)(x)
output = Dense(nb_classes, activation='softmax', kernel_regularizer=l2(0.001))(x)

model = Model(inputs=inp, outputs=output)

Solution

  • you start feeding your network with 4D data like your images in order to have the compatibility with ImageDataGenerator and then you have to reshape them in 3D format for LSTM.

    These are the possibilities:

    with only one channel you can simply squeeze the last dimension

    inp = Input(shape=(28, 28, 1))
    x = Lambda(lambda x: tf.squeeze(x, axis=-1))(inp) # from 4D to 3D
    x = LSTM(32)(x)
    

    if you have multiple channels (this is the case of RGB images or if would like to apply a RNN after a Conv2D) a solution can be this

    inp = Input(shape=(28, 28, 1))
    x = Conv2D(32, 3, padding='same', activation='relu')(inp)
    x = Reshape((28,28*32))(x)  # from 4D to 3D
    x = LSTM(32)(x)
    

    the fit can be computed as always with model.fit_generator


    UPDATE: model review

    inp = Input(shape=(28, 28, 1))
    x = Lambda(lambda x: squeeze(x, axis=-1))(inp)  # from 4D to 3D
    x = LSTM(32, dropout=dropout, recurrent_dropout=recurrent_dropout, activation=activation_fn, return_sequences=False)(x)
    x = BatchNormalization()(x)
    x = Dense(128, activation=activation_fn)(x)
    output = Dense(nb_classes, activation='softmax', kernel_regularizer=l2(0.001))(x)
    
    model = Model(inputs=inp, outputs=output)
    model.summary()
    

    pay attention when you define inp variable (don't overwrite it)

    set return_seq = False in LSTM in order to have 2D output