Search code examples
deep-learninglstmrecurrent-neural-networkautoencoder

Error Received while building the Auto encoder


I am trying to build an auto encoder for my term project using CNN as Encoder and LSTM as Decoder, how ever when i display the summary of the model. I receive the following error:

ValueError: Input 0 is incompatible with layer lstm_10: expected ndim=3, found ndim=2

x.shape = (45406, 100, 100)
y.shape = (45406,)

I already tried changing the shape of the input for the LSTM, but it didn't work.

def keras_model(image_x, image_y):

model = Sequential()
model.add(Lambda(lambda x: x / 127.5 - 1., input_shape=(image_x, image_y, 1)))

last = model.output
x = Conv2D(3, (3, 3), padding='same')(last)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2, 2), padding='valid')(x)

encoded= Flatten()(x)
x = LSTM(8, return_sequences=True, input_shape=(100,100))(encoded)
decoded = LSTM(64, return_sequences = True)(x)

x = Dropout(0.5)(decoded)
x = Dense(400, activation='relu')(x)
x = Dense(25, activation='relu')(x)
final = Dense(1, activation='relu')(x)

autoencoder = Model(model.input, final)

autoencoder.compile(optimizer="Adam", loss="mse")
autoencoder.summary()

model= keras_model(100, 100)

Solution

  • Given you are using an LSTM, you need a time dimension. So your input shape should be: (time, image_x, image_y, nb_image_channels).

    I would suggest to get a more in-depth understanding of autoencoders, LSTM and 2D Convolution as all these play together here. This is a helpful intro: https://machinelearningmastery.com/lstm-autoencoders/ and this https://blog.keras.io/building-autoencoders-in-keras.html).

    Also have a look at this example, someone implemented an LSTM with Conv2D How to reshape 3 channel dataset for input to neural network. The TimeDistributed layer comes in useful here.

    However, just to get your error fixed you can add a Reshape() layer to fake the extra dimension:

    def keras_model(image_x, image_y):
    
        model = Sequential()
        model.add(Lambda(lambda x: x / 127.5 - 1., input_shape=(image_x, image_y, 1)))
    
        last = model.output
        x = Conv2D(3, (3, 3), padding='same')(last)
        x = BatchNormalization()(x)
        x = Activation('relu')(x)
        x = MaxPooling2D((2, 2), padding='valid')(x)
    
        encoded= Flatten()(x)
        # (50,50,3) is the output shape of the max pooling layer (see model summary)
        encoded = Reshape((50*50*3, 1))(encoded)
        x = LSTM(8, return_sequences=True)(encoded)  # input shape can be removed
        decoded = LSTM(64, return_sequences = True)(x)
    
        x = Dropout(0.5)(decoded)
        x = Dense(400, activation='relu')(x)
        x = Dense(25, activation='relu')(x)
        final = Dense(1, activation='relu')(x)
    
        autoencoder = Model(model.input, final)
    
        autoencoder.compile(optimizer="Adam", loss="mse")
        print(autoencoder.summary())
    
    model= keras_model(100, 100)