I am trying to build an auto encoder for my term project using CNN as Encoder and LSTM as Decoder, how ever when i display the summary of the model. I receive the following error:
ValueError: Input 0 is incompatible with layer lstm_10: expected ndim=3, found ndim=2
x.shape = (45406, 100, 100)
y.shape = (45406,)
I already tried changing the shape of the input for the LSTM, but it didn't work.
def keras_model(image_x, image_y):
model = Sequential()
model.add(Lambda(lambda x: x / 127.5 - 1., input_shape=(image_x, image_y, 1)))
last = model.output
x = Conv2D(3, (3, 3), padding='same')(last)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2, 2), padding='valid')(x)
encoded= Flatten()(x)
x = LSTM(8, return_sequences=True, input_shape=(100,100))(encoded)
decoded = LSTM(64, return_sequences = True)(x)
x = Dropout(0.5)(decoded)
x = Dense(400, activation='relu')(x)
x = Dense(25, activation='relu')(x)
final = Dense(1, activation='relu')(x)
autoencoder = Model(model.input, final)
autoencoder.compile(optimizer="Adam", loss="mse")
autoencoder.summary()
model= keras_model(100, 100)
Given you are using an LSTM, you need a time dimension. So your input shape should be: (time, image_x, image_y, nb_image_channels).
I would suggest to get a more in-depth understanding of autoencoders, LSTM and 2D Convolution as all these play together here. This is a helpful intro: https://machinelearningmastery.com/lstm-autoencoders/ and this https://blog.keras.io/building-autoencoders-in-keras.html).
Also have a look at this example, someone implemented an LSTM with Conv2D How to reshape 3 channel dataset for input to neural network. The TimeDistributed layer comes in useful here.
However, just to get your error fixed you can add a Reshape() layer to fake the extra dimension:
def keras_model(image_x, image_y):
model = Sequential()
model.add(Lambda(lambda x: x / 127.5 - 1., input_shape=(image_x, image_y, 1)))
last = model.output
x = Conv2D(3, (3, 3), padding='same')(last)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2, 2), padding='valid')(x)
encoded= Flatten()(x)
# (50,50,3) is the output shape of the max pooling layer (see model summary)
encoded = Reshape((50*50*3, 1))(encoded)
x = LSTM(8, return_sequences=True)(encoded) # input shape can be removed
decoded = LSTM(64, return_sequences = True)(x)
x = Dropout(0.5)(decoded)
x = Dense(400, activation='relu')(x)
x = Dense(25, activation='relu')(x)
final = Dense(1, activation='relu')(x)
autoencoder = Model(model.input, final)
autoencoder.compile(optimizer="Adam", loss="mse")
print(autoencoder.summary())
model= keras_model(100, 100)