Search code examples
pythontensorflowkerasconv-neural-networkautoencoder

Autoencoder working for MNIST but crashing for images with larger size


I've tried to follow Keras tutorial to build Autoencoder for MNIST. Autoencoder worked and then I tried to change images and consequently input shape from 28, 28, 1 to 150, 150, 3 and I receive following error:

ValueError: Error when checking target: expected conv2d_6 to have shape (148, 148, 1) but got array with shape (150, 150, 3)

Autoencoder architecture:

input_img = Input(shape=(150, 150, 3))

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer=Adam(0.01), loss='binary_crossentropy')

Train settings:

autoencoder.fit(x_train, y_train,
                epochs=50,
                batch_size=512,
                shuffle=True,
                validation_data=(x_test, y_test))

My data shapes are as following:

x_train shape: (4022, 150, 150, 3)
y_train shape: (4022, 150, 150, 3)
x_test shape: (447, 150, 150, 3)
y_test shape: (447, 150, 150, 3)

Collaboratory link to my workspace:

https://colab.research.google.com/drive/1C8RX7OYS2BXaHJh6VOMscxEbrFTuQY5H


Solution

  • Use this code and it will work

    input_img = Input(shape=(150, 150, 3))
    x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
    encoded = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    x = ZeroPadding2D(padding=(1, 1), input_shape=(148, 148, 16))(x)
    x = Conv2D(16, (3, 3), activation='relu')(x)
    x = UpSampling2D((2, 2))(x)
    decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='valid')(x)
    autoencoder = Model(input_img, decoded)
    autoencoder.compile(optimizer=Adam(0.01), loss='binary_crossentropy')
    

    I added Zero padding and changed the last layer conv to output 3 channels

    This will print the following summary

    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_10 (InputLayer)        (None, 150, 150, 3)       0         
    _________________________________________________________________
    conv2d_58 (Conv2D)           (None, 150, 150, 16)      448       
    _________________________________________________________________
    max_pooling2d_25 (MaxPooling (None, 75, 75, 16)        0         
    _________________________________________________________________
    conv2d_59 (Conv2D)           (None, 75, 75, 8)         1160      
    _________________________________________________________________
    max_pooling2d_26 (MaxPooling (None, 38, 38, 8)         0         
    _________________________________________________________________
    conv2d_60 (Conv2D)           (None, 38, 38, 8)         584       
    _________________________________________________________________
    max_pooling2d_27 (MaxPooling (None, 19, 19, 8)         0         
    _________________________________________________________________
    conv2d_61 (Conv2D)           (None, 19, 19, 8)         584       
    _________________________________________________________________
    up_sampling2d_25 (UpSampling (None, 38, 38, 8)         0         
    _________________________________________________________________
    conv2d_62 (Conv2D)           (None, 38, 38, 8)         584       
    _________________________________________________________________
    up_sampling2d_26 (UpSampling (None, 76, 76, 8)         0         
    _________________________________________________________________
    zero_padding2d_4 (ZeroPaddin (None, 78, 78, 8)         0         
    _________________________________________________________________
    conv2d_63 (Conv2D)           (None, 76, 76, 16)        1168      
    _________________________________________________________________
    up_sampling2d_27 (UpSampling (None, 152, 152, 16)      0         
    _________________________________________________________________
    conv2d_64 (Conv2D)           (None, 150, 150, 3)       435       
    =================================================================
    Total params: 4,963
    Trainable params: 4,963
    Non-trainable params: 0
    _________________________________________________________________
    None