Search code examples
pythontensorflowconv-neural-networkautoencoder

Keras CNN autoencoder for hi-res Images


I got a model like this for autoencoder from a tutorial.

input_img = layers.Input(shape=(28,28,1))
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)


x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)


autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

This returns output shape (28, 28, 1) But when I use (512,512,1) as the shape output shape is (508, 508, 1). Can someone provide way to adjust model that gives output as (512,512,1)


Solution

  • Try to add padding='same' on this line

    x = layers.Conv2D(32, (3, 3), activation='relu')(x)
    

    Full code:

    input_img = layers.Input(shape=(512,512,1))
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
    x = layers.MaxPooling2D((2, 2), padding='same')(x)
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2, 2), padding='same')(x)
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
    encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
    
    
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(encoded)
    x = layers.UpSampling2D((2, 2))(x)
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
    x = layers.UpSampling2D((2, 2))(x)
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
    x = layers.UpSampling2D((2, 2))(x)
    decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
    
    
    autoencoder = keras.Model(input_img, decoded)
    autoencoder.summary()
    autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
    

    Summary of model:

    Model: "model"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         [(None, 512, 512, 1)]     0         
    _________________________________________________________________
    conv2d (Conv2D)              (None, 512, 512, 32)      320       
    _________________________________________________________________
    max_pooling2d (MaxPooling2D) (None, 256, 256, 32)      0         
    _________________________________________________________________
    conv2d_1 (Conv2D)            (None, 256, 256, 16)      4624      
    _________________________________________________________________
    max_pooling2d_1 (MaxPooling2 (None, 128, 128, 16)      0         
    _________________________________________________________________
    conv2d_2 (Conv2D)            (None, 128, 128, 16)      2320      
    _________________________________________________________________
    max_pooling2d_2 (MaxPooling2 (None, 64, 64, 16)        0         
    _________________________________________________________________
    conv2d_3 (Conv2D)            (None, 64, 64, 16)        2320      
    _________________________________________________________________
    up_sampling2d (UpSampling2D) (None, 128, 128, 16)      0         
    _________________________________________________________________
    conv2d_4 (Conv2D)            (None, 128, 128, 16)      2320      
    _________________________________________________________________
    up_sampling2d_1 (UpSampling2 (None, 256, 256, 16)      0         
    _________________________________________________________________
    conv2d_5 (Conv2D)            (None, 256, 256, 32)      4640      
    _________________________________________________________________
    up_sampling2d_2 (UpSampling2 (None, 512, 512, 32)      0         
    _________________________________________________________________
    conv2d_6 (Conv2D)            (None, 512, 512, 1)       289       
    =================================================================
    Total params: 16,833
    Trainable params: 16,833
    Non-trainable params: 0