Very briefly my question relates to image-size not remaining the same as the input image size after a maxpool layer when I use padding = 'same'
in Keras code. I am going through the Keras blog: Building Autoencoders in Keras. I am building Convolution autoencoder. The autoencoder code is as follows:
input_layer = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
As per autoencoder.summary()
, the image output after the very-first Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
layer is 28 X 28 X 16 ie the same as input image size. This is because padding is 'same'
In [49]: autoencoder.summary() (Numbering of layers is given by me and not produced in output) _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= 1.input_1 (InputLayer) (None, 28, 28, 1) 0 _________________________________________________________________ 2.conv2d_1 (Conv2D) (None, 28, 28, 16) 160 _________________________________________________________________ 3.max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16) 0 _________________________________________________________________ 4.conv2d_2 (Conv2D) (None, 14, 14, 8) 1160 _________________________________________________________________ 5.max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8) 0 _________________________________________________________________ 6.conv2d_3 (Conv2D) (None, 7, 7, 8) 584 _________________________________________________________________ 7.max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8) 0 _________________________________________________________________ 8.conv2d_4 (Conv2D) (None, 4, 4, 8) 584 _________________________________________________________________ 9.up_sampling2d_1 (UpSampling2 (None, 8, 8, 8) 0 _________________________________________________________________ 10.conv2d_5 (Conv2D) (None, 8, 8, 8) 584 _________________________________________________________________ 11.up_sampling2d_2 (UpSampling2 (None, 16, 16, 8) 0 _________________________________________________________________ 12.conv2d_6 (Conv2D) (None, 14, 14, 16) 1168 _________________________________________________________________ 13.up_sampling2d_3 (UpSampling2 (None, 28, 28, 16) 0 _________________________________________________________________ 14.conv2d_7 (Conv2D) (None, 28, 28, 1) 145 =================================================================
Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x)
. The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'
. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?
Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).
Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?
There seems to be misunderstanding of what padding does. Padding just takes care of corner cases (what to do next to the boundary of the image). But you have 2x2 maxpooling operation, and in Keras the default stride equals to the pooling size, so stride=2, which halves the image size. You need to specify stride=1 by hand to avoid that. From Keras doc:
pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.
For the second question
Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).
Layer 12 does not have padding=same specified.