Search code examples
pythonimagekerasconvolutionautoencoder

Why does this Autoencoder consisting of all convolutions keep pushing the output to a blank/white image?


I am having a lot of trouble understanding the behaviour of my model and need some help to try figure it out.
Suppose this Autoencoder consisting of all convolution layers:

initializer = he_uniform()

#Input
input_tensor_a = Input(shape=(128,128,3))

#Encoder
conv1 = Conv2D(64, kernel_size=7, padding='same', kernel_initializer=initializer)(input_tensor_a)
bn1 = BatchNormalization()(conv1)
relu1 = Activation('relu')(bn1)

conv2 = Conv2D(128, kernel_size=3, strides=2, padding='same', kernel_initializer=initializer)(relu1)
bn2 = BatchNormalization()(conv2)
relu2 = Activation('relu')(bn2)

conv3 = Conv2D(256, kernel_size=3, strides=2, padding='same', kernel_initializer=initializer)(relu2)
bn3 = BatchNormalization()(conv3)
relu3 = Activation('relu')(bn3)

conv4 = Conv2D(512, kernel_size=3, strides=2, padding='same', kernel_initializer=initializer)(relu3)
bn4 = BatchNormalization()(conv4)
relu4 = Activation('relu')(bn4)

#Decoder
ups1 = UpSampling2D(size=(2,2))(relu4)
up_conv1 = Conv2D(256, kernel_size=3, padding='same', kernel_initializer=initializer)(ups1)
bn5 = BatchNormalization()(up_conv1)
relu5 = Activation('relu')(bn5)

ups2 = UpSampling2D(size=(2,2))(relu5)
up_conv2 = Conv2D(128, kernel_size=3, padding='same', kernel_initializer=initializer)(ups2)
bn6 = BatchNormalization()(up_conv2)
relu6 = Activation('relu')(bn6)

ups3 = UpSampling2D(size=(2,2))(relu6)
up_conv3 = Conv2D(64, kernel_size=3, padding='same', kernel_initializer=initializer)(ups3)
bn7 = BatchNormalization()(up_conv3)
relu7 = Activation('relu')(bn7)

up_conv4 = Conv2D(3, kernel_size=7, padding='same', activation='tanh', kernel_initializer=glorot_uniform())(relu7)

optimizer = Adam(0.00001)
#loss is mean_squared_error

I know this architecture might not be very good, but I just want to understand the behaviour of it.

The following keeps happening:
(I dont have enough reputation to post the images directly)

This is the input Image:
(https://i.sstatic.net/72Miv.png)

This is the image after the first 5 epochs:
(https://i.sstatic.net/K6JhR.png)

And now after some more epochs:
(https://i.sstatic.net/P7mBQ.png)

As you can see the the Model pushes the output image further and further to a blank image until at the end it is going to be a full white blank image.

The loss starts off at around 25000 and will decrease very slowly until like 24000 and then is stuck there (which is probably the blank image state)

I have tried different learning rates, switching bn and relu layer, different loss function than 'mse', with and without kernel_initializer but nothing helps.

So I assume it has to do with the arcitecture, but I dont understand why.

If anyone could give me a good explanation I would be very grateful.


Solution

  • You mentioned that the loss you are using is MSE. You also mentioned that the initial loss is around 25,000. My guess is that the inputs to your model are not within the range of [-1, 1] (otherwise the maximal mean squared error would be 4). In such situation, any autoencoder would not be able to properly learn meaningful representations. If this is indeed the case, try re-scaling the inputs to the same scale as the outputs.