Why is the loss of my autoencoder not going down at all during training?

I am following this tutorial to create a Keras-based autoencoder, but using my own data. That dataset includes about 20k training and about 4k validation images. All of them are very similar, all show the very same object. I haven't modified the Keras model layout from the tutorial, only changed the input size, since I used 300x300 images. So my model looks like this:

Model: "autoencoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 300, 300, 1)]     0
_________________________________________________________________
encoder (Functional)         (None, 16)                5779216
_________________________________________________________________
decoder (Functional)         (None, 300, 300, 1)       6176065
=================================================================
Total params: 11,955,281
Trainable params: 11,954,897
Non-trainable params: 384
_________________________________________________________________
Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 300, 300, 1)]     0
_________________________________________________________________
conv2d (Conv2D)              (None, 150, 150, 32)      320
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 150, 150, 32)      0
_________________________________________________________________
batch_normalization (BatchNo (None, 150, 150, 32)      128
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 75, 75, 64)        18496
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 75, 75, 64)        0
_________________________________________________________________
batch_normalization_1 (Batch (None, 75, 75, 64)        256
_________________________________________________________________
flatten (Flatten)            (None, 360000)            0
_________________________________________________________________
dense (Dense)                (None, 16)                5760016
=================================================================
Total params: 5,779,216
Trainable params: 5,779,024
Non-trainable params: 192
_________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(None, 16)]              0
_________________________________________________________________
dense_1 (Dense)              (None, 360000)            6120000
_________________________________________________________________
reshape (Reshape)            (None, 75, 75, 64)        0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 150, 150, 64)      36928
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 150, 150, 64)      0
_________________________________________________________________
batch_normalization_2 (Batch (None, 150, 150, 64)      256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 300, 300, 32)      18464
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 300, 300, 32)      0
_________________________________________________________________
batch_normalization_3 (Batch (None, 300, 300, 32)      128
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 300, 300, 1)       289
_________________________________________________________________
activation (Activation)      (None, 300, 300, 1)       0
=================================================================
Total params: 6,176,065
Trainable params: 6,175,873
Non-trainable params: 192

Then I initialize my model like this:

IMGSIZE = 300
EPOCHS = 20
LR = 0.0001

(encoder, decoder, autoencoder) = ConvAutoencoder.build(IMGSIZE, IMGSIZE, 1)
sched = ExponentialDecay(initial_learning_rate=LR, decay_steps=EPOCHS, decay_rate=LR / EPOCHS)
autoencoder.compile(loss="mean_squared_error", optimizer=Adam(learning_rate=sched))

Then I train my model like this:

image_generator = ImageDataGenerator(rescale=1.0 / 255)
train_gen = image_generator.flow_from_directory(
    os.path.join(args.images, "training"),
    class_mode="input",
    color_mode="grayscale",
    target_size=(IMGSIZE, IMGSIZE),
    batch_size=BS,
)
val_gen = image_generator.flow_from_directory(
    os.path.join(args.images, "validation"),
    class_mode="input",
    color_mode="grayscale",
    target_size=(IMGSIZE, IMGSIZE),
    batch_size=BS,
)
hist = autoencoder.fit(train_gen, validation_data=val_gen, epochs=EPOCHS, batch_size=BS)

My batch size BS is 32 and I start with an initial Adam learning rate of 0.001 (but I also tried values like 0.1 down to 0.0001). I also tried to increase the latent dimensionality to something like 1024, but that doesn't solve my issue either.

Now during training the loss goes down in the first epoch from about 0.5 to about 0.2 - and then beginning from the second epoch that loss sticks at the very same value, e.g. 0.1989, and then it stays there "forever", regardless of how many epochs I train and/or the initial learning rate I use.

Any ideas what could be the problem here?

Solution

It could be that the decay_rate argument in tf.keras.optimizers.schedules.ExponentialDecay is decaying your learning rate quicker than you think it is, effectively making your learning rate zero.