keras deep-learning tensorflow2.0 resnet batch-normalization

Validation and Test accuracy at random performance, whereas Train accuracy very high

I am trying to build a classifier in TensorFlow2.1 for CIFAR10 using ResNet50 pre-trained over imagenet from keras.application and then stacking a small FNN on top of it:

# Load ResNet50 pre-trained on imagenet
resn = applications.resnet50.ResNet50(weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3), pooling='avg', include_top=False)


# Load CIFAR10 
(c10_train, c10_test), info = tfds.load(name='cifar10', split=['train', 'test'], with_info=True, as_supervised=True)

# Make sure all the layers are not trainable
for layer in resn.layers:
    layer.trainable = False

# Transfert Learning for CIFAR10: fine-tune the network by stacking a trainable FNN on top of Resnet
from tensorflow.keras import models, layers

def build_model():
  model = models.Sequential()
  # Feature extractor
  model.add(resn)
  # Small FNN
  model.add(layers.Dense(256, activation='relu'))
  model.add(layers.Dropout(0.4))
  model.add(layers.Dense(10, activation='softmax'))

  model.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.1),
                metrics=['accuracy'])

  return model

# Build the resulting net
resn50_c10 = build_model()

I am facing the following issue when it comes to validate or test the accuracy:

history = resn50_c10.fit_generator(c10_train.shuffle(1000).batch(BATCH_SIZE), validation_data=c10_test.batch(BATCH_SIZE), epochs=20)

Epoch 1/20
25/25 [==============================] - 113s 5s/step - loss: 0.9659 - accuracy: 0.6634 - val_loss: 2.8157 - val_accuracy: 0.1000
Epoch 2/20
25/25 [==============================] - 109s 4s/step - loss: 0.8908 - accuracy: 0.6920 - val_loss: 2.8165 - val_accuracy: 0.1094
Epoch 3/20
25/25 [==============================] - 116s 5s/step - loss: 0.8743 - accuracy: 0.7038 - val_loss: 2.7555 - val_accuracy: 0.1016
Epoch 4/20
25/25 [==============================] - 132s 5s/step - loss: 0.8319 - accuracy: 0.7166 - val_loss: 2.8398 - val_accuracy: 0.1013
Epoch 5/20
25/25 [==============================] - 132s 5s/step - loss: 0.7903 - accuracy: 0.7253 - val_loss: 2.8624 - val_accuracy: 0.1000
Epoch 6/20
25/25 [==============================] - 132s 5s/step - loss: 0.7697 - accuracy: 0.7325 - val_loss: 2.8409 - val_accuracy: 0.1000
Epoch 7/20
25/25 [==============================] - 132s 5s/step - loss: 0.7515 - accuracy: 0.7406 - val_loss: 2.7697 - val_accuracy: 0.1000   
#... (same for the remaining epochs)

Although the model seems to learn adequately from the training split, both the accuracy and loss for the validation set does not improve at all. What is causing this behavior?

I am excluding this is overfitting since I am applying Dropout and since the model seems to never really improve on the test set.

What I have done so far:

Check the one-hot labelling is consistent throughout train and test
Tried different FNN configurations
Tried the method fit_generator instead of fit
Preprocess the image, resized the images w/ different input_shapes

and experienced always the same problem.

Any hint would be extremely appreciated.

Solution

Apparently the problem was caused uniquely by the use of ResNet50.

As a workaround, I downloaded and used other pre-trained deep networks such as keras.applications.vgg16.VGG16, keras.applications.densenet.DenseNet121 and the accuracy on the test set increased as expected.

UPDATE

The above part of this answer is just a palliative. In order to understand what is really happening and eventually use transfer learning properly with ResNet50, keep on reading.

The root cause appears to be found in how Keras handles the Batch Normalization layer:

During fine-tuning, if a Batch Normalization layer is frozen it uses the mini-batch statistics. I believe this is incorrect and it can lead to reduced accuracy especially when we use Transfer learning. A better approach in this case would be to use the values of the moving mean and variance.

As explained more in-depth here: https://github.com/keras-team/keras/pull/9965

Even though the correct approach has been implemented in TensorFlow 2 when we use tf.keras.applications we reference the TensorFlow 1.0 behavior for Batch Normalization. That's why we need to explicitly inject the reference to TensorFlow 2 by adding the argument layers=tf.keras.layers when loading modules. So in my case, the loading of ResNet50 will become

history = resn50_c10.fit_generator(c10_train.shuffle(1000).batch(BATCH_SIZE), validation_data=c10_test.batch(BATCH_SIZE), epochs=20, layers=tf.keras.layers)

and that will do the trick.

Credits for the solution to @rpeloff: https://github.com/keras-team/keras/pull/9965#issuecomment-549126009