python keras deep-learning image-recognition transfer-learning

Is it normal that transfer learning (VGG16) performs worse on CIFAR-10?

Note: I am not sure this is the right website to ask these kind of questions. Please tell me where I should ask them before downvoting this "because this isn't the right place to ask". Thanks!

I am currently experimenting with deep learning using Keras. I tried already a model similar to the one to be found on the Keras example. This yields expecting results:

80% after 10-15 epochs without data augmentation before overfitting around the 15th epoch and
80% after 50 epochs with data augmentation without any signs of overfitting.

After this I wanted to try transfer learning. I did this by using the VGG16 network without retraining its weights (see code below). This gave very poor results: 63% accuracy after 10 epochs with a very shallow curve (see picture below) which seems to be indicating that it will achieve acceptable results only (if ever) after a very long training time (I would expect 200-300 epochs before it reaches 80%).

Is this normal behavior for this kind of application? Here are a few things I could imagine to be the cause of these bad results:

the CIFAR-10 dataset has images of 32x32 pixels, which might be too few for the VGG16 net
The filters of VGG16 are not good for CIFAR-10, which would be solved by setting the weights to trainable or by starting with random weights (only copying the model and not the weights)

Thanks in advance!

My code:

Note that the inputs are 2 datasets (50000 training images and 10000 testing images) which are labeled images with shape 32x32x3. Each pixel value is a float in the range [0.0, 1.0].

import keras

# load and preprocess data...

# get VGG16 base model and define new input shape
vgg16 = keras.applications.vgg16.VGG16(input_shape=(32, 32, 3),
                                       weights='imagenet',
                                       include_top=False)

# add new dense layers at the top
x = keras.layers.Flatten()(vgg16.output)
x = keras.layers.Dense(1024, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(128, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

# define and compile model
model = keras.Model(inputs=vgg16.inputs, outputs=predictions)
for layer in vgg16.layers:
    layer.trainable = False
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# training and validation
model.fit(x_train, y_train,
          batch_size=256,
          epochs=10,
          validation_data=(x_test, y_test))

model.evaluate(x_test, y_test)

Solution

It’s not that the VGG16 model doesn’t work on that input size it’s that the weights you’re using have been pre-trained on a different input size (imagenet). You need your source and target dataset to have the same input size so the pre-trained weights can transfer. So you could either do pre-training with rescaled imagenet images to 32x32x3 or pick a target dataset that is roughly the same as the pre-training was done on (often 224x224x3 or similar for imagenet) and scale to match. I have seen a paper recently where they transferred from imagenet to cifar10 and 100, by up scaling the latter, which worked reasonably well, but that wouldn’t be ideal.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Kornblith_Do_Better_ImageNet_Models_Transfer_Better_CVPR_2019_paper.pdf

Second with a target dataset with that many training examples freezing all the layers transferred is unlikely to be a good solution in fact setting all the layers to trainable will probably work best.