I want to train MobileNetV2 from scratch on CIFAR-100 and I get the following results where it just stops learning after some while.
Here is my code. I would like to see at least 60-70% validation accuracy and I wonder whether I have to pre-train it on imagenet or whether it is because CIFAR100 is just 32x32x3? Due to some restrictions, I am using Keras 2.2.4 with tensorflow 1.12.0.
from keras.applications.mobilenet_v2 import MobileNetV2
(x_train, y_train), (x_test, y_test) = cifar100.load_data()
x_train = x_train / 255
x_test = x_test / 255
y_train = np_utils.to_categorical(y_train, 100)
y_test = np_utils.to_categorical(y_test, 100)
input_tensor = Input(shape=(32,32,3))
x = MobileNetV2(include_top=False,
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
x = Dense(512, activation='relu')(x)
preds = Dense(100, activation='softmax')(x)
model = Model(inputs=[input_tensor], outputs=[preds])
optimizer = Adam(lr=1e-3)
epochs = 300
batch_size = 64
callbacks = [ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.1), cooldown=0, patience=10, min_lr=1e-6)]
generator = ImageDataGenerator(rotation_range=15,
width_shift_range=5. / 32,
height_shift_range=5. / 32,
model.fit_generator(generator.flow(x_train, y_train),
validation_data=(x_test, y_test),
steps_per_epoch=(len(x_train) // batch_size),
epochs=epochs, verbose=1,
Well, MobileNets
and all other imagenet based models down-sampling the image for 5 times(224 -> 7) and then do GlobalAveragePooling2D
and then the output layers.
I think using 32*32 images on these models directly won't give you a good result, as the tensor shape would be 1*1 even before the GlobalAveragePooling2D
Maybe you should try resize the image to like 96*96 or remove the first stride=2
. Take the NASNet paper as reference, they use 4 poolings in both Cifar and ImageNet versions while only ImageNet version has stride=2
in the first Convolution layer.