I'm using MobileNet and TensorFlow 2 to distinguish between 4 fairly similar toys. I have exactly 750 images for each toy and one label that contains 750 'negative' images, without any of the toys.
I've used MobileNet before for this with a fair degree of success, but something about this case is causing a lot of overfitting (~30-40% discrepancy between training/validation accuracy). The model very quickly trains to a training accuracy of about 99.8% in 3 epochs but the validation accuracy is stuck around 75%. The validation data set is a random set of 20% of the input images. When looking at the accuracy of the model, there's a strong bias towards one of the toys with a lot of the other toys falsely identified as that toy.
I've tried pretty much everything out there to combat this:
I've added Dropout after the Conv2D layer that's added to the top of MobileNet and tried various dropout rates between 0.2 and 0.9.
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(label_count, activation='softmax')
])
I've added an additional Dropout layer before the Conv2D layer, which seemed to marginally improve things:
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(label_count, activation='softmax')
])
I've also added more test data, trying mixtures of photographs of the toys in various lighting conditions and backgrounds and generated images of the toys superimposed over random backgrounds. None of these have a significant impact.
Should I be adding dropout to the MobileNet model rather than just to the layers that I'm adding after it? I came across this code on github that does this, but I've no idea if this is actually a good idea or not - or quite how to achieve this with TensorFlow 2.
Is this sensible, or feasible?
Alternatively, the only other ideas I can think of are:
Since the Model is Over fitting, you can
Shuffle
the Data, by using shuffle=True
in cnn_model.fit
. Code is shown below:
history = cnn_model.fit(x = X_train_reshaped,
y = y_train,
batch_size = 512,
epochs = epochs, callbacks=[callback],
verbose = 1, validation_data = (X_test_reshaped, y_test),
validation_steps = 10, steps_per_epoch=steps_per_epoch, shuffle = True)
Use Early Stopping
. Code is shown below
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
Use Regularization
. Code for Regularization is shown below (You can try l1 Regularization
or l1_l2 Regularization
as well):
from tensorflow.keras.regularizers import l2
Regularizer = l2(0.001)
cnn_model.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation='relu', data_format='channels_last',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
cnn_model.add(Dense(units = 10, activation = 'sigmoid',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
Try replacing GlobalAveragePooling2D
with MaxPool2D
You can try using BatchNormalization
.
Perform Image Data Augmentation using ImageDataGenerator
. Refer this link for more info about that.
If the Pixels
are not Normalized
, Dividing the Pixel Values with 255
also helps.
Finally, if there still no change, you can try other Pre-Trained Models
like ResNet
, Vgg Net
, DenseNet
(as mentioned by Mohsin in comments)