tensorflow conv-neural-network tensorflow2.0 dropout mobilenet

Adding Dropout to MobileNet with TensorFlow 2

I'm using MobileNet and TensorFlow 2 to distinguish between 4 fairly similar toys. I have exactly 750 images for each toy and one label that contains 750 'negative' images, without any of the toys.

I've used MobileNet before for this with a fair degree of success, but something about this case is causing a lot of overfitting (~30-40% discrepancy between training/validation accuracy). The model very quickly trains to a training accuracy of about 99.8% in 3 epochs but the validation accuracy is stuck around 75%. The validation data set is a random set of 20% of the input images. When looking at the accuracy of the model, there's a strong bias towards one of the toys with a lot of the other toys falsely identified as that toy.

I've tried pretty much everything out there to combat this:

I've added Dropout after the Conv2D layer that's added to the top of MobileNet and tried various dropout rates between 0.2 and 0.9.

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.Dropout(0.5),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(label_count, activation='softmax')
])

I've added an additional Dropout layer before the Conv2D layer, which seemed to marginally improve things:

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.Dropout(0.5),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(label_count, activation='softmax')
])

I've also added more test data, trying mixtures of photographs of the toys in various lighting conditions and backgrounds and generated images of the toys superimposed over random backgrounds. None of these have a significant impact.

Should I be adding dropout to the MobileNet model rather than just to the layers that I'm adding after it? I came across this code on github that does this, but I've no idea if this is actually a good idea or not - or quite how to achieve this with TensorFlow 2.

Is this sensible, or feasible?

Alternatively, the only other ideas I can think of are:

Capture more images, to make the training harder - but I'd have thought 750 for each item should be enough to do a pretty good job.
Don't use MobileNet, create a neural network from scratch or use another pre-existing one.

Solution

Since the Model is Over fitting, you can

Shuffle the Data, by using shuffle=True in cnn_model.fit. Code is shown below:

history = cnn_model.fit(x = X_train_reshaped, y = y_train, batch_size = 512, epochs = epochs, callbacks=[callback], verbose = 1, validation_data = (X_test_reshaped, y_test), validation_steps = 10, steps_per_epoch=steps_per_epoch, shuffle = True)
Use Early Stopping. Code is shown below

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)

Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):

from tensorflow.keras.regularizers import l2

Regularizer = l2(0.001)

cnn_model.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation='relu', data_format='channels_last', activity_regularizer=Regularizer, kernel_regularizer=Regularizer))

cnn_model.add(Dense(units = 10, activation = 'sigmoid', activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
Try replacing GlobalAveragePooling2D with MaxPool2D
You can try using BatchNormalization.
Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.
If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps.
Finally, if there still no change, you can try other Pre-Trained Models like ResNet, Vgg Net, DenseNet (as mentioned by Mohsin in comments)