tensorflow machine-learning keras sequential mlp

How to avoid overfitting with keras?

def build_model():
  model = keras.models.Sequential()

  model.add(keras.layers.Flatten(input_shape=[32,32,3]))
  keras.layers.Dropout(rate=0.2)

  model.add(keras.layers.Dense(500, activation="relu"))
  keras.layers.Dropout(rate=0.2)

  model.add(keras.layers.Dense(300, activation="relu"))
  keras.layers.Dropout(rate=0.2)  

  model.add(keras.layers.Dense(10, activation="softmax"))
  model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.SGD(), metrics=['accuracy'])
  return model 

keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)

def exponential_decay_fn(epoch): 
  return 0.05 * 0.1**(epoch / 20)

lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)

history = keras_clf.fit(np.array(X_train_new), np.array(y_train_new), epochs=100,
                      validation_data=(np.array(X_validation), np.array(y_validation)),
                      callbacks=[keras.callbacks.EarlyStopping(patience=10),lr_scheduler])

I use 'drop out', 'early stopping', and 'lr scheduler'. The results seem overfitting, I tried to reduce n_neurons of hidden layers to (300, 100). The results were underfitting, the accuracy of the train set was only around 0.5.

Are there any suggestions?

Solution

i dealing with these issue I first start out with a simple model like just a few dense layer with not a lot of nodes. I run the model and look at the resultant training accuracy. First step in modelling is to get a high training accuracy. You can add more layers and or more nodes in each layer until you get a satisfactory level of accuracy. Once that is achieved then start to evaluate the validation loss. If after a certain number of epochs the training loss continues to decrease but the validation loss starts to TREND upward then you are in an over fitting condition. Now the word TREND is import. I can't tell from you graphs if you are really overfitting but it looks to me that the validation loss has reached its minimum and is probably oscillating around the minimum. This is normal and is NOT overfitting. If you have an adjustable lr callback that monitors validation loss or alternately a learning rate scheduler lowering the learning may get you to a lower minimum loss but at some point (provided you run for enough epochs) continually reducing the learning rate doesn't get you to a lower minimum loss. The model has just done the best it can. Now if you are REALLY over fitting you can take remedial actions. One is to add more dropout at the potential of reduced training accuracy. Another is to add L1 and or L2 regularization. Documentation for that is here.. If your training accuracy is high but your validation accuracy is poor it usually implies you need more training samples because the samples you have are not fully representative of the data probability distribution. More training data is always better. I notice you have 10 classes. Look at the balance of your dataset. If the classes have a significantly different number of samples this can cause problems. There are a bunch of methods to handle that problem like over-sampling under represented classes, under-sampling over represented classes, or a combination of both. An easy method is to use the class_weight parameter in model.fit. Look at your validation set and make sure it is not using to many samples from under represented classes. Always best to select the validation set randomly from the overall data set.