Minimize the small loss to zero loss in transfer learning

I am asking this question because I noticed in competitions people tend to minimize the loss to 0. I have an "image binary classification " problem and I already achieved the binary_crossentropy_loss to 0.003 with a "train from scratch" transfer learning model. How can I further reduce it to 0? Should I fine-tune the model again or should I go back to do image feature engineering?

Additionally, according to the picture here, I suppose I encountered "vanished gradient" instead of "overfitting". If so, what should I do on the next step?

Thank you!

Solution

Since you are trying to perform image binary classification, if you can minimize both your training and validation loss to 0, that basically means your network is 'perfectly' trained to recognize all the validation images by using just the training images. When this happens, I think it's better for you to get 'harder' data for your network to learn.

From your image, I think you should continue training your model for more epochs, since val_loss does not seem to converge yet; as a result, there are no indications of 'overfitting'.

Regarding 'vanished gradient', it's not possible to tell from your picture since the common sign of vanishing gradients is weights dying down to 0. To check for this problem, I think you should keep track of the weights distribution of your model in addition to the losses.