I'm training an FCN (Fully Convolutional Network) and using "Sigmoid Cross Entropy" as a loss function. my measurements are F-measure and MAE. The Train/Dev Loss w.r.t #iteration graph is something like the below: Although Dev loss has a slight increase after #Iter=2200, my measurements on Dev set have been improved up to near #iter = 10000. I want to know is it possible in machine learning at all? If F-measure has been improved, should the loss also be decreased? How do you explain it?
Every answer would be appreciated.
Short answer, yes it's possible.
How I would explain it is by reasoning on the Cross-Entropy loss and how it differs from the metrics. Loss Functions for classification, generally speaking, are used to optimize models relying on probabilities (0.1/0.9), while metrics usually use the predicted labels. (0/1)
Plotting the distribution of your predictions would help to confirm this hypothesis.