Search code examples
pythonmachine-learningkerasimbalanced-data

Why we use the loss to update our model but use the metrics to choose the model we need?


First of all,I am confused about why we use the loss to update the model but use the metrics to choose the model we need.

Maybe not all of code, but most of the code I've seen does,they use EarlyStopping to monitor the metrics on the validation data to find the best epoch(loss and metrics are different).

Since you have chosen to use the loss to update the model, why not use the loss to select the model? After all, the loss and the metrics are not exactly the same. It gives me the impression that you do something with this purpose, and then you evaluate it with another indicator, which makes me feel very strange.Take the regression problem as an example,when someone use the 'mse' as their loss, why they define metrics=['mae'] and monitor this to early stop or reduce learning rate,I just can't understand and I want to know what is the advantages of doing this?

Secondly, when your training data is imbalance data and the problem is a classfication problem, Some of the tutorial will tell you to use the F1 or AUC as your metrics,and they say it will improve the problem caused by imbalance data.I don't know why these metrics can improve the problem caused by imbalance data.

Thirdly,I am confused about when someone send more than one metric to the parameter metrics in the function compile. I don't understand why multiple, why not one. What is the advantage of defining multiple metrics over one?

I seem to have too many questions,and they have been bothering me for a long time.

Thank you for your kind answer.


The content above is what I edited before. Some people think my questions are too broad, so I want to reorganize my language.

Now suppose that there is a binary classification problem, and the data is not balanced. The ratio of positive and negative classes is 500:1.

I chose DNN as my classification model. I chose cross entropy as my loss. Now the question is whether I should choose cross entropy as my metric, or should I choose something else, why?

I want to talk about the information I get from other people's answers, that is, when the problem is a regression problem, the general metric and loss are differentiable, so in fact, choosing the same metrice and loss, or different one, depends entirely on your own understanding of the problem. But if the problem is classification, the metric we want is not differentiable, so we will choose different loss and metric, such as F1 and AUC, which are not differentiable. Why don't we choose cross entropy directly as the measure?


Solution

  • Question is arguably too broad for SO; nevertheless, here is a couple of things which you will hopefully find helpful...

    Since you have chosen to use the loss to update the model, why not use the loss to select the model?

    Because, while the loss is the quantity we have to optimize from the mathematical perspective, the quantity of interest from the business perspective is the metric; in other words, at the end of the day, as users of the model, we are interested in the metric, and not in the loss (at least for settings where these two quantities are by default different, such as in classification problems).

    That said, selecting the model based on loss is also a perfectly valid strategy, too; as always, there is some subjectivity, and it depends on the specific problem.

    Take the regression problem as an example, when someone use the 'mse' as their loss, why they define metrics=['mae']

    This is not the norm, and is far from standard; normally, for regression problems, it is perfectly natural to use the loss as the metric, too. I agree with you that choices like the one you refer to seem unnatural, and in general do not seem to make much sense. Just keep in mind that because someone used it in a blog or something does not make it necessarily "correct" (or a good idea), but it is difficult to argue in general without taking into account possible arguments for the specific case.

    I don't know why these metrics [F1 or AUC] can improve the problem caused by imbalance data.

    They don't "improve" anything - they are just more appropriate instead of accuracy, where a naive approach in a heavily imbalanced dataset (think of 99% majority class) will be simply to classify everything as the majority class, which would give a 99% accuracy without the model having learned anything.

    I am confused about when someone send more than one metric to the parameter metrics in the function compile. I don't understand why multiple, why not one. What is the advantage of defining multiple metrics over one?

    Again, generally speaking, there is no advantage, neither this is the norm; but everything depends on possible specifics.


    UPDATE (after comment): Limiting the discussion to classification settings (since in regression, the loss and the metric can be the same thing), similar questions pop up rather frequently, I guess because the subtle differences between the loss and the various available metrics (accuracy, precision, recall, F1 score etc) are not well understood; consider for example the inverse of your question:

    Optimizing for accuracy instead of loss in Keras model

    and the links therein. Quoting from one of my own linked answers:

    Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the "translation" of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)...

    You may also find the discussion in Cost function training target versus accuracy desired goal helpful.