tensorflow optimization keras gradient-descent loss-function

Why would I choose a loss-function differing from my metrics?

When I look through tutorials in the internet or at models posted here at SO, I often see that the loss function differs from the metrics used to evaluate the model. This might look like:

model.compile(loss='mse', optimizer='adadelta', metrics=['mae', 'mape'])

Anyhow, following this example, why wouldn't I optimize 'mae' or 'mape' as loss instead of 'mse' when I don't even care about 'mse' in my metrics (hypothetically speaking when this would be my model)?

Solution

In many cases the metric you are interested might not be differentiable, so you cannot use it as a loss, this is the case for accuracy for example, where the cross entropy loss is used instead as it is differentiable.

For metrics that are already differentiable, you just want to get additional information from the learning process, as each metrics measures something different. For example the MSE has a scale that is squared from the scale of the data/predictions, so to get the same scale you have to use RMSE or the MAE. The MAPE gives you relative (not absolute) error, so all of these metrics measure something different that might be of interest.

In the case of accuracy, this metric is used because it is easily interpretable by a human, while cross entropy loss are less intuitive to interpret.