Search code examples
pythonrmachine-learningkerasloss-function

Optimizing for accuracy instead of loss in Keras model


If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I want my model to be trained in order to have the least mean absolute error, i should use the MAE as the loss function. Why is it, for example, sometimes you see someone wanting to achieve the best accuracy possible, but building the model to minimize another completely different function? For example:

model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')

How come the model above is trained to give us the best acc, since during it's training it will try to minimize another function (MSE). I know that, when already trained, the metric of the model will give us the best acc found during the training.

My doubt is: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE? If done in that way, wouldn't the model give us even higher accuracy, since it knows it has to maximize it during it's training?


Solution

  • To start with, the code snippet you have used as example:

    model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
    

    is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.

    Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be

    model.compile(loss='mean_squared_error', optimizer='sgd')
    

    without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like

    model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
    

    i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.

    Now, your question:

    shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?

    is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.

    Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in

    For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful: