How to add confidence score to CNN regression model？

I have a model that predicts rectangular coordinates, using Resnet and outputting two values as (x,y), and using mse to optimize the network. The problem now is that when I am predicting I want to see the confidence score. How can I design a confidence score similar to the probability value output by classification?

I try to use multitask to output a value such that this value is based on the subtraction between the predicted (x, y) and the actual (x, y), but unfortunately the effect is very bad. I hope to give a score every time I make a prediction. If the prediction is good, the score will be high, and if the prediction is bad, the score will be low.

Solution

Confidence is usually a categorical concept. For a regression problem like predicting bounding boxes, an analog would be variance or std deviation of the prediction.

Your goal is that if the model is confident, the var/std it outputs is low, and vice versa.

Architecturally, just add one or two more outputs for std. That's easy part. Now, how to train it? Which boils down to finding the right loss function.

To figure out the right loss on the std head, you have to think about dealing with Maximum Likelihood Estimation (MLE) in the context of predicting std. Expectations Maximization is a well known technique and we can borrow some ideas.

It's correct that MSE is the right loss for regression to the mean (based on the interplay of MLE and the Gaussian distribution, going back to work in the 20th century by Fisher and the 19th century by LaGrange, Gauss, and Laplace). For var/std, when the model is really wrong, you want to penalize it less if the predicted var is wide and more if the var is narrow. And if the model is spot on, then you should penalize it more if var is wide and less if the var is narrow.

It turns that that MLE works here very directly: just take the negative log of the probably density function of the normal distribution parameterized by the models' predicted mean and variance (or std).

It'll be something like this in your custom loss function, and your goal is to make it small through SGD or whatever.

dist = torch.distributions.normal.Normal(predicted mean, predicted std)
std_loss = dist.log_prob(actual)

Note that the distributions package is obviously vectorized:

>>> import torch.distributions as D
>>> D.normal.Normal(torch.tensor([0,2]), torch.tensor([1,4]))
Normal(loc: torch.Size([2]), scale: torch.Size([2]))