python machine-learning scikit-learn face-recognition

Confidence score for machine learning with SciKit Learn?

I have followed an example of applying SciKit Learning's machine learning to facial recognition. https://scikit-learn.org/stable/auto_examples/applications/plot_face_recognition.html#sphx-glr-auto-examples-applications-plot-face-recognition-py

I have been able to adapt the example to my own data successfully. However, I am lost on one point:

after preparing the data, training the model, ultimately, you end up with the line: Y_pred = clf.predict(X_test_pca)

This produces a vector of predictions, one per face. What I can't figure out is how to get any confidence measurement to correspond with that.

The classification method is a forced choice, so that each face passed in MUST be classified as one of the known faces, even if it isn't even close.

How can I get a number per face that will reflect how well the result matches the known face?

Solution

It seems like you are looking for the .predict_proba() method of the scikit-learn estimators. It returns the probabilities of possible outcomes instead of a single prediction.

The example you are referring to is using an SVC. It is a little special in regard to this function as it states:

The model need to have probability information computed at training time: fit with attribute probability set to True.

So, if you are using the same model as in the example, instantiate it with:

SVC(kernel='rbf', class_weight='balanced', probability=True)

and use .predict_proba() instead of .predict():

y_pred = clf.predict_proba(X_test_pca)

This returns an array of shape (n_samples, n_classes), i.e. the probabilities for each class for each sample. Accessing the probabilities for class k could then be done by calling y_pred[k] for example.