I have followed an example of applying SciKit Learning's machine learning to facial recognition. https://scikit-learn.org/stable/auto_examples/applications/plot_face_recognition.html#sphx-glr-auto-examples-applications-plot-face-recognition-py
I have been able to adapt the example to my own data successfully. However, I am lost on one point:
after preparing the data, training the model, ultimately, you end up with the line: Y_pred = clf.predict(X_test_pca)
This produces a vector of predictions, one per face. What I can't figure out is how to get any confidence measurement to correspond with that.
The classification method is a forced choice, so that each face passed in MUST be classified as one of the known faces, even if it isn't even close.
How can I get a number per face that will reflect how well the result matches the known face?
It seems like you are looking for the .predict_proba()
method of the scikit-learn
estimators. It returns the probabilities of possible outcomes instead of a single prediction.
The example you are referring to is using an SVC
. It is a little special in regard to this function as it states:
The model need to have probability information computed at training time: fit with attribute
probability
set to True.
So, if you are using the same model as in the example, instantiate it with:
SVC(kernel='rbf', class_weight='balanced', probability=True)
and use .predict_proba()
instead of .predict()
:
y_pred = clf.predict_proba(X_test_pca)
This returns an array of shape (n_samples, n_classes), i.e. the probabilities for each class for each sample. Accessing the probabilities for class k
could then be done by calling y_pred[k]
for example.