I am using a face recognition library to detect faces. The model gets 128 embeddings from the image. To check if two faces match, it checks if the distance between those two points is less than 0.6. I am not sure what it means by distance between two images. As per my understanding, does it mean comparing the distance between two points in known images and also again in the image we want it to recognize. I could not find any documentation on this online. Please Help
face_recognition package uses dlib in the background. Dlib builds a resnet model and it is a CNN model. The output layer of the resnet model has 128 nodes. In other words, when you feed a facial image to resnet model, it generates 128 dimensional vector. Some sources call this representation.
When you compare two facial images, you feed both of them to resnet model respectively. So, you will have two 128D vectors as output.
Finally, you need to find the similarity of these two vectors. Finding cosine similarity and euclidean distance are the most common methods to find the similarity. Author of the dlib found the tuned threshold for euclidean distance and it is 0.6. If you will use cosine similarity, then the threshold will be very different.
The question is that how this threshold is determined? He passed positive and negative idendity pairs to resnet model, find representations and finally he found the euclidean distance for each pair.
When you have distance values for positive and negative examples, then you can feed this to a basic decision tree algorithm such as ID3, C4.5, CART or CHAID. It will find the best split point to determine.