I have a set of 1900+ 512-D Facial embeddings/vectors, I'd like to group all similar individuals/faces. There are also an unknown number of distinct faces.
I've employed sklearn.cluster.DBSCAN similar to the suggestion in PyImageSearch Face Clustering with Python. However, it can't cluster effectively, returning 0 clusters. I believe the matrix is too sparse. And believe there are a couple options:
In the process of trying the different methodologies right now, but perhaps there is a well-known method/approach I'm missing?
First I think it is important you check with what similarity measurement your face recognition is using to decide if two embeddings are of the same person. Some engines use cosine similarity and not Euclidian distance.(for example Sphereface or Arcface)
Second I would check what is the threshold on this similarity measurement the face recognition engine to consider two embeddings are of the same person. Usually it is done inorder to balance between TP and FP on a labeled dataset.
Using this two points above I will do the following algorithm:
Note about the second point: You can find the threshold yourself using some face recognition data set like LFW or celeb a and decide exactly how you want to fine tune the threshold by balancing FP an TP.