Search code examples
pythonmachine-learningcomputer-visiondatasetdimensionality-reduction

Clustering 512-D Facial Embeddings/Vectors


I have a set of 1900+ 512-D Facial embeddings/vectors, I'd like to group all similar individuals/faces. There are also an unknown number of distinct faces.

I've employed sklearn.cluster.DBSCAN similar to the suggestion in PyImageSearch Face Clustering with Python. However, it can't cluster effectively, returning 0 clusters. I believe the matrix is too sparse. And believe there are a couple options:

  • Calculate Euclidean similarity for each of 1900 combinations - slow, even with matrix multiply, but it works
  • Employ dimensionality reduction/PCA to 128-D vector and try to use DBSCAN
  • Use Nearest-Neighbors - I would have to know how many different people there are beforehand
  • Chinese Whisper Clustering

In the process of trying the different methodologies right now, but perhaps there is a well-known method/approach I'm missing?


Solution

  • First I think it is important you check with what similarity measurement your face recognition is using to decide if two embeddings are of the same person. Some engines use cosine similarity and not Euclidian distance.(for example Sphereface or Arcface)

    Second I would check what is the threshold on this similarity measurement the face recognition engine to consider two embeddings are of the same person. Usually it is done inorder to balance between TP and FP on a labeled dataset.

    Using this two points above I will do the following algorithm:

    1. Create a similarity matrix between all of the embedings A1900x1900 matrix where the value of the entrey I,j correspond to there similarity measured between embeding I and embeding j.
    2. Threshold the matrix using the appropriate value (second point above). Each entry in the matrix above the threshold will get 1 and below will get zero
    3. Treat the threshold matrix as adjencey matrix of a graph and run graph connected components algorithm (using BFS or DFS) to find the number of components. Each component correspond to a unique identify.

    Note about the second point: You can find the threshold yourself using some face recognition data set like LFW or celeb a and decide exactly how you want to fine tune the threshold by balancing FP an TP.