Search code examples
machine-learningcomputer-visionface-recognitionfacial-identification

How do I collect huge face dataset programatically for facial recognition?


So, I am trying to work on this facial recognition system using Facenet. The difficulty with the project is the data, because I need at-least 100K class of labeled face images. Eventually I want to store the encodings in the database for real time face detection. There are datasets like 'labeled faces in the wild' which consists of huge face dataset but are inconsistent with the quality of the face images and the number of images on each class. I also looked into how facenet was trained on and found out that it was trained on '1 million celebrity face dataset'. I assume I can't use it because it was used to train the facenet which I am trying to use for my project. So, my question is how do I programatically collect face dataset? Thank you


Solution

  • I recommend you to look at VGGFace2 data set. It stores 3.3M face images of 9K+ identities.

    Another good one is FaceScrub. It stores 100K face images of 530 identities.

    FaceNet is neither trained with VGGFace2 or FaceScrub. Nowadays, many studies train models with those data set and test models on Labeled Faces in the Wild (LFW) Data set. LFW stores 13K face images of 5749 identities.