Search code examples
pythonopencvtorchvision

Shape of face encodings differ


I'm trying to make a face recognition program but the problem is the face encoding shape of some encodings are bigger than the others and thus im getting the error

ValueError: setting an array element with a sequence.

Here's my code to generate the encodings

class FaceEncoder():
    def __init__(self, files, singleton = False, model_path='./models/lbpcascade_animeface.xml', scale_factor=1.1, min_neighbours=1):
        self.singleton = singleton
        self.files = files
        self.model = model_path
        self.scale_factor = scale_factor
        self.min_neighbours = min_neighbours

    def encode(self, singleton=False):

        if  self.singleton == False:
            encodings = []
            labels = []

            for file in self.files:
                cascade = cv2.CascadeClassifier(self.model)
                
                image = cv2.imread(file)
                rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

                faces = cascade.detectMultiScale(rgb, self.scale_factor, self.min_neighbours)

                if len(faces) > 0:
                    print('Found face in '+file)
                    encodings.append(faces.flatten())
                    labels.append(file.split('/')[2])
                else:
                    print('Couldnt find face in '+file)

            return encodings, labels

Here are some of the encodings

[204  96 211 211]
[525 168 680 680]
[205  11 269 269]
[ 165   31  316  316 1098  181  179  179]
[ 113  422 1371 1371]
[ 71  86 183 183]
[209  19  33  33  88  27  60  60 133  80  65  65  68 117  52  52]
[117  77 149 149]
[ 63  77 284 284]
[370 222 490 490]
[433 112 114 114 183  98 358 358]
[ 44  35  48  48 192  34  48  48]
[210  82 229 229]
[429  90 153 153]
[318  50 174 174 118 142 120 120]

Solution

  • you should not put several found rects into the same list entry. if there are many faces found, put each on its own row, and add a label per face found (not per image)

    then, what you have now, are NOT "encodings", just mere boxes / rectangles.

    read up on how to get real encodings (facenet, spherenet ?), then you need to:

    • crop the face region fom the image
    • resize it to the nn input size (e.g. 96x96)
    • run it through the nn to receive the encoding
    • save that along with a label to a db/list