python python-3.x opencv face-recognition dlib

Count the number of unknown people in a video with following conditions

I have a video, https://www.youtube.com/watch?v=LdNrXndwyCc . I am trying to count the number of unknown people in this video, with following restrictions:

Unknown count will increase upon per new unique face detect. and stores the face encoding to a list(facelist).
Suppose 1st frame contains 2 people and 2nd frame contains 4 people. The code will compare, new faces(new face encoding) with old faces(face encodings, which is present in the array). and count the number of new faces which is not in face list, and adds this count to total unknown faces. if new face found , it appends the face encoding to the list.
In new frame if no face encoding matches with facelist's any element, then it clears the facelist. And appends the new face encodings into face list. Unknown count will increase according to number of new people.

Problem:

When a person smiles, or turns his face from front to left (or front to right).It detects the face as new face, and increases the unknown count
It is not detecting new face properly after several frames

Tried in Python3, opencv2, face_recognition python's library. Platform Ubuntu 18.04

class FaceCount:
    def __init__(self):
        self.unknown_count=0

    def face_distance_to_conf(self,face_distance, face_match_threshold=0.6):
        if face_distance > face_match_threshold:
            range = (1.0 - face_match_threshold)
            linear_val = (1.0 - face_distance) / (range * 2.0)
            return linear_val
        else:
            range = face_match_threshold
            linear_val = 1.0 - (face_distance / (range * 2.0))
            return linear_val + ((1.0 - linear_val) * math.pow((linear_val - 0.5) * 2, 0.2))

    def countFaceThread(self,facelist,face_encodings):
        matched_with_no_one=True
        for face_encoding in face_encodings:      
            dup=False

            for face in facelist:

                match=face_recognition.compare_faces([face_encoding],face)[0]
                face_distanc=face_recognition.face_distance([face_encoding],face)
                percent=self.face_distance_to_conf(face_distanc)[0]
                print(percent)
                if match and percent>0.40:
                    dup=True
                    matched_with_no_one=False
                    break
            #print('finished Comparing')   
            if not dup:
                self.unknown_count+=1
                print("unknown_count---->",self.unknown_count)
                facelist.append(face_encoding)

        if matched_with_no_one:
            print("clearing facelist....")
            facelist.clear()
            print("unknown_count---->",self.unknown_count)
            for f_encode in face_encodings:
                facelist.append(f_encode)

    def countUnknown(self):
        cap = cv2.VideoCapture('livetest.webm')
        cap.set(cv2.CAP_PROP_POS_MSEC,30)
        facelist=[]
        while(cap.isOpened()):
            try:
                #istart=time.time()
                ret, frame = cap.read()

                #print('frame reading time------->',time.time()-istart)

                #start=time.time()
                rgb_frame = frame[:, :, ::-1]
                face_locations = face_recognition.face_locations(rgb_frame)

                face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)
                #print("detection time----------->",time.time()-start)

                #start=time.time()
                cv2.imshow('frame', frame)
                for (top, right, bottom, left) in face_locations:
                    cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)
                    cv2.imshow('frame', frame)
                #print("showing the detected frame time----------->",time.time()-start)

                start=time.time()

                if facelist and face_encodings:
                    t2=threading.Thread(target=self.countFaceThread,args=(facelist,face_encodings))
                    t2.start()
                elif face_locations:
                    self.unknown_count+=len(face_locations)
                    print("unknown people------->",self.unknown_count)
                    for face in face_encodings:
                        facelist.append(face)
                    continue

                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break

            except Exception as e:
                print(e)
    # When everything done, release the capture
        cap.release()
        cv2.destroyAllWindows()

if __name__=='__main__':

    t1=threading.Thread(target=FaceCount().countUnknown)
    t1.start()

https://www.youtube.com/watch?v=LdNrXndwyCc play this video in 0.02 sec the person should be treated as unknown people , and increases the count by one. but it doesn't. it increases when the person is smiling

Solution

This is not python version issue. The problem you want to solve is very challenging. Problem lies on the detection and association part. First, you might not have detection, second, the detected object may not be associated to next frame.

match=face_recognition.compare_faces([face_encoding],face)[0]
face_distanc=face_recognition.face_distance([face_encoding],face)

If the distance is too large or too small between the target. you will have failed association and false association. In this case, you most likely have to improve the face feature association accuracy by getting a better distance function/face encoding function.

Couple thing you could Do with minimal effort to improve the result.

First,

Instead of

frame1 -> detect -> associate

frame2 -> detect -> associate

frame3 -> detect -> associate

. ...

Try

frame1 -> detect -> tracking

frame2 -> detect -> tracking -> assocaiate

frame3 -> detect -> tracking -> assocaiate

The tracking can be any method such as kct or tld tracker. It is originally implemented as individual tracker and there are work that expands them into multiple target tracker. You can find them in the github

As shown in the picture, even you have multiple people, there would be less false associate or failed associate.

Second,

The other thing you can try is to use skeleton detection/tracking/association for the scene, frankly speaking, I cant really differentiate between left and right boy. Especially when they are not face directly to the camera, there could be multiple cases of failed detection/tracking/association.

However, we do encounter this type of pose/detection/association question frequently in skeleton detection where people can move, dance and change poses all the time. There are many open source skeleton detection tracking package in the github as well.

Depends on how much effort you want to put in this, there could be many other solutions.