I have a video, https://www.youtube.com/watch?v=LdNrXndwyCc . I am trying to count the number of unknown people in this video, with following restrictions:
Problem:
When a person smiles, or turns his face from front to left (or front to right).It detects the face as new face, and increases the unknown count
It is not detecting new face properly after several frames
Tried in Python3, opencv2, face_recognition python's library. Platform Ubuntu 18.04
class FaceCount:
def __init__(self):
self.unknown_count=0
def face_distance_to_conf(self,face_distance, face_match_threshold=0.6):
if face_distance > face_match_threshold:
range = (1.0 - face_match_threshold)
linear_val = (1.0 - face_distance) / (range * 2.0)
return linear_val
else:
range = face_match_threshold
linear_val = 1.0 - (face_distance / (range * 2.0))
return linear_val + ((1.0 - linear_val) * math.pow((linear_val - 0.5) * 2, 0.2))
def countFaceThread(self,facelist,face_encodings):
matched_with_no_one=True
for face_encoding in face_encodings:
dup=False
for face in facelist:
match=face_recognition.compare_faces([face_encoding],face)[0]
face_distanc=face_recognition.face_distance([face_encoding],face)
percent=self.face_distance_to_conf(face_distanc)[0]
print(percent)
if match and percent>0.40:
dup=True
matched_with_no_one=False
break
#print('finished Comparing')
if not dup:
self.unknown_count+=1
print("unknown_count---->",self.unknown_count)
facelist.append(face_encoding)
if matched_with_no_one:
print("clearing facelist....")
facelist.clear()
print("unknown_count---->",self.unknown_count)
for f_encode in face_encodings:
facelist.append(f_encode)
def countUnknown(self):
cap = cv2.VideoCapture('livetest.webm')
cap.set(cv2.CAP_PROP_POS_MSEC,30)
facelist=[]
while(cap.isOpened()):
try:
#istart=time.time()
ret, frame = cap.read()
#print('frame reading time------->',time.time()-istart)
#start=time.time()
rgb_frame = frame[:, :, ::-1]
face_locations = face_recognition.face_locations(rgb_frame)
face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)
#print("detection time----------->",time.time()-start)
#start=time.time()
cv2.imshow('frame', frame)
for (top, right, bottom, left) in face_locations:
cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)
cv2.imshow('frame', frame)
#print("showing the detected frame time----------->",time.time()-start)
start=time.time()
if facelist and face_encodings:
t2=threading.Thread(target=self.countFaceThread,args=(facelist,face_encodings))
t2.start()
elif face_locations:
self.unknown_count+=len(face_locations)
print("unknown people------->",self.unknown_count)
for face in face_encodings:
facelist.append(face)
continue
if cv2.waitKey(1) & 0xFF == ord('q'):
break
except Exception as e:
print(e)
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
if __name__=='__main__':
t1=threading.Thread(target=FaceCount().countUnknown)
t1.start()
https://www.youtube.com/watch?v=LdNrXndwyCc play this video in 0.02 sec the person should be treated as unknown people , and increases the count by one. but it doesn't. it increases when the person is smiling
This is not python version issue. The problem you want to solve is very challenging. Problem lies on the detection and association part. First, you might not have detection, second, the detected object may not be associated to next frame.
match=face_recognition.compare_faces([face_encoding],face)[0]
face_distanc=face_recognition.face_distance([face_encoding],face)
If the distance is too large or too small between the target. you will have failed association and false association. In this case, you most likely have to improve the face feature association accuracy by getting a better distance function/face encoding function.
Couple thing you could Do with minimal effort to improve the result.
Instead of
frame1 -> detect -> associate
frame2 -> detect -> associate
frame3 -> detect -> associate
. ...
Try
frame1 -> detect -> tracking
frame2 -> detect -> tracking -> assocaiate
frame3 -> detect -> tracking -> assocaiate
The tracking can be any method such as kct or tld tracker. It is originally implemented as individual tracker and there are work that expands them into multiple target tracker. You can find them in the github
As shown in the picture, even you have multiple people, there would be less false associate or failed associate.
The other thing you can try is to use skeleton detection/tracking/association for the scene, frankly speaking, I cant really differentiate between left and right boy. Especially when they are not face directly to the camera, there could be multiple cases of failed detection/tracking/association.
However, we do encounter this type of pose/detection/association question frequently in skeleton detection where people can move, dance and change poses all the time. There are many open source skeleton detection tracking package in the github as well.
Depends on how much effort you want to put in this, there could be many other solutions.