I need to use face detection to finish my homework and then I searched on the Internet and I think that using a pre-trained deep learning face detector model with OpenCV's DNN module is easy and good, it works well. Where I learnt it is here: https://www.pyimagesearch.com/2018/02/26/face-detection-with-opencv-and-deep-learning/ , but I am really confused about the 4D array returned by net.forward():
net = cv2.dnn.readNetFromCaffe("deploy.prototxt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")
def detect_img(net, image):
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0), False, False)
net.setInput(blob)
detections = net.forward() # Here is the 4D array.
print(detections.shape)
return show_detections(image, detections)
I almost know nothing about deep learning. I think that I guessed out something by reading "deploy.prototxt" which may be a configuration file of the pre-trained model, I guess, but I still feel really confused about it. May I ask whether there is one way that I can understand the meaning of the 4D array quickly or not? Could I understand how the pre-trained model works roughly, with poor knowledge of deep learning, in a week?
3rd dimension helps you iterate over predictions and
in the 4th dimension, there are actual results
class_lable = int(inference_results[0, 0, i,1])
--> gives one hot encoded class label for ith box
conf = inference_results[0, 0, i, 2]
--> gives confidence of ith box prediction
TopLeftX,TopLeftY, BottomRightX, BottomRightY = inference_results[0, 0, i, 3:7]
-->gives
co-ordinates bounding boxes for resized small image
and 2nd dimension is used when the predictions are made in more than one stages, for example in YOLO the predictions are done at 3 different layers.
you can iterate over these predictions using 2nd dimension like [:,i,:,:]