Search code examples
opencv3.0object-detectioncascade-classifieropencvpython

Meaning of Parameters of detectMultiScale(a, b, c)


OpenCV-Python version 3.4.1

I am trying to detect multiple objects through a camera. The objects are Face, eyes, spoon, pen. Spoon and Pen are particular i.e. it should only detect the Pen and Spoon that I have trained it with. But it detects all the kind of faces and eyes as I have used the '.xml' file for face and eye detection that comes with OpenCV-Python.

My Question is about the code. There is a line in my code below which says detectMultiScale(gray, 1.3, 10). Now, I used the documentation and still couldn't clearly understand the last two parameters of the bracket.

My code:

# with camera feed
import cv2
import numpy as np

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
spoon_cascade = cv2.CascadeClassifier('SpoonCascade.xml')
pen_cascade = cv2.CascadeClassifier('PenCascade.xml')

cap = cv2.VideoCapture('link')

while True:
    ret, img = cap.read()
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    faces = face_cascade.detectMultiScale(gray, 1.3, 5)

    spoons = spoon_cascade.detectMultiScale(gray, 1.3, 10)

    pens = pen_cascade.detectMultiScale(gray, 1.3, 10)

    for (x, y, w, h) in spoons:
        font = cv2.FONT_HERSHEY_SIMPLEX
        cv2.putText(img, 'Spoon', (x-w, y-h), font, 0.5, (0, 255, 255), 2, 
        cv2.LINE_AA)
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)

    for (x, y, w, h) in pens:
        font = cv2.FONT_HERSHEY_SIMPLEX
        cv2.putText(img, 'Pen', (x-w, y-h), font, 0.5, (0, 255, 255), 2, 
        cv2.LINE_AA)
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)

    for (x, y, w, h) in faces:
        font = cv2.FONT_HERSHEY_SIMPLEX
        cv2.putText(img, 'Face', (x + w, y + h), font, 0.5, (0, 255, 255), 2, 
        cv2.LINE_AA)
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
        roi_color = img[y:y + h, x:x + w]
        roi_gray = gray[y:y + h, x:x + w]
        eyes = eye_cascade.detectMultiScale(roi_gray)

        for (ex, ey, ew, eh) in eyes:
            cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 0, 
            255), 2)

    cv2.imshow('Voila', img)
    cv2.imwrite('KuchhToDetected.jpg', img)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

My question:

  1. Is it just a matter of trial and error adjusting these last two parameters or can one know how to change them according to the images?

  2. According to me these two are really significant and make the code very sensitive as it affects false positives. How do I set them properly to reduce false positives ?

It is really important parameter in case of object detection, so it would be beneficial for all if we have the answer once and for all.

Thank you.


Solution

  • Did you get the code (including the call to detectMultiScale) from somewhere, or write it yourself?

    Is it just a matter of trial and error adjusting these last two parameters or can one know how to change them according to the images?

    There is some trial and error in fine-tuning, but you should understand all the parameters and choose initial values which give a good level of performance. Then you can use some kind of automatic method for fine-tuning (i.e., iteratively re-train and re-test with different parameter values and see if detection improves or worsens, but be careful of overfitting). Since the parameters form a large multi-dimensional space, finding good parameters randomly is not practical.

    Looking at the Python OpenCV bindings, it appears the two numeric parameters you use are scaleFactor and minNeighbors respectively. There is a good explanation of minNeighbours on this question: OpenCV detectMultiScale() minNeighbors parameter. Setting it higher should reduce your false positives, as described there.

    The scaleFactor parameter determines a trade-off between detection accuracy and speed. The detection window starts out at size minSize, and after testing all windows of that size, the window is scaled up by scaleFactor and re-tested, and so on until the window reaches or exceeds maxSize. If scaleFactor is large (eg. 2.0), of course there will be fewer steps, so detection is faster, but you may miss objects whose size is in between two tested scales. But Haar-like features are inherently robust to some small variation in scale, so there's no need to make scaleFactor very small (eg. 1.001); that just wastes time with needless steps. That is why the default is 1.3 and not something smaller.

    Setting minSize and maxSize is also important to maximise detection speed. Don't test windows that are smaller or larger than the size range you expect given your setup. So you should specify those in your call.

    To be honest, I don't see Haar cascade classifiers being that good for detecting pens or spoons in unknown orientations (if that is your use case). Pens are long and thin which is poorly suited to a square detection window. You may have more success with LINEMOD for example.

    According to me these two are really significant and make the code very sensitive as it affects false positives. How do I set them properly to reduce false positives ?

    While your false negative rate and speed are OK, don't play with scaleFactor, instead work on improving your training data to reduce your high false positive rate. If speed falls to unacceptable levels while doing that (because the cascade grows to include too many classifier stages), revisit scaleFactor.