Search code examples
algorithmface-detectionface-recognitionhaar-classifierviola-jones

Viola Jones threshold value Haar features error value


I have read the viola paper from 2004. In 3.1 they explain the threshold calculation. But I am super confused. It reads as

For each feature, the examples are sorted based on feature value

Question1) Is sorted list a list of haar feature values calculated from integral image of examples. So if we have a feature and 10 images(positive and negative). we get 10 results associated with each input image.

The AdaBoost optimal threshold for that feature can then be computed in a single pass over this sorted list. For each element in the sorted list, four sums are maintained and evaluated: the total sum of positive example weights T +, the total sum of negative example weights T −, the sum of positive weights below the current example S+ and the sum of negative weights below the current example S−

Question 2) what is the purpose of sorting. I guess the one with the highest is the one describes the image best. But algorithmically how does it affect (S- S+ T+ T-).

Question3) Now for a sorted list we calculate (S- S+ T+ T-). Does this mean each entry holds its own (S- S+ T- T+) or is there only One (S- S+ T- T+) for the whole list.

The error for a threshold which splits the range between the current and previous example in the sorted list is: e = min ( S+ + (T − − S−), S− + (T + − S+)) ,

Question4) This somewhat answers my previous question but I am not sure. So in order for us have "e "for each input image. We need to maintain (S- S+ T- T+) for each entry in the list. But what do we do with "e" after we calculate N of them (one for each image) for that feature.

Thanks in advance, Please let me know if this is confusing or you need more clarification for my questions.


Solution

  • Question1) Is sorted list a list of haar feature values calculated from integral image of examples. So if we have a feature and 10 images(positive and negative). We get 10 results associated with each input image.

    You get 10 results for the feature, one result associated with each input image. Each image is marked as positive or negative.

    Question 2) what is the purpose of sorting. I guess the one with the highest is the one describes the image best. But algorithmically how does it affect (S- S+ T+ T-).

    The image with the highest is the one with the highest response for that feature. You sort based on the response, not based on the weight.

    The reason you sort them is that two of the things you are trying to calculate are "the sum of positive weights below the current example S+ and the sum of negative weights below the current example S−". If the list is sorted then you can keep a running sum, and at each point, all the examples whose weights you have added to the sum until then will have a feature response less than (i.e. "below") the current example. That doesn't work if the list isn't sorted. Then you can evaluate the error associated with using the response level halfway between that example and the next one as a threshold.

    Question3) Now for a sorted list we calculate (S- S+ T+ T-). Does this mean each entry holds its own (S- S+ T- T+) or is there only One (S- S+ T- T+) for the whole list.

    There will be one S- and one S+ per example, because it's "the sum of positive weights below the current example". T+ and T- are calculated for the whole list, I don't know why they say you need to maintain it for each element.

    Question4) This somewhat answers my previous question but I am not sure. So in order for us have "e "for each input image. We need to maintain (S- S+ T- T+) for each entry in the list. But what do we do with "e" after we calculate N of them (one for each image) for that feature.

    You chose the minimum out of all of them, and that's the optimal place to put the threshold (which will be the midpoint of the responses of those two examples), because it has the minimum error (false positives + false negatives). BTW, the reason there are two choices at each point (i.e. e = min ( S+ + (T − − S−), S− + (T + − S+)) ) is that you can choose whether to make the threshold so that values above that response level are considered positive (the first term), or values below it are considered positive.

    If it's the former, then S+ are your false positives, (T- - S-) are your false negatives. If it's the latter, then S- are your false negatives and (T+ - S+) are your false positives.