HAAR Classifier Clarification

I am trying to understand how haar classifiers work. I am reading the opencv documentation here: http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html and it seems like you basically train a set of data to get something like a template. Then you lay the template over the actual image that you want to check and you go through and check each pixel to see if it is likely or not likely to be what you are looking for. So, assuming that is right, I got up to the point where I was looking at the photo below and I did not understand. Are the blocks supposed to represent regions of "likely" and "unlikely"? Thanks in advance

enter image description here

Solution

These patterns are the features that are evaluated for your training image. For example, for feature 1a the training process finds square regions in all your training images where the left half is usually brighter than the right half (or vice versa). For feature 3a, the training finds square regions where the center is darker than the surrounding.

These particular features you depicted were chosen for the haar cascade not because they are particularly good features, but mainly because the are extremely fast to evaluate.

More specifically, the training of a haar cascade finds the one feature that helps best at differentiating your positive and negative training images, (roughly the feature that is most often true for the positive images and most often false for the negative images). That feature will be the first stage of the resulting haar cascade. The second best feature will be the second stage, and so on.

After training, a haar cascade consists of a series of rules, or stages, like this:

evaluate feature 1a for region (x1;y1)-(x2;y2). Is the result greater than threshold z1?
(meaning: is the left half of that region brighter than the right half by a certain amount?)
- if yes, return 'not a match'
- if no, execute the next stage

In the classical haar cascade, each such rule, involving only a single feature at a single location with a single threshold, represents a stage of the cascade. OpenCV actually uses a boosted cascade, meaning each stage consists of a combination of several of these simple features.

The principle remains: each stage is a very weak classifier that by itself is just barely better than wild guessing. The threshold for each stage is chosen so that the chance of false negatives is very low (So a stage will almost never wrongly reject a good match, but will quite frequently wrongly accept a bad match).

When the haar cascade is executed, all stages are executed in order; only a picture that passes the first AND the second AND the third ... stage will be accepted.

During training the first stage is also trained first. Then the second stage is trained only with the training images that would pass the first stage, and so on.