Can anyone all the different techniques used in face detection? Techniques like neural networks, support vector machines, eigenfaces, etc.
What others are there?
the technique I'm going to talk about is more a machine learning oriented approach; in my opinion is quite fascinating, though not very recent: it was described in the article "Robust Real-Time Face Detection" by Viola and Jones. I used the OpenCV implementation for an university project.
It is based on haar-like features, which consists in additions and subtractions of pixel intensities within rectangular regions of the image. This can be done very fast using a procedure called integral image, for which also GPGPU implementations exist (sometimes are called "prefix scan"). After computing integral image in linear time, any haar-like feature can be evaluated in constant time. A feature is basically a function that takes a 24x24 sub-window of the image S and computes a value feature(S); a triplet (feature, threshold, polarity) is called a weak classifier, because
polarity * feature(S) < polarity * threshold
holds true on certain images and false on others; a weak classifier is expected to perform just a little better than random guess (for instance, it should have an accuracy of at least 51-52%).
Polarity is either -1 or +1.
Feature space is big (~160'000 features), but finite.
Despite threshold could in principle be any number, from simple considerations on the training set it turns out that if there are N examples, only N + 1 threshold for each polarity and for each feature have to be examined in order to find the one that holds the best accuracy. The best weak classifier can thus be found by exhaustively searching the triplets space.
Basically, a strong classifier can be assembled by iteratively choosing the best possible weak classifier, using an algorithm called "adaptive boosting", or AdaBoost; at each iteration, examples which were misclassified in the previous iteration are weighed more. The strong classifier is characterized by its own global threshold, computed by AdaBoost.
Several strong classifiers are combined as stages in an attentional cascade; the idea behind the attentional cascade is that 24x24 sub-windows that are obviously not faces are discarded in the first stages; a strong classifier usually contains only a few weak classifiers (like 30 or 40), hence is very fast to compute. Each stage should have a very high recall, while false positive rate is not very important. if there are 10 stages each with 0.99 recall and 0.3 false positive rate, the final cascade will have 0.9 recall and extremely low false positive rate. For this reason, strong classifier are usually tuned in order to increase recall and false positive rate. Tuning basically involves reducing the global threshold computed by AdaBoost.
A sub-window that makes it way to the end of the cascade is considered a face.
Several sub-window in the initial image, eventually overlapping, eventually after rescaling the image, must be tested.