Is this AdaBoost behavior correct?

I'm implementing AdaBoost as described by the Viola-Jones paper for my own edification. In the process of unit testing the algorithm I have found some strange behavior. It is possible this is just the algorithm acting strangely on canned data or it is possible I am missing something. I'd like to know which case it is.

To start with I have:

2 instances of A type faces
1 instance  of a B type face
3 instances of noise
--------------------
6 total instances

So each image has an initial weight of 1/6.

The first feature selected by the classifier identifies A types faces but not B type faces and not any of the noise. As a result it has an error (and associated weight in the boosted classifier) of 1/6.

Next the weights are updated (first the correctly classified images are multiplied by (error / 1 - error)) == 0.2 yielding:

A type face weight: 1/30
B type face weight: 1/6
noise image weight: 1/6

Then the weights are normalized (to sum to 1):

A type face weight: 1/22
B type face weight: 5/22
noise image weight: 5/22

The second feature correctly selects B type images but not noise or A type images. As a result it has an error of 1/11 (2/22) which is noticeably less than 1/6.

Since the "default" threshold proposed by Viola-Jones (this is before we get to cascades and tweaking thresholds in the latter section of the paper) is half the weights, and there are only two weights, and the second feature's weight is larger (since it had a lower error) then the resulting boosted classifier only correctly classifies the B type faces.

Intuitively, I would expect a strong classifier composed of a weak classifier which detects A faces and a weak classifier which detects B faces to detect both A and B faces.

I'm even willing to accept that I'll only get one of the two since AdaBoost is sort of a majority voting algorithm and might behave strangely with only 2 voters, but I would expect if it's only going to correctly classify one of the faces then it would correctly classify the A faces since there are more of them.

In other words, I would expect each weak classifier added to the strong classifier to have successively lower weight.

Am I missing a step or is this just strange behavior for overly-simple data?

Solution

There is a mistake in the computation. When using the first feature, the error is 1/6 because only B faces are misclassified. Noise and A faces are classified correctly in this case. Therefore, when you are updating weights according to w(i) = w(i) * beta^(1-e(i)), only for B faces e(i) equals to 1. For A faces and noise e(i) = 0. Thus, weights for both noise and A faces will be updated:

A type face weight: 1/30
B type face weight: 1/6
noise image weight: 1/30

After normalization:

A type face weight: 1/10
B type face weight: 1/2
noise image weight: 1/10

Now, when you are using the second feature, the error is 1/5.