Why does this trivially learnable example break AdaBoost?

I'm testing out a boosted tree model that I built using Matlab's fitensemble method.

X = rand(100, 10);
Y = X(:, end)>.5;
boosted_tree = fitensemble(X, Y, 'AdaBoostM1', 100,'Tree');
predicted_Y = predict(boosted_tree, X);

I just wanted to run it on a few simple examples, so I threw in an easy case, one feature is >.5 for positive examples and < .5 for negative examples. I get the warning

Warning: AdaBoostM1 exits because classification error = 0

Which leads me to think, great, it figured out the relevant feature and all the training examples were correctly classified.

But if I look at the accuracy

sum(predicted_Y==Y)/length(Y)

The result is 0.5 because the classifier simply assigned the positive class to all examples!

Why does Matlab think that classification error = 0 when it is clearly not 0? I believe this example should be easily learnable. Is there a way to prevent this error and get the correct result using this method?

Edit: The code above should reproduce the warning.

Solution

This is not a bug, it's just that AdaBoost is not designed to work in cases where the first weak learner gets perfect classification. More details:

1) The warning you get is referring to the error of the first weak learning, which is indeed zero. You can see this by following the stack trace that comes along with the warning into the function Ensemble.m (in Matlab R2013b, at line 194). If you place a breakpoint there and run your example, then run the command H.predict(X) you will see that this learning has perfect prediction.

2) So why doesn't your ensemble have perfect prediction? If you look more at Ensemble.m, you'll see that this perfect learner never gets added to the ensemble. This is also reflected in that boosted_tree.NTrained is zero.

3) So why doesn't this perfect learner get added to the ensemble? If you find a description of the AdaBoost.M1 algorithm, you'll see that in each round, training examples are weighted by the error of the previous weak learner. But if that weak learner had no error, then the weights will be zero and therefore all subsequent learners will have nothing to do.

4) If you come across this situation in the real world, what do you do? Don't bother with AdaBoost! The problem is easy enough that a single one of your weak learners can solve it:

X = rand(100, 10);
Y = X(:, end)>.5;
tree = fit(ClassificationTree.template, X, Y);
predicted_Y = predict(tree, X);
accuracy = sum(predicted_Y == Y) / length(Y)