In my bachelor thesis I am supposed to use AdaBoostM1 with a MultinomialNaiveBayes classifier on a text classification problem. The problem is that in most cases, the M1 is worse or equal to the MultinomialNaiveBayes without boosting.
I use the following code:
AdaBoostM1 m1 = new AdaBoostM1();
m1.setClassifier(new NaiveBayesMultinomial());
m1.buildClassifier(training);
So I don't get how the AdaBoost would not be able to improve the results? Unfortunately, I couldn't find anything else about that on the web as most people seem to be very satisfied with the AdaBoost.
AdaBoost is a binary/dichotomous/2-class classifier and designed to boost a weak learner that is just better than 1/2 accuracy. AdaBoostM1 is a M-class classifier but still requires the weak learner to be better than 1/2 accuracy, when one would expect chance level to be around 1/M. Balancing/weighting is used to get equal prevalence classes initially, but the reweighting inherent to AdaBoost can destroy this quickly. A solution is to base boosting on chance corrected measures like Kappa or Informedness (AdaBook).
As M grows, e.g. with text classification, this mismatch grows, and thus a much stronger than chance classifier is needed. Thus with M=100, chance is about 1% but 50% minimum accuracy is needed by AdaBoostM1.
As base classifiers get stronger (viz. no longer barely above chance) the scope for boosting to improve things reduces - it has already pulled us to a very specific part of the search space. It is increasingly likely to have overfitted to errors and outliers, so there is no scope to balance a wide variety of variants.
A number of resources on informedness (including matlab code and xls sheets and early papers) are here: http://david.wardpowers.info/BM A comparison with other chance-corrected kappa measures is here: http://aclweb.org/anthology-new/E/E12/E12-1035.pdf
A weka implementation and experimentation for Adaboost using Bookmaker informedness is available - contact author.