boost machine-learning classification weak

weak Learners based on distributions : Decision stump

I need to boost the decision stump weak classifier. So for every iteration, I'll have to train the weak classifier based on certain weights. I will then update the weights after every iteration. So far I have understood. But the unclear part to me is "Train decision stump weak classifier based on weights" . How do they exactly do that ? Can anyone explain in layman terms ?

Say suppose I have training dataset {(x1,y1),(x2,y2).....,(xm,ym)} X are the features (say 10), y are a binary class

Initially the weights are w(x) = 1/m

So the decision stump weak learner should give h(x) which should be binary based on the weights.

How does the algorithm work ? What features will the decision stump take ?

Solution

For boosting task you need to pick best classifier on each iteration of algorithm. To do so you need to minimize average error of stump on dataset with respect to weights, so you must take into account weights of objects while counting error measure of a classifier. Thus penalty of classifier for incorrect labeling of object with big weight will be bigger than penalty for incorrect labeling of object with small weight.

You can see my implementation of boosting on decision trees on R language, it works well, for decision stumps, just change depth of tree on line 132 to 1, and you can test accuracy with different number of classifiers changing parameter T.

If you need deeper understanding: You can learn stumps in same way as tree of depth 1. To learn tree on weighted dataset you need to pick feature and value that separates dataset in best way to 2 parts by chosen feature according to weighted metrics, for example Entropy and Information Gain. You can just iterate with for loop over all available features, in nested loop sort picked feature and try all possible separations of dataset into two sets S, according to chosen feature and separator value, then calculate entropy over every set as it is written on wikipedia, but instead of calculating p(x) as

The proportion of the number of elements in class x to the number of elements in set S

you need to sum all weights of objects with class x in set and divide this number by sum over all weights of objects in this set.

$p(x)$

where $W$ - all weights of objects in set S, and $W_x$ - all weights of objects in set S with class x.

Then you can calculate information gain, but again, you need to use weighted proportion p(t) instead of variant in wikipedia (proportion of numbers).

$p(t)$

where $W_S$ - set of weights of objects from initial (not divided by separator) set. and $W_t$ - set of weights of objects from set t (you will get 2 set's t by separating S with some separator value)

Pick feature and separator value which gives you biggest gain and that's all, you have just learned new stump classifier on weighted data and it is ready for work.

I've made some pic to provide example of computation, here i picked only 1 separator, you need to check gain for every possible separator. example