machine-learning classification traffic prediction adaboost

About adaboost algorithm

I'm working on a traffic flow prediction where I can predict that a place has heavy or light traffic. I have classified each traffic as 1-5, 1 being the lightest traffic and 5 being the heaviest traffic.

I came across to this website http://www.waset.org/journals/waset/v25/v25-36.pdf, AdaBoost algorithm, and I'm really having a difficulty learning this algorithm. Specially in the part where S is the set ((xi, yi), i=(1,2,…,m)). where Y={-1,+1}. What are x, y and the constant L? what is the value of L?

Can someone explain me this algorithm? :)

Solution

S={(x1,y1),...,(xm,ym)}: Every (x,y) pair is a sample used for training (or testing) your classifier:

x = The features which describe this particular sample, for example values which list the amount of cars on the road, day of the week, etc
y = The label for a particular x, which in your case can be 1, 2, 3, 4 or 5

Table 1 in the paper shows the x features they used , namely: DAY, TIME, INT, DET, LINK, POS, GRE, DIS, VOL and OCC. The last column of the table shows the label (y), which they set to either 1 or -1 (i.e., yes or no). Every row in the table is 1 sample.

L is the amount of rounds in which AdaBoost trains a weak learner (in the paper Random Forests is used as the weak classifier). If you set L to 1 then AdaBoost will run 1 round and only 1 weak classifier will be trained, which will have bad results. Perform multiple experiments with different values for L to find the optimal value (i.e., when AdaBoost is converged or when it starts to overfit).