I'm working on a traffic flow prediction where I can predict that a place has heavy or light traffic. I have classified each traffic as 1-5, 1 being the lightest traffic and 5 being the heaviest traffic.
I came across to this website http://www.waset.org/journals/waset/v25/v25-36.pdf, AdaBoost algorithm, and I'm really having a difficulty learning this algorithm.
Specially in the part where S
is the set ((xi
, yi
), i=(1,2,…,m)
). where Y={-1,+1}
. What are x
, y
and the constant L
? what is the value of L
?
Can someone explain me this algorithm? :)
S={(x1,y1),...,(xm,ym)}
: Every (x,y)
pair is a sample used for training (or testing) your classifier:
x
= The features which describe this particular sample, for example values which list the amount of cars on the road
, day of the week
, etcy
= The label for a particular x
, which in your case can be 1, 2, 3, 4 or 5
Table 1
in the paper shows the x
features they used , namely: DAY
, TIME
, INT
, DET
, LINK
, POS
, GRE
, DIS
, VOL
and OCC
. The last column of the table shows the label (y
), which they set to either 1
or -1
(i.e., yes
or no
). Every row in the table is 1 sample.
L
is the amount of rounds in which AdaBoost trains a weak learner (in the paper Random Forests
is used as the weak classifier). If you set L
to 1
then AdaBoost will run 1 round and only 1 weak classifier will be trained, which will have bad results. Perform multiple experiments with different values for L
to find the optimal value (i.e., when AdaBoost is converged or when it starts to overfit).