I am having trouble understanding the weight update rule for perceptrons:
w(t + 1) = w(t) + y(t)x(t).
Assume we have a linearly separable data set.
- w is a set of weights [w0, w1, w2, ...] where w0 is a bias.
- x is a set of input parameters [x0, x1, x2, ...] where x0 is fixed at 1 to accommodate the bias.
At iteration t, where t = 0, 1, 2, ...,
- w(t) is the set of weights at iteration t.
- x(t) is a misclassified training example.
- y(t) is the target output of x(t) (either -1 or 1).
Why does this update rule move the boundary in the right direction?
The perceptron's output is the hard limit of the dot product between the instance and the weight. Let's see how this changes after the update. Since
w(t + 1) = w(t) + y(t)x(t),
then
x(t) ⋅ w(t + 1) = x(t) ⋅ w(t) + x(t) ⋅ (y(t) x(t)) = x(t) ⋅ w(t) + y(t) [x(t) ⋅ x(t))].
Note that:
- By the algorithm's specification, the update is only applied if x(t) was misclassified.
- By the definition of the dot product, x(t) ⋅ x(t) ≥ 0.
How does this move the boundary relative to x(t)?
- If x(t) was correctly classified, then the algorithm does not apply the update rule, so nothing changes.
- If x(t) was incorrectly classified as negative, then y(t) = 1. It follows that the new dot product increased by x(t) ⋅ x(t) (which is positive). The boundary moved in the right direction as far as x(t) is concerned, therefore.
- Conversely, if x(t) was incorrectly classified as positive, then y(t) = -1. It follows that the new dot product decreased by x(t) ⋅ x(t) (which is positive). The boundary moved in the right direction as far as x(t) is concerned, therefore.