Search code examples
algorithmboosting

boosting algorithm - classifiers yields the correct label


I am a computer science student; I am studying the Algorithms course independently.

During the course, I saw this question:

Suppose we have a set X = {x1, . . . , xn} of elements, each with a label L(i) ∈ {0, 1} (think of x(i) as a picture, and the label indicates whether it is a cat or not). We also have a set of classifiers H, and an algorithm A that given any distribution D on X, outputs h ∈ H such that

Pr(i∼D)[h(x(i)) = L(i)] ≥ 0.51

Show an algorithm that produces a set of T = O(log n) classifiers h (1), . . . , h(T) ∈ H, such that the majority vote among these T classifiers yields the correct label for all 1 ≤ i ≤ n.

photo of the question

From what I can understand this is a question related to boosting. But it is not clear to me how to show an algorithm for this question.

I found an algorithm, but I do not know if it fits the problem:

Algorithm 1 Boost(D, A)
Let T ← 4 log n/ E^2 for E < 0.01.
Initialize a copy of polynomial weights to run over w^t ∈ ∆n.
for t = 1 to T do
  Let h^t = A(D, w^t)
  Let L^t ∈ [0, 1]^m be such that L^t_i = 1[h^t(xi) = yi].
  Pass L^t to the PW algorithm.
end for
Let pˆ =1/T(SIGMA^T_t=1 e_h^t )
Return fpˆ(x).

link to algorithm, page 16

to be perfectly honest, did not understand how to solve the question.


Solution

  • We can view this as a two-player zero-sum game. Carol (the “classifier” player) chooses a classifier, and Dave (the “data” player) chooses a labeled element. Carol wins if the classifier is correct on that element, and Dave wins if it’s incorrect.

    Algorithm A implies that Carol can win this game at least 51% of the time. We can run algorithm ComputeEQ with ε = 0.005 (0.5%) to find a strategy for Carol where she chooses uniformly at random from O(log n) classifiers and wins at least 50.5% of the time regardless of Dave’s strategy. This implies that the majority vote is correct on all n elements.

    (This is really a question for https://cs.stackexchange.com.)