I have checked the wikipedia page but can't find the differences between them,Both seem to be converting multi classes into multiple linear classifiers.
It is about the strategy to split the data for training. Suppose you have N data-samples with C classes.
One-vs-One: Here, you pick 2 classes at a time, and train a two-class-classifier using samples from the selected two classes only (other samples are ignored in this step). You repeat this for all the two class combinations. So you end up with N(N-1)/2 classifiers. And while testing, you do voting among these classifiers.
One-vs-Rest: Here, you pick one class and train a two-class-classifier with the samples of the selected class on one side and all the other samples on the other side. Thus, you end up with N classifiers. While testing, you simply classify the sample as belonging to the class with maximum score among the N classifiers.
As an example, suppose we have a 3 class problem, with class labels c1, c2, and c3. And let samples be x1, x2, .... Let the classifiers be f1, f2, .... So, suppose your training data is { {x1, c1}, {x2, c1}, {x3, c2}, {x4,c1}, {x5, c2}, {x6, c3}, {x7, c3} }. Then:
One-vs-One: f1: trained with the subset { {x1, c1}, {x2, c1}, {x3, c2}, {x4,c1}, {x5, c2} }, for classes c1 and c2 f2: trained with the subset { {x3, c2}, {x5, c2}, {x6, c3}, {x7, c3} }, for classes c2 and c3. f3: trained with the subset { {x1, c1}, {x2, c1}, {x4,c1}, {x6, c3}, {x7, c3} }, for classes c2 and c3.
One-vs-Rest: f1: trained with { {x1, c1}, {x2, c1}, {x3, ~c1}, {x4,c1}, {x5, ~c1}, {x6, ~c1}, {x7, ~c1} }, for class c1 and the rest (~c1, i.e, NOT c1). f2: trained with { {x1, ~c2}, {x2, ~c2}, {x3, c2}, {x4, ~c2}, {x5, c2}, {x6, ~c2}, {x7, ~c2} }, for class c2 and the rest (~c2, i.e, NOT c2). f3: trained with { {x1, ~c3}, {x2, ~c3}, {x3, ~c3}, {x4, ~c3}, {x5, ~c3}, {x6, c3}, {x7, c3} }, for class c3 and the rest (~c3, i.e, NOT c3).