I was wondering why libSVM gives different accuracy results if I predict with or without the probabilities and I found a FAQ at this page which says
Q: Why using svm-predict -b 0 and -b 1 gives different accuracy values?
Let's just consider two-class classification here. After
probability information is obtained in training, we do not have
prob > = 0.5 if and only if decision value >= 0.
So predictions may be different with -b 0 and 1.
I read and re-read it a dozen times but still do not understand it. Can someone explain it more clearly?
A "normal" SVM model calculates a decision value for each given data point, which basically is the distance of said point from the separating hyperplane. Everything on the one side of the hyperplane (dec_value >= 0) is predicted as class A, everything on the other side (dec_value < 0) as class B.
If you now calculate class probabilities, there may be a point with a decision value of (for example) 0.1, which would make it class A. But the probability calculation for class A could be 45% and for class B 55%, so the algorithm would now predict it as B.
Possible algorithms for calculating said class probabilities are described in their paper, Section 8.
The sentence in question
After probability information is obtained in training, we do not have prob > = 0.5 if and only if decision value >= 0. So predictions may be different with -b 0 and 1.
Basically says "A decision value of >= 0 does not mean probA > probB or vice versa.