I'm using SVM-light as its written in tutorial to classify data into 2 classes:
Train file:
+1 6357:1 8984:1 11814:1 15465:1 16031:1
+1 6357:1 7629:0.727 7630:42 7631:0.025
-1 6357:1 11814:1 11960:1 13973:1
...
And test file:
0 6357:1 8984:1 11814:1 15465:1
0 6357:1 7629:1.08 7630:33 7631:0.049 7632:0.03
0 6357:1 7629:0.069 7630:6 7631:0.016
...
By executing svm_learn.exe train_file model
-> svm_classify.exe test_file model output
I get some kind of unexpected values in output
:
-1.0016219
-1.0016328
-1.0016218
-0.99985838
-0.99985853
Isn't it should be exactly +1 or -1 as classes in train file? Or some kind of float number between -1 and +1 to manually choose a 0 as a solution for classifying or some another number, but as for me it's pretty unexpected situation when all of the numbers are just close to -1 and some of them even less.
UPD1: It's said that if the result number is negative then its class -1
, if it's positive - +1
. Still questioning what does this value after the sign mean? I've just started exploring SVM so it may be an easy or stupid question :) And if I get pretty bad prediction what steps should I take - another kernels? Or maybe some other options to make SVM-light more relevant to my data?
Short answer: just take the sign of the result
Longer answer: A SVM takes an input and returns a real-valued output (which is what you are seeing).
On the training data, the learning algorithm tries to set the output to be >= +1 for all positive examples and <= -1 for all negative examples. Such points have no error. This gap between -1 and +1 is the "margin." Points in "no-man's land" between -1 and +1 and points on the completely wrong side (like a negative point with an output of >+1) are errors (which the learning algorithm is trying to minimize over the training data).
So, when testing, if the result is less than -1, you can be reasonably certain it is a negative example. If it is greater than +1, you can be reasonably certain it is a positive example. If it is in between, then the SVM is pretty uncertain about it. Usually, you must make a decision (and cannot say "I don't know") and so people use 0 as the cut-off between positive and negative labels.