Search code examples
machine-learningsvmsvmlight

How to interpret SVM-light results


I'm using SVM-light as its written in tutorial to classify data into 2 classes:

Train file:

 +1 6357:1 8984:1 11814:1 15465:1 16031:1
 +1 6357:1 7629:0.727 7630:42 7631:0.025
 -1 6357:1 11814:1 11960:1 13973:1
 ...

And test file:

 0 6357:1 8984:1 11814:1 15465:1
 0 6357:1 7629:1.08 7630:33 7631:0.049 7632:0.03
 0 6357:1 7629:0.069 7630:6 7631:0.016
 ...

By executing svm_learn.exe train_file model -> svm_classify.exe test_file model output I get some kind of unexpected values in output:

 -1.0016219
 -1.0016328
 -1.0016218
 -0.99985838
 -0.99985853

Isn't it should be exactly +1 or -1 as classes in train file? Or some kind of float number between -1 and +1 to manually choose a 0 as a solution for classifying or some another number, but as for me it's pretty unexpected situation when all of the numbers are just close to -1 and some of them even less.

UPD1: It's said that if the result number is negative then its class -1, if it's positive - +1. Still questioning what does this value after the sign mean? I've just started exploring SVM so it may be an easy or stupid question :) And if I get pretty bad prediction what steps should I take - another kernels? Or maybe some other options to make SVM-light more relevant to my data?


Solution

  • Short answer: just take the sign of the result

    Longer answer: A SVM takes an input and returns a real-valued output (which is what you are seeing).

    On the training data, the learning algorithm tries to set the output to be >= +1 for all positive examples and <= -1 for all negative examples. Such points have no error. This gap between -1 and +1 is the "margin." Points in "no-man's land" between -1 and +1 and points on the completely wrong side (like a negative point with an output of >+1) are errors (which the learning algorithm is trying to minimize over the training data).

    So, when testing, if the result is less than -1, you can be reasonably certain it is a negative example. If it is greater than +1, you can be reasonably certain it is a positive example. If it is in between, then the SVM is pretty uncertain about it. Usually, you must make a decision (and cannot say "I don't know") and so people use 0 as the cut-off between positive and negative labels.