I'm trying to familiarize myself with CvSVM by using this hand-labeled database of 590 images, which are graded from from 0-5 (0 is blurry, 5 is perfect). If a grade is <3 I label it 0 (blurry) and if >=3 I label it 1 (clear).
For features I'm simply using five different common metrics for blur evaluation. Each is standardized by their mean and standard deviation in the training data. The same training mean and standard deviation are used to standardize the test data as well.
For some reason, my SVM only predicts whole numbers. I have checked for int casts and other silly mistakes but cannot figure it out. I realize that my features are probably not very robust since there is so much variance between different images (the standardization isn't very helpful, as the ranges of the standardized test features end up being larger than those of the training features), but still I feel like I should be getting some decimal predictions, even if they're inaccurate.
Training:
// data format is [ img1 grade feature1 feature2 ... feature5, img2... ]
void train_svm(CvSVM& svm, const Mat& data)
{
CvSVMParams params;
params.svm_type = CvSVM::EPS_SVR;
params.kernel_type = CvSVM::RBF;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, (int)1e8, FLT_EPSILON);
CvParamGrid Cgrid(.01, 100, exp(1));
CvParamGrid gammaGrid(.01, 10, exp(.05));
CvParamGrid pGrid(.01, 1.8, exp(.01));
params.C = Cgrid.min_val;
params.gamma = gammaGrid.min_val;
params.p = pGrid.min_val;
// split features from grades
Mat features = data.colRange(2, data.cols);
Mat grades = data.colRange(1, 2);
try
{
svm.train_auto(features, grades, Mat(), Mat(), params, 10,
Cgrid,
gammaGrid,
pGrid,
CvSVM::get_default_grid(CvSVM::NU),
CvSVM::get_default_grid(CvSVM::COEF),
CvSVM::get_default_grid(CvSVM::DEGREE),
false);
}
catch (Exception e)
{
params = svm.get_params();
qDebug() << params.C << params.gamma << params.p;
}
params = svm.get_params();
svm.train(features, grades, Mat(), Mat(), params);
}
Testing:
void test_svm(const CvSVM& svm, const Mat& data)
{
Mat features = data.colRange(2, data.cols);
Mat grades = data.colRange(1, 2);
int num_test = features.rows;
assert(features.rows == grades.rows);
Mat results(num_test, 1, CV_32FC1);
svm.predict(features, results);
qDebug() << "Act\t\tPred";
for (int i = 0; i < num_test; i++)
{
float actual = grades.at<float>(i, 0);
float predicted = results.at<float>(i, 0);
qDebug() << actual << "\t" << predicted;
}
}
The predictions are always 0 or 1. No decimals.
Can anyone figure out what I'm doing wrong?
As usual, the answer is so simple that I'm embarrassed.
The problem was that I was passing all my test features to the CvSVM in one go, which strictly classifies each sample--thus the whole numbers. From the CvSVM documentation:
C++: float CvSVM::predict(const CvMat* samples, CvMat* results) const
However, when samples are tested individually, there is the option of getting the result as the distance from the margin, which is the float I was looking for:
C++: float CvSVM::predict(const Mat& sample, bool returnDFVal=false ) const
As the documentation clearly explains:
returnDFVal – Specifies a type of the return value. If true and the problem is 2-class classification then the method returns the decision function value that is signed distance to the margin, else the function returns a class label (classification) or estimated function value (regression).
Predicting test samples individually with returnDFVal=true solved my problem.