Search code examples
svmlibsvmfeature-selection

libsvm scaling of features real and categorical


I am wondering if categorical features, after converting to one-hot encoding (e.g. 0 0 0 1 0 0 for 6 possible values of the variable) should be scaled along real features using svm-scale function. libsvm guide apparently says so, I think.

Also, what is the effect on learning in SVM, if there are some features which are undiscriminating, e.g. random noise? Should I remove such features before training? My guess is that these can affect learning because SVM essentially calculates euclidean distances between data points which are represented as vectors of features. I am not much concerned with running time as number of features is small. Please mention standard feature selection algorithm implementation for svm. Any suggestion is welcome.

Thank you.


Solution

  • You have several questions in there:

    1) Should 0-1 features get scaled?
    2) What is the effect of noise features?
    3) Should noise features be removed?
    4) If so, how?

    The general answer to (1) and (3) is that you should use cross-validation, (or a holdout validation set) try it both ways, and keep whichever one scores better on cross-validation. If I'm going to guess, I'd say that scaling 0-1 features probably doesn't matter very much, because SVM's are not that scale dependent as long as all of the features are O(1), which those are. A moderate number of noise features are probably ok, too. As for (2), you are correct that noise features usually degrade SVM performance somewhat. Feature selection is a big topic. There is a decent introduction to it in the scikit-learn user guide.